Linguistics students investigating what fictional languages in pop culture can teach us about human communication

Monday, March 11, 2024

by WALKER SMART

Photo of UNT students Carter Smith and Analiese Beeler

UNT linguistics students Carter Smith (left) and Analiese Beeler. (Photo: Ahna Hubnik/UNT)

Dothraki, Klingon and Quenya may not be languages used in everyday life, but it turns out there is much to learn from these languages that have been designed to make fictional worlds come alive in popular books, films and television shows.

Analiese Beeler and Carter Smith, UNT undergraduate linguistics majors, are using computational methods and statistical modeling to compare the constructed languages — including Dothraki in Game of Thrones, Klingon in Star Trek and Quenya in The Lord of the Rings — with natural languages such as Dutch, English and German that evolved over millennia.

To compare these languages, they are using information theory, which studies how much information is passed in a given message using a mathematical framework. Specifically, they’re studying segment surprisal, a way to quantify how surprising it is to see a particular sound in a word given its position relative to the other sounds. Many people who play Wordle or do crosswords use this knowledge instinctively while working to solve the puzzle.

“We are applying Bayesian inference techniques to compare the patterns of information between constructed languages and natural languages in hopes of learning more about the nature of constructed languages,” Beeler says.

Their project is a way to explore humans’ innate understanding of language and how it’s unconsciously applied.

“We want to expand understanding within the field of linguistics,” Smith says. “Beyond that, we also hope to draw more interest into computational linguistics and linguistic research in general by connecting with a topic that has a larger presence in popular culture.”

They found inspiration through their collaborator and mentor, Frederik Hartmann, assistant professor of linguistics at UNT. Hartmann studies the Voynich manuscripts, a 15th century European book with a mysterious script that researchers are still working to understand. While discussing whether the manuscript’s “language” could be computationally distinguished from natural language, Beeler and Smith were driven to investigate whether the same could be said for constructed languages.

“Dr. Hartmann is keen on supporting students in research endeavors, and he has made major contributions to this project,” Beeler says. “It has been amazing to be able to learn from him, both about this field and research in general, as we work alongside him on this project.”

While there is other existing research on constructed languages, it is a relatively new field and no one has looked at this exact topic as far as they know. That means they’ve had to pave their own way in terms of how they conduct the research.

Photo of UNT student researchers Analiese Beeler and Carter Smith

(Photo: Ahna Hubnik/UNT)

One of the challenges they are working through involves the large variance in the size of lexicon data between natural languages and constructed languages. Natural languages include hundreds of thousands of words, while constructed languages usually have a much smaller lexicon.

“We want to mitigate any effect smaller lexicon sizes in constructed languages could have on our comparisons, and we are exploring a few possible solutions to this problem,” Smith says.

While the project is in its early stages, they have seen some level of variance between languages based on their metrics. Some closely related languages, such as German, Dutch and English, exhibit very similar patterns that are distinct from what they have observed in constructed languages.

“We’ve found that computer models can effectively estimate the effect of different variables. This helps us see how various measurements relate to each other and explore potential distinctions between human-made languages and natural ones,” Hartmann says.

However, as they look forward, computational processing limitations may not be able to handle the amount of data they would need to analyze in a final model. One of the next steps is eliminating any confounding factors that might obscure any base-level differences between natural languages and constructed languages.

Recently, Beeler and Smith presented their research at the DFW Metroplex Linguistics Conference, where they had the opportunity to discuss their work with professionals in the field.

“It was a valuable experience,” Beeler says. “They offered some great suggestions on how we could further this research and how we could clarify our work, such as collecting more natural language data for comparison.”

Smith, a junior, plans to pursue graduate studies in computational linguistics and stay involved in research.

“Working with and learning from faculty members like Dr. Hartmann and Dr. Schuelke on different projects has helped me to find a love for linguistics and a field within it that appeals to me,” Smith says. “I am very grateful for the way that the linguistics faculty has always been ready to support and encourage me to challenge myself and find a path in the world of academia.”

Beeler is graduating this spring and plans to work as a proofreader and editor while she considers which way her passions will lead her.

“UNT faculty have been enormously helpful in supporting me in my plans for the future," she says. “I’m taking a gap year, but assuming I do return, it will be because of the generous support of UNT staff. Particular thanks go to Dr. Hartmann and Dr. Cukor-Avila for all their mentorship and research guidance.”