Processing Language: Kenji Sagae

By Alan Wong - For decades, linguists, philosophers, psychologists, anthropologists and more have explored how language "works". Today, computational linguists like Kenji Sagae are using cutting-edge techniques to ask fundamental questions about the nature of language.

Like all linguists, computational linguists are interested in human language. But the methods they use to study natural language set them apart. They study language in the context of computation, which often involves programming machines to do some sort of Natural Language Processing (NLP).

Kenji Sagae joined UC Davis in 2016 as a computational linguist. His research asks fundamental questions. For instance: how do human cognitive capacities shape the form of language? What features are shared among all human languages? Hypotheses aimed at answering these sorts of questions, which are shared by many language researchers, can be framed in formal ways, and implemented and tested using computational methods.

Many issues that fascinate computational linguists may not interest their more 'traditional' counterparts. (A typical sociolinguist, for example, likely won't be concerned if online processing of a sentence is accomplished in polynomial time according to a particular model of syntax.) But the results derived from computational methods can be fascinating to all.

Interdisciplinary beginnings

From the beginning of his career, Sagae has collaborated with researchers from many fields, including linguistics, computer science, and informatics. This is not unusual, Sagae says, since computational linguistics exists at a point of intersection between various topics which—for various historical reasons—happen to be divided among different academic departments.

In 2006, Sagae joined the Tsuji Laboratory at the University of Tokyo, combining linguistic theory and NLP with practical applications in bioinformatics and biomedical research. From 2008 to 2015, he worked at the University of Southern California as a research assistant professor in the Department of Computer Science, and as a research scientist and project leader at the Institute for Creative Technologies. At USC, his research ranged from extracting semantics from natural language data to analyzing user interfaces for healthcare information and support.

In 2015, Sagae co-founded KITT.AI, a startup that produced a visual toolkit to allow people to build natural language systems. In this project, he was able to directly apply his knowledge in linguistics, computation, and psychology.

Diverse approaches

One of the major challenges of NLP is representing human language in a way that can be processed by computers. Computational approaches to studying language require researchers to be very explicit about the language in their models. However, there is currently no widespread agreement about the fundamental organization and structure of language. When it comes to the theoretical adequacy of their models, then, computational linguists must often settle for compromise.

A given model of language may have strengths in one area, such as expressive power, while being weak in another, such as simplicity or intuitive appeal. By taking a more pragmatic, flexible approach to the study of language, computational linguists are able to conduct rigorous evaluations. Comparing competing models in this way has allowed computational linguists to make particularly rapid progress in recent years.

Sagae has worked with many theories of grammar, and sees the diversity of approaches possible in computational linguistics as a strength of his sub-field. Without becoming particularly attached to any one theory of language, computational linguists are able to objectively evaluate which theories do a better job at certain objectives. These objectives may be linguistic (correctly attaching adjuncts, for example, or resolving long-distance dependencies) or computational (creating a scalable, generalizable solution). 

This pragmatic approach follows the old aphorism that All models are wrong, but some are more useful than others. Sagae's multi-pronged approach to studying natural language reflects an appreciation for using a wide array of strategies to answer big research questions. 

Leveraging new technologies

Computational linguists are relatively scarce. This is likely due to the vast body of working knowledge necessary not only to understand problems, but to engineer workable, testable models that stand a chance of achieving progress. It is not uncommon for a computational linguist to be all at once a computer scientist, linguist, and engineer. In addition to earning the aforementioned labels through his NLP work, Sagae has conducted research on child language acquisition and using NLP technologies in biomedical research and commercial applications.

It is no surprise, then, that skilled computational linguists are in great demand, both in academia and private industry. Sagae notes that many of his former colleagues and collaborators have since left academia for the corporate world, where more data and computing power are available.

Nevertheless, an academic institution such as UC Davis offers many resources not easily accessible to private-sector researchers—proximity to other researchers from many fields, for example. Universities also offer unique opportunities for studying topics—child language acquisition, say—not necessarily of great interest to companies like Facebook and Google.

Perhaps more so than researchers in any other sub-field of linguistics, computational linguists are able to leverage new technologies and data in both theoretically motivated and practically applicable ways. Sagae's research has already shown how theoretical work and practical applications may mutually inform one another to help us answer old questions, and find new problems to solve.

Learn more about Kenji Sagae.

Filed under: