The UniMorph Project
Resources for Processing Complex Morphology in Many Languages with Human Language Technology (NLP, MT, IE)
As a postdoctoral fellow at the Center for Language and Speech Processing at Johns Hopkins, my current research is in computational linguistics, and focuses on improving how human language technology (including machine translation and natural language processing technology) is able to process languages with rich surface morphology, such as Russian, Arabic, Turkish, and many others. I have collaborated in developing a language-independent, universal framework for annotating inflectional morphology called the UniMorph Schema, and it has been used to richly annotate data from the 350 languages on the English edition of Wiktionary. For more details, see the UniMorph project page. Recently, I was a collaborator on the 2016 ACL SIGMORPHON Shared Task on morphological reinflection, which led to the development of better technology for automatically generating inflected morphological word forms in 10 typologically diverse languages.
Apart from my work in computational morphology, my research is in theoretical and descriptive linguistics, namely phonetics, phonology, and morphology. My dissertation developed a novel framework for deriving (rather than representationally specifying) natural classes based on phonetic connections between phonemes. The framework is able to effectively derive natural classes composed of the post-velar consonants (e.g. uvular, pharyngeal, and glottal consonants, which occur in Semitic languages like Arabic, North American languages, and Caucasian languages, among others). Previous theoretical frameworks were unable to effectively derive these natural classes, particularly across the full range of languages in which they occur. In addition, the dissertation contributed a novel survey of 291 languages.
Resources for Processing Complex Morphology in Many Languages with Human Language Technology (NLP, MT, IE)
14+ Million Inflected Forms from ~1 Million Lexemes/Lemmas from 350 Languages
Language-independent, Typologically-informed Annotation Scheme for Inflectional Morphology
The SIGMORPHON 2016 Shared Task on Morphological Reinflection
Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms
Contrastive Morphological Typology and Logical Hierarchies
I'd love to hear from you!