Daya, Ezra; Roth, Dan; Wintner, Shuly

doi:10.1162/coli.2008.07-002-R1-06-30

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Identifying Semitic Roots: Machine Learning with Linguistic Constraints.
Authors: Daya, Ezra; Roth, Dan; Wintner, Shuly
Abstract: Words in Semitic languages are formed by combining two morphemes: a root and a pattern. The root consists of consonants only, by default three, and the pattern is a combination of vowels and consonants, with non-consecutive "slots" into which the root consonants are inserted. Identifying the root of a given word is an important task, considered to be an essential part of the morphological analysis of Semitic languages, and information on roots is important for linguistics research as well as for practical applications. We present a machine learning approach, augmented by limited linguistic knowledge, to the problem of identifying the roots of Semitic words. Although programs exist which can extract the root of words in Arabic and Hebrew, they are all dependent on labor-intensive construction of large-scale lexicons which are components of full-scale morphological analyzers. The advantage of our method is an automation of this process, avoiding the bottleneck of having to laboriously list the root and pattern of each lexeme in the language. To the best of our knowledge, this is the first application of machine learning to this problem, and one of the few attempts to directly address non-concatenative morphology using machine learning. More generally, our results shed light on the problem of combining classifiers under (linguistically motivated) constraints.
Subjects: SEMITIC languages; AFROASIATIC languages; MACHINE learning; CONSTRAINTS (Linguistics); MORPHOLOGY (Grammar); EDUCATION; WORD stems (Linguistics)
Publication: Computational Linguistics, 2008, Vol 34, Issue 3, p429
ISSN: 0891-2017
Publication type: Article
DOI: 10.1162/coli.2008.07-002-R1-06-30

We found a match

Identifying Semitic Roots: Machine Learning with Linguistic Constraints.

Daya, Ezra; Roth, Dan; Wintner, Shuly

SEMITIC languages; AFROASIATIC languages; MACHINE learning; CONSTRAINTS (Linguistics); MORPHOLOGY (Grammar); EDUCATION; WORD stems (Linguistics)

Computational Linguistics, 2008, Vol 34, Issue 3, p429

0891-2017

Article

10.1162/coli.2008.07-002-R1-06-30