
Conditional Arabic Light Stemmer: CondLight Yaser Al-Lahham, Khawlah Matarneh, and Mohammad Hassan
Arabic language has a complex morphological structure, which makes it hard to select index terms for an IR system.
The complexity of the Arabic morphology caused by multimode terms, using diacritics, letters have different forms according
to its location in the word and affixes can be added at all locations in a word. Several methods were proposed to overcome
these problems; such as root extraction and light stemming. Light stemming show better retrieval efficiency, Light10 is the best
stemmer among a series of light stemmers, it simply removes suffixes and prefixes if it is listed in a predefined table. Light10
has no restrictions on the affixes, so it is possible to have two different terms having the same token while they have different
meanings. This paper proposes CondLight stemmer which adds new prefixes and suffixes to the table of Light10, and imposes
a set of conditions on removing these affixes. The implementation and testing of the proposed method show that CondLight
gains 38% precision, while Light10 stemmer gains average precision of 36.7%. Moreover CondLight show better average
precision either when imposing all conditions or part of them.
