Page 74 - The-5th-MCAIT2021-eProceeding
P. 74
3.2. Lexicon
The Iraqi dialects are rich in lexicon, as many words are borrowed from the Mesopotamia civilization and
languages from other neighboring countries, such as Turkish, Persian, Assyrian, and other Arabic dialects. for
example, the word "همانزور, rwznAmh /calendar " barrowed from Persian while the word "هكوش اخḫʾašwkh/spoon
" from Turkish.
4. The Development of Iraqi Corpus
Iraqi dialects have increasingly been used in Social media which provide a good source of data for NLP
tasks. We have developed an annotated morphological Iraqi corpus called Al-Rafidayn which contains 3,000
sentences. To develop a morphology corpus in Iraqi dialects, four stages have been adopted. In the first stage,
annotation labeling is done by online web survey. Arabic Buckwalter transliteration and Lemma are used to
recognize the Iraqi Indo-Iranian letters in the second stage. The third stage utilized morphological online tools
to annotate grammatical properties. Finally, the fourth stage is an agreement which was reached with annotators
to correct the error and verify the quality of the previous stages.
5. Methodology
This work aims to identify a set of features to improve Iraqi dialects classification. The study has adopted a
variety of feature extraction and machine learning-based models using Multinomial Naive Bayes (MNB) in
terms of training and testing.
5.1. Features Extraction
Extracting a set of discriminative features from the data helps in distinguishing the different classes. This
study aims to extract specific Iraqi morphological and lexical handcrafted features to distinguish the Iraqi sub-
dialects from the Iraqi annotated morphological corpus. This corpus is developed to include the inflection and
diacritics due to the Iraqi dialects that use them as characters. For example, the word “ِ كل, laki” which means
"is yours", the diacritic kasra is replaced by the letter "ya, ي" to become “يكلا, laky” in the MOS dialect to
express the feminine pronoun, while in BAG and BAS they tend to onvert the letter “kaf and kasra” to the letter
“Jim,ج” to becomes “جلا, laǧ”. This may aid the feature extraction process to utilize a variety of linguistically
motivated feature sets, namely morphological content. The features considered are shown in Table 1.
Table 1. Feature extracted for Iraqi Arabic Dialect
Features Description
Special Character The Special 5 Iraqi letters (Indo-Iranian letter) along with possible derivational inflections
nouns, Number of words, proper nouns, adjectives, adverbs, Number of Pronouns, verbs, particles,
POS features
prepositions, abbreviations, punctuation, conjunctions, interjections, foreign letters.
Case features nominative, accusative, and genitive.
Gender features Feminine and Masculine.
Number features singular words, plural words, and dual words.
Grammatical person features 1st person, 2nd person, 3rd person.
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [61]
Artificial Intelligence in the 4th Industrial Revolution