Page 75 - The-5th-MCAIT2021-eProceeding

P. 75

5.2. Feature selection

Feature selection is an optimization technique that narrows down the feature space by selecting a subset of
the original set's most important features. In this work, the Random Forest algorithm is used to select the top
features. It is an ensemble learning algorithm based on combining a number of de-correlated decision trees in
which the tree-based structure is naturally used to rank the features.

5.3. Classification

This study performs multi-class classification in the experiments. In particular, Multinomial Naive Bayes
(MNB), is a variation of Naive Bayes that estimates the conditional probability of a token given its class as the
relative frequency of the token t in all documents to class c. MNB has proven to be suitable for classification
tasks with discrete features (e.g. Word or character counts or representation for text classification) (Manning et
al. 2008).

6. Conclusion

This study aims to identify the Iraqi Arabic dialects. To achieve this goal, an annotated morphosyntactic
Iraqi dialects corpus includes three main dialects in Iraq (BAG, MOS, and BAS) has been created. Then, this
corpus was used to train the proposed approach to extract features along with an MNB to identify the sub-
dialects in the Iraqi dialect. For future directions, carrying out the experiments and analyzing the obtained results
would be our next interest for determining the best subset of features.

Acknowledgements

This publication was supported by the Universiti Kebangsaan Malaysia (UKM) under GGP-2020-041.

References
Alshutayri, A. & Atwell, E. 2018a. Creating an Arabic Dialect Text Corpus by Exploring Twitter, Facebook,
and Online Newspapers (May). Retrieved from http://eprints.whiterose.ac.uk/128607/
Alshutayri, A. & Atwell, E. 2018b. Creating an Arabic dialect text corpus by exploring Twitter, Facebook,
and online newspapers. OSACT 3 Proceedings. LREC.
Bouamor, H., Hassan, S. & Habash, N. 2019. The MADAR shared task on Arabic fine-grained dialect
identification. Proceedings of the Fourth Arabic Natural Language Processing Workshop, hlm. 199–207.
El-Haj, M., Rayson, P. & Aboelezz, M. 2019. Arabic dialect identification in the context of bivalency and
code-switching. LREC 2018 - 11th International Conference on Language Resources and Evaluation 3622–
3627.
Eltanbouly, S., Bashendy, M. & Elsayed, T. 2019. Simple But Not Naïve: Fine-Grained Arabic Dialect
Identification Using Only N-Grams 214–218. doi:10.18653/v1/w19-4624
Ibrahim, H.S., Abdou, S.M. and Gheith, M., 2015. Sentiment analysis for modern standard Arabic and
colloquial. arXiv preprint arXiv:1505.03105.
Khoshaba, M. P. 2006. Iraqi dialect versus standard Arabic. Medius Corporation.
Kwaik, K. A., Saad, M., Chatzikyriakidis, S. & Dobnik, S. 2019. Shami: A corpus of levantine Arabic
dialects. LREC 2018 - 11th International Conference on Language Resources and Evaluation 3645–3652.

E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [62]
Artificial Intelligence in the 4th Industrial Revolution

70 71 72 73 74 75 76 77 78 79 80