Page 75 - The-5th-MCAIT2021-eProceeding
P. 75

5.2. Feature selection


           Feature selection is an optimization technique that narrows down the feature space by selecting a subset of
        the original set's most important features. In this work, the Random Forest algorithm is used to select the top
        features. It is an ensemble learning algorithm based on combining a number of de-correlated decision trees in
        which the tree-based structure is naturally used to rank the features.

        5.3. Classification

           This study performs multi-class classification in the experiments. In particular, Multinomial Naive Bayes
        (MNB), is a variation of Naive Bayes that estimates the conditional probability of a token given its class as the
        relative frequency of the token t in all documents to class c. MNB has proven to be suitable for classification
        tasks with discrete features (e.g. Word or character counts or representation for text classification) (Manning et
        al. 2008).

        6. Conclusion

           This study aims to identify the Iraqi Arabic dialects. To achieve this goal, an annotated morphosyntactic
        Iraqi dialects corpus includes three main dialects in Iraq (BAG, MOS, and BAS) has been created. Then, this
        corpus was used to train the proposed approach to extract features along with an MNB to identify the sub-
        dialects in the Iraqi dialect. For future directions, carrying out the experiments and analyzing the obtained results
        would be our next interest for determining the best subset of features.


        Acknowledgements

           This publication was supported by the Universiti Kebangsaan Malaysia (UKM) under GGP-2020-041.



        References
        Alshutayri, A. & Atwell, E. 2018a. Creating an Arabic Dialect Text Corpus by Exploring Twitter, Facebook,
        and Online Newspapers (May). Retrieved from http://eprints.whiterose.ac.uk/128607/
        Alshutayri, A. & Atwell, E. 2018b. Creating an Arabic dialect text corpus by exploring Twitter, Facebook,
        and online newspapers. OSACT 3 Proceedings. LREC.
        Bouamor, H., Hassan, S. & Habash, N. 2019. The MADAR shared task on Arabic fine-grained dialect
        identification. Proceedings of the Fourth Arabic Natural Language Processing Workshop, hlm. 199–207.
        El-Haj, M., Rayson, P. & Aboelezz, M. 2019. Arabic dialect identification in the context of bivalency and
        code-switching. LREC 2018 - 11th International Conference on Language Resources and Evaluation 3622–
        3627.
        Eltanbouly, S., Bashendy, M. & Elsayed, T. 2019. Simple But Not Naïve: Fine-Grained Arabic Dialect
        Identification Using Only N-Grams 214–218. doi:10.18653/v1/w19-4624
        Ibrahim, H.S., Abdou, S.M. and Gheith, M., 2015. Sentiment analysis for modern standard Arabic and
        colloquial. arXiv preprint arXiv:1505.03105.
        Khoshaba, M. P. 2006. Iraqi dialect versus standard Arabic. Medius Corporation.
        Kwaik, K. A., Saad, M., Chatzikyriakidis, S. & Dobnik, S. 2019. Shami: A corpus of levantine Arabic
        dialects. LREC 2018 - 11th International Conference on Language Resources and Evaluation 3645–3652.







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [62]
        Artificial Intelligence in the 4th Industrial Revolution
   70   71   72   73   74   75   76   77   78   79   80