Page 72 - The-5th-MCAIT2021-eProceeding
P. 72

Analyzing Iraqi Dialects Unique Features for Dialect
                                           Identification


                                        a*
                                                                                  c
                                                                   b
                      Ali Abdulraheem , Lailatul Qadri Zakaria  , Nazlia Omar
           a,b,c  Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, University Kebangsaan
                                   Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia
                                                 a
                                           *Email:  aaj8068@gmail.com

        Abstract
        With the dramatic expansion of textual information, language identification has emerged as a task for analyzing such a huge
        amount of text. Dialect identification is a sub-task of language identification where a particular language and its sub-dialects
        are being addressed. This paper provides a series of features for improving the classification of Iraqi Arabic sub-dialects. It
        makes an effort to resolve the issue of sentence-level fine-grained Iraqi Arabic Dialects Identification of three distinct sub-
        dialects (Baghdadi, Maslawi, and Basrawi). Iraqi Arabic Dialects Recognition is a dynamic process in which other languages
        have common traits, such as having the same character and vocabulary. This paper aims to investigate an extensive space
        of features for identifying Iraqi Arabic sub-dialects by exploring a variety of feature extraction techniques such as (Special
        Character, POS features, Grammatical individual features, Case features, Gender features, Number features), as well as
        Machine learning-based models utilizing Multinomial Naive Bayes (MNB). However, this is the first preliminary analysis
        for Iraqi Arabic sub-dialects, which have not yet been interested in computational linguistics.

        Keywords: Iraqi Arabic; Arabic morphology; Dialectal Arabic


        1. Introduction

           Arabic is one of the world's oldest languages it has been evolving over the decades. Arabic language can be
        classified into three categories: modern standard Arabic (MSA), classical Arabic (CA), and Arabic dialects
        (AD). MSA is formally used in official platforms including educational institutes, television broadcasts, and
        newspapers. CA is the language of the Holy Quran and Hadiths. It can also be viewed as the language of pre-
        Islamic poets. AD is the combination of different Arabic dialects spoken in different Arab countries. Such
        dialects have no written background, and they are formed by accommodating the varying degree of accents used
        in different cultures (Belkredim and Sebai 2009). Arab people use AD more than MSA in their everyday lives.
        AD is different from the CA and MSA in terms of morphology, phonology, lexicon, and syntax (Janet 2007).
        Different  varieties  of  ADs  are  posing  significant  challenges  for  natural  language  processing  tasks  such  as
        sentiment analysis, opinion mining, author profiling, and machine translation.

        2. Related Work and Background

           Arabic is known as a morphologically rich and complex language, which presents significant challenges for
        dialect identification.  Arabic dialect identification is a crucial topic for most Arabic NLP research because of
        the  diversity  of  the  Arabic  dialects.  Some  ADs  in  the  same  country  shared  features  such  as  characters,
        vocabulary, and basic language set making, that amplifies the complexity of the dialect identification task.
           Some studies have used different methods such as game-based theory (Alshutayri& Atwell 2018a; Osman
        et  al.  2016)  to  automatically  identify  dialect  in  Arabic  text.  Bouamor  et  al.  (2019),  proposed  a  simple






        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [59]
        Artificial Intelligence in the 4th Industrial Revolution
   67   68   69   70   71   72   73   74   75   76   77