Page 215 - The-5th-MCAIT2021-eProceeding
P. 215

Review of Malay Named Entity Recognition




                            Hafsah , Saidah Saad , Lailatul Qadri Zakaria
                                   a*
                                                   b
                                                                            c
          a,b,c  Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 UKM Bangi,
                                             Selangor, Malaysia
                                        *Email: p93826@siswa.ukm.edu.my

        Abstract

        Named Entity Recognition (NER) is a technique used extensively to extract useful information from unstructured natural
        language  document  collections.  Named-Entity  Recognition  has  important  information  extraction  tasks  that  should  be
        developed for all languages in the world and almost all domains. Most of the research on NER has been done for English
        languages. The Malay language NER cannot use the English corpus because there are differences in speech structure and
        morphology between English and Malay. Based on discussion shows that the research of Malay named entity still in the
        early stage.

        Keywords:  Named Entity Recognition; Natural Language Processing; Malay language; NER approach



        1.  Introduction

           Named-Entity Recognition (NER) is a sub-part of Natural Language Processing (NLP) research which is
        included in the field of Artificial Intelligence (AI). Named Entity Recognition (NER) is the initial step in
        information  extraction  that  seeks  to  find  and  classify  entities  mentioned  in  the  text  into  predetermined
        categories, such as the name of the person, organization, location, expression, time, amount, monetary value,
        percentage, etc. (Saad & Mansor, 2018).
           Currently, research related to NER has been carried out for various purposes and the methods used. The
        methods used also vary, from rule-based to the use of Machine Learning (ML) (Saad & Mansor, 2018). The
        rule-based approach uses defined rules based on linguistic knowledge with analysis carried out at the syntactic
        and semantic levels (Goyal et al., 2018). This method has limitations because we have to define as many rules
        as possible to get optimal results (Nadia & Omar, 2019). To overcome these limitations, we can use the ML
        approach to study patterns from the data by only providing sufficient data sets (Salini et al., 2017).
           An approach that is also widely used recently is to use deep learning (DL) to recognize patterns of entities
        in sentences (Li et al., 2020). Named-Entity Recognition has important information extraction tasks that should
        be developed for all languages in the world and almost all domains. However, these tasks differ according to
        language, domain, and systems development approach (Patil et al., 2019).
           Most of the Named Entity Recognition research focuses on English as well as European languages. But
        along with the development of research in this field, more and more types of languages have been researched.
        English  and  Japanese  are  well  explored  in  MUC-6  [5]  and  earlier  works.  German,  Dutch  and  Spanish  is
        discussed at the CONLL conference. Chinese is studied in an abundant literary language as well as French,
        Greek  and  Italian.  Arabic  has  started  to  receive  a  lot  of  attention  in  large-scale  projects  such  as  Global
        Autonomous Language Exploitation (GALE). In time, Asian and several other languages were also considered
        (Goyal et al., 2018).









        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [200]
        Artificial Intelligence in the 4th Industrial Revolution
   210   211   212   213   214   215   216   217   218   219   220