Page 216 - The-5th-MCAIT2021-eProceeding
P. 216

This paper is organized as follows. The second section of this paper presents the Literature Review of NER.
        This  section  will  discuss  the  approaches  that  the  previous  researcher  has  done.  The  third  section  gives  an
        overview of issues and challenges in Malay Named entity recognition.

        2.  Literature Review

           Named entity recognition  has three approaches classified into three  main streams: rule-based,  machine
        learning approaches, and hybrid approaches.

        2.1.  Rule-Based Named Entity Recognizers
           Rule-based and dictionary-based methods are the earliest methods used in NER (Ji et al., 2019). They rely
        on  handcrafted  rules,  use  named  entity  libraries,  and  assign  weights  to  each  rule.  When  a  rule  conflict  is
        encountered, the rule with the highest value is selected to determine the named entity type. However, these rules
        often depend on the specific language, domain, and text style (Ji et al., 2019). Alfred et al. in 2014 have proposed
        the Malay-named entity using Malay articles. His approach is based on a rule-based part of speech (POS)
        tagging process and contextual feature rules. Some dictionaries are also manually created to detect three named
        entities: a person, location, and organization. Evaluation using standard performance metrics has shown where
        Recall of 94.44%, Precision of 85%, and F-score of 89.47%.
           Wulandari et al. (2018) have conducted research related to NER on biological documents using rule-based
        and naïve Bayes classifier methods. This study used 19 training documents. The document was processed and
        annotated manually based on NEs and obtained 1,135 training data in the form of words. The pre-processing of
        data includes POS-tagging and n-gram. From the combination of rule-based and naïve Bayes methods, this
        study obtained an average Precision, Recall, and F-measure of 0.8 with a micro average.
           In Malay, research related to criminal news documents was conducted by Saad and Mansor (2018). This
        research builds a crime news corpus sourced from BERNAMA news. Linguists manually check the corpus to
        identify name entities such as individuals, organizations, locations, dates, times, finances, percentages, crimes,
        and weapons. This prototype system's testing shows good promising results with a Recall value of 78.67%,
        Precision 71.11%, and F-measure 74.7%.
           Nadia and Omar (2019) proposed Malay NER Using Rules-Based. This Research Identifies Name Entities
        Involving  Nine  Name  Entities:  Individual  Name,  Location,  Organization,  Position,  Date,  Time,  Finance,
        Measurement, And Percentage. This test shows promising results with a Recall value of 92.13%, a Precision
        value of 90.23%, and an F-Score of 91.05%.

        2.2.  Machine Learning-Based Named Entity Recognizers

           The Machine Learning method is used to classify and uses a statistical classification model to recognize
        named entities. This method looks for patterns and their relationship in a text, tries to create models with a
        statistical approach and machine learning algorithms, and identifies and classifies nouns into several classes,
        such as a person, location, and time (Jurafsky and Martin, 2017).   Surwaningsih et al. (2014) conducted a study
        on Indonesian Medical Named Entity Recognition (ImNER) utilizing a Support Vector Machine (SVM).  They
        used data in the form of 3,000 sentences taken randomly. The accuracy value obtained is 90%. Their research
        uses data on word types, contextual characteristics of words, word writing systems, and common word lists.
        Apart from that, they also make use of medical-related word lists.
           Aryoyudanta et al. (2016) use the Co-Training algorithm to empower unlabeled data to obtain new labeled
        data. This study uses news articles as unlabeled data and Dbpedia as labeled data. This research's initial stage
        is to perform text-processing on unlabeled data, which POS-Tagging then follows. The purpose of POS-Tagging
        is to look for words that are most likely to have named entities. Furthermore, they use the Co-Training algorithm






        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [201]
        Artificial Intelligence in the 4th Industrial Revolution
   211   212   213   214   215   216   217   218   219   220   221