Page 217 - The-5th-MCAIT2021-eProceeding
P. 217

to do entity labeling on unlabeled data with data from DBpedia. For testing, they use the SVM algorithm for
        labeled data modeling. The results obtained in this study are a precision value of 73.6%, a recall value of 80.1%,
        and an F1 of 76.5%.
           Bhasuran et al. (2016) have proposed a biomedical NER based on a stacked ensemble approach. The authors
        applied several domain-specific, morphological, orthographical, and contextual features, Conditional Random
        Fields (CRF) based modeling, and two fuzzy matching algorithms for extracting disease-named entities. Some
        post-processing measures are also applied to enhance the performance of the model.
           Salleh et al. (2017) propose that the Malay language NER uses the Python CRFsuite and several features.
        The feature such as capitalization, lowercase, previous and closest words, digits, word forms, and word POS
        tags, and others show the potential for increasing the accuracy of results from recognizing named entities Malay.
        Salleh et al. (2018) proposed Malay NER using the fuzzy c-means method with the Rapid Miner software and
        dataset from Bernama Malay news. The types of named entities analyzed are person, location, organization,
        and facility. In conclusion, the overall percentage accuracy gave markedly good results based on clustering
        matching with 98.57%.

        2.3.  Hybrid Named Entity Recognizers

           A hybrid Named Entity recognition system combines both rule-based and machine learning techniques.
        These  new  methods  combine  the  strongest  points  from  each  method:  the  adaptability  and  flexibility  from
        machine learning approaches and rules to improve efficiency. Keretna et al. (2014) present a hybrid model
        comprising the rule-based and lexicon-based techniques for extracting drug Named entity from the informal
        and unstructured medical text.  The experimental outcome indicates that integrating many valuable rules into a
        lexicon-based  technique  can  enhance  the  performance  of  the  BioNER  problem.  The  proposed  model  can
        achieve an f-score of 66.97%.
           Munkhjargal et al. (2015) have introduced a Mongolian named entity recognizer. The authors used statistical
        techniques, namely Maximum Entropy, SVM, CRF, gazetteers, and string matching patterns, to handle the
        vocabulary words. The optimal ensemble reached 90.59% precision, 85.88% recall, and 88.17% F1 score.

        3.  Issues and Challenges in Malay Named Entity Recognition

           Most of the documents on a website are unstructured, making it difficult to get the relevant information in
        structured data. Information extraction is the process of converting unstructured data into structured data. Thus
        the extraction of named entities is a challenging task. Apart from the techniques used, several factors affect
        NER tasks' performance, such as language factors, domain factors, entity type factors, etc. Several researchers
        have researched the Malay language NER. Most of the research of NER in Malay uses a Rule-based approach
        and a supervised system approach (Nadia & Omar, 2019).
           The  NER  system's  performance  is  highly  dependent  on  some  language  resources  such  as  POS  tagger,
        morphological analyzer, chunker, parser, etc. The Malay language has some similarities with English features
        such as capitalization and word POS tag such as proper noun to recognize the entity (Morsidi et al., 2016). The
        supervised named entity recognition system requires large annotated corporations to classify named entities
        from the test data.  The challenge because the Malay language corpus is still limited compared to the English
        corpus.  The  Malay  language  NER  cannot  use  the  English  corpus  because  there  are  differences  in  speech
        structure and morphology between English and Malay (Nadia & Omar, 2019).
        Domain  factors  have  a  significant  influence  on  the  Named  Entity  Recognition  task.  Various  domains  are
        explored for NER assignments, such as news articles, crime, medical, etc.










        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [202]
        Artificial Intelligence in the 4th Industrial Revolution
   212   213   214   215   216   217   218   219   220   221   222