Page 216 - The-5th-MCAIT2021-eProceeding
P. 216
This paper is organized as follows. The second section of this paper presents the Literature Review of NER.
This section will discuss the approaches that the previous researcher has done. The third section gives an
overview of issues and challenges in Malay Named entity recognition.
2. Literature Review
Named entity recognition has three approaches classified into three main streams: rule-based, machine
learning approaches, and hybrid approaches.
2.1. Rule-Based Named Entity Recognizers
Rule-based and dictionary-based methods are the earliest methods used in NER (Ji et al., 2019). They rely
on handcrafted rules, use named entity libraries, and assign weights to each rule. When a rule conflict is
encountered, the rule with the highest value is selected to determine the named entity type. However, these rules
often depend on the specific language, domain, and text style (Ji et al., 2019). Alfred et al. in 2014 have proposed
the Malay-named entity using Malay articles. His approach is based on a rule-based part of speech (POS)
tagging process and contextual feature rules. Some dictionaries are also manually created to detect three named
entities: a person, location, and organization. Evaluation using standard performance metrics has shown where
Recall of 94.44%, Precision of 85%, and F-score of 89.47%.
Wulandari et al. (2018) have conducted research related to NER on biological documents using rule-based
and naïve Bayes classifier methods. This study used 19 training documents. The document was processed and
annotated manually based on NEs and obtained 1,135 training data in the form of words. The pre-processing of
data includes POS-tagging and n-gram. From the combination of rule-based and naïve Bayes methods, this
study obtained an average Precision, Recall, and F-measure of 0.8 with a micro average.
In Malay, research related to criminal news documents was conducted by Saad and Mansor (2018). This
research builds a crime news corpus sourced from BERNAMA news. Linguists manually check the corpus to
identify name entities such as individuals, organizations, locations, dates, times, finances, percentages, crimes,
and weapons. This prototype system's testing shows good promising results with a Recall value of 78.67%,
Precision 71.11%, and F-measure 74.7%.
Nadia and Omar (2019) proposed Malay NER Using Rules-Based. This Research Identifies Name Entities
Involving Nine Name Entities: Individual Name, Location, Organization, Position, Date, Time, Finance,
Measurement, And Percentage. This test shows promising results with a Recall value of 92.13%, a Precision
value of 90.23%, and an F-Score of 91.05%.
2.2. Machine Learning-Based Named Entity Recognizers
The Machine Learning method is used to classify and uses a statistical classification model to recognize
named entities. This method looks for patterns and their relationship in a text, tries to create models with a
statistical approach and machine learning algorithms, and identifies and classifies nouns into several classes,
such as a person, location, and time (Jurafsky and Martin, 2017). Surwaningsih et al. (2014) conducted a study
on Indonesian Medical Named Entity Recognition (ImNER) utilizing a Support Vector Machine (SVM). They
used data in the form of 3,000 sentences taken randomly. The accuracy value obtained is 90%. Their research
uses data on word types, contextual characteristics of words, word writing systems, and common word lists.
Apart from that, they also make use of medical-related word lists.
Aryoyudanta et al. (2016) use the Co-Training algorithm to empower unlabeled data to obtain new labeled
data. This study uses news articles as unlabeled data and Dbpedia as labeled data. This research's initial stage
is to perform text-processing on unlabeled data, which POS-Tagging then follows. The purpose of POS-Tagging
is to look for words that are most likely to have named entities. Furthermore, they use the Co-Training algorithm
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [201]
Artificial Intelligence in the 4th Industrial Revolution