Page 188 - The-5th-MCAIT2021-eProceeding
P. 188
2. Method
A literature study was conducted to identify and map the results of previous studies related to certain literature
themes. In addition, a good literature study will produce a map of knowledge about a research topic that can
guide researchers to dig deeper into areas that are not yet mature (Fisch & Block, 2018). The literature data in
this study were collected through the use of Google Scholar with the keyword "Named Entity Recognition".
The literature with the topic of introducing named-entity was then selected according to several factors, namely:
1) The approach used only focuses on the machine learning approach, and 2) The publication year of the
literature obtained should be from the year 2018 to 2020. The results of the gradual selection resulted in seven
pieces of literature which will be used as materials for comparison.
3. Results and Discussion
The process of analyzing and extracting large amounts of unstructured text or documents using Artificial
Intelligence algorithms is often referred to as text mining. One part of text mining is the process of recognizing
named-entities that can be used in various fields such as economy, health, social, politics, or culture. Based on
the seven pieces of literature analyzed in this study, six pieces of literature apply the introduction of the main
entity in the health sector, especially in the field of biomedicine and medicine. On the other hand, Wintaka’s
research used data taken from Twitter social media to identify the entity's name, location name, and organization
name (Wintaka et al., 2019). The pieces of literature used in this research are shown in Table 1.
The health sector, especially the pharmaceutical industry, requires research on the introduction of named-
entities, especially the medicine entities. The influence of a particular medicine with other medicines is closely
monitored by the pharmaceutical industry in order to maintain patient safety from side effects caused by drug
interactions (Chukwuocha et al., 2018). The biomedical field also has a very large corpus and requires
information extraction to reduce the ambiguity due to several different entities that have the same acronym.
Furthermore, several biomedical entities have inconsistent use of prefixes and suffixes (Cho et al., 2020).
Table 1. Literature Review Data based on Dataset
No Ref Year Object / Dataset Machine learning
1 (Chukwuocha et al., 2018 Medicine names / PubMed dataset Conditional Random Field (CRF), and
2018) Naive Bayes (NB)
2 (Phan et al., 2019) 2019 Biomedical texts / BioNLP 2004 Convolutional Neural Network (CNN), and
Challenge dataset Recurrent Neural Network (RNN)
3 (Casillas et al., 2019) 2019 Medical Online Corpus (GEN-MED) Bidirectional Long Short-Term Memory
IXAMed Spanish EHR Corpus (EHR) (Bi-LSTM), and
Conditional Random Field (CRF)
4 (Suárez-Paniagua et 2019 eHealth-KD dataset Bidirectional Long Short-Term Memory
al., 2019) (Bi-LSTM), and
Conditional Random Field (CRF)
5 (Wintaka et al., 2019) 2019 600 manually-labeled tweets in Bahasa Bidirectional Long Short-Term Memory
Indonesia from Twitter social media (Bi-LSTM), and
Support Vector Machine (SVM)
6 (Gligic et al., 2019) 2019 Informatics for Integrating Biology & Forwards Neural Network (FFN), and
the Bedside – i2b2 dataset (2007-2012) Recurrent Neural Network (RNN), and
Bidirectional Long Short-Term Memory
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [175]
Artificial Intelligence in the 4th Industrial Revolution