Page 142 - The-5th-MCAIT2021-eProceeding
P. 142
2. Literature Review
1.1 Halal Food Certification in Malaysia
Nowadays, Muslims have a variety options of halal products, food and beverages and services offered.
However, Nurrachmi (2018) has reported that halal food suppliers mostly came from non-Muslim countries
like New Zealand, Australia, France, and Canada. This has shown that countries with lesser Muslim populations
are well aware of the halal sources. In Malaysia, halal goods are recognized by searching for a halal logo issued
by JAKIM or any other halal certified organization. Besides, a considerable amount of literature regarding halal
food have been published. In a study, it was found that roughly 70% of Muslims all around the world adhere to
at least some of the halal food restrictions (Ahmad et al., 2018).
1.2 Sentiment Analysis
Sentiment is known as the opinions expressed by individuals that contain feelings, attitudes, and thoughts.
Sentiment analysis analyses textual context using natural language processing and classifies it as positive,
negative, or neutral (Hassan 2019). It was broadly applied to analyze how people feel about something based
on their sentiments. According to Chen & Zhang (2018), sentiment analysis generally uses natural language
processing (NLP), text interpretation, machine learning, computational linguistics, and other approaches to
interpret, process and trigger emotionally colored messages. The two most widely used methods to conduct
sentiment analysis is by using machine learning approach or lexicon-based approaches (Sarlan et al. 2015).
● Machine Learning Approach: According to Hasan et al. (2018), machine learning approach was
essentially intended to identify textual content by implementing algorithms like naïve bayes and support
vector machine (SVM). Naïve bayes, deep learning and support vector machine are examples of
supervised machine learning algorithms while k-means is unsupervised algorithms. The goal of
supervised learning is to predicts the final outcome variable using the predictor variable. Moreover,
supervised learning aims at automating time-consuming, or costly manual tasks (Mittal & Patidar 2019).
● Lexicon-based approach: Lexicon-based approaches are part of unsupervised learning algorithms.
Using this approach, the positive and negative words in dictionary will match the words in the tweet.
These techniques, however, depends entirely on lexical resources that are concerned with mapping
words to a score of categorical, or numerical sentiments. Additionally, lexicon-based approaches require
no training data, and depends solely on dictionary. The sentiment lexicon comprises an index of words
and contains the polarity details of the related terms, whether positive or negative. However, the
limitation of lexicon dictionary was, not all words in the sentiment can be assigned with a value (Sarlan
et al. 2015).
3. Methodology
3.1 Data Collection and Pre-processing
Tweets were collected by scrapping from Twitter using TWINT module, using the Twitter search function
related to halal food and restaurant from recent years. This project did not use Twitter API for data collection
as even though it is the most conventional method the extract data from Twitter, it has many limitations like
limited time span and limited access to Twitter server. Total data scraped using TWINT were approximately
72,000. Tweets were also collected using the keywords identified. The dataset consists of details such as Tweet
Id, time and date of tweet, and location of the tweet. Several data pre-processing activities were conducted to
achieve the cleaned data set, such as data transformation, filtering, tokenization, normalization, and application
of N-gram. Duplication of tweets were also performed on dataset using Rapid Miner software. And finally, the
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [129]
Artificial Intelligence in the 4th Industrial Revolution