Page 142 - The-5th-MCAIT2021-eProceeding
P. 142

2.  Literature Review

        1.1  Halal Food Certification in Malaysia

           Nowadays, Muslims have a variety options of halal products, food and beverages and services offered.
        However, Nurrachmi (2018) has reported that halal food suppliers mostly came from non-Muslim countries
        like New Zealand, Australia, France, and Canada. This has shown that countries with lesser Muslim populations
        are well aware of the halal sources. In Malaysia, halal goods are recognized by searching for a halal logo issued
        by JAKIM or any other halal certified organization. Besides, a considerable amount of literature regarding halal
        food have been published. In a study, it was found that roughly 70% of Muslims all around the world adhere to
        at least some of the halal food restrictions (Ahmad et al., 2018).

        1.2   Sentiment Analysis

           Sentiment is known as the opinions expressed by individuals that contain feelings, attitudes, and thoughts.
        Sentiment analysis analyses textual context  using natural language processing and classifies it as positive,
        negative, or neutral (Hassan 2019). It was broadly applied to analyze how people feel about something based
        on their sentiments. According to Chen & Zhang (2018), sentiment analysis generally uses natural language
        processing (NLP), text interpretation, machine learning, computational linguistics, and other approaches to
        interpret, process and trigger emotionally colored messages. The two most widely used methods to conduct
        sentiment analysis is by using machine learning approach or lexicon-based approaches (Sarlan et al. 2015).
           ●  Machine  Learning  Approach:  According  to  Hasan  et  al.  (2018),  machine  learning  approach  was
              essentially intended to identify textual content by implementing algorithms like naïve bayes and support
              vector  machine  (SVM).  Naïve  bayes,  deep  learning  and  support  vector  machine  are  examples  of
              supervised  machine  learning  algorithms  while  k-means  is  unsupervised  algorithms.  The  goal  of
              supervised learning is to predicts the final outcome variable using the predictor variable. Moreover,
              supervised learning aims at automating time-consuming, or costly manual tasks (Mittal & Patidar 2019).
           ●  Lexicon-based  approach:  Lexicon-based  approaches  are  part  of  unsupervised  learning  algorithms.
              Using this approach, the positive and negative words in dictionary will match the words in the tweet.
              These techniques,  however,  depends entirely on lexical resources that are concerned  with  mapping
              words to a score of categorical, or numerical sentiments. Additionally, lexicon-based approaches require
              no training data, and depends solely on dictionary. The sentiment lexicon comprises an index of words
              and  contains  the  polarity  details  of  the  related  terms,  whether  positive  or  negative.  However,  the
              limitation of lexicon dictionary was, not all words in the sentiment can be assigned with a value (Sarlan
              et al. 2015).

        3.  Methodology

        3.1  Data Collection and Pre-processing

           Tweets were collected by scrapping from Twitter using TWINT module, using the Twitter search function
        related to halal food and restaurant from recent years. This project did not use Twitter API for data collection
        as even though it is the most conventional method the extract data from Twitter, it has many limitations like
        limited time span and limited access to Twitter server. Total data scraped using TWINT were approximately
        72,000. Tweets were also collected using the keywords identified. The dataset consists of details such as Tweet
        Id, time and date of tweet, and location of the tweet. Several data pre-processing activities were conducted to
        achieve the cleaned data set, such as data transformation, filtering, tokenization, normalization, and application
        of N-gram. Duplication of tweets were also performed on dataset using Rapid Miner software. And finally, the







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [129]
        Artificial Intelligence in the 4th Industrial Revolution
   137   138   139   140   141   142   143   144   145   146   147