Page 143 - The-5th-MCAIT2021-eProceeding
P. 143

dataset was labeled to positive, neutral and negative sentiments based on the polarity score through Rapid Miner
        software as well.

        3.2  Modelling and System Development

           Three machine learning classifiers were applied to compare the accuracy and the performance results using
        RapidMiner software. The classifiers chosen were SVM, Deep Learning, and Naïve Bayes where the dataset
        were split to a 90:10 ratio. Performance metrics taken into account for this research were accuracy and f1-score.
        Subsequently, the results were published through a dashboard created using Power BI software and through a
        web-based system named Halalopedia created using PHP and HTML language.

        4.  Results and Discussion

        4.1  Machine Learning Classifiers Performance

           In our results, SVM and Naïve Bayes works better with Vader approach and [1,3] N-grams range with
        68.44% and 67.03% accuracy respectively. Meanwhile, deep learning works best with SentiWordNet approach
        on  [1,2]  N-grams  range.  Besides,  roughly,  [1,2]  range  gives  out  the  lowest  accuracy  except  for  SVM  +
        SentiWordNet and NB + SentiWordNet. By comparing the accuracy for each model, deep learning achieved
        the highest accuracy with 73.18% using [1,2] N-grams range and SentiWordNet. Contradictory, SVM and naïve
        bayes achieved highest f1-score when it is used with SentiWordNet approach with 56.28% and 57% respectively
        whereas deep learning scored highest using Vader approach with 58.16%. However, all the highest f1-score for
        each classifier is achieved when performed on [2,3] N-grams range. Since deep learning achieved highest for
        both accuracy and f1-score metrics, it can be concluded that deep learning classifier is the best model for this
        project.
           To summarize, deep learning is found to produce better accuracy compared to SVM and naïve bayes with
        highest accuracy of 73.18%. Besides, deep learning results are also much better when used with SentiWordNet
        approach. The only downside of deep learning is that the processing time took the longest to finish which
        approximately around 20 minutes while other classifiers is between 10-15 minutes. Additionally, it is also
        proven that deep learning can overcome short texts dataset problem. However, the dataset used in this project
        are imbalanced for each class label from the confusion matrix. Therefore, for this case, accuracy might not be
        the best performance measure. The accuracy on imbalanced data have mislead the performance of the sentiment
        analysis. Hence, f1-score is a good metric when the data is imbalanced as it considers the precision and recall
        value of the data. Considering the f1-score, the highest score is considered to be the best model for imbalanced
        data as it can predict better on multiclass classification.
        4.2  Results discussion

           Based on the experiments done, it can be seen how pre-processing phase is very important and need to be
        done thoroughly. This is because, in Twitter there are many slang, dialect and short form words are used.
        Nevertheless, there are still some slang and Malay words that cannot be detected during data pre-processing,
        hence contributing to one of the reasons on why this project does not get high accuracy above 80%. Therefore,
        variety range of N-grams are used in this project to improve the sentiment analysis performance.

        5.  Conclusion and Future Works

           In conclusion, this project managed to accomplish all the research objectives stated earlier. The result of
        sentiment analysis is able to analyze and determine the tweets on food reviews into halal, non-halal and pork-







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [130]
        Artificial Intelligence in the 4th Industrial Revolution
   138   139   140   141   142   143   144   145   146   147   148