Page 143 - The-5th-MCAIT2021-eProceeding
P. 143
dataset was labeled to positive, neutral and negative sentiments based on the polarity score through Rapid Miner
software as well.
3.2 Modelling and System Development
Three machine learning classifiers were applied to compare the accuracy and the performance results using
RapidMiner software. The classifiers chosen were SVM, Deep Learning, and Naïve Bayes where the dataset
were split to a 90:10 ratio. Performance metrics taken into account for this research were accuracy and f1-score.
Subsequently, the results were published through a dashboard created using Power BI software and through a
web-based system named Halalopedia created using PHP and HTML language.
4. Results and Discussion
4.1 Machine Learning Classifiers Performance
In our results, SVM and Naïve Bayes works better with Vader approach and [1,3] N-grams range with
68.44% and 67.03% accuracy respectively. Meanwhile, deep learning works best with SentiWordNet approach
on [1,2] N-grams range. Besides, roughly, [1,2] range gives out the lowest accuracy except for SVM +
SentiWordNet and NB + SentiWordNet. By comparing the accuracy for each model, deep learning achieved
the highest accuracy with 73.18% using [1,2] N-grams range and SentiWordNet. Contradictory, SVM and naïve
bayes achieved highest f1-score when it is used with SentiWordNet approach with 56.28% and 57% respectively
whereas deep learning scored highest using Vader approach with 58.16%. However, all the highest f1-score for
each classifier is achieved when performed on [2,3] N-grams range. Since deep learning achieved highest for
both accuracy and f1-score metrics, it can be concluded that deep learning classifier is the best model for this
project.
To summarize, deep learning is found to produce better accuracy compared to SVM and naïve bayes with
highest accuracy of 73.18%. Besides, deep learning results are also much better when used with SentiWordNet
approach. The only downside of deep learning is that the processing time took the longest to finish which
approximately around 20 minutes while other classifiers is between 10-15 minutes. Additionally, it is also
proven that deep learning can overcome short texts dataset problem. However, the dataset used in this project
are imbalanced for each class label from the confusion matrix. Therefore, for this case, accuracy might not be
the best performance measure. The accuracy on imbalanced data have mislead the performance of the sentiment
analysis. Hence, f1-score is a good metric when the data is imbalanced as it considers the precision and recall
value of the data. Considering the f1-score, the highest score is considered to be the best model for imbalanced
data as it can predict better on multiclass classification.
4.2 Results discussion
Based on the experiments done, it can be seen how pre-processing phase is very important and need to be
done thoroughly. This is because, in Twitter there are many slang, dialect and short form words are used.
Nevertheless, there are still some slang and Malay words that cannot be detected during data pre-processing,
hence contributing to one of the reasons on why this project does not get high accuracy above 80%. Therefore,
variety range of N-grams are used in this project to improve the sentiment analysis performance.
5. Conclusion and Future Works
In conclusion, this project managed to accomplish all the research objectives stated earlier. The result of
sentiment analysis is able to analyze and determine the tweets on food reviews into halal, non-halal and pork-
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [130]
Artificial Intelligence in the 4th Industrial Revolution