Page 49 - The-5th-MCAIT2021-eProceeding
P. 49
features and using the feature to train a Bayesian Linear Ridge Regression (BLRR) model which scored an
average of 0.7045 Quadratic Weighted Kappa (QWK) score. The EASE engine groups feature into 4 groups
which are length, part-of-speech (PoS), bag of words (BoW), and prompt. It is often being used by others as the
baseline feature engineering comparison to their own research projects (Eid & Wanas, 2017; Latifi&Gierl,2020;
Janda et al., 2019) as it is invented by one of the top 3 winners of the Automated Student Assessment Prize
(ASAP) competition. Hence the EASE engine is considered to be a robust and baseline engine for AES.
Coh-Metrix is a system proposed by Graesser et al. (2004) where it is an integration of varying software
modules that extract features based on language, discourse, cohesion, and world knowledge (McNamara et al.,
2010). Latifi & Gierl (2020) have taken the ASAP dataset and built a random forest model based on Coh-
Metrixfeatures, which scored an average of 0.7 QWK score.
Eid &Wanas (2017) have proposed to focus on lexical features for AES, where they gathered 22 lexical
features from three other pieces of research on lexical features, which scored an average of 0.684 QWK score.
Janda et al. (2019) proposed 3 main groups of features; syntactic, semantic, and sentiment, that consists of 30
features and they worked on several feature selection techniques to select the top features. These were then use
for a classification model of a three-layer neural network that resulted in an average QWK score of 0.793.
From our review of other related work, the results reported by Phandi et al. (2015) using the EASE feature
extractor is a good baseline for investigation. We propose to base the evaluation on Phandi et al. in order to
identify the influential feature groups.
3. Evaluation Methodology
3.1. Data preprocessing
We use the set 2 from the dataset released for the ASAP competition so that we can focus on our
investigation. We extract the features from the dataset by using the functions provided by EASE. Features
generated by EASE group into four feature groups including length, part of speech (PoS), prompt, and Bag of
Words (BoW) refer to Phandi et al’s (2015) paper. We followed Phandi et al. (2015) method of data
preprocessing to get as closest as the results they get with EASE.
3.2. Learning algorithm and evaluation metric
Multinomial Naïve Bayes (NB) was added on top of the learning algorithms Support Vector Machine (SVM)
and BLRR that were applied in Phandi et al. (2015). NBis known to be suitable for multinomial distributed data
for short text classification. As reported byPhandi et al. (2015), BLRR has been often proven to provide good
results in natural language processing tasks. SVM regression is selected as the comparison against BLRR.The
implementation of the learning algorithms is written in the Python programming language (version 3.8) utilizing
the Python scikit-learn library.
To evaluate the trained models, QWK was used to calculate the agreement between two raters, the human
rater and the trained models. It considers the possibility of the agreement happening by chance (Vanbelle&
Albert, 2009). QWK is the official evaluation metric being used for the ASAP competition. Also, the work by
Phandi et al. (2015), Latifi&Gierl (2020); Janda et al. (2019) use QWK for their evaluation.
3.3. Experimental Setup
The pre-processed data will be duplicated into 4sets for the purpose of generating “Exclude one feature
group” dataset where each of these new datasets will exclude a feature group each, and these are referred to as
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [37]
Artificial Intelligence in the 4th Industrial Revolution