Page 53 - The-5th-MCAIT2021-eProceeding
P. 53
According to (Lehečka, Švec, Ircing, & Šmídl, 2020), there are plenty of parameters within the BERT
architecture in which any task can be suited through tuning such parameters. One of the attempts to suite the
BERT architecture for a particular task is through alter the learning rate. However, the way of altering the
learning rate is very domain-specific issue where the specified task (i.e., AES) must be considered (Howard &
Ruder, 2018). Therefore, an adjustment to the learning mechanism for the AES task is needed. This paper aims
to propose an adjusted BERT architecture based on unfreezing fine-tune mechanism for AES task to overcome
‘catastrophic forgetting’ problem.
2. Related Work
Liu et al. (2019) proposed a multi-way attention architecture for AES task. The proposed architecture
contains a transformer layer at first which process pre-trained Glove word embedding of student’s answer and
model’s answer. Then, the following layer represents the multi-way attention where three self-attention vectors
are represented for the student’s answer, model’s answer and their cross vector respectively. This will be
followed with an aggregation layer where word’s position vectors will be added. The final layer contains the
regressor where the score of the essay is being predicted. For this purpose, the authors have used a real-word
educational dataset of questions and answers. Result of accuracy was 88.9%.
Zhang & Litman (2019) proposed a deep learning architecture for AES task. The proposed architecture begins
with pre-trained word embedding vectors brought from Glove and processed via Convolutional Neural Network
(CNN) layer. Then, the resulted features will be processed via Long Short Term Memory (LSTM) in order to
generate sentence embedding for each answer. The key distinguishes of this study lies in adding a co-attention
layer that consider the similar sentences between student’s answer and model’s answer. Lastly, the final layer
will give the score for each answer. Using the Automated Student Assessment Prize (ASAP) benchmark dataset,
the proposed architecture produces an accuracy of 81.5%.
Kyle (2020) examined the lexical sophistication for evaluating second language writing proficiency (L2).
The authors have used a corpus for English placement test (i.e., TOEFL). Using some lexical features such as
word and n-gram overlapping along with a semantic approach of LSA, the authors have applied a simple
regression in order to predict the score of the tested answers.
Li et al. (2020) have proposed a deep learning method for AES task where two architectures of CNN and
LSTM are being employed. First, the authors have processed the words’ vectors of each answer through the
CNN architecture in order to get the sentence embedding. For this purpose, a pre-trained model of Glove word
embedding has been used. In addition, the resulted sentence embedding from CNN have been furtherly
processed via the LSTM architecture in order to get the score. Using the benchmark dataset of ASAP, the authors
have shown an accuracy of 72.65%.
Tashu (2020) have proposed a deep learning architecture for AES task. The proposed architecture begins
with word embedding vectors generated by Word2Vec and process via CNN layer in order to extract n-gram
features. Lastly, a recurrent layer called Bidirectional Gated Recurrent Unit (BGRU) is being used to predict the
score of the answer. Using the benchmark dataset of ASAP, the proposed architecture showed an accuracy of
86.5%.
The advancement of deep learning architecture led to the emergence of Transformers which yield a novel
mechanism in learning. Such mechanism lies in the synchronized bidirectional learning. Such an architecture
led to the emergence of Bidirectional Encoder Representations from Transformers (BERT) embedding. BERT
has a fixed and indexed pretrained model of embedding where a vocabulary of 30,000 English terms is being
stored. BERT has shown remarkable superior performance in text generation applications.
However, recently, Rodriguez et al. (2019) have utilized the BERT architecture for the AES task. Using
ASAP dataset, BERT showed an accuracy of 74.75%. The authors have compared the BERT against the LSTM
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [41]
Artificial Intelligence in the 4th Industrial Revolution