Page 53 - The-5th-MCAIT2021-eProceeding
P. 53

According  to  (Lehečka,  Švec,  Ircing,  &  Šmídl,  2020),  there  are  plenty  of  parameters  within  the  BERT
        architecture in which any task can be suited through tuning such parameters. One of the attempts to suite the
        BERT architecture for a particular task is through alter the learning rate. However, the way of altering the
        learning rate is very domain-specific issue where the specified task (i.e., AES) must be considered (Howard &
        Ruder, 2018). Therefore, an adjustment to the learning mechanism for the AES task is needed. This paper aims
        to propose an adjusted BERT architecture based on unfreezing fine-tune mechanism for AES task to overcome
        ‘catastrophic forgetting’ problem.
        2. Related Work


           Liu  et  al.  (2019)  proposed  a  multi-way  attention  architecture  for  AES  task.  The  proposed  architecture
        contains a transformer layer at first which process pre-trained Glove word embedding of student’s answer and
        model’s answer. Then, the following layer represents the multi-way attention where three self-attention vectors
        are  represented  for  the  student’s  answer,  model’s  answer  and  their  cross  vector  respectively.  This  will  be
        followed with an aggregation layer where word’s position vectors will be added. The final layer contains the
        regressor where the score of the essay is being predicted. For this purpose, the authors have used a real-word
        educational dataset of questions and answers. Result of accuracy was 88.9%.
           Zhang & Litman (2019) proposed a deep learning architecture for AES task. The proposed architecture begins
        with pre-trained word embedding vectors brought from Glove and processed via Convolutional Neural Network
        (CNN) layer. Then, the resulted features will be processed via Long Short Term Memory (LSTM) in order to
        generate sentence embedding for each answer. The key distinguishes of this study lies in adding a co-attention
        layer that consider the similar sentences between student’s answer and model’s answer. Lastly, the final layer
        will give the score for each answer. Using the Automated Student Assessment Prize (ASAP) benchmark dataset,
        the proposed architecture produces an accuracy of 81.5%.
           Kyle (2020) examined the lexical sophistication for evaluating second language writing proficiency (L2).
        The authors have used a corpus for English placement test (i.e., TOEFL). Using some lexical features such as
        word  and  n-gram  overlapping  along  with  a  semantic  approach  of  LSA,  the  authors  have  applied  a  simple
        regression in order to predict the score of the tested answers.
           Li et al. (2020) have proposed a deep learning method for AES task where two architectures of CNN and
        LSTM are being employed. First, the authors have processed the words’ vectors of each answer through the
        CNN architecture in order to get the sentence embedding. For this purpose, a pre-trained model of Glove word
        embedding  has  been  used.  In  addition,  the  resulted  sentence  embedding  from  CNN  have  been  furtherly
        processed via the LSTM architecture in order to get the score. Using the benchmark dataset of ASAP, the authors
        have shown an accuracy of 72.65%.
           Tashu (2020) have proposed a deep learning architecture for AES task. The proposed architecture begins
        with word embedding vectors generated by Word2Vec and process via CNN layer in order to extract n-gram
        features. Lastly, a recurrent layer called Bidirectional Gated Recurrent Unit (BGRU) is being used to predict the
        score of the answer. Using the benchmark dataset of ASAP, the proposed architecture showed an accuracy of
        86.5%.
           The advancement of deep learning architecture led to the emergence of Transformers which yield a novel
        mechanism in learning. Such mechanism lies in the synchronized bidirectional learning. Such an architecture
        led to the emergence of Bidirectional Encoder Representations from Transformers (BERT) embedding. BERT
        has a fixed and indexed pretrained model of embedding where a vocabulary of 30,000 English terms is being
        stored. BERT has shown remarkable superior performance in text generation applications.
           However, recently, Rodriguez et al. (2019) have utilized the BERT architecture for the AES task. Using
        ASAP dataset, BERT showed an accuracy of 74.75%. The authors have compared the BERT against the LSTM







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [41]
        Artificial Intelligence in the 4th Industrial Revolution
   48   49   50   51   52   53   54   55   56   57   58