Page 50 - The-5th-MCAIT2021-eProceeding
P. 50

“Exclude Length”, “ExcludePoS”, “ExcludeBoW”and “Exclude Prompt”. A set of pre-processed data will be
        kept for comparison, referred to as “All features”. The 5 training datasets will be used to train the three models
        separately. Then, we predict the scores based on the test sets. The predicted scores will be taken into the QWK
        evaluation metric to compute the agreement between the human rater’s scores and the AES predicted scores.

        4. Results and Discussion

        4.1. QWK scores result for comparison

           The QWK scores for the “All features” dataset and 4 “Exclude one feature group” datasets were computed
        for the three trained models. The trained models' results are summarized in Table 1. “All feature group” shows
        the BLRR model outperforming the rest of the models, which is in agreement with Phandi et al. (2015). For
        “Exclude  one  feature  group”,  the  most  influential  sets  are  bold-faced,  and  the  least  influential  sets  are
        underlined. The length feature is the most influential feature group. However, the prompt feature seems to be
        lacking. The QWK score of “Exclude prompt” in  SVM and BLRR compared to “All  features” show  it is
        overfitting the trained model. By overfitting, it means the prompt feature has worsened the models.

        Table 1. Results for all EASE features and except one feature group.

                                                                          QWK Score
                Feature Group             Features Used
                                                                 NB          SVM         BLRR
               All feature group           All Features         0.517       0.601        0.626
            Exclude one feature group     Exclude Length        0.444       0.565        0.601
                                           Exclude PoS          0.511       0.583        0.617
                                          Exclude BoW           0.546       0.599        0.604
                                         Exclude Prompt         0.494       0.636        0.657

           The EASE function uses the Natural Language ToolKit (NLTK) to tokenize the essay topic into prompt
        words. Subsequently, it finds the synonym of prompt words through the WordNet corpus in NLTK. Then, it
        counts the synonym of prompt words and prompt words. We postulate that the reason for the prompt features
        to be the least influential and to over fit is due to its weakness of extracting the semantic attributes. Semantic
        attributes correspond to the contextual meaning of words or a set of words (Janda et al., 2019). It is crucial for
        essay evaluation that the essay is written around a prompt or essay topic semantically (Norton, 1990). Hence,
        we believe the EASE engine took into consideration of all PoS, which caused the prompt feature to overfit. PoS
        such as conjunctions and adpositions do not bring any contextual meaning, which could add noise to the dataset.
           Also, the method EASE applied to extract the semantic attributes is too brief and can be further improved in
        the future. It only takes into consideration the separate words instead of a pair of words or sentences, which
        makes  it  unable  to  capture  context  where  a  sentence  or  essay  is  starting  to  digress.  As  reported  by
        Miltsakaki&Kukich (2000), coherence between a pair of words or sentences is the key to make text semantically
        meaningful.

        5. Conclusion

           We have experiments to investigate the weak point of the generic approach of feature engineering in AES.
        We propose to compare the four types of features extracted from EASE by using the “Exclude one feature
        group” datasets, then compare their QWK score with the “All features” set. As the comparison between the
        sets, our work has shown that the prompt feature is the weakest feature among the four types of features. The







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [38]
        Artificial Intelligence in the 4th Industrial Revolution
   45   46   47   48   49   50   51   52   53   54   55