Page 48 - The-5th-MCAIT2021-eProceeding
P. 48

Investigating Feature Relevance for Essay Scoring

                                                   a
                                                                    b
                                    Jih Soong Tan *, Ian K. T. Tan
                             a Priority Dynamics Sdn Bhd, One City, Subang Jaya and 47650, Malaysia
                          b Monash University Malaysia, Bandar Sunway, Subang Jaya and 47500, Malaysia
                                        *Email: jsoong@prioritydynamics.com


        Abstract
        Human grading of essays requires significant effort that is time consuming and vulnerable to be biased to the varying human
        graders. There has been numerous research effort in recent years on automated essay scoring (AES). The majority of the
        researches are based on extracting multiple linguistic features and using them to build a classification model for essay
        scoring. There are 3 main groups of features that are commonly being investigated for AES, namely lexical, grammatical,
        and semantic features. In this paper, we conducted empirical studies to investigate the influence of the different groups of
        features on the accuracy of the AES classification models based on a commonly used approach for AES research. The results
        exposed that the semantic feature, prompt, is the weakest group among the feature groups and this is due to the typical
        overfitting of the classification model when using the essay prompt.

        Keywords: Auto Essay Scoring; Features; Importanc; EASE; ASAP


        1. Introduction

           Essays are generally used in academic writing which determines the understanding of students based on their
        arguments. However, in order to grade these essays, the effort needed by human graders will require time to
        ensure fair assessments. This is because human grading is vulnerable to be biased and will vary depending on
        the events that precede the human grader’s life (Shermis& Burstein, 2003). An automated essay scoring (AES)
        computing  system  is  ought  to  be  capable  of  overcoming  all  these  human  graders’  shortcomings  by  being
        consistent and fair throughout the essay evaluation (Shermis& Burstein, 2003; Janda et al., 2019). As far back
        as 1966, Page (1966) first invented an AES system called Project Essay Grade (PEG). Since then, there have
        been innovations and new systems in the AES field such as a newer version of PEG (Page, 1994), e-rater V2
        (Attali& Burstein, 2006), and IntelliMetric (Elliot, 2001). Among all these systems, the linguistic features can
        be grouped into 3 groups of features, which are lexical, grammatical and semantic features.
           In the previous study by Shermis& Burstein (2003), they have reported that the key properties of a good
        essay are written around the given prompt, well-structured, smooth flow, good grammar application, length,
        good spellings, and punctuation. Hence, we propose feature influence study to find the weak points of current
        feature engineering using a generic approach of feature engineering for AES for potential further improvement
        in  addressing  the  AES  classification  accuracy.  Using  known  state  of  the  art  learning  algorithms  for  the
        classification models, the most influential and the least influential or the weak point of the current feature
        engineering method is discovered.

        2. Related Work

           For feature engineering in  AES, there has been several efforts done by the other researchers. Phandi et
                                                                                  *
        al.(2015) have worked on AES by implementing the Enhanced AI Scoring Engine (EASE)  engine to extract

        *https://github.com/edx/ease






        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [36]
        Artificial Intelligence in the 4th Industrial Revolution
   43   44   45   46   47   48   49   50   51   52   53