Page 48 - The-5th-MCAIT2021-eProceeding

P. 48

Investigating Feature Relevance for Essay Scoring

a
b
Jih Soong Tan *, Ian K. T. Tan
a Priority Dynamics Sdn Bhd, One City, Subang Jaya and 47650, Malaysia
b Monash University Malaysia, Bandar Sunway, Subang Jaya and 47500, Malaysia
*Email: jsoong@prioritydynamics.com

Abstract
Human grading of essays requires significant effort that is time consuming and vulnerable to be biased to the varying human
graders. There has been numerous research effort in recent years on automated essay scoring (AES). The majority of the
researches are based on extracting multiple linguistic features and using them to build a classification model for essay
scoring. There are 3 main groups of features that are commonly being investigated for AES, namely lexical, grammatical,
and semantic features. In this paper, we conducted empirical studies to investigate the influence of the different groups of
features on the accuracy of the AES classification models based on a commonly used approach for AES research. The results
exposed that the semantic feature, prompt, is the weakest group among the feature groups and this is due to the typical
overfitting of the classification model when using the essay prompt.

Keywords: Auto Essay Scoring; Features; Importanc; EASE; ASAP

1. Introduction

Essays are generally used in academic writing which determines the understanding of students based on their
arguments. However, in order to grade these essays, the effort needed by human graders will require time to
ensure fair assessments. This is because human grading is vulnerable to be biased and will vary depending on
the events that precede the human grader’s life (Shermis& Burstein, 2003). An automated essay scoring (AES)
computing system is ought to be capable of overcoming all these human graders’ shortcomings by being
consistent and fair throughout the essay evaluation (Shermis& Burstein, 2003; Janda et al., 2019). As far back
as 1966, Page (1966) first invented an AES system called Project Essay Grade (PEG). Since then, there have
been innovations and new systems in the AES field such as a newer version of PEG (Page, 1994), e-rater V2
(Attali& Burstein, 2006), and IntelliMetric (Elliot, 2001). Among all these systems, the linguistic features can
be grouped into 3 groups of features, which are lexical, grammatical and semantic features.
In the previous study by Shermis& Burstein (2003), they have reported that the key properties of a good
essay are written around the given prompt, well-structured, smooth flow, good grammar application, length,
good spellings, and punctuation. Hence, we propose feature influence study to find the weak points of current
feature engineering using a generic approach of feature engineering for AES for potential further improvement
in addressing the AES classification accuracy. Using known state of the art learning algorithms for the
classification models, the most influential and the least influential or the weak point of the current feature
engineering method is discovered.

2. Related Work

For feature engineering in AES, there has been several efforts done by the other researchers. Phandi et
*
al.(2015) have worked on AES by implementing the Enhanced AI Scoring Engine (EASE) engine to extract

*https://github.com/edx/ease

E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [36]
Artificial Intelligence in the 4th Industrial Revolution

43 44 45 46 47 48 49 50 51 52 53