Is your auto-grading software smarter than a fifth grader?

See all posts

One automated grader can review 16,000 essays in 20 seconds. A human grader would need over three months of 9 to 5 work to grade as much—and that's at a fast clip of one essay every two minutes. To state and district leaders facing recession-era budgets, the possibility of automated essay grading must be mighty enticing.

In a new study shaped by the two state consortia tasked with developing assessments aligned with the Common Core State Standards, researchers submitted thousands of pre-graded essays to nine different auto-readers, contrasting their reliability. Slight variability did exist among software programs, but all the programs generated mean scores within 0.10 of the human scores for rubrics ranging from 0-3, and within 1 point for 0-60 scores. And throughout the study the auto-readers faced several types of logistical challenges, leading the researchers to argue that their performance actually represents only the floor of what is possible.

This is not to suggest that there aren't some big kinks to work out. For one, a recent New York Times column describes how easy it is to game the auto-readers, suggesting that their exclusive use would be foolish. (Such is the nature of computer programming: for every cyber-hole there is a hacker.) But in the world of programming, trying to break code is just part of the process that ultimately makes software more robust. For example, one weakness of the current auto-graders is their blindness to factual accuracy and plagiarism. However, since computers can already reliably beat humans at Jeopardy and identify cases of plagiarism in college essays, solutions are likely on both of these fronts.

Auto-grading district and state-mandated tests would in itself be revolutionary in terms of cost and turnaround time for scores, but the real win could be for teachers. Several of the software packages in this study required no further programming, and one was open-source, meaning this software can be both accessible and cheap. For a teacher contemplating a stack of papers to correct whenever assigning essays, access to auto-graders could lead to a whole new perspective. Software like this could grade some assignments, leaving the teacher free to tackle grading of meatier and less formulaic writing assignments.

This study was the first of three. Follow-up work will cover computer capabilities for short-answer constructed responses and math items. Additionally, there are cash prizes for newly developed scoring engines. So, skeptics, make your voices heard on the limitations of robo-graders, because somewhere out there a hungry programmer is already developing a smart work-around.