Josh Blackman » Which grades essays more accurately

Which grades essays more accurately–Humans or Algorithms?

June 11th, 2012

The standardized tests administered by the states at the end of the school year typically have an essay-writing component, requiring the hiring of humans to grade them one by one. This spring, the William and Flora Hewlett Foundation sponsored a competition to see how well algorithms submitted by professional data scientists and amateur statistics wizards could predict the scores assigned by human graders. The winners were announced last month — and the predictive algorithms were eerily accurate. . . .

If the thought of an algorithm replacing a human causes queasiness, consider this: In states’ standardized tests, each essay is typically scored by two human graders; machine scoring replaces only one of the two. And humans are not necessarily ideal graders: they provide an average of only three minutes of attention per essay, Ms. Chow says.

We are talking here about providing a very rough kind of measurement, the assignment of a single summary score on, say, a seventh grader’s essay, not commentary on the use of metaphor in a college senior’s creative writing seminar.

Software sharply lowers the cost of scoring those essays — a matter of great importance because states have begun to toss essay evaluation to the wayside.