Which grades essays more accurately–Humans or Algorithms?

June 11th, 2012

It’s a close call.

The standardized tests administered by the states at the end of the school year typically have an essay-writing component, requiring the hiring of humans to grade them one by one. This spring, the William and Flora Hewlett Foundation sponsored a competition to see how well algorithms submitted by professional data scientists and amateur statistics wizards could predict the scores assigned by human graders. The winners were announced last month — and the predictive algorithms were eerily accurate. . . .

If the thought of an algorithm replacing a human causes queasiness, consider this: In states’ standardized tests, each essay is typically scored by two human graders; machine scoring replaces only one of the two. And humans are not necessarily ideal graders: they provide an average of only three minutes of attention per essay, Ms. Chow says.

We are talking here about providing a very rough kind of measurement, the assignment of a single summary score on, say, a seventh grader’s essay, not commentary on the use of metaphor in a college senior’s creative writing seminar.

Software sharply lowers the cost of scoring those essays — a matter of great importance because states have begun to toss essay evaluation to the wayside.