Comparing Human and Algorithm-Based Curation of Legal Data

August 8th, 2012

Susan Nevelow Mart and Jeffrey Lutig posted a paper to SSRN titled “”The Case for Curation: The Relevance of Digest and Citator Results in Westlaw and Lexis” that looks at how Westlaw, which relies primarily on human curators, and Lexis, which relies primarily on algorithm-based curators compare.

 Humans and machines are both involved in the creation of legal research resources. For legal information retrieval systems, the human-curated finding aid is being overtaken by the computer algorithm. But human-curated finding aids still exist. One of them is the West Key Number system. The Key Number system’s headnote classification of case law, started back in the nineteenth century, was and is the creation of humans. The retrospective headnote classification of the cases in Lexis’s case databases, started in 1999, was created primarily although not exclusively with computer algorithms. So how do these two very different systems deal with a similar headnote from the same case, when they link the headnote to the digesting and citator functions in their respective databases? This paper continues an investigation into this question, looking at the relevance of results from digest and citator search run on matching headnotes in ninety important federal and state cases, to see how each performs. For digests, where the results are curated – where a human has made a judgment about the meaning of a case and placed it in a classification system – humans still have an advantage. For citators, where algorithm is battling algorithm to find relevant results, it is a matter of the better algorithm winning. But no one algorithm is doing a very good job of finding all the relevant results; the overlap between the two citator systems is not that large. The lesson for researchers: know how your legal research system was created, what involvement, if any, humans had in the curation of the system, and what a researcher can and cannot expect from the system you are using.

I cannot cite from the article without the author’s permission, so I won’t.