Predictive Coding Identifies 80% of Relevant Documents

January 24th, 2013

Interesting developments from Loudon County Virginia.

ABA Journal reports on a WSJ Blog Post ($$$):

The e-discovery process got under way when lawyers coded a sample of 5,000 documents out of 1.3 million as either relevant or irrelevant. The information was then used to develop algorithms for a computer search of the remaining documents. The program turned up about 173,000 documents deemed relevant.

To see how well the computer program worked, the lawyers checked a sample of about 400 documents deemed relevant by the computer program. About 80 percent were indeed relevant. The lawyers then checked a sample of the documents deemed irrelevant. About 2.9 percent were possibly relevant. The statistics mean that about 81 percent of all relevant documents were found.

Humans can generally identify 60% of relevant documents.