Understanding “Extremely Randomized Trees” As Thousands of Supreme Court Predicting Bloggers

August 5th, 2014

In our article, Dan Katz, Mike Bommarito, and I rely on the machine learning technology, known as “extremely randomized trees,” to build a model that predicts decisions with a 69.7% accuracy,and individual justice votes with a 70.9% accuracy, over all cases from 1953-2013.

One of the most common questions I get, in various forms, is how can your model predict cases that were already decided. This often followed by a critique that our model was backwards looking, and attempted to fit a model with what happened. This isn’t how our model works, so let me try to use an analogy, with law bloggers we all know, to explain it.

On Tuesday, the Supreme Court announces that Thursday will be the final day of the term. We know that there are five outstanding cases. On Wednesday, Orin at VC offers his predictions for the five cases. He makes this prediction based on everything he knows about the Court from this term, and from the last few terms. He also considers factors like what the lower court did, how out of line it was with the Court’s precedents, and perhaps how often that Circuit had been reversed. He also considers how specific Justices have treated specific cases involving this particular issue. Based on all of these factors, which he assigns varying weights, he posts his predictions on Wednesday.

The Court makes a decision on Thursday. On Friday, Orin takes a retrospective, analyzes his predictions, and tries to figure out what went right and what went wrong. Maybe he weighted some factors too much, and other factors too little. He notes that next term he will try to use a slightly different framework.

Come the following June, Orin, based on the good and bad steps he took the previous term, approaches the decision-making process slightly differently, and comes up with better, or worse results. Before each day the Court is in session to hand down opinions, Orin takes a look back, figures what works and didn’t, and revises his methodology. Then, he offers a new set of predictions. After the cases are decided, he again looks back and revises his approach, and offers new predictions. Tedious for sure.

After doing this for, let’s say 60 years, Orin has a pretty good sense of what the right weight of factors to balance are. This becomes his standard model. With this standard formula, he goes back and runs predictions in each of the 60 terms, limiting himself only to information he knew at the time the case is decided (as hard as this may be, I’m positive Orin can do this). In other words, the ideologies of the Justices in 2014 will be different than the ideologies of the Justices in 2074. The model is the same, but the variables are time-sensitive. After Orin runs this 60 year experiment in the year 2074, he finds that the finely calibrated weights of the variably achieved an overall accuracy rate of (let’s say) 70%.

Now, imagine instead of just Orin doing this 60-year process, every member of the VC did it. And each member came up with different weightings of the variable. Will went one way, David went another, Ilya a third, Randy a fourth, etc. Each of their models, when run over 60 years, achieved accuracy rates around 70–some higher, some lower.

Now, imagine thousands of legal bloggers on the legal blogosphere attempt the same 60-year experiment, each designing different weights that may work better or worse.

The king of the blogosphere then averages together all of the different models to figure out the “gold standard” that can be used to predict any case during the previous 60 years, or for the upcoming October 2074 term (where the Court will decide whether robots have a right to privacy, or something like that).

This is effecitvely how our machine-learning model works. When designing the method, we attempted to model how hundreds or thousands of law professors who, if glued to their computers for 60 years, could have designed a prediction methodology. Instead of working with an individual blogger, or even a group of bloggers, our model generates thousands of extremely randomized decision trees. Each tree assigns different weights to a series of over 95 variables. Then, based on these predictions, we figure which weights work better than others, and refine our methodology. But all the predictions are made before the case was decided–as if we were using a time machine. When going through the 60-year history, we only look at information known prior to the case decision. As I explained to Vox:

 Instead of making this backward-looking, we wanted to try to be forward looking. Say a case was decided on March 12th 1954. What information was known to the world on March 11th 1954, the day before it was decided (which was actually Antonin Scalia’s eighteenth birthday — sorry, I’m a nerd). What did we know on March 11th 1954? Based only on that data we started using a machine learning process called extremely randomized trees.

In the same way that other studies generate a decision tree, our algorithms basically spit out lots of trees, randomly designed. Each tree put different weights on different variables. Then we checked the trees. Some trees happen to work better than others. The weights of the variables are calculated to four or five decimal places. They use very precise weights. By creating enough trees, we were able to figure out which ones did best.

Once we make our predictions, we have to test them against the ultimate outcome to see what worked and didn’t. Then we can go back with the revised weights of variables.

In the end, our model will be put to the test this year. We will be predicting cases in real time, and post all of our predictions online. It’s hard to predict in advance how we will do. Some years we were around 80%. Some years we were around 60%. There is a lot of variability in the Court’s docket. And, the recent unanimity of the Court has thrown a wrench in things a bit. But we are really excited to see how we do this year. We are even more excited for the players in our tournament to compete against–and beat–our model. This will give us far more insights into where humans excel, and where machines excel.