Predictions of the 10th Justice: Testing the Wisdom of the Crowds

February 4th, 2010

Welcome to the ninth installment of Predictions of the 10th Justice, brought to you by The league has over 3,800 members, who have made predictions on all cases currently pending before the Supreme Court.

How large of a crowd do we need before the crowd becomes “wise”? In this week’s 10th Justice, we will be check how accurate our members predicted five cases recently handed down: Smith v. Spisak, Power Marketing, LLC v. Maine Pub. Util. Comm’n, Kucana v. Holder, South Carolina v. North Carolina, and Wood v. Allen. While these cases are not among the most important cases of the term, they still contribute important data points to test the predictive capabilities of, and enables us to further understand the wisdom (and limitations) of our crowd of users. For a refresher on confidence intervals and margins of error in predictions, check out last week’s column.

The first case we considered is Smith v. Spisak, which asked whether or not the 6th Circuit overreached by vacating Spisak’s death penalty and went against the Antiterrorism and Effective Death Penalty Act in the process. The Supreme Court reversed the 6th Circuit ruling in a unanimous decision. Out of 114 members who predicted this case, 67 correctly predicted the Supreme Court would reverse the Sixth Circuit. (59%). Only 8 users predicted the unanimous decision, which constitutes approximately 12% of users who voted for reversal. Compared to the other voting splits, a 6-3 split (22 predictions) was the most predicted, followed by a 5-4 outcome (17 predictions). Given a 90% confidence level (+/- 7.58% margin of error) with a sufficiently large sample size, so this was a relatively close case, as the actual range was somewhere between 51.42% and 66.58%. Had the lower range dipped below 50%, we would have been unable to confidently predict the outcome. For this case, the members generally correctly predicted the general disposition, but mostly failed to determine the correct split.

The second case we considered is NRG Power Marketing, LLC v. Maine Pub. Util. Comm’n, which asked whether the Mobile-Sierra doctrine applied to the Federal Energy Regulation Committee’s review of wholesale electricity rates. The Supreme Court reversed the D.C. Circuit’s decision that the doctrine only applied when a party to the contract attempts a unilateral rate change in an 8-1 decision, with Justice Stevens writing the lone dissent. As to the split, only one member predicted that only Stevens would dissent from the majority. This member was also the only one to predict an 8-1 split. The most popular split among reversal was a unanimous decision, with 13 predictions. Due to the close nature of the predictions, we could not predict with any level of confidence that this outcome would be accurate. Members generally predicted the correct overall outcome, but had difficulty with the split.

What do these cases teach us about the reliability of our predictions? That, and three more predictions, after the jump.

The third case we considered is Kucana v. Holder, which asked about the scope of the jurisdictional stripping provision of 8 U.S.C. 1252(a)(2)(B)(ii) and whether it removes jurisdiction from federal courts to review rulings on motions to reopen by the Board of Immigration Appeals. The Court reversed the 7th Circuit’s in a unanimous decision. Out of 59 predictions, 30 members (51%) predicted that the Court would reverse. Further, 10 members (34%) of those who voted reverse also guessed the correct split. The unanimous decision was the most popular split in the reverse category. Due to both the sample size and the close outcome of the case, we could not predict this case with any level of confidence. In this case, members did not generally predict the correct overall outcome, but were much closer on the split than the previous two cases.

The fourth case we considered is South Carolina v. North Carolina, which asks whether or not other parties can intervene into cases in situations where one state is suing another under the Supreme Court’s original jurisdiction. The Court reversed the Special Master’s decision to let the City of Charlotte, a hydroelectric company, and an interstate water supply company join the suit in an 5-4 decision. In an odd alignment, Chief Justice Roberts dissented, joined by Justices Thomas, Ginsburg, and Sotomayor dissenting. Out of 139 predictions, only 50 members (36%) correctly guessed that the Supreme Court would reverse would reverse the special master. At a 99% confidence level, these numbers yielded a margin of error of +/-10.49%. Only 2 members predicted that 4 Justices would dissent. Not a single member correctly guessed this unusual quartet of dissenters. The 9-0 split in the reverse category attracted the most predictions. For this case, the majority of users were wrong about the overall and split decision.

The final case we considered is Wood v. Allen, which asked whether or not failure to present mental impairment during the sentencing phase of a capital case constituted ineffective assistance of counsel. The Court affirmed the 11th Circuit’s ruling in a 7-2 margin. Justice Stevens dissented, joined by Justice Kennedy. Out of 79 predictions, 63 members (80%) correctly guessed that the Supreme Court would affirm the lower court’s decision. A 99% confidence level yielded a margin of error or +/-11.65% margin of error, thus generating a very reliable prediction. Thirteen members correctly predicted that the split would be 7-2, constituting 21% of all affirm predictions. This split was the second highest predicted category after 5-4 affirm, with 19 predictions. Although many users predicted Stevens would dissent, none predicted both Stevens and Kennedy would dissent together. In this case, users were extremely accurate at predicting the general outcome, but had slightly more trouble predicting the split.

Overall, these cases illustrate important points of relying on prediction pools. First, while larger sample sizes increase the certainty of prediction proportions, they cannot always resolve the issue of reliability. This issue could be due to uncertainty in the overall population of predictions. Second, predicting the split, with the 6 possible choices under each affirm/reverse category, is significantly more difficult than predicting the outcome of a case. Third, lacking a high confidence level in cases such as NRG or Kucana means that the predictions might as well be decided by a random flip of the coin. This is accurate since proportion confidence interval calculations are derived from binomial probability distributions (coin flips). Finally, South Carolina v. North Carolina illustrates that regardless of how accurately we can determine overall feelings about a specific case, the prediction pool is by no means infallible and may get cases wrong. And as we showed in the above cases, it can be difficult to pin down Supreme Court behavior on a Justice level, especially in certain unorthodox allignments.

Many thanks to Corey Carpenter for his fantastic assistance with this post.