A great interview in Slate with the interview of Kaggle, an online marketplace for data scientists. A few relevant bits about how experts fare against super crunchers (hint, experts lose):
PA: How exactly do these competitions work?
JH: They rely on techniques like data mining and machine learning to predict future trends from current data. Companies, governments, and researchers present data sets and problems, and offer prize money for the best solutions. Anyone can enter: We have nearly 64,000 registered users. We’ve discovered that creative-data scientists can solve problems in every field better than experts in those fields can.
PA: These competitions deal with very specialized subjects. Do experts enter?
JH: Oh yes. Every time a new competition comes out, the experts say: “We’ve built a whole industry around this. We know the answers.” And after a couple of weeks, they get blown out of the water.…
PA: That sounds very different from the traditional approach to building predictive models. How have experts reacted?
JH: The messages are uncomfortable for a lot of people. It’s controversial because we’re telling them: “Your decades of specialist knowledge are not only useless, they’re actually unhelpful; your sophisticated techniques are worse than generic methods.” It’s difficult for people who are used to that old type of science. They spend so much time discussing whether an idea makes sense. They check the visualizations and noodle over it. That is all actively unhelpful.
PA: Is there any role for expert knowledge?
JH: Some kinds of experts are required early on, for when you’re trying to work out what problem you’re trying to solve. The expertise you need is strategy expertise in answering these questions.
Wait till the experts try to shut down these algorithms (think unauthorized practice of law and medicine suits). Such suits are going to be a really, really big issue in a few years once the experts recognize the threat technology poses to their homogeny.
This is why understanding First Amendment for big data will soon be very, very important. See my Op-Ed in the Houston Chronicle.