Matt Jones

Opacity and its History: Decision Trees, Random Forests, and the Genealogy of the Algorithmic Black Box

My talk concerns the genesis and the development of one of the foremost kinds of algorithms for supervised learning: decision trees. A series of researchers, each slightly askew to the dominant practices and epistemic virtues of their fields, came obliquely to trees in the 1970s: a data-driven statistician, a machine learning expert focused on large data sets, social scientists unhappy with multivariate statistics, a physicist interested mostly in computers who eventually was tenured in a statistics department. In case after case, the creators of different forms of trees deployed “applied” philosophies of science in critiquing contemporary practices, epistemic criteria and even promotion practices in academic disciplines. Faced with increasing amounts of highdimensional data, these authors time and again advocated a data-focused positivism. The history of trees does not cleanly divide into a theoretical and an applied stage; an academic and a commercial phase; a statistical and a computational stage; or even an algorithm design and an implementation stage. This history is iterative: the implementation of algorithms on actually existing computers with various limitations drives the development and transformation of the techniques. Before the very recent renaissance and current triumph of neural networks, decision trees were central to the transformation of artificial intelligence and machine learning of recent years: the shift in the central goal to a focus on prediction at the expense of concerns with human intelligibility, and of a shift from symbolic interpretation to potent but inscrutable black-boxes. Trees exploded in the late 1980s and 1980s as paragons of interpretable algorithms but developed in the late 1990s into a key example of powerful but opaque ensemble models, predictive but almost unknowable. We need to explain, rather than take as given, the shift in values to prediction—to an instrumentalism—central to the ethos and practice of the contemporary data sciences. Opacity needs its history—just as transparency does.