Big Data & AI

My love for data began almost 20 years ago when I was (seemingly in a different life) a database engineer. Back then, I was responsible for automating the collection, collation, and querying of messaging metadata from tens of thousands of trade confirmations, SMS messages, and even TELEX messages (!) sent monthly on my company’s network. Back then, that was a lot of data. Times have changed.

Now, really big data determines the products we buy, the news we read, our social networks, even the prison terms we are likely to serve if we run afoul of the law. These (and many more) are all determined by data-driven formulas we neither control nor, for the most part, understand. If only for its central position in modern life, big data is dwarfed by the questions it raises. I’ve taught classes about the role of big data in society, focusing on its political and ethical implications (link). I’ve given some talks at UC on the subject, one of them even winning first prize in a 2019’s “AI and the Future of (No) Work” Colloquium (link).

But my main interest in data concerns questions of scientific methodology. Specifically, I’m interested in challenges to causal reasoning stemming from the nominalist metaphysics underlying “natural” kind construction in current data techniques. In the popular press, this has been presented as the death of theorizing in science.¹ But while that prognosis is overly dramatic, I’m interested in what truth there is to it, and how we can expect to ‘understand’ in a future in which opaque AI guides our construction of scientific theories.

It all started here: https://www.wired.com/2008/06/pb-theory/. ↩