Whatever Happened to Big Data?

By Keegan O’Shea, Head of Data & Analytics at Baby Bunting

It’s been a decade since I first heard the phrase “Big Data.” I was told that it represented the future and that those of us in ‘reporting’ roles needed to take note, as it would transform our lives and the lives of those around us. I’ll admit that my first thoughts drew me to phrases like Big Pharma or Big Tobacco, conjuring images of titans of industry colluding in smoky back rooms, arguing the relative merits of ones and zeroes. Thankfully, my curiosity got the better of me, and I read my first-ever book on the topic of data: Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth Cukier.

The book itself was well reasoned and argued that data would become its own form of capital, the impact its growth would have on industry and privacy, as well as cautions about the risks of correlation versus causation. At the time, data was still very much a tabular affair, with the end of the AI winter arriving at the time of the book’s publication. It wasn’t the first time I’d taken an interest in a promising new technology - I had been driving on the “Information Superhighway” since the late 90s, a few years shy of when I would be allowed to drive on an actual highway and always had a fascination with emerging technologies. Big Data, however, seemed different - if all of Information Technology was the process of receiving inputs, processing them, and producing an output, the growth of data meant that we had all the raw ingredients to transform the world.

The exponential growth of data is not easy to grasp. The scale of terabytes, petabytes and exabytes are difficult to interpret, like switching an ‘m’ for a ‘b’ would dramatically increase the fortunes of a millionaire. To put it in perspective, I like to think about the volume of data generated in terms of people. While exact estimates are difficult, we believe that around 2 petabytes (or 2,000 terabytes) of data were generated in the year 1984. For a moment, let’s assume each person is a petabyte - so data could be represented by a young couple living in a one-bedroom flat. Skip to 1994, and the couple has grown to a group of ten, sitting along a long dinner table debating drink purchases over a bill. This was around the time digital storage became more cost-effective for storing data than paper, initiating an explosion of information. In 2004, we’re now at 100,000 - the population of Bendigo, Victoria. Skipping to 2014, we hit 12.5 million, around the combined populations of Sydney, Melbourne, and Brisbane.

The book Big Data was published in 2013, and the dream of big data had been fully realised. In three decades, the scale of data exploded, with the growth of Web 2.0 generating images, videos, and the general proliferation of data across the globe. We were all right to be wowed by the promise of Big Data, especially with the rise of at-scale machine learning fueling the growth of the trillion-dollar businesses that dictate so much of our daily lives. So… why has the phrase fallen out of fashion?

A quick Google Trends search suggests that interest in the phrase peaked in 2017, with a steady decline putting interest slightly below where it was in 2013. Data is everywhere and it fuels so much - why is this phrase no longer part of the lexicon? While there are a few theories out there, my assessment is rather straightforward - the use of data has become so widespread that we now have a suite of commonly accepted terms that go a level deeper and speak to the explicit functionality in the area. Data warehouse, data lake, API call, S3 bucket, Google Drive - all these phrases assume that data is big, and that they need to be stored and processed somewhere. These examples are a subset of a broader set of terms we use now that we’re further down the value chain - artificial intelligence, business intelligence, machine learning - we now no longer look upon Big Data as the black monolith from 2001: A Space Odyssey. We’ve smashed the stone apart and we’ve been building widgets for years, demystifying the unknown and making something commonplace.

While some may argue that the hype of Big Data has given way to Data Science, AI, Crypto or even Web 3.0 (remember Web 3.0?), its value is only amplified as a foundational component of each of these new trends. Coming back to our population growth example, in the decade since 2014 our population has again exploded from the size of Australia’s eastern capitals to 150 petabytes a year, or roughly the population of Russia. In just 40 years, we’ve gone from a couple to a major nation, and in that time, we’ve built industries and apparatus that produce outputs that even the most prescient of us would have been shocked by a mere three years ago.

So - do we mourn the decline of the phrase ‘Big Data’? Hardly, it’s the natural order of things. Famed linguist Ferdinand de Saussure once said - "Time changes all things; there is no reason why language should escape this universal law.” For if it’s true of all things, it should especially be true of technology. Big Data arrived, and it’s here to stay - we’ve just traded our automobiles in for cars, hot rods, and hatchbacks.

What I believe we must remain focused on, however, is how we treat data. Today it is too easy to throw caution to the wind and build AI solutions without the appropriate concern for governance, ownership, privacy, or accuracy. I was drawn to data for its ability to uncover the truth, and the cruel irony in the scale of data leaves us having to navigate this space with measured consideration. Mayer-Schönberger and Kenneth Cukier’s concerns around correlation vs causation seem quaint when you look at the AI tooling available to the general public today - and those of us in the data profession must not lose sight of its truth-telling origins.

 

About the Author

Keegan O'Shea is Head of Data & Analytics at Baby Bunting and an IAPA Advisory Committee member.