Nobody likes SQL, in fact, I’m almost certain Turing would have hated it the most.
Before I back up my reasonably wild and sensational statement with what I hope is a somewhat cogent argument, let me just list some stats as background:
- SQL is around 40 years old
- Tableau and Splunk are almost 15 years old
- Hadoop is ~ 6 years old
- Spark is ~ 3 years old
SQL Sucks! And so do RDBMS!
I’ve never liked SQL, and I’ve never liked relational data stores either. Why? Because I’ve spent most of my career working in industries that were very dynamic, both from a business and technology perspective. The data landscape changed rapidly, and the folks in the front office needed insights quickly, so they leaned heavily on the IT folks in the back, the keepers of the data, to get those insights. Only problem was, the folks in the back often didn’t understand the intricacies of the business domain, but were great at plugging systems together and getting the data to flow. There was a gap between the people who needed answers and the ones tasked with getting it to them.
40 or so years ago, those systems were big monolithic mainframes, where reasonably sized data stores lived. IBM needed a way to pull basic representations from those data stores, and they invented SQL. SQL is by its own definition, structured, and hence, needed a data source that was as well. It’s no surprise then that one of the fathers of SQL was also the father of one of the most painful and onerous data normalization strategies on earth. And so RDBMS became a thing, and the wretched union of SQL and RDBMS came to be.
But it was created at a time when technology, whether it was a printer or “advanced” system, was kept far away from anyone who wasn’t a technologist, which back then, meant you had never loaded a tape reel or compiled a kernel.
Time’s have changed, BI hasn’t!
Back when SQL was growing facial hair, the weapon of choice for business was this:
Today, businesses change at the speed of sound, and so have the tools of the trade:
And even though most aspects of corporate IT have changed with the times, the divide between the business and the data they need to make decisions hasn’t. We may have made it more efficient for the IT folks to build reports etc., but truly letting business folks get access to the right data is still no different than it was 40 years ago; you cut a ticket, try your best to describe your need, and someone in IT will deliver it a few days later.
What has this got to do with Turing?
Turing believed that the most powerful way to deal with computers should be to simply have a conversation with them. If the computer was built right, it would be able to respond intelligently to the human, and together they could get the job done.
SQL is the absolute antithesis of this in the data context. SQL requires the human to express their needs in a language understandable by the computer. It requires that the human be able express their needs in a highly reduced language form. Because SQL lives in a one-way world, you submit your request, it runs it, and it returns data.
In a Turing world however, the machine sits between the human and the data. It understands the data to a degree, and machine learning can take this understanding past basic non-linearity. As the human makes requests, the machine tries to understand the intent and respond. Like human-human interactions, such as requesting a new BI report and going back and forth until the right information is presented, the human-machine interaction is similar, the human helps the machine learn the right way to satisfy the request, until it gets it right, but here is where the world changes forever.
See, that human-human iteration process, that we do everyday when trying to get data out of a system, by the very nature of our natural environment, is not unbounded, it may sometimes feel like that, but largely it’s not, especially in business where the bounds get more narrow because of vertical specialization. As humans, we suck at remembering vast frames of data, and recalling these. So when we receive requests for data in the form of reports, we kinda start from scratch each time, and introduce a certain bias based on what we know and don’t know.
Machines don’t suffer the same issues, in fact, “learning” about the business intelligence domain is a very tractable problem, and over time, and privy to all requests for information within an organization, a machine could become far more powerful in assisting humans with their data needs.
What does this mean for the future of SQL?
Hopefully it means SQL can one day disappear from the data landscape, much like CSV is now that more advanced formats like Parquet have emerged. There’s no need to get nostalgic or dogmatic about these things, the data world is changing as fast as the businesses that use it, and it’s time to get behind more advanced ways of thinking that embrace intelligent machine based methods.
For example, something I’ve been working on is a more natural method for interacting with data, aimed at empowering end-users and truly democratizing the vast stores of data within an organization.
For me, being able to explore data as a conversation is the right direction for advanced analytics to take, and with the right use of deep learning, natural language understanding, and semantic data modelling, the possibilities are endless. It also requires us to start embracing non-linear and probabilistic methods, which is frankly, a good thing.