The Data Science evolution

Data has been around for centuries, computers has been in our homes and offices for decades – why now all this talk about Data Science ?

Because data science is the “sexiest job of the twenty first century”.   Since an article by this title was published in the revered Harvard Business Review of October 2012 data science has become more than a catchy buzzword. Data Science is developing and growing so fast, it could leave Moore’s Law breathless.

What is Data Science ?

Simply put – it is the science of solving problems using data. Here are Data Dialect we’ve distilled our data science work into four simple steps:

  1. Define a question,
  2. Source the related data
  3. Apply an algorithm to the data to answer the question
  4. Tell a story to communicate the result

So – using loads and loads of data and computing power, data science solves problems using mathematical statistics like regression analysis and hypothesis testing using different variables simultaneously in experimentation.  But…mathematical statistics have been around for decades. Why are we only talking Data Science now?

For Data Science to evolve from Modern Mathematical Statistics it needed two more ingredients. Data and Computing Power.

Since the early 2000’s the world around us has become more and more digitized.  In the previous century, websites were little more that digital pamphlets. Then Web 2.0 emerged where website changed to being a shared experience.

In 2003 we got MySpace, in 2004 Facebook, in 2005 YouTube. More commercial sites followed – Twitter, LinekdIn, Snapchat, Skype, Instagram, Tinder, Netflix, AirBnB, Uber. These sites involved the person behind the keyboard to contribute, post, comment, upload, share, have a footprint in the digital landscape.

In addition the Internet of Things evolved – where data readings are recorded and used to trigger events. The internet created and shaped an ecosystem of data.

In addition, companies have digitised their client data, interactions and transactions happened digitally instead of via paper work – a flood of data has become available.

Data creation has grown to a point where – to paraphrase a quote reportedly from Google’s Chief Economist Hal Varian, we are creating more data every 2 days than was created between the dawn of civilization and 2003.

We have Big data ! All of this resulted in data too big for traditional computing power .

But what has happened with traditional computing power ?

    • In the 1960s we had computing power confined to specialised data centres.
    • In the 1980s computing power was liberated to the personal computer, gaining access to domestic households and office workers desktop.
    • Then in the early/mid 2000s – distributed computing power came alive in the cloud infrastructure.

Maybe big data came about because of the enhanced computing power the cloud offered, maybe the cloud came about because big data demanded it, maybe the two fed on each other.  Either way, mathematical statistics got what it needed to evolve into data science – Big Data and Big Computing Power.