Data Science is the science of solving problems using data. Data Dialect have adopted four steps in managing data science workflow:

  1. Define the business question

  2. Source the related data

  3. Apply an algorithm to the data to answer the question

  4. Tell a story to communicate the result

Step 2, Sourcing data a tedious and often most time-consuming task in any data science project. The Data Dialect framework was designed to allow a business to outsource the data sourcing step.

Sourcing the data is done through three iterative flows, applying Data Wrangling in the R scripting language.

It has a clear start and end with defined deliverables in the form of scripts, data files and documented reports.

Once wrangling is done, completing the step to Source the Data, it opens possibilities for many business applications:

The Data Dialect framework is closely aligned with two well published data science models:

The results of the Data Dialect framework is reproduceable and the resulting work is the ownership of the client.

See our Wrangling of the Fire Incidents for the City of Cape Town where we’ve applied the Data Dialect framework to source the data.

All the r script applied along with the data files they produce are available on GitHub.