The Harvard Business Review recently declared that Data Scientist was the “sexiest career of the 21st Century” and in the United States it is now ranked as the best paying job around, with an average salary of $110,000. Added to this, there is a dire shortage of qualified people to fill all the Data Scientist positions that are opening up. IBM estimates that the number of Data Scientist positions is expected to grow by 364,000 in the US alone by 2020.
Suffice to say: Data Scientists are hot right now. The majority of people would probably tell you that you have to hire one, and soon. But in the immortal words of Mark Twain: “Whenever you find yourself on the side of the majority, it is time to pause and reflect”. And that’s exactly what this article is about. We ask the question: do business owners and managers really need to hire a Data Scientist?
Of course, for large enterprises with sufficient budget, a skilled and experienced Data Scientist can be a valuable addition to your team. The massive volumes of data that large enterprises face easily justify a devoted resource (or two or three or more) to study the problem and find solutions.
However, the question is: do you ‘have to’ hire a Data Scientist and at dataWerks our view is: perhaps not. This may go against the grain of popular opinion these days so you might expect us to have some strong evidence to support our claim. Indeed, we do. We recently implemented a BI solution for a Fortune 100 media company that proves our point.
This customer presented us with a massive challenge: 150,000 visitors per day at their flagship theme park creating approximately four billion data records stored in more than 40 disparate data sources. They asked us to deliver real-time BI related to theme park visitors so they could immediately address any issues hindering an optimum customer experience.
For example, events that reduce customer satisfaction, such as waiting in line for too long or being exposed to a rain shower, have been defined and are tracked in real-time, suggesting which visitors are most likely to have a bad experience and would need immediate intervention to recover, for instance in the form of coupons for the merchandise store.
Oh, and by the way, the customer wanted to see a sales uplift of a half billion in revenue per year by optimising the customer experience in this way. Impossible right? Not at all. We delivered the solution in less than a year and the customer is now enjoying half a billion sales growth each year with a 20% operating margin.
Total number of Data Scientists on the project: 0. So this is why we answer the question with: “perhaps not”. The next logical question is: how is this possible?
The short answer: empower the business users instead of hiring Data Scientists. This is exactly what we did for the customer described above. They knew everything about their data including where it was stored. They just couldn’t access it fast and simple enough. Typically, it took them 24-36 hours to get the data that they really needed, and by that time, the theme park customer would typically have left the theme park.
Hiring several Data Scientists for this project would undoubtedly have necessitated 6 months for them to come up to speed on the vast amount of data stored in those 40 data sources. Instead, we delivered 80% of the requirements within 6 weeks by simply empowering their business users with access to the data they needed, and giving them the ability to mashup the data as required.
Indeed, this is the mission of dataWerks: revolutionize the way companies access data, and in turn, drastically save time and money for our customers. If this involves negating the need for a full-time Data Scientist then so be it. To be very frank: on this project a Data Scientist may have slowed us down.
But why? Here the answer lies in another of our core principles. The closer to the source of the data, the more accurate and valid the data usually is. The same holds true of the data experts: the business users have typically been working with the data for 10 or 20 years so they know it best. Even the most talented of Data Scientists would need a few months to learn what the business users already know. And that’s just Step 1 of what a Data Scientist does. There are five more steps. Let’s take a look by deconstructing the job of a Data Scientist. A.J. Goldstein recently summarized the six typical steps in the Data Science process in his article Deconstructing Data Science. The steps are illustrated below:
The first, and most fundamental problem, lies in those three words in the middle: tries to understand. If the Data Scientist succeeds then you are off to the races and with the six steps shown they can contribute some incredibly valuable insights. However, if they don’t fully understand the data, this can introduce even more complexity into a situation already fraught with complexity. Paralysis by analysis anyone?
Claudia Perlich, a leading Data Scientist herself, described this problem when she wrote that Data Scientists “end up solving whatever they understood might be the problem, ultimately creating a solution that is not really helpful (and often far too complicated). And that’s just the risk with step 1.
Steps 2, 3 and 4 are also inherently risky and they all involve significant time and costs, not to mention all sorts of old-fashioned ETL, data lakes and so forth. All of which may not even be necessary. We cover this subject in Beyond Data Lakes: The Total Integration Revolution. The key point: you may not need a Data Lake so you may not need a Data Scientist to handle the Data Lake. Indeed, dataWerks can negate the need for Data Lakes altogether by virtualising access to all your data stores.
The key difference: we remove ‘tries to understand’. Instead, we empower business users with access to all your data silos using the front-end tools you already have in place. So to answer the original question: do you have to hire a Data Scientist: perhaps not.