In the slipstream of the data revolution, there is a newfound demand for more data scientists and the number of employment opportunities has increased exponentially. As a result, over the past many years, there has been a significant increase in the number of educational institutions that offer programs and degrees with a focus on this area of expertise to support the growing demand.
However, this influx has led business and technology leaders to wonder if the clamor for these specialists is based on more hype than on actual reality.
As it stands, the increased need for these data scientists is not just merely hype, but perhaps a distorted reality. In fact, the demand for data scientists is closely mirrored by the need for – or a lack of – great data tools that have the ability to work in harmony with speed and efficiency.
Data scientists – the way companies are currently defining the role – typically design data sets and work directly with tools, while data analysts provide the human ingenuity that makes data sets relevant to a company’s business model and its target audiences. But the role of the data scientist continues to transform, the harder it becomes to define exactly what a data scientist is.
Increasingly, companies are finding ways to consolidate these positions. They are blurring the lines between modelers, data scientists and data analysts. In part, this is due to the fact that data scientists have become very effective interpreters of big data, which often makes their key findings more relevant to business strategy than one would think. Because of this, there is a constant reassessment of how well these positions are functioning together, and companies are restructuring accordingly to help unleash the full potential of these talents.
As companies sort through the need for data scientists versus other analyst and modeler positions, several key questions should be considered:
- Who provides the biggest value to the business?
- What data is needed and how is it best accessed so that it is easily analyzed?
- Is the right technology in place?
- How can the massive amounts of data available be processed without overwhelming existing infrastructure?
The conundrum of the data scientist is not unlike the 1970s and 1980s, when the programming world was at the mercy of a handful of “UNIX gurus.” Increasingly, the notion that companies need to hire an army of data scientists with “superhuman skills” simply reflects the scarcity of the right tools required to tame the data.
The tools problem is three-pronged. The first – data tools tend to be geared towards programmers rather than to the modelers and statisticians who typically better understand the business problem, but for the most part would rather not write code.
Secondly, conventional, well-established tools that work effectively on smaller data sets tend to fall apart when grappling with much larger data sets. There is no doubt that unearthing insights from several petabytes of data requires a different approach from one that deals with just a few gigabytes of data, especially when the larger data sets exhibit the characteristics of volume, variety and velocity. Even if one can get the tools to work, the productivity associated with these tools is sub-optimal in terms of what it takes for a human to wrestle the tool to the ground in order to gather intelligent information. Additionally, the performance exhibited by the tools is inadequate and it often requires a significant amount of time to execute most jobs.
Finally, it takes a few super-humans (or at least a cadre of data scientists) to put together an interlocking set of tools from the current data ecosystem – some open source, some not – and to get these tools to play with each other nicely.
This combination of factors has given rise to the notion that more data scientists are needed.
But today, companies do not need to find polymath data scientists who are well-versed in Computer Science, Statistics and Predictive Analytics. Instead, companies can look to build or acquire better technology tools that empower business analysts and others to conduct work that is currently thought of as requiring high-powered data scientists.
There are tools available that companies can acquire that resolve the above issues and connect the dots for organizations by providing smart, usable data – actionable information that helps to advance their goals. Further, the right technology often enables data scientists to do this in seconds and minutes as compared to hours and days that it used to take and entire team of data scientists.
One such example is the open source, massive-parallel processing platform HPCC Systems (High Performance Computing Cluster), which solves complex data problems by employing a harmonized platform to fabricate, link and analyze data. This system provides a single, unified environment that ensures speed and efficiency.
This type of newer, more functional technology most often positively impacts the efficiency of a business and helps spur innovation as data scientists can now work with the correct tools, rather than fight with the wrong ones. As these new approaches and tools are being brought forward, data scientists are able to shift their focus to evaluating and using data in more creative ways, allocating their time and resources to additional problem solving. This redirection is leading many companies to the realization that they may already have the right talent in-house, likely in the form of data scientists, data analysts, modelers, or in a combination of roles, thus directly undermining the perception that more data scientists are needed. With the right set of tools, companies will no longer have to limit their pace of growth, because they will not be constrained by an artificial need for resources that they do not have and cannot procure.
Companies that take the time to acquire or create tools enabling developers, modelers and business analysts to be more productive are setting a new standard by which modern businesses can be measured. These organizations are not only in a position to tap into technology that advances their own objectives, but they also can benefit society at large.
Flavio Villanustre, VP, Infrastructure & Security, HPCC Systems
Image Credit: Sergey Nivens / Shutterstock