Data scientists spend a lot of time doing things they don't like, such as sorting out problems with unprocessed information, but they still love their jobs according to a new survey.
The second annual Data Science report from data enrichment platform CrowdFlower shows that there’s a perceived shortage of data scientists, with 83 per cent saying there aren’t enough to go around, up from 79 per cent last year.
The results of asking how data scientists spend their time are revealing to. They spend 60 per cent of their time acting as "digital janitors" cleaning and organising data prior to processing. Only nine per cent of their time is spent mining for patterns and only four per cent building algorithms, the sort of tasks that we think of data scientists performing.
When asked which part of the job they enjoyed least, 57 per cent named the data wrangling aspect of cleaning and organising information. Collecting data sets was cited by 21 per cent. The tasks they do the most are therefore the ones they get least enjoyment from.
Yet despite this data scientists overwhelmingly happy in their work. When asked to rank how happy they felt in their current position on a simple five point scale, 35 per cent gave it a five and 47 per cent a four, meaning that over 80 per cent like their jobs.
The survey also asked respondents if they felt they had the right tools to do their jobs. Just 14 per cent disagreed, indicating that enterprises are committed to giving data scientists what they need to succeed. When asked about the skills that are most in demand, SQL came out top on 56 per cent, followed by big data favourite Hadoop on 49 per cent, Python on 39 per cent and Java on 36 per cent.
The report concludes, "As more and more organisations adopt data as a key driver of decision making, the importance of streamlined, well-oiled data science teams is going to remain paramount. But the current status quo probably isn't sustainable.
"On the one hand, we see a shortage of data scientists while on the other, they’re spending too much time cleaning and munging data. This is time that could be much better served doing predictive analysis and building out machine learning practices".
You can find out more about the report's findings on the CrowdFlower blog.