Interview: Cashing in on the big data goldmine

With the big data technology and services market set to reach $6.8 billion in Western Europe by 2018, organisations are finding themselves sitting on a data goldmine.

With this boom, organisations across every industry, vertical market and location face the challenge of transforming increasingly diverse and complex data into something meaningful and valuable.

We recently caught up with Adam Wilson, CEO of Trifacta, to discuss how organisations can benefit from data wrangling to turn their data deluge into intelligent data and his predictions for the Big Data economy in 2016.

1. You’ve coined the term 'data wrangling', what does that mean and how does it fit in with the analytics cycle?

Data wrangling describes the crucial first steps in preparing data for broader analysis. These steps – discovering, structuring, cleaning, enriching, validating and publishing – consume a significant amount of time and effort, as much as 80 per cent of the data analysis process. Many people view wrangling as “data janitorial work” - a necessary evil before sitting down to do “real” work.

Data wrangling is just as important as the final results are in the analysis cycle. Properly conducted, wrangling gives you insights into the nature of your data that then allows you to ask better questions of it. Wrangling is not something that’s done in one fell swoop, but iteratively. Each step in the wrangling process exposes new potential ways that the data might be “re-wrangled,” all driving toward the goal of generating the most robust final analysis.

2. Big data is such a buzzword – how can organisations turn the big data deluge into intelligent data?

In the big data ecosystem, organisations ultimately must turn data into innovations and efficiencies. This means examining a wide variety of disparate data sources, including new variably structured sources, as well as new slices and aggregations of existing datasets.

For example, banks and financial institutions are striving to better understand their customers by leveraging data more effectively. Everything from mobile, web, email, ATM, and social media interactions enables these institutions to better understand the needs of their customers and provide more personalised experiences. That said, working with large, diverse data sets is a massive undertaking. Many big data initiatives fail because organisations fail to align technology investments with business initiatives and don’t provide their users with the appropriate applications to effectively access and make use of the data.

To take advantage of all of this data in a meaningful way, organisations need to augment their infrastructure investments with user-friendly tools empowering analysts to explore and transform these large volumes of diverse data, reducing the time spent cleaning and structuring data for analysis.

3. What are your predictions for the big data economy in 2016?

In 2016, we will start to see big data proof of concepts and prototypes mature into initiatives that transform organisational strategies and industry dynamics. As executives gain a better understanding of how to implement the technology and analytics practices to gain the most valuable insights, initiatives will improve and emerge as repeatable processes that can be put to regular use. This maturing of data projects is further fueling the booming big data market in Europe.

4. What has led to the emergence of self-service data-preparation capabilities?

With the rise of big data, data scientists have become ubiquitous in almost every organisation for their ability to interpret and find value from this influx of data. While most organisations typically have at least one data scientist on staff these days, data is useful to nearly every part of a business and doesn’t necessarily require the input of a PhD to provide value. As organisations have begun to realise the value of having data in the hands of more people, self-service analysis tools like Tableau have grown in popularity and value. Whether you’re a marketer, small business owner or an educator, you can benefit from making data-driven decisions at some level. However, before analysing data in an analytics system like Tableau, the data must be effectively wrangled prior to analysis. As more end users analyse data themselves, the importance of intuitive data wrangling solutions has also developed, and is growing exponentially.

5. Data security is a contentious issue at the moment – how do you ensure that users only access their data and not someone else's?

Ensuring secure data access is an essential part of Trifacta’s capabilities, as is providing security in a way that is user friendly enough that it doesn’t hamper the data analysis process. Rather than drape a new security layer in Trifacta on top of the existing components, Trifacta provides the flexibility to use standardised Hadoop frameworks already in place, including support for Kerberos security and user authentication through LDAP. This integration ensures that user access to data is consistent and hassle-free for the end user.

6. With the big data technology and services market set to reach $6.8 billion in Western Europe by 2018, what do you see as the key drivers of this movement?

We are working with users at more than 500 organisations in 35 countries across Europe, each finding real value from data. What we are seeing is that projects are moving beyond prototypes and proof-of-concepts and becoming integral to business success. Big data has moved beyond buzzword and is now a competitive advantage for nearly every organisation and every business unit within that organisation.

7. How can organisations ensure that they keep up with evolving data governance needs?

Our biggest investment in 2015 was in security and data governance. We chose to focus on three foundational capabilities: security, metadata and lineage, and operationalising transformation workflows.

As prototypes evolve into organisation-wide production, projects and companies will need to manage a fast-growing user base in conjunction with large, diverse data sets, making all three above capabilities crucial for data-driven organisation operations.

Image Credit: Shutterstock/McIek