Skip to main content

How to build an agile data pipeline

(Image credit: Image source: Shutterstock/alexskopje)

Agility and data are two of the most overused buzzwords of the business community – and for good reason.

Every business wants to be agile, to be responsive to the changing environment, to survive and thrive. Likewise, forward-thinking businesses are majorly focused on data as a route to greater insights, creativity and efficiency. It seems buzzword squared to put these two concepts together, but rather than being a technology to hype, it refers to a smarter way of managing with what enterprises already have, or with readily acquired skills.

An agile data pipeline is what data-centric organisations are putting in place in order to make the best use out of their data investments and ensure that the business can incorporate data-led analytical decision-making in a healthy and sustainable way.

As with any business process, building an agile pipeline involves several stages and should properly encompass a range of appropriate stakeholders within the business. As it is, that’s not always the case as many organisations tend to develop their analytics functions in a higgledy-piggledy manner.

It’s no surprise that the data estate of a business can quickly grow out of control – the four Vs of big data, as defined by IBM are the variety, velocity, volume, and veracity of big data and show that data is no monolithic thing. It’s a living, changing entity. So fluid in fact, that in 2017 Experian built on this format and added two more Vs: Vulnerability and value.

So how do you corral and harness the bucking bronco of data and put it behind the corporate plough, to turn up the nuggets of true insight?

Catalogue shopping

A data catalogue makes storing, finding and using data a much more seamless experience. It’s an organised solution that allows business users to explore data sources and understand them. It saves the user time and can stop them recreating new data if they might have failed to find what they wanted in a non-catalogued state. It’s a great resource to keep the analytical process ticking over at speed, without slowing down the work of data scientists or ‘line of business’ analysts.

A faultless data catalogue doesn’t arrive fully formed, and the history of data governance integrations is littered with solutions that have failed to achieve a critical adoption in an organisation. To truly deliver on a data catalogue the business must also focus on the people and the process, not just the technology. Analytic leaders must build a culture that enables users to succeed with data.

Discover together

Data discovery can be fun, but it’s a hygiene factor that the analyst needs to get through before they can do the job they want to: Analytics, insights, and adding value to the business. Really, the organisation wants to unite all of the data workers with the data and analytic assets they could possibly (but legitimately!) need in a controlled and secure way. It’s important to take steps to make data both searchable and trackable. A platform will offer this and event data lineage, offering more visibility for better governance. When data discovery and data security are breathtakingly easy, there’s no room for data governance missteps. It’s a great first step before an enterprise can create a culture of collaboration, sharing, and innovation by extending formally tribal knowledge across the organisation.

Culture the data culture

The data catalogue is the starting point for most analytical activities. Searching and finding content, understanding context and gaining trust in the results through community feedback and interaction – it’s a great resource when it’s used correctly, saving time and energy, and greatly aiding productivity.

The success of the catalogue is tied into the success of the organisation. Track and reward the most active contributors who add value to the analytic process, understand the assets that are creating the most impactful results, and promote those users to ensure that information assets are well curated and maintained.

The right data culture is socially engaging. It empowers users to impart and share knowledge, and is supported by technology that supports the different ways that users bring their experience together to solve problems. This includes creating and annotating definitions, discussing quality and purpose in conversation threads, and even simple social gestures like sharing a link or giving a 'thumbs up' reinforce the value of the underlying asset and make it richer and easier to find for future users.

Collaborate or die!

It might be that during the course of the pre-data-focused days others in the organisation have already collected the same information or performed a similar analysis, but different analysts have no good way of finding it. Data assets and resulting information proliferate, thus compounding the problem and creating inefficiencies and delays in answering critical business questions.

Taking a cue from social media and wiki techniques, social interactions can help users share and utilise organisational tribal knowledge easily. And everything in the analytic process: Data, analytic apps, workflows, macros, visualisations, and dashboards, should be sharable. When everything is seamlessly shareable and it is fast work to identify trusted information assets as well as insights into how they are used and lineage, it’s very simple to make more impactful business decisions.

One of the most important pieces to this is closing the gap not only around finding the right data but around the roles within an organisation: Between IT, business analysts, data scientists, everyday ‘citizen data scientists’, and onwards to all who use data. Sharing across an organisation is the grease to the wheels of innovation.

Define the best working practices

From the moment you embark on analytics project you stand at a base camp with the peak of expectations staring at you from across the chasm of ignorance. Building a social repository of all the organisation's data sources, reports, workflows, terminology, and more (potentially thousands of lifetimes of accumulated knowledge) is as daunting as climbing Mount Everest. So, don't.

Start small, but think big. Tackle smaller challenges to get some early victories and build momentum from there.

  • Pick a single department or project. Perhaps start with a handful of critical datasets
  • Document expertise while reports and data sources are being created, before the skills and the knowledge leaves the project (or the company!) Ensure that new people can understand the function of dashboards, reports other datasets
  • Follow your business strategy: Document and socialise the assets associated with key strategic projects, and use the catalogue as a means to change the culture towards greater collaboration
  • To ensure adoption, it’s vital that users find the information always up-to-date. Without timeliness, the catalogue immediately loses trust and credibility and the pipeline starts to leak
  • A business glossary is a critical component of your data strategy. A glossary can take many forms: definitions, concepts, subject areas, etc. It captures the unique language of your organisation in a central location, and then connects that meaning with the contents of the catalogue
  • A proper analytics pipeline lives-or-dies on whether users find value in the information within. There is no-one central to the organisation, not even BI and IT teams that have a 100 per cent understanding of all those data sources, data sets, and reports and other types of assets. This expertise and 'know-how' is in the heads of staff: Business teams, analysts, knowledge workers, analytics groups, and more. It's pervasive and waiting to be harnessed

Trusting data

It’s one thing to have data, it’s another to trust it and use it properly. Famously executives relied on their experience, their ‘gut’, when making decisions, and sometimes, that’s not a necessarily a bad idea. Where data is not cleaned, rated and trusted, it might not be worth the time to review. But where the right steps are in place the data can tell a very honest and trustworthy story. It is a better resource than the thoughts and opinions of an executive who may not have access to all the facts, the long-term trends, or the powerful analytical ability to correlate all their contents appropriately.

So to stock the data pipeline put in place some simple best practices, encourage your people with good processes and give them the technology that makes this all easy. We’re not in the days of needing to know how code to operate analytical tools, and end-to-end platforms take out the sting of finding, moving, prepping and using data. In fact, stocking the analytics pipeline should be a breeze, exhilarating, process, the opening stages in a virtuoso performance by a data maestro.

Nick Jewell, Director of Product Strategy, Alteryx (opens in new tab)
Image source: Shutterstock/alexskopje

Nick is Lead Technology Evangelist at Alteryx. He works within the product management team to present its end-to-end platform vision, as an evangelist with analysts, data scientists and the public.