Skip to main content

Learning from 2020: Decentralized workforces need decentralized data

data
(Image credit: Shutterstock / carlos castilla)

The goals of data-oriented projects follow a similar trend. Typically, these projects attempt to use the powerful and valuable data companies generate to create predictive, insight-driven decisions and products. However, we frequently see the shortcomings of the data platform causing teams to become disillusioned and begin working around the platform, finding ad hoc ways of acquiring data, resulting in silos, inconsistent analytics and reporting, and duplication of effort.

Working in this haphazard way often leads to business units being unable to scale out their analytics capability and it can become incredibly difficult to integrate analytics into production products and services. It also doesn’t help provide any meaningful ROI, and these issues have only been exasperated in the past year. 

I likely do not need to spend time explaining what the past year has been like for workforces, or to use phrases like disruption, unprecedented or new normal. But this has made the issue of integrating data-centric approaches across the business even tougher. In traditional data organization structures, where data departments are remote, they often find they aren’t able to keep pace with the data changes by each business area – and that was before 2020. This siloed approach that we can all understand from a year of working from home, mirrors the wider complexity around harnessing data – and a culture shift is very much required.

The longstanding bottleneck

Traditionally organizations rely on centralized analytics and a data department to provide support in innovation pipelines, products and projects. Having a centralized function though, often creates a bottleneck. If other teams can’t get access to the data they need and also understand it quickly, as is often the case, then they will find other ways to solve their problems and try to pull and utilize their own data. 

There are no guarantees the insights they find will be valid or will stay valid as the data landscape evolves and there are often real issues getting those analytics pipelines into production. This results in low-quality data, with questionable reliability, which in turn creates this lack of trust in the data platform. 

Even when teams do engage the data function, it’s often overlooked until late in the project, creating further slack and de-incentivizing the team to follow the process. So rather than engaging with it, the centralized data team is often cut adrift, and therefore won’t have the domain knowledge to apply context and understanding to the data. This shows that while the appetite to utilize data is certainly there, there is a clear need to spread the expertise throughout the organization.

Breaking out from the center 

The approach should be to decentralize in order not to de-incentivize. Business units and development teams should be responsible for managing their own data provision and consumption, as well as being given the authority to change their data domain freely and easily. 

This way the business units and development teams take the problem into their own hands, reducing the latency for data change. Business units and development teams are best placed to apply intelligence to data to find insights to help improve their projects and products, and are more likely to be able to keep pace with the constantly moving data terrain. So even if a team operates in a silo, there is already a data expert in that silo. 

Another area which benefits from a decentralized data organization structure is in experimentation. Experimenting with data can come in various forms, whether its machine learning experiments or creating product changes for targeted demographics and measuring the success, the ability to change quickly and often, and then measure the outcome is key. The length of the change and outcome measurement cycle has a direct correlation with an organization’s ability to innovate. 

By creating distributed data-centric teams with high levels of data literacy, these teams are encouraged to experiment, innovate and iterate quickly, reducing this cycle time and increasing innovation within an organization. For any company, there is a steep learning curve to applying new technical, structural paradigms. To reduce risk and embed new ways of working effectively and efficiently, these teams should strive to learn quickly and apply across domain verticals.

DataOps – continuous delivery

If we go back ten years, most IT departments were separated into sub-departments, with one sub-department being operations. Operations were the gatekeepers for software and releases. 

Then came the advent of feature teams and DevOps. Suddenly feature teams, working together through all of the sprint ceremonies, design and discussions included the work

necessary for repeatable, reliable releases. These releases require a cursory sign off from a centralized team, with a view to ensure governance and policies have been adhered to. Which leads us to move one step further, to DataOps. 

One of the common hurdles businesses are experiencing is taking analytics into production. DataOps as a practice differs from DevOps and requires a specialized focus. By removing the central data team from the responsibilities of implementation, the communication becomes direct between the business experts, which has the benefit of improving data literacy and understanding. By incentivizing communication between the different data domains, each department increases their data vocabulary, slowly but surely becoming organizational information experts.

Much like we have seen with feature squads including QA and DevOps, where planning how a change is tested and released through different environments and to different customers is part of the doctrine of feature design, we are now in a position to apply this to data. An embedded data engineer and or analytics capability within the team, during design and even implementation time keeps a data-centric voice nearby, encouraging fact-based decision making. A data engineer is more likely to promote data-based decisions and ensure it is always core to product or feature development, no matter the team or the working situation. 

Enabling the future business 

In data and therefore DataOps, the flow, monitoring and orchestration of analytics and data pipelines takes a different skill set and involves different steps in the cycle. The concerns of a big data platform involve a different focus on scale, robustness and a constantly moving terrain. There is a direct correlation between data teams who contain DataOps capability, even if that is with data engineers, and their ability to apply analytics cycles into production.

Data, and certainly big data, is a comparatively young technology practice and there is still a lot of work to do. Maintaining a centralized data capability can create hurdles for change and innovation. If companies want to create data-centric thinking within their organization, where business units and development teams integrating data and analytics into their processes, projects and products, then confluence of data and business, delivered via DataOps, must be core to the thinking. Especially if the workforce is spread across the country or borders, all in their home office, where it’s far harder for a central force to truly affect all the teams around it. 

Data, much like the evolution of other areas of software development, must be there to enable the business. Data and especially big data is a substantial problem and when you have a monolithic data platform, it can be very tricky to solve everyone’s problems. By fostering the right culture and ensuring they have the right skills in teams, companies can ensure constant delivery of new features – fuelled by continuous and accurate insight.

Ian Cowley, Principal Consultant Data Engineer, Amido

Ian has been the Principal Data Engineer at Amido since 2018 and oversees the process of designing and delivering key data engineering projects within the company.