Skip to main content

How a grand ambition eased our data complexity

(Image credit: Image Credit: Flickr / janneke staaks)

The bigger you get as a company, the more careful you have to be to keep your IT and data systems in check. For a business that processes billions of interactions a day, it’s no easy feat. A few years ago we transformed the way we ingest, process, clean, analyse, and report corporate data and now have a more streamlined operation that sets us up for success as a growing company, but it wasn’t always that easy.

3.5 billion people visit a website hosted by WP Engine every single day. That’s a huge number that comes with a lot of complexity. On top of that, we’ve had massive growth since our founding in 2010, and the growing pains that go along with it.

This was a question of scale – more specifically ramping up the servers, processes, corporate data and, many other assets. We had to take action proactively before we were forced to do so by systems that started to creak under the pressure of the nine billion new records WP Engine pulls in every day. We had to architect for the future if we wanted to optimise both workflow and innovation. 

Wouldn’t it be easier to just take all the data from the various sources we have, locate it in one place and then pull the reports from there? I wish it were that simple. The thing that makes this more complex isn’t just the data itself but the environment it lives in, the people who access it and legislation that affects how the data is treated. Not all customers have the same requirements, not all employees are technically adept as each other and not all countries have the same data laws.

We wanted to streamline the data from half a million websites and systems of record into one accessible place. We wanted to give our employees access to that data and make it easier to run reports on the data. At the same time, we wanted to be able to integrate everything into WP Engine’s existing systems. It was an ambitious project but we found a solution in what we call the Grand Central Data Station.

Finding simplicity and avoiding chaos

Think about Grand Central Station in New York. Even if you’ve never been Stateside, you’ve probably heard of it. It’s a feat of engineering and planning. Grand Central Station has 750,000 people pass through its doors every day. It has 44 platforms across its mammoth 48 acres yet all of its services coexist. It’s one of the busiest train stations in North America but still serves a purpose for modern commuters. We wanted to take the same approach with our Grand Central Data Station.

Simply put, our Grand Central Data Station cleans the data we have and documents the results. There’s a series of steps we had to take to get it to where it is now, as a successful solution that can scale with the business. First things first, we had to carry out an audit of all of our corporate data so that we could understand what types of data we had and what our data sources were. We also needed to do the same thing for our system of record that govern functional areas in specific applications.

Once we completed this stage we then had to ask where we were going to keep all this data, with the secondary question of how to get it to this destination. We decided to use BigQuery as the data warehouse where we would store everything. This enabled us to create a data lake but this approach usually comes with its own security issues because not everyone should have access to all the data we were storing. To combat this issue, we created datasets to organise views and tables. This allows for access to be granted to specific teams, only enabling them to view the data that applies to them.

Reaping the benefits

So far so good? Well, funnelling all of the data isn’t without its challenges. That’s because each system of record has a variety of data sources to pull from and some are even housed behind an API. We soon realised that the only way to integrate data across such a broad data environment would be to code a solution ourselves. After selecting Looker as the BI tool to query, visualise and report, we ended up with the Grand Central Data Station that we envisioned.

Our Grand Central Data Station is refreshed every hour and gives us a clear audit trail and grants the highest level of transparency to all who need it. Employees can easily see how all values are calculated with immutable source records and they also have access to searchable documentation on any specific topic they need. The system provides a comprehensive view of the meta-data from the entire company. It’s both auditable, documented and also has controls and security baked in so there will be no concerns about regulations further down the line.

All in all, our Grand Central Data Station has helped our business navigate rapid growth and position ourselves to innovate for the future. Now the data is unified, centralised and defined in the same way across the business, we can be assured we’re all speaking the same language. And with that newfound synergy, we can now work in a more joined-up way across all projects to secure additional growth in years to come.

Fabio Torlini, Senior Vice President and Managing Director, International, WP Engine