Automating data centre operations with intent-based networking - Yahoo Japan shows how

null

A 2017 report, from Gartner emphatically points out that “Digital business initiatives will struggle unless CIOs and business leaders change the way they think about networking.” Gartner states that “by 2022, the percentage of enterprises that deem networking core to their digital initiative success will increase to over 75 per cent”, up from 25 per cent today. Gartner believes that enterprises should consider networking to be a profit centre rather than a cost centre. In fact for the first time since the early 2000s, Gartner also urges CIOs to make “Networking a critical strategic infrastructure resource for enabling digital business.”

Clearly, complexities, inefficiencies and high costs plague data centre network operations today and prevent organisations from delivering on their digital transformation goals. Digital transformation requires successfully eliminating these complexities to achieve log-scale improvements in network infrastructure CapEx, OpEx, and capacity.

We are living in pivotal times. Compute power has reached unprecedented levels and is affecting how we do business at every level. Companies who embrace those technological advances accelerate their business velocity by orders of magnitude and develop an unfair advantage versus their competition. Technology is driving business velocity, and is increasingly determining winners and losers.

Yahoo Japan is one of those winners, they have embraced technological advances and are among the most innovative, technologically capable and credible companies anywhere. The company is the largest Internet provider in Japan. Stronger than Google in search and mail, stronger than Netflix in video streaming, stronger than eBay in auction and stronger than PayPal in financial transactions.

Yahoo Japan is a webscale company; and like other webscale companies, they’re seeing an explosion in their growth. For this reason, they have built their data centres using the same state-of-the-art principles other webscale companies have adopted: 

  • A leaf spine Clos design, which accommodates large amounts of East-West traffic, which is essential to support today’s web applications
  • A multi-hardware vendor strategy, leveraging both established hardware vendors (Arista, Cisco), and open alternatives (Cumulus and OCP hardware)
  • Major investments in automation and analytics for efficient scalable operations. 

Also like other webscale companies, Yahoo Japan embraced a disaggregated approach, separating into a hardware layer, a network operating systems (NOS) layer, and an automation layer. 

In their search for the right operational model for their data centre network, Yahoo Japan had three options:

  • Use the automation software provided by the hardware vendors.
  • Build it themselves, also known as DIY.
  • Find and use a vendor-agnostic automation layer.

The company quickly concluded that the first approach didn’t work for them because of their multi-vendor hardware strategy.  They also decided against building their own automation software, because it requires hiring and retaining a large team, at a high cost over many years. 

The first two choices are flawed. With Choice 1, hardware-vendor provided software will lock an organisation into that hardware vendor and it is exceedingly hard to pursue a dual-vendor strategy. How can you believe that switch vendor C would support switch vendor A's hardware even if vendor C claims so, when A is C's most feared competitor? 

Choice 2 is fraught with danger and risk. I’ve seen organisations spend $20M over several years investing in DIY and get nowhere. DIY requires the ability to hire dozens of top software engineers, and have them focused on building the solution from the ground up. As important, DIY requires the ability to retain those top engineers so they are able to support the solution they’ve built over many years. 

To quote Tsvi Gal, CTO at Morgan Stanley, “the worst vendor lock-in is our own... We are basically locked into our own environment.”

So the company chose option 3 and decided to use a commercial offering to handle automation of the data centre.

Choosing an automation platform

Yahoo Japan’s list of requirements for an automation platform were significant. Among the features they wanted and found – were:

  • A highly scalable distributed data store.
  • Abstractions that capture user intent.
  • A graph representation of all intent and infrastructure state, which captures in real-time all the relationships between objects, e.g. user intent, topology, physical elements (including switches, interfaces, transceivers, links), logical elements (virtual networks, security zones), and telemetry.
  • Extensible telemetry agents that can extract telemetry across platforms.
  • Device drivers across various vendor devices used to both configure, and extract telemetry from these devices.
  • Design tools that architecture teams can use to design data centre pods in a matter of minutes.
  • Build tools to stand up pods in minutes.
  • Continuous validation engine that generates anomaly alerts in real-time anytime infrastructure state deviates from intent.
  • Web interface that one can use to design, build, deploy, and operate these networks with unmatched simplicity.

Historically, such cross-platform, vendor-agnostic capabilities were not available to webscale data centre companies. They were forced to either use their hardware provider(s) automation tools, or build their own automation solution. 

Fortunately, the market has matured, and there are companies that are focused on serving this market with data centre solutions that avoid hardware lock-in, and also save data centre operators from having to invest in reinventing the wheel and building their own automation platforms. Webscale organisations can now focus management and development talent on areas that are strategic to the business – and not on building automation software for data centres.

Mansour Karam CEO Apstra
Image Credit: Welcomia / Shutterstock