Recent events have shone a spotlight on data centers and the need to make their network fabrics more flexible and automated. We saw a dramatic increase in remote work because of shelter-in-place orders during the first wave of the pandemic. Some home workers experienced interrupted network services and were unable to access shared resources in enterprise data centers. Unexpectedly, the problems were not so much in the wide area network (WAN) operated by communications service providers (CSPs), but originated with enterprises and the way that their internal network fabrics had been architected.
Wide area networks, and the loads they are subject to, are unpredictable and as a result CSPs tend to be conservative and build in a lot of redundant capacity to cope with rapid shifts in demand. That being said, CSPs also benefit from an averaging effect for traffic across many customers and applications. Similarly, CSPs also architect their networks differently, creating multi-point to multi-point network fabrics. These approaches ensure network availability even under unusual circumstances – such as the recent pandemic.
In contrast to the WAN, enterprise networks – which provide network connectivity between data centers, head offices and branches – are relatively stable and predictable. Enterprises also tend to organize their networks hierarchically reflecting the head office/centralized data centers/branch office tree topology. The applications in the enterprise are also well-known, as are their traffic patterns. As such, enterprises tend to only build for what they need, without much margin for change other than steady growth.
The main function of this traditional enterprise structure is to move traffic from branch to data center and from branch to branch. In the cloud era, however, traffic patterns have changed. Many enterprises have adopted hybrid clouds comprising a mix of public and private cloud, with applications and workloads running across both on- and off-premises data centers. Most traffic now flows from branch to cloud as users access applications and workloads running in, and distributed across, hybrid cloud data centers.
Many enterprise IT organizations weren’t prepared for the dramatic shift in traffic patterns due to the pandemic, as most workers left the office and began working remotely from home, accessing applications and running workloads in the hybrid cloud – all with the same secure connectivity and team collaboration needs they previously had in the office. These remote workers significantly increased the network security and collaboration tool demands, and the need to access both enterprise and cloud-based applications resulted in both cloud providers and enterprises needing to scale applications and reconfigure their data center networks to cope with these changing traffic patterns. With limited flexibility and automation in their data centers, enterprises were impacted the most. As a result, many have accelerated their data center transformation plans to adjust to the new employee working model which may be here to stay.
The challenges of a data center fabric approach in a Covid world
While most enterprises that have adopted a data center fabric approach – which utilizes a leaf-spine architecture, full-fledged Internet Protocol (IP) routing and Virtual Extensible local area network (VXLAN) – many have found themselves challenged in today’s environment. Containers and microservices, combined with generalized remote working and expansion of cloud-based applications and workloads are causing traffic to explode. In the current cloud paradigm, IT requires a more resilient and dynamic network fabric that is agile enough to automatically adjust to the rapid shifts in users accessing distributed applications and workloads running across hybrid cloud environments.
The problem with existing data center networks based on leaf-spine is that despite their improvement over tree topologies, they’re not dynamic enough. It only takes minutes to instantiate new compute resources, but it can take hours or even days to manually reconfigure the network fabric. If the fabric cannot be re-configured dynamically to keep pace with data volumes and workload changes, applications will underperform.
Transforming to next-generation data center fabrics
There is a need for a more dynamic network fabric, which adds Border Gateway Protocol (BGP) and Ethernet virtual private network (EVPN) to the existing VXLAN, IP routing and leaf-and-spine topologies, and employs a more flexible and totally open network operating system (NOS) that is seamlessly integrated with DevOps goals of continuous integration/continuous delivery and consumption of networking with the same ease and speed as compute. Such a next generation data center fabric approach that leverages DevOps principles and practices known as NetOps enables rapid development of network applications, automation of network operations and easy integration with cloud-native environments. Next generation data center fabrics that use an open network operating system (NOS) and a NetOps approach can adopt innovations that address these needs and provide significant operational benefits, including:
- Model-driven architecture – Using an open, extensible and resilient NOS with a model-driven architecture allows network applications to define and declare their own schemas, enabling them to set configurations and retrieve fine-grained system state and data with push-based streaming telemetry. This ensures that data center operators have full visibility of the state of the entire fabric, allowing operators to manage the network just like they manage cloud applications, compute and storage resources.
- Certified fabric designs – Network engineering doesn’t have time to work out fabric designs using spreadsheets and manual templates. This can all be handled automatically simply by using an intent-based language such as YAML that automatically determines fabric design “as code” based on pre-determined templates such as numbers of racks and servers per rack, microservices endpoint locations, as well as quality of service and security parameters. Network vendors need to provide certified designs that include best practice domain knowledge to ease the engineering burden of learning and testing new designs, while lowering the operational concerns of deploying them.
- Digital sandbox – DevOps teams want deployment speed and flexibility, while network operations teams must manage change management risk and uptime. With a properly sandboxed NetOps environment, the fabric can be designed, fine-tuned and pre-validated virtually using a digital-twin model before being deployed physically to meet the goals of both teams. Such a digital sandbox is an operational tool that provides a true emulation of the data center fabric, creating a virtual digital twin of the physical network. It can be used to emulate a data center fabric and its application workloads and can emulate Day 0 fabric design, Day 1 fabric deployment and Day 2+ operations such as fabric change management and troubleshooting. The digital sandbox approach allows data center operators to reduce risk and ensure that network operations are not disrupted while network applications are validated, deployed or de-bugged.
The goal of this architectural shift in data center fabrics is to make the network an extension of the application layer, or as some call it to become ‘fabric as code.’ The industry needs to evolve to consider the network as an integral part of delivering ‘x-as-a-service.’ The data center NOS cannot be a closed, proprietary system, but needs to be an open development platform on which to build and deploy network applications. Together with a NetOps approach, operators can rapidly develop new network applications, automate data center operations and integrate them easily with cloud-native environments.
Enterprises IT managers have a lot on their plates these days. Adding automation and flexibility to their data center networks has likely been on the to-do list for a while, but it has taken the pandemic to wake them up to the issue in a new way. It often takes an event like this to promote innovation and change, as we’ve seen in a number of different ways – including the sudden rise of remote work to access cloud-based applications, which many industries hadn’t fully embraced.
The silver lining for enterprises is that by embracing the idea of an automated, programmable data center network fabric with an open NOS and a NetOps approach to operations and automation, they can do more than deal with the present and future crises. They can lay the groundwork for a new, more robust and resilient digital transformation of their businesses. With the move to cloud (and 5G and IoT), it is inevitable that the data center network fabric needs to become more dynamic and programmable. It has just taken a crisis to get people to start making it a reality.
Jon Lundstrom is Business Development Lead, Webscale Segment, Nokia