Automating data center operations with intent-based networking – Yahoo Japan shows how
A recent report, from Gartner emphatically points out that “Digital business initiatives will struggle unless CIOs and business leaders change the way they think about networking.” Gartner states that “by 2022, the percentage of enterprises that deem networking core to their digital initiative success will increase to over 75%”, up from 25% today.
Clearly, complexities, inefficiencies and high costs plague data center network operations today and prevent organisations from delivering on their digital transformation goals, says Mansour Karam, CEO and founder of Apstra.
We are living in pivotal times. Compute power has reached unprecedented levels and is affecting how we do business at every level. Companies who embrace those technological advances accelerate their business velocity by orders of magnitude and develop an unfair advantage versus their competition. Technology is driving business velocity, and is increasingly determining winners and losers.
Yahoo Japan is one of those winners, they have embraced technological advances and are among the most innovative, technologically capable and credible companies anywhere. The company is the largest Internet provider in Japan. Stronger than Google in search and mail, stronger than Netflix in video streaming, stronger than eBay in auction and stronger than PayPal in financial transactions.
Yahoo Japan is a webscale company; and like other webscale companies, they’re seeing an explosion in their growth. For this reason, they have built their data centers using the same state-of-the-art principles other webscale companies have adopted:
- A leaf spine Clos design, which accommodates large amounts of East-West traffic, which is essential to support today’s web applications
- A multi-hardware vendor strategy, leveraging both established hardware vendors (Arista, Cisco), and open alternatives (Cumulus and OCP hardware)
- Major investments in automation and analytics for efficient scalable operations
Also like other webscale companies, Yahoo Japan embraced a disaggregated approach, separating into a hardware layer, a network operating systems (NOS) layer, and an automation layer.
In their search for the right operational model for their data center network, Yahoo Japan had three options:
- Use the automation software provided by the hardware vendors.
- Build it themselves, also known as DIY.
- Find and use a vendor-agnostic automation layer.
The company quickly concluded that the first approach didn’t work for them because of their multi-vendor hardware strategy. They also decided against building their own automation software, because it requires hiring and retaining a large team, at a high cost over many years.
The first two choices are flawed. With Choice 1, hardware-vendor provided software will lock an organisation into that hardware vendor and it is exceedingly hard to pursue a dual-vendor strategy. How can you believe that switch vendor C would support switch vendor A’s hardware even if vendor C claims so, when A is C’s most feared competitor?
Choice 2 is fraught with danger and risk. I’ve seen organisations spend $20M (€17.18m) over several years investing in DIY and get nowhere. DIY requires the ability to hire dozens of top software engineers, and have them focused on building the solution from the ground up. As important, DIY requires the ability to retain those top engineers so they are able to support the solution they’ve built over many years.
To quote Tsvi Gal, CTO at Morgan Stanley, “the worst vendor lock-in is our own… We are basically locked into our own environment.”
So the company chose option 3 and decided to use a commercial offering to handle automation of the data center.
Choosing an automation platform
Yahoo Japan’s list of requirements for an automation platform were significant.
Among the features they wanted and found – were:
- A highly scalable distributed data store.
- Abstractions that capture user intent.
- A graph representation of all intent and infrastructure state, which captures in real-time all the relationships between objects, e.g. user intent, topology, physical elements (including switches, interfaces, transceivers, links), logical elements (virtual networks, security zones), and telemetry.
- Extensible telemetry agents that can extract telemetry across platforms.
- Device drivers across various vendor devices used to both configure, and extract telemetry from these devices.
- Design tools that architecture teams can use to design data center pods in a matter of minutes.
- Build tools to stand up pods in minutes.
- Continuous validation engine that generates anomaly alerts in real-time anytime infrastructure state deviates from intent.
- Web interface that one can use to design, build, deploy, and operate these networks with unmatched simplicity.
Historically, such cross-platform, vendor-agnostic capabilities were not available to webscale data center companies. They were forced to either use their hardware provider(s) automation tools, or build their own automation solution.
Fortunately, the market has matured, and there are companies that are focused on serving this market with data center solutions that avoid hardware lock-in, and also save data center operators from having to invest in reinventing the wheel and building their own automation platforms. Webscale organisations can now focus management and development talent on areas that are strategic to the business – and not on building automation software for data centers.
The author of this case study is Mansour Karam, CEO and founder of Apstra
About the author
Mansour Karam is an entrepreneur and executive with a passion and successful track record for building high tech infrastructure companies from the ground up. As CEO and founder of Apstra, Mansour is responsible in setting the product vision and leading the company’s culture and business.