The future of cloud computing can be seen in serverless functions and storage technologies will have to adapt - again. Innovations in serverless computing promise a new era of auto-scaling, elastic, subscription-based, pay-as-you-go software architectures that align perfectly with increasingly popular SaaS business models. But before serverless compute platforms can join the mainstream, significant technological hurdles in the rest of the workflow will need to be overcome. We can be sure huge efforts will be made across the computing industry to surpass any obstacles as quickly as possible and unlock its intrinsic potential.
It is not clear that every storage vendor is currently prepared for this transition. Major innovations in computing typically, and swiftly, mandate necessary innovations in storage technology. The coming shift to serverless software design will not prove different, and storage platforms will need to be ready for what lies ahead.
In reality, the term “serverless computing” is an oxymoron; compute cannot happen without servers. However, it is possible for hardware to be abstracted so far from its developers that they feel they never deploy their code to a physical machine, even if this is not actually the case. Still, with serverless application architectures, developers might never really have to consider the infrastructure they are working with at any point in their design process.
Developers interested in experiencing this “freedom” from infrastructure will need to modularise their applications into a series of independent, “stateless” functions which will then be executed using a developer-defined trigger.
It is already the case that public cloud providers offer easy-to-integrate triggers for most of their cloud services. In AWS, for example, developers can configure a function to run when an object is uploaded to a specified bucket. The ability for third party applications to trigger functions is also available via the AWS SDK or API. This is of course also true of the other public clouds.
But serverless technology still has some distance to travel before it reaches its full potential. Auto-scaling pay-as-you-go stateless functions are great, until they’re not. Fundamentally, most customers are still busy virtualising or containerising applications, so rewriting an application into many serverless functions is still a long way off. When application rewrites do occur, or when new application development begins, developers find the job of implementing massively parallel systems comprised of stateless functions much more difficult to manage than they might initially have supposed.
It seems that, like “serverless,” the idea that developers won’t ever have to worry about application state by rewriting their code into many cloud functions is also an oxymoron. In the case of functions, “state” might not be stored in the function itself, but it does need to be stored somewhere, usually in shared storage such as S3. And here is where we come across one of the major pain points in this innovation. Shared storage can be slow and can be expensive when read frequently. So companies are having to pay for the time it takes to execute the function, as well as the time it takes to store and retrieve data from the shared storage. The pinch is felt most keenly when a function is spun up for the first time and needs to load all of its code dependencies. For now, the major function-as-a-service (FaaS) platforms must load the entire codebase of a dependency library instead of just the parts the function requires. This leads to cold starts downloading hundreds of megabytes of dependency libraries, driving up costs substantially.
It is common for developers to try to overcome the performance limitations of slow shared storage by implementing one of two anti-patterns. One option is to create caches on their functions which need quick access to data, resulting in an anti-pattern of high fan-out systems with caches everywhere. Another possibility is that they fall into a “ship data to code” anti-pattern where data gets passed around from function to function. Not only can moving high volumes of data in this way be highly inefficient, but it can also make functions exceedingly dependent on one another, thus losing the promise of a highly decoupled system.
Despite these limitations, solutions are on the horizon, and a new era of cloud computing is about to go mainstream. The widespread adoption of serverless computing will require next-generation storage to be instantly available with provisioning happening in milliseconds. As requests increase and new functions are created, a central data repository should be able to grow to meet the increasing capacity and performance demands.
Furthermore, additional storage innovation is required to ensure necessary data is as close to a serverless function as possible. So, the function can execute quickly, and of course as a result of this, costs are minimised for the customer. This will most likely require storage vendors invest heavily in virtualisation technologies as well as deep integrations with major public cloud providers in order to make it so their storage can automatically and quickly move data close to customer functions.
One of the main reasons S3 has been adopted by serverless applications is that it was designed for web applications and launched with already hugely impressive APIs and SDKs. To keep up, storage used in serverless computing will also need to be easy to access via applications and by functions via an easy-to-use developer SDK and API.
In my mind, there is no doubt that serverless computing is firmly on the horizon and it will be interesting to see how storage and file system providers will work to meet this new challenge. The storage industry is about to experience a massive disruption, yet again. Luckily, the companies at the forefront of this sector are uniquely positioned to rise to the challenge as they have done so often in the past.
Grant J. Gumina, Product Manager, Qumulo