The data lake concept has been around for a while now. While definitions vary, most agree that conceptually, a data lake is a shared data environment that can handle a variety of datasets, including semi-structured and unstructured data. There are a variety of ways that companies use data lakes, from IT-managed environments for data to analytically-centric platforms for data science.
Data lakes have several attributes that make them good candidates for a public cloud deployment on AWS or Azure. Data is often large in scale and many times, it’s being generated outside corporate firewalls, as with sensor logs or streaming data. Ease of access to a data lake can be an important requirement, another potential benefit of the cloud model. And data lakes often have uncertain growth rates – a quick success could mean immediate expansion and scaling more easily accomplished in public clouds.
That’s why more are interested in cloud-based data lakes, aka data lakes as a service. Again, definitions vary. Some cloud data lake solutions are more storage-oriented, others may include analytic capabilities.
Given the rise in data lake adoption, it’s important to know where and when a cloud data lake as a service may be most effective.
Key drivers for a cloud data lake
Here are seven scenarios where a cloud data lake as a service is your best option.
- When you’re short on time. While early data lakes were often development projects that took months, now there are prebuilt platforms and services. Using a data lake service can jumpstart projects, speeding deployments. Recent years have seen a major increase in secure cloud services, so now AWS or Azure data lakes are viable for all, even in highly-regulated industries.
- When you run a lean team and don’t like troubleshooting. Getting a data lake implemented is just the first step. Data lakes require ongoing maintenance, optimisation and upgrades. A cloud data lake as a service often includes some or all operations. For IT, that can make a huge impact. Plus, IT doesn’t have to train or assign precious staff to cloud troubleshooting and operations.
- When security and risk-reduction are important. A cloud data lake can help consolidate data into one secure and monitored environment, reducing the need for multiple silos. A data lake architecture can also reduce risky practices like employees manually storing copies of data, emailing or overloading sensitive systems with requests. It can also be more efficient for IT and compliance processes, by making popular datasets available in the cloud once, with appropriate governance and logging.
- When you’re trying to reduce platforms. Most data lakes offer a place to collect, store and consolidate data, certainly helpful. Some data lakes also serve as analytic environments, and a subset support multiple kinds of workloads in one platform; for example, supporting batch SQL processing as well as R or Python interfaces for data scientists. That can offer efficiency for analytic performance, and efficiency for analysts, who don’t have to switch between platforms or tools.
- When you want the latest and greatest capabilities. Both big data and cloud capabilities are upgraded often. The cloud can make it easier to scale or adopt new capabilities, such as new analytics methods or applications. But be sure to understand exactly what each cloud data lake service includes: Some cover basic operations, some include performance optimisation, upgrades and testing.
- When you are geographically dispersed. The cloud remains a helpful delivery model for teams that need to share data or would benefit from easy data sharing mechanisms. Rather than use a commercial service (yikes) or email data around (obvi) or navigate complex enterprise networking requests (no thanks), secure cloud data lakes can simplify access to data for teams around the world.
- When you want to improve collaboration and innovation. Related, by making data accessible and easily available across business units, cloud data lakes can empower different parts of the company to work together. A collaborative environment can also inspire new ideas, as a place to surface new datasets and experiments. This improves agility and outcomes.
Why a data lake as a service for CWT?
For example, Carlson Wagonlit Travel, the world’s largest corporate travel provider, implemented a data lake as a service when it added a data science team. Using a prebuilt cloud solution accelerated delivery, allowing the data science program to get started quickly and show results faster. It worked, and their data lake as a service has driven several success stories, including more collaboration between the data science team and corporate clients around the world. Now, various teams access data in the data lake for a variety of different business activities, including streamlining customer experiences, improving cost utilisation and offering strategic insights to clients.
Data lakes in the clouds
As with all technologies, but particularly with data lakes, read labels carefully. The last few years have seen the introduction of many new services and solutions, cloud and otherwise, so there’s no longer any need to “do it yourself” or DIY your own data lake. However, many of these new data lake services have very different capabilities – and organisations have very different requirements. If there’s one thing we’ve learned in the evolution of data lakes (and really all data platforms), it’s critical to have a key use case or two in mind before picking a platform.
Then, consider those requirements and pick a platform or service that can be deployed quickly and can flex with your future needs. Showing quick wins is critical, or you risk the “data swamp” moniker (and that’s hard to get rid of.) Data lake implementations don’t have to take a long time and a little upfront planning can go a long way. Don’t believe the hype (even from me) and make sure your solution fits your requirements.
You can learn more at this new online resource, Data Lake Concepts, which offers definitions, news and articles about data lakes. If you’re considering a cloud data lake, Cazena’s Fully-Managed Data Lake as a Service has been successfully deployed in days in a variety of industries.
Hannah Smalltree, Vice President - Marketing, Cazena
Image source: Shutterstock/Carlos Amarillo