The amount of unstructured data—freeform information that does not fit neatly into databases organized by fixed categories—is exploding. By 2022, 93% of all digital data will be unstructured, IDG Research predicts. Much of this will be corporate data, including email and text messages, audio files of customer service recordings, video files of YouTube uploads, text files of Word documents and PDFs, social media posts and more.
Having reached epic proportions, the proliferation of unstructured data presents huge storage and processing challenges. Machine-generated data such as medical 3D imaging and satellite imagery, and data created by the Internet of Things, is adding new and even larger streams of unstructured data to the flood.
Because unstructured data is far more accessible and easier to share than structured data, it poses a significant risk in terms of vulnerability from cyber-attacks. Due to its varying nature, and the challenges of identifying where it resides in the enterprise network, it is difficult to protect unstructured data from unauthorized access and/or to prevent it from leaving the secure company environment.
The challenges presented by storing and protecting unstructured data in turn raise concerns relating to compliance with data protection and privacy laws such as the EU’s General Data Protection Regulation (GDPR) and Australia’s NDB.
Unstructured data has traditionally been of lesser value than structured data. But that’s changing. The value of unstructured data is quickly rising because as unstructured data grows unabated, it tends to contain a higher level of vital information. And because it’s increasingly spread across an organization’s infrastructure and different devices, this data is a treasure trove for hackers. Often, organizations have little awareness of the volume, composition, risk, and business value of their unstructured data—making awareness a critical first step in its protection.
Unlike structured data, which grows predictably, unstructured data grows unpredictably and exponentially. Predictable, linear growth can be managed with old-school storage techniques, such as overprovisioning—purchasing more storage than is needed to meet anticipated growth. But unstructured data is another animal entirely and most enterprise IT infrastructures are not designed to handle it. This is a problem not only in management terms but also from the perspective of data security. Infrastructure that can’t scale and adapt to unpredictable data growth exposes the data to risk.
Organizations understand this. They know they need multiple layers of protection from endpoint to network security, patch management, and Identify and Access Management (IAM) etc. to safeguard their data and protect the crown jewels of their information. However, security measures are only perfect until they are penetrated, and organizations need to get security right 100% of the time, while hackers just need a single successful breach to compromise data.
That’s why enterprises need to reexamine not only their security infrastructure but also their storage infrastructure from the ground up. They need to implement modern storage techniques if they truly want to defend against attacks. They need to realize that storage is not merely a container for enterprise data; it can also be designed to successfully mitigate the risks associated with unstructured data.
All organizations face the danger of a hacker destroying their data. And they all need a backup storage solution to be able to recover it, in that worst-case scenario. So, in a perfect world, they would all keep extremely close track of their data and back it up to ensure it can be recovered easily.
Trouble is, the closer an organization comes to real-time data saving, the more expensive it gets. And with that, they make a choice: depending on the value of the data they’re protecting, they may decide they’re willing to accept one minute of data loss—or perhaps one day.
Fortunately, this dilemma can be solved. Innovations such as continuous storage snapshots give organizations the ability to capture and “back up” their unstructured data cost-effectively in near real-time. A storage snapshot is a set of reference markers for data at a particular point in time. It serves as a detailed table of contents, providing organizations with instant access to previous copies of their data. IT managers are realizing that storage-based technologies like snapshots provide an additional form of backup and offer the next level of reactive recovery from an attack. For example, the data referenced by snapshots are immutable, making it highly resilient to ransomware attacks.
The second challenge is that unstructured data can simply overwhelm organizations. Just as they have dealt with one terabyte, 10 more terabytes come in. In fact, market analyst firm Gartner expects to see growth of 800% over the next five years. This calls for a storage infrastructure that can do both: manage growth and secure all that unstructured data. This is where object-based scale out storage stands out.
Object-based scale out storage gives organizations the means to deal with exponential data growth in a cost-effective way. With features such as deduplication and compression which compact the overall data, it helps organizations get out of reactive mode and having to constantly deal with storage provisioning. Object-based scale out storage lets organizations focus on higher value tasks of data management and protection.
Despite the numerous challenges, the good news is that modern storage technology can come to the rescue of organizations: it can deliver quicker, more granular recovery points so they don’t lose data in the event of an attack. It can also lock down certain types of data to ensure a higher level of protection. Indeed, object-based scale out storage can be an organization’s secret weapon in getting better control of their unstructured data and mitigating risk once and for all.
Florian Malecki, International Product Marketing Director at StorageCraft
Image Credit: Billion Photos / Shutterstock