By now we are all aware that there is more data in the world than ever before. The rate at which data is produced is truly incredible and provides a massive opportunity for those who choose to harness it.
A brilliant example of this is the quantified self movement. 'Followers' are people who use their smartphones to produce data about their mental and physical states. These insights are then used to inform positive lifestyle changes. For example, apps today can record everything the user eats, as well as their physical activities, sleeping patterns and social engagements. This data can then be analysed to enforce positive lifestyle changes. If it’s discovered that small increases in the amount of sleep the user gets leads to their being more active the following day, the data enables the user to make small changes to their lifestyle, which can have a larger positive, previously unrealised, impact to their daily life.
There are obvious ways in which this approach could be applied to business, but as with many big IT projects, concerns about risk often get in the way. IT managers can be wary of spiralling costs and busted deadlines at the best of times, and without a proven path to getting the best from big data management anxiety can skyrocket.
This is exactly why I’m so interested in developing a safe 'on-ramp' to big data. Big data risks can indeed be substantial but the benefits to taking a bold approach – ‘quantified self for business’ if you will – can be immense. In this article I’d like to run through these perceived risks along with some suggestions for overcoming them and achieving more effective, safer big data solutions to unleash more rapid innovation for enterprises.
Perceptions of risk and reward in big data
Everywhere you look people are talking about big data and businesses are wondering how to make the most of what's available to them. Most businesses recognise the need to cost-effectively manage the growing volume of big transaction data, but that’s not where the real opportunity lies.
Big interaction data – information we get from social media, machines and sensors, call detail records, geospatial systems and the web – opens a whole new frontier for business opportunity. The most innovative enterprises are already leveraging their interaction data by combining this new data type with conventional data sources. When done effectively, this enables leading organisations to secure more customers and keep them for longer. They can increase the efficiency of their operations, improve their products and the services they deliver and – crucially – generate breakthrough results with insights that they couldn’t get any other way.
However, a surprising number of companies have opted to take a wait-and-see approach to adopting big data. Research Informatica conducted last year found that 31 per cent of IT and business professionals had no plans to pursue big data initiatives, at least not any time soon. These businesses run the risk of being left behind by their more adventurous competitors, which begs the question: why on earth are they holding back?
It turns out that whilst businesses recognise the massive benefits available to them if they go ahead and pursue a big data strategy, some businesses see a number of potential pitfalls along the road to big data brilliance.
Most worrying for businesses is the maturity of the tools available to them. More than half (52 per cent) of businesses consider this as a problem. For instance, emerging technologies such as Hadoop are still maturing with capabilities for data management and analytics in a heterogeneous environment. Such frameworks are still developing and can lack support for reusability and metadata, making it tough to extend projects and ensure consistency. This has led some organisations to resort to manual scripting. In order to mitigate the risk of big data projects ending up as just another data silo we need to work on developing tools to a more advanced and mature level.
Unfortunately, this is just one of a number of risks that put some businesses off unleashing their data potential. Others include a lack of support for real-time data and concerns over poor data quality, security, and privacy. IT professionals are also worried about a perceived limit to the availability of skilled developers to manage big data and the potential for difficult, and time-consuming development in Hadoop and other new technologies.
A bullish approach to big data opportunities
Despite all of these nervous organisations sitting on the sidelines, the majority (69 per cent) is being much more ambitious. These more innovative businesses have chosen to move ahead with big data projects, with many in production or testing phases. Overall, businesses are feeling optimistic – 67 per cent view big data as more of an opportunity than a challenge, compared to just 17 per cent who expect difficulties building a business case and mapping a return on investment for big data projects.
From my conversations with customers and interactions at conferences, I’ve found that the most bullish organisations have tended to equip themselves to deal with the risks that hold back the less courageous firms. This is especially true for those dealing with data integration and quality, where 80 per cent of big data work takes place (the remaining 20 per cent takes focuses on analytics). Be it technology or best practice, pioneering companies have found ways to address many of the concerns listed above.
My focus is to try and provide these companies with a safe on-ramp to big data brilliance. This ramp needs to work with both emerging technologies and traditional data management infrastructures to minimise risk, reduce costs and fuel innovation. In a nutshell, I aim to address the following key challenges so my customers can reap the benefits of big data:
Lack of real-time data
It’s all well and good having access to all of this data but it’s of no use to anybody if it’s obsolete by the time you get to use it. Given our rapidly changing world, the lifecycle of data is shorter than ever. Businesses need access to their data in real-time so they can make the right decisions at the moment those decisions need to be made. We need to continue developing technologies such as data streaming, messaging, complex event processing, data virtualisation, changed data capture and high-speed data replication so we can power real-time data transformation and replication.
Data security and privacy
This is a particularly interesting challenge. We need to be realistic and recognise that in our inter-connected world, 20th century standards of privacy are fast becoming a thing of the past. The rise of social media means we really do live in a global village, and as anyone who has lived in a village knows, it’s impossible to keep a secret. However, just because businesses have access to all of this data doesn’t mean they can abuse it. What customers choose to tell businesses has to be treated in confidence – much like as you might keep a secret for a friend. Only by maintaining this trust can businesses really help customers to unleash their data potential. Techniques such as data masking therefore become critical in ensuring a business can continue to grow and flourish.
By addressing these key challenges we can reap multiple rewards, namely through reduced costs and risks, alongside more rapid innovation.
Reducing the cost of big data
Big data projects can be a minefield of escalating costs. Hardware investments, poor hardware utilisation, and costly manual development (that needs to be repeated for each project) can all be massive drains on resources. The most innovative companies have realised this and are saving huge amounts in a number of ways.
Optimal hardware investments
This is crucial. Businesses can make enormous savings by identifying performance bottlenecks and unused data so that they can work to get more out of their data infrastructure. Smart businesses deploy their big data processing on the highest performance and most cost-effective platforms, from symmetric multiprocessing machines all the way to distributed platforms like Hadoop, or even traditional grid clusters or data warehouse appliances. For example moving data processing from an appliance to a grid or Hadoop can minimise loads and extend appliance capacity.
Real-time data replication
The benefits of working in real-time are huge. Real-time data replication and changed data capture can offload up to 60 per cent of processing from source systems, while data archiving lets you offload infrequently used data from warehouses or other sources to low-cost commodity hardware or Hadoop. These approaches optimise hardware utilisation and enable you to avoid investments in additional infrastructure to accommodate growing data volumes, velocity, and variety.
In an effort to jump-start big data projects, some organisations have resorted to hand-coding and manual scripting in MapReduce, Pig, and other technologies. This is a needless waste of time and energy, increasing both personnel costs and the time to deploy; all the while introducing the risk of error, troubleshooting and delay. If we were to get rid of such practices we could increase productivity by an astounding 500 per cent. Benefits like this are what make me so excited to work in the data industry.
As I’ve already mentioned however, it’s not just costs that businesses can reduce by being bold with big data. By being brave they can reduce the very risks they’re worried about in the first place too.
Minimising big data risk
I’ve said a lot in this piece about risk already. You may be thinking: risk isn’t unique to big data – it’s a common factor of all technological developments. However, with something as game-changing as big data this risk can be terrifying. Make no mistake, big data is game-changing. If quantified self for business comes to fruition, no detail whatsoever about a business’ operations will be inscrutable. Even the water cooler can be positioned according to the desired balance between productivity and morale.
But these game-changing developments present an even greater risk to those who let their fears paralyse them. Businesses that don’t adapt and act can get left behind by more nimble competitors so quickly nowadays.
This mix of inaction, implementation misfires and inadequate integration means that many of those who are undertaking big data projects face real risks. In fact, Gartner predicts that most large enterprises will fall short and fail to fully realise their data potential in the next few years. Gartner thinks that through 2015, over 85 per cent of Fortune 500 firms will fail to gain an advantage using big data by effectively exploiting it.
I believe that the best way to insulate your organisation against big data risk is to adopt a single platform that combines your existing infrastructure with emerging technologies like Hadoop. There are two major benefits I’ve found with this approach. First, you can minimise your risk of high costs, delays and subpar results when taking on a big data project. Second, it leaves you in a fantastic position to use your existing staff, rather than having to hunt for expensive in-demand talent with really specialised skills. With a single platform, you can be flexible and easily equip your business to tackle low-risk, high-value projects very quickly; all the while laying the foundation for a cross-enterprise infrastructure that maximises your return from big data too.
Finally, the benefit everyone really cares about – innovation. Big data enables businesses to innovate like never before. Not for nothing has it been called a new industrial revolution.
Analytics and business intelligence (BI) is the key focus area of big data projects. Nearly four in five (78 per cent) businesses cite this as their top project – that’s double the next priority (master data management got 39 per cent of the vote). And no wonder; if there’s one reason to delve into big data it’s to gain insights that could fuel innovation.
To get to these insights, organisations are now beginning to try their hands at data science. What that means is that they are combining technical skills like statistical modelling, data discovery and data visualisation with their business acumen to really interrogate their data and generate useful insights – Insights that can have a tangible impact on their businesses.
The problem with such an approach is that data scientists have such little time to actually analyse data. DJ Patil’s excellent book Data Jujitsu points out that 80 per cent of the work on data projects is merely cleaning the data and preparing it for use. This assertion is backed up by a study conducted by Stanford University and University of California at Berkeley. This study looked at 35 data scientists at 25 organisations and found that they were overburdened by tedious tasks like data access, manipulation and integration and had little time to spend on the meaty work of analytics.
It doesn’t need to be this way. A single platform with rapid development and streamlined data integration can relieve data scientists and IT professionals of this time-consuming manual work and address a key concern of professionals surveyed by Gartner. Gartner found that 30 per cent of respondents viewed the scarcity of analytics capabilities and skills as the biggest barrier to big data benefits.
Once such analytics dreams have been realised, businesses will finally be able to innovate in the way you might hope they could. By changing any minute detail on the back of insights gleaned from their data, businesses will be able to be fully optimised all of the time.
Quantified self for business
I’ve spent a lot of time throughout this article addressing the various risks businesses face in achieving big data brilliance, along with some thoughts on overcoming these risks. We must understand that these risks are not insurmountable.
Businesses that are bold and bullish enough to overcome their fears first will unlock a huge competitive advantage by taking the same approach to their data as aspirational people have been. By using all their data – from customers, staff, sales, social channels, all the way through to data generated through the Internet of things – they will be able to make constant incremental improvements to everything they do. This is what I mean when I talk about quantified self for business.
Imagine a business where everything is constantly optimised, right down to the colour of the walls in different areas of the building, depending on each area’s purpose. Seating position, lighting levels, temperature; businesses will be able to use their data to optimise absolutely everything to maximise all of its outputs. This is big data brilliance.
It’s a privilege to work on building such a world and overcoming such complex challenges to achieve it. Already we’re making big strides – leading organisations are managing to innovate up to three times faster than before, whilst halving their big data costs. That’s no mean feat, and we’re only just beginning.
Greg Hanson is the CTO of Informatica EMEA.
Image Credit: Flickr (infocux technologies)