Since its inception in the 1990s, the internet as we know it has grown astronomically. What was once a U.S. government research project is now a backbone component of everyday life, as the average adult spends 6.3 hours per day on the internet, per Mary Meeker’s 2019 Internet Trends Report.
In the early days, the internet was limited to experimental websites users found through portals such as America Online. Then, the wave of search engines hit, essentially opening the doors to the far reaches of the internet. Now, with the onset of social media, it’s more abnormal than normal to not be connected to the World Wide Web.
As the internet evolves and expands, so too does the way people gather, organise and use the data that comes with it. In what could be considered the Stone Age of the Web, users would manually scroll through AOL web portals and create Boolean searches in their search engines.
Now, if users were to do the same, they’d spend longer than the internet’s lifetime to gather every iota of data. Instead, users – especially in the enterprise – have started to employ web scraping tools to streamline the collection of web data. However, even these tools are not quite enough to make full use of the exponentially increasing amount of data, as 90 per cent of the world’s data has been created within the past two years according to Opimas Research.
Instead, enterprises must adopt more sophisticated, automated, end-to-end approaches to the way they perceive, receive and work with web data. The Web Data Integration approach prepares enterprises for the next revolutionary step of working with the data available on the internet.
The world of web data
Organisations across the globe are leveraging web data on daily basis to drive business insights. The web stands alone as the single, largest data source – between both traditional and alternative data sources – that is available at an enterprise’s fingertips. Not only is the web the largest source of data, but it is also one that is growing exponentially and adapting constantly. From financial and equity research, to retail and manufacturing, to travel and hospitality, businesses across many verticals rely on the Web to access real-time information that can be used to inform decision-making, fuel investment models, provide unique alternative data sets and offer insights.
Despite the opportunity the Web brings as a data source, businesses worldwide are losing trillions of dollars due to lack of timely access to high-quality data. IBM estimates that poor-quality data costs businesses in the U.S. alone more than $3 trillion annually.
As the current status quo, most organisations that try to leverage web data use a technique called web scraping. However, while traditional web scraping can provide users initial access to a wealth of web data, that’s basically where the buck stops.
Just as the internet has brought a revolution to information by making it possible to access almost any information, communicate with people across the world, and so much more, organisations can do better when it comes to leveraging the data that the internet provides. Enter Web Data Integration (WDI).
Web data integration, a more sophisticated perspective on scraping
WDI is an emerging category of web data solutions that goes beyond the need for traditional web scraping. Web Data Integration is a revolutionary approach to acquiring and managing web data that focuses not only on data extraction but also data quality and control. While WDI still does everything that a web scraper can do, it also provides users a much more sophisticated, holistic, end-to-end solution that treats the entire web data lifecycle as a single, integrated process.
In the five stages of Web Data Integration – Identify, Extract, Prepare, Integrate, Consume – web scraping stops in the Extraction stage. While web scraping is in fact a component of Web Data Integration, The WDI approach also enables users to work with their web data within a single platform to:
- Extract data from non-human readable output (hidden data)
- Programmatically extract data several screens deep into transaction flows
- Perform calculations and combinations to data to make it richer and more meaningful
- Cleanse the data
- Transform and Normalise the data
- Apply additional QA processes
- Integrate the data not just via files but APIs and streaming capabilities
- Extract additional data on demand
- Analyse data with change, comparison, and custom reports
- View analytics, charts, graphs for visual interpretation of data and trends
Web data integration unlocks the value of web data
According to Opimas Research, total spend on Web Data Integration is estimated to hit $5 billion in 2019. As companies urgently try to become “data-driven” as a part of their digital transformation, they recognise a key piece of that is their web data, the value of it, and how they work with it. Furthermore, Ovum reports that when web data is treated as a single, holistic workflow (from web data extraction to insight), with the same level of data validation discipline that is normally accorded to conventional business intelligence data or big data, it can yield insights more valuable as that from traditional sources.
Market research, business intelligence, analyst, and data teams in companies from a broad range of industries are now realising the value that can be found in alternative datasets that reside outside of their organisations’ walls, as they turn to the web as a key source of intelligence. With WDI, organisations can achieve speedy and repeatable automation of web data capture and aggregation to fuel a broad array of mission critical strategies. Use cases include:
- Price monitoring and competitive analysis to drive product and pricing strategy
- Sentiment analysis of products, brands, services to inform business strategies
- Collection of alternative data sets from non-traditional sources such as ecommerce sites, industry blogs, social media sites, world news, and any other target source to augment decision making and fuel investment models
When combined with enterprise data, analytic data or big data, web data brings the best of all data worlds together by adding evidence and providing context. WDI yields hidden insights about the market that are otherwise unattainable, giving today’s enterprises the competitive edge they need to become leaders in their respective markets.
Gary Read, CEO and Chairman, Import.io