Skip to main content

Hard lessons won well - delivering on the promises of the Semantic Web

a zoomed in image of an address bar on a web browser
(Image credit: Atm2003 / Shutterstock)

The Semantic Web – the original "Web 3.0", as proposed by Sir Tim Berners-Lee back in 1999 – was supposed to make information about organizations, people, places and things machine-readable. Sir Tim foresaw where we are today: we want to use data in an ever-increasing number of ways (apps, devices, assistants, services) to do an ever-increasing number of things. And this demands that it is the machines we use that understand online data.

Based on the romantic idea that “The Semantic Web will enable machines to understand everything on the internet”, the original vision was that all of our devices would be able to understand an organization and its offerings based solely on their website.

This would enable a range of scenarios across both B2C and B2B users. For example, entering a URL into a vehicle to be navigated to the nearest store, dialing a URL and being able to select the most relevant phone number for an inquiry, or accountancy software that could automatically fetch tax data for suppliers using their URL.

But the truth is that – as of today - the Semantic Web enables us to do none of these things. In 2021 the Semantic Web has enabled us to share fancy links on Facebook.

This is because most websites still don’t use the structured data formats that the Semantic Web relies upon (microdata, JSON-LD, RDFa). Of those that have adopted structured data formats, the overwhelming majority have only done so to the extent needed so that users of Facebook, WhatsApp, Twitter, LinkedIn etc can share fancy links to webpages. 

Many of these don’t use real Semantic Web standards, opting instead for simpler, massively pared-down standards created by Facebook (Open Graph), Twitter (card system) and others.

How was this opportunity missed? Partly because of issues of complexity and partly because of the chicken-and-egg situation of innovative technologies.

Outside of academia and research organizations, the general Semantic Web failed because it was too complex for most businesses to adopt. Ultimately, not enough companies published machine-readable data to make it generally and reliably useful so that apps, devices, assistants and services could be built on top of it. In other words, it failed because complexity created the chicken-and-egg problem.

It is only now - as consumer and business demand for what can be done with machine-readable data has grown so much – that this commercial opportunity has dwarfed this issue of complexity. The trouble has become that those companies that have stepped in to enable this data, have an incredible monopoly.

Gatekeepers of the web

Our ever-increasing need for machine-readable data combined with the failure of the Semantic Web has resulted in the rise of centralized APIs offered by the giants of the web. Companies like Alphabet have built an empire by crawling the web, indexing its content and storing it in their Knowledge Graph. They’ve built devices, apps, services, operating systems and entire ecosystems on top of this data.

Developers looking to build apps that use machine-readable data about organizations are faced with two options:

  • Crawl the web as Alphabet have done, store that data and try to keep it up to date
  • Use a paid, restricted, rate limited API

Option 1 is outside the realms of possibility for most developers and would only serve to fragment internet data further. 

Option 2 comes with rate limits and use restrictions that are designed to limit competition. 

The result is stifled creativity of millions of developers and users left with little choice but to use privacy-compromising apps offered by the giants of the web. The way the web's gatekeepers make data available to developers through APIs only further entrenches their position. This must change.

Embracing machine-readable data

In the same way that organizations can independently deliver human-readable websites direct to customers using the web’s open standards; organizations need to be able to supply machine-readable data direct to devices, apps and services used by their customers. 

We know that this can be done. We’ve created NUM – a DNS-based alternative to the Semantic Web, providing the sort of data that's only previously been available through APIs offered by giants of the web. 

Unlike APIs, access to NUM data is available to developers free, unlimited and unrestricted. We’ve launched in the UK, pre-populating data for millions of domains. Any domain owner can override their pre-populated data by adding NUM records to their own DNS or claim and update their pre-populated records using a simple user interface.

Almost all online organizations have a domain name - it’s their unique identifier and little piece of the internet. The World Wide Web and email are two of the most successful standards ever created and both are built on top of the Domain Name System (DNS). NUM is built on top of the DNS too, but crucially doesn’t suffer from the same chicken-and-egg problem that killed the Semantic Web because we’ve pre-populated the DNS with useful data.

To find this data we crawled 18 million domains for UK companies and found around five million active company websites with useful public data. From these we extracted contact data, logos, company numbers, VAT numbers and more. We matched and mashed this data up with other open, public data sources like Companies House and published all this data to DNS in the form of NUM records - almost 10 million of them.

This data is available in DNS because it’s one of the most efficient ways to store and serve small packets of data, partly due to the cached and distributed nature of DNS. By storing and serving data using DNS, it can’t be tracked, limited or restricted. Developers can use it today and build apps with open-source libraries.

As an example of how this data and standard can solve real problems for real users, we developed CompanyDirectory.UK – a directory of some of the UK's largest companies, with all their specific, departmental contact information provided in a simple, searchable list. 

But the list of applications is endless – and backed by free, unlimited data, developers can experiment and innovate far beyond the boundaries of today’s Semantic Web.

Elliott Brown, founder, Num (opens in new tab)

Elliott Brown, founder, Num.