I have a master’s degree in information studies, a field that aims to collect, organise, and spread structured information. The Semantic Web, originally envisioned by Tim Berners-Lee as the next evolutionary step of the World Wide Web, was one of the major topics in the master’s programme. Aside from a brief visit to the Web Science conference in 2018, I haven’t really done anything on the subject since my graduation.
So this week we’re reviewing a review article about the state of Semantic Web research in 2021!
Semantic Web is a field of research that has been with us for two decades now, so it shouldn’t be too hard to figure out what it is about, right? Nope. The field actually consists of a large number of different but interconnected subcommunities that all have different definitions of the field.
Hitzler lists a few common perspectives:
It’s about creating the Semantic Web itself, along with all the other stuff that’s needed to make it useful. The Semantic Web is an improved version of the World Wide Web with data and information that .
Another, more recent perspective is that the methods and tools developed by the field do not necessarily have to be tied to the World Wide Web or used by intelligent machines – the field is primarily about the sharing, discovery, integration and reuse of data. This perspective is similar to that of the databases and (data management part of) data science fields, except that the Semantic Web has a strong focus on integrating vastly different data sources.
Finally, one might also say that it is about the foundations and applications of its three major concepts; ontologies, linked data, and knowledge graphs.
While no perspective is necessarily right or wrong, this article mainly discusses the evolution of the field from that third perspective.
Since its birth in the early 2000s, the Semantic Web field has gone through three overlapping phases, each focussed on a different concept.
An ontology is a
formal, explicit specification of a shared conceptualisation.
For the domain-driven developers and designers among you: an ontology is basically
a domain model with concepts (types, classes) and their relationships, where each
concept has a unique identifier in the form of an .
The idea of ontologies was that they are universally usable and reusable, which makes it easier to integrate, share, and discover data. In theory, that is.
While ontologies were reasonably successful in the medical and life sciences, it turned out that for most domains ontologies were hard to get right, costly to develop, and difficult to maintain and reuse.
Once the research community accepted that “Grand Unified Domain Models” weren’t going to work, it came up with the idea of linked (open) data.
The idea of linked data is that everyone has their own mini-ontologies with their own concepts and IRIs, in the form of RDF graphs, but also includes links to IRIs in RDF graphs from other creators. Examples of such RDF graphs include DBpedia and BBC Things. Combined, all the linked RDF graphs form one very large RDF graph: .
Unfortunately, linked data didn’t solve all problems either. While there was one very popular initiative that gained a lot of traction (schema.org), most data was hidden behind hard-to-use query interfaces or in human-readable Web pages – so things were not very “open”.
Furthermore, the information in RDF graphs was much more simplistic compared to what was originally envisioned during the ontologies era. To make matters worse, integrating and using linked data still took more effort than expected, despite all the simplifications.
In 2012 Google launched its Knowledge Graph, which it still uses to present infoboxes on its search results pages for prominent entities (people, organisations, etc.). Other companies, like Facebook, Microsoft, and IBM also have their own knowledge graphs.
Knowledge graphs aren’t entirely new, but reuse a lot of existing ideas from the Semantic Web field. The primary difference between knowledge graphs and the Linked Open Data Cloud is that knowledge graphs are industry-led, strongly centralised, not very open, and don’t play very well with other graphs. Is this really what we want?
Regardless of your perspective on the Semantic Web field, it is clear that “we” have not achieved the grand goal yet. Before that can happen, some major advances are needed in its many, many subfields, which include artificial intelligence, databases, natural language processing, and machine learning.
Moreover, the field needs consolidation; of ideas, approaches, best practices, and tools that not only work well on their own, but also work well with each other. These are all obstacles than can be overcome, but it’s not going to be easy.
There’s little consensus on what the Semantic Web field is about, other than that there is still a lot of work to be done
Ontologies, linked data, and knowledge graphs are three key concepts within the Semantic Web field