Evaluating ontological decisions with OntoClean (2002)
I’ve technically been an information scientist for over seven years now and I still can’t explain what the field is really about. What I do know though is that it includes things like ontology engineering: a subject that might feel a bit academic at first, but really should be required reading for anyone who designs domain models.
Why it matters
Ontology engineering is a field about the study and construction of ontologies; formal representations of concepts and relations between those concepts in a specific domain.
Ontologies are somewhat akin to models in object-oriented design and domain-driven design, as they give domain experts access to a shared vocabulary. But ontologies are much more than that, because they also make it easier to relate concepts from different domain models.
They can even be used to let AI systems reason about concepts within a domain – provided that they’re constructed correctly.
Ontology construction isOr was, when the article was originally published a bit of an arcane art, and consequently hard to learn and master for newcomers. The authors therefore introduce a methodology that helps ontology engineers make the right modelling decisions by drawing lessons from philosophical ontology.
It’s also a good read for those who design more informal models though!
How the study was conducted
Probably in front of a whiteboard, with a lot of discussionsThe article doesn’t have (or need) a methodology section..
What discoveries were made
The article explains a few key concepts that you should know and how you can use these concepts to discover modelling mistakes.
Essence and rigidity
Entities have properties. Some of those properties are essential to an entity, i.e. the property must – by definition – always be true for an entity.
- Being hard is an essential property for hammers. Something can only be a hammer if it’s hard.
- Being hard is not an essential property for sponges. Sponges can (but don’t have to!) be hard. Of course it’s possible to have sponges that are hard throughout their entire existence, by chance. This doesn’t matter though: the point is that it could have been soft at some point in time.
Rigidity is a special form of essence. If a property is rigid, then every entity that can exhibit it must exhibit it. We can distinguish between three levels of rigidity:
- Rigid: Properties that are essential to all instances, e.g. being a person;
- Semi-rigid: Properties that are essential to some instances, but not to others, e.g. being hard;
- Anti-rigid: Properties that are not essential at all, e.g. being a student.
All properties in an ontology must be labelled with their rigidity. This makes it possible to verify the consistency of taxonomic links between entities, as anti-rigid properties cannot subsumebe a superclass of rigid properties.
- The class Student cannot subsume the class Person, because students may cease being a student, while persons must always be persons. This would imply that persons would only be persons as long as they’re students, which is obviously wrong.
Identity and unity
Two other important concepts are identity and unity.
Identity is about being able to tell when two different entities in the world are actually the same (or not).
One might say that a time slot of “1:00–2:00 next Tuesday” is a time duration of “one hour”, at a specific moment in time. Does that mean that Time slot is a kind of (or a subclass of) Time duration?
The following analysis shows that it isn’t:
- Two time durations are the same if they have the same length: there’s no difference between any two “one-hour” durations, so all instances of “one-hour” durations are actually the same.
- Two time slots occurring at the same time are the same (there’s only one “1:00–2:00 next Tuesday”). But two time slots occurring on different days are not – even if they have the same time duration.
Modelling Time slots as a subclass of Time duration would lead to inconsistencies! Instead, we should say that a Time slot has a Time duration.
Identity criteria are inherited over subsumption relations. Any subclass must therefore have the same identity criteria as its ancestors.
Unity is about being able to tell whether something is a single “thing”.
- Water cannot be recognised as an isolated entity and is therefore not a whole.
- Oceans on the other hand do represent whole objects, as you can easily name instances of oceans, e.g. “the Atlantic Ocean”.
Wholes should never be subclasses of non-wholes.
In the case of oceans and water, we cannot say that oceans are a kind of water; that would imply that the parent class (water) is not a whole, but the child class is. This is clearly a contradiction. Instead, we should say that oceans are composed of water.
Discovering misuse of subsumption
Ontological analysis can be used to identify a backbone taxonomy that consists of all rigid properties in the ontology. Every entity within a domain should instantiate at least one the properties in the backbone.
It’s also a useful way to explain mistakes in modelling of subsumption relations. The authors list several examples of common mistakes:
Instantiation: modelling Human as a subclass of Species. The location of a particular human in the biological taxonomy does not help you identify a specific human. Human is an instance of Species!
Part/whole: modelling Engine as a subclass of Car. Cars do not have the same essential properties (accommodating people) as engines. An Engine is part of a Car!
Disjunction/type restriction: modelling Engine as a subclass of Car part, as a workaround. Engines are not necessarily car parts: if you take an engine from a car and put it into a boat, it’s no longer a car part. Note that Car part is anti-rigid (something can cease to be a car part), but Engine is rigid (engines cannot become not-engines).
Polysemy: using the same word or entity to refer to things that are fundamentally different. Take for example the word “book”: A book can be refer to bound volumes, which have a location in time and space. But it can also refer to an abstract notion of a book, which is not identified by its location, but its author, title, and other criteria.
Constitution: modelling Ocean as a subclass of Water. Oceans aren’t water – they consist of water.