Only answering competency questions is not enough to evaluate your ontology

How do you know whether the ontology you developed or want to reuse is any good? It’s not a new question. It has been investigated quite a bit, and so the answer to that is not a short one. Based on a number of anecdotes, however, it seems ever more people are leaning toward a short answer along the line of “it’ll be fine if it can answer my competency questions”.  That is most certainly not the right answer. Let me illustrate  this.

Here’s a set of 5 competency questions and a bad ontology (with the OWL file), being a newly mutilated version of the African Wildlife Ontology [1] modified with a popular South African pastime: the braai, i.e., a barbecue.

  • CQ1: Which animals are served at a barbecue? (Sample answers: kudu, impala,  warthog)
  • CQ2: What are the materials used for a barbecue? (Sample answers: tongs, skewers, poolbraai)
  • CQ3: What is the energy source for a braai device? (Sample answers: gas, coal)
  • CQ4: Which vegetables taste good with a braai? (Sample answers: tomatoes, onion, butternut)
  • CQ5: What food is eaten at a braai, or: what collection of edible things are offered?

The bad ontology does have answers to the competency questions, so a ‘CQs-only’ criterion for quality would suggest that the bad ontology is a good one. 100% good, even.

Why is it a bad one nonetheless?

That’s where years of methods, techniques, and tool development enter the stage (my textbook dedicates Section 5.2 to that), there are heuristics-based tips to prevent pitfalls [2] in general and for bio-ontologies with GoodOD, and there’s also a framework for ontology quality, OQuaRE [3], that all aim to approach this issue of quality systematically. Let’s have look at some of that.

Low-hanging fruit for a quick sanity check is to run the ontology through the Ontology Pitfall Scanner OOPS! [4]. Here’s the summary result, with two opened up that show what was flagged and why:

Mixing naming conventions is not neat. Examples of those in the badBBQ ontology are using CamelCase with PoolBraai but dash in tasty-plant and spaces converted to underscores in Food_Preparation_Material, and lower-case for some classes and upper case for others (PoolBraai and plant). An example of unconnected ontology element is Site: the idea is that if it isn’t really used anywhere in the ontology, then maybe it shouldn’t be in the ontology, or you forgot to add something there and OOPS! points you to that. Pitfall P11 may be contested, but if at all possible, one really should add domain and range to the object property so as to minimise unintended models and make the ontology closer to the reality (or understanding thereof) one aims to present. For instance, surely eats should not have any of the braai equipment on the left-hand side in the domain position, because equipment does not eat—only organisms do.

At the other end of the spectrum are the philosophy and Ontology-inspired methods. The most well-known one is OntoClean [5], which is summarised in the textbook and there’s a tutorial for it in Appendix A. The, perhaps, most straightforward (and simplified) rule within that package is that anti-rigid classes cannot subsume rigid classes, or, in layperson terminology: (physical) entities cannot be subclasses of things that are roles that entities play. Person cannot be a subclass of Employee, since not all persons are always employees. For the badBBQ: Food is a role that an organism or part thereof plays in a certain context, and animals and plants are not always food—they are organisms (or part thereof) irrespective of the roles they may play (or, worded differently: of the roles that they are the ‘bearer of’). 

Then there are the methods and tools in-between these two extremes. Take, for instance, Advocatus Diaboli / PEW (Possible World Explorer) [6], which helps you find places where disjointness axioms ought to be added. This is in the same line of thinking as adding those domain and range axioms: it helps you to be more precise and find mistakes. For instance, Site and BraaiEquipment are definitely intended to be disjoint: some location cannot be a concrete physical object. Adding the disjointness axiom results in an error, however: the PoolBraai is unsatisfiable because it was declared to be both a subclass of Site and of BraaiEquipment. Pool braais do exist, as there are braais that can be placed in or next to a pool. What the issue is here, is that there are two different meanings of the same term: once that device for the barbecue and once the ‘braai area by the pool’. That is, they are two different entities, not one, and so they either have to appear as two different entities in the ontology, with different names, or the intended one chosen and one of the subsumption axioms removed.

I also put some ugly things in the description of Braai: both those two ways of the source of heating and the member. While one may say informally that a braai involves a collection of things (CQ5), ontologically, it won’t fly with ‘member’. Membership is not arbitrary. There are foundational (or top-level) ontologies whose developers already did the heavy-lifting of ontological analysis of key elements and membership is one of them (see, among others, [7-9]). Such relations can simply be reused in one’s own ontology (e.g., imported from here), with their widely-agreed upon meaning; there’s even a tool to assist you with that [10]. If what you want is something else than that, then that relation is not membership but indeed something else. In this case, there are two options to fix it: 1) a braai as an event (rather than the device) will have objects (such as food, the tongs) participating in the event, or 2) for the braai as a device, it has accessories (related with has Accessory, if you will), such as the tongs, and it is used for preparing (/barbecuing/cooking/frying) food (/meals/dinners).

Then the source of heating. The one-off construct (with the {…}) is relatively popular in conceptual data modelling when you know the set of values is ever only allowed to be that, like the days of the week. But in our open world of ontologies, more just might be added or removed. And, ontologically, coal, gas, and electricity are not individuals, so also that is incorrect. The other option, with heatedBy xsd:String, has its own set of problems, largely because data properties with their data types entail application implementation decisions that ought not to be in an ontology that is supposed to be usable across multiple applications (see Section 6.1 ‘attributions’ for a longer explanation). It can be addressed by granting them their rightful status as classes in the OWL file and relating that to the braai.

This is not an exhaustive analysis of the badBBQ ontology, nor even close to a full list of the latest methods and techniques for good ontology development, but I hope I’ve illustrated my point about not relying on just CQs as evaluation of your ontology. Sample changes made to the badBBQ are included in the improvedBBQ OWL file. Here’s snapshot of the differences in the basic metrics (on the left). There’s room for another round of improvements, but I’ll leave that for later.

All this was not to say that competency questions are useless. They are not. They can be very useful to demarcate the scope of the ontology’s content, to keep on track with that since it’s easy to go astray from the intended scope once you begin or be subjected to scope creep, and to check whether at least the minimum content is in there somehow (and if not, why not). It’s the easy thing to check compared to the methods, techniques, and theory about good, sub-optimal, and bad ways of representing something. But such relative ease with CQs, perhaps unfortunately, does not mean it suffices to obtain a ‘good quality’ stamp of approval. Why the plethora of methods, techniques, theories, and tools aren’t used as often as they should, is a question I’d like to know the answer to, and may be a topic for another time.

References

[1] Keet, C.M. The African Wildlife Ontology tutorial ontologies. Journal of Biomedical Semantics, 2020, 11:4.

[2] Keet, C.M., Suárez-Figueroa, M.C., Poveda-Villalón, M. Pitfalls in Ontologies and TIPS to Prevent Them. Knowledge Discovery, Knowledge Engineering and Knowledge Management: IC3K 2013 Selected Papers. A. Fred et al. (Eds.). Springer CCIS vol. 454, pp. 115-131, 2015. preprint

[3] Duque-Ramos, A. et al. OQuaRE: A SQuaRE-based approach for evaluating the quality of ontologies. Journal of research and practice in information technology, 2011, 43(2): 159-176

[4] Poveda-Villalón, M., Gómez-Pérez, A., Suárez-Figueroa, M. C.. OOPS!(Ontology Pitfall Scanner!): An on-line tool for ontology evaluation. International Journal on Semantic Web and Information Systems, 2014, 10(2): 7-34.

[5] Guarino, N., Welty,C. An overview of OntoClean. In S. Staab and R. Studer (Eds.), Handbook on Ontologies, pp 201-220. Springer Verlag, 2009.

[6] Ferré, S., Rudolph, S. Advocatus diaboli exploratory enrichment of ontologies with negative constraints. In A ten Teije et al., editors, 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12), volume 7603 of LNAI, pages 42-56. Springer, 2012. Oct 8-12, Galway, Ireland.

[7] Keet, C.M. and Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, 2008, 3(1-2): 91-110.

[8] Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A. WonderWeb Deliverable D18–Ontology library. WonderWeb. 2003.

[9] Smith, B., et al. Relations in biomedical ontologies. Genome biology, 2005, 6.5: 1-15.

[10] Keet, C.M., Fernández-Reyes, F.C., Morales-González, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.