Quantitative results on pitfalls in ontologies

The amount of ontologies in the world is becoming large, and is increasing rapidly in amount and size of the ontologies. Such ontologies are only to some extent developed by ontologists/knowledge engineers and increasingly also by ‘novice modellers’ and domain experts. Does one do better than the other? When is an ontology ‘better’? What is the prevalence of pitfalls—potential errors or problems—in those ontologies? Which aspects are pitfalls?

I can go on with such general questions one would want to see answered, for the answers can help in designing better guidelines, methods, and methodologies, and therewith improving both on teaching ontology engineering and cracking the nut of what the ‘quality’ of an ontology really entails. To be sure, there has been done work in this direction, such as [1,2] for general modeling issues and authoring suggestions, methods [3,4], and structured cataloguing of “antipatterns” [5] and “pitfalls” [6,7]. The pitfall catalogue included 29 types of pitfalls (and growing), of which 21 are implemented in the OntOlogy Pitfall Scanner! (OOPS!). The nice thing of having such an automated pitfall detection tool for OWL ontologies, is that is offers the opportunity to obtain quantitative results on assumed quality of many ontologies (or, at least, on the presence of pitfalls that can be detected automatically), which has been noted to be lacking [8].

As you may have guessed already, we did exactly that—the ‘we’ being Mari Carmen Suárez Figueroa, María Poveda-Villalón, and I— with 406 OWL ontologies and we report the results in a recently accepted paper [9] at the 5th International Conference on Knowledge Engineering and Ontology Development (KEOD’13). We did not seek the answer to everything, but narrowed it down to the following questions and hypotheses (copied from the paper):

  1. A. What is the prevalence of each of those pitfalls in existing ontologies?
  2. B. To what extent do the pitfalls say something about quality of an ontology?

Question B is refined into two sub-questions:

  1. Which anomalies that appear in OWL ontologies are the most common?
  2. Are the ontologies developed by experienced developers and/or well-known or mature ontologies ‘better’ in some modelling quality sense than the ontologies developed by novices? This is refined into the following hypotheses:
    1. i. The prevalence and average of pitfalls is significantly higher in ontologies developed by novices compared to ontologies deemed established/mature.
    2. ii. The kind of pitfalls observed in novices’ ontologies differs significantly from those in well-known or mature ontologies.
    3. iii. The statistics on observed pitfalls of a random set of ontologies is closer to those of novices’ ontologies than the well-known or mature ones.
    4. iv. There exists a positive correlation between the detected pitfalls and the size or number of particular elements of the ontology.
    5. v. There exists a positive correlation between the detected pitfalls and the DL fragment of the OWL ontology.

The set of 406 ontologies we used in trying to answer these questions consists of three subsets, being 362 ontologies that were already scanned by OOPS! in the year until Oct 2012 (it was online available, so this can be considered a set of ‘random’ ontologies), 23 ontologies made by novices (students enrolled in an ontology engineering course), and 21 well-known ontologies that we assumed to be relatively mature (developed by ontologists, used applications, etc.), such as DOLCE, GFO, and GoodRelations. The ‘novices’ and ‘mature’ ontologies were scanned by OOPS! as well and also evaluated manually.

To make a long story short, I’ll go straight to the outcome (the materials & methods, data, statistical and qualitative analysis can be found in the paper and its supplementary material [9]). First, all 21 types of pitfalls that OOPS! scans for have been detected in the full set of 406 ontologies. The most common ones detected in the ontologies are the absence of annotations, declaring object properties but not their domain and range classes, and there are some issues with inverses. To a lesser extent, there are also issues with unconnected ontology elements and declaring a definition that is recursive. Second, with respect to the five hypotheses: the results falsify hypotheses (i), (ii), and (v), partially validate (iv) and validate (iii), where, regarding  (iv), for novices, the number of pitfalls/ontology does relate to the size and complexity of the ontology. Or, to put it bluntly: there are no striking differences between the sets of ‘novices’, ‘random’, and ‘mature’ ontologies, therewith providing a general landscape of pitfalls in ontologies.

This wasn’t quite what we had expected. We analysed the ‘novices’ and ‘mature’ ontologies in detail, and could find a few more candidate pitfalls as well as a few false positives (which, when taken into account, have an equalizing effect). Then, one has to ask whether the assumptions were valid. This we discuss in some detail in section 4 of the paper, and we could come up with several pros and cons. Based on the varied reviewers’ comments, I presume you’ll have your own opinion about it as well. Either way, the pitfall catalogue is being extended and the community may need to come up with a better way of defining what ‘maturity’ and ‘ontology quality’ means and how that relates to the presence of pitfalls in an ontology.

While it is certainly interesting and useful (imho) to have insight in the presence of pitfalls, the data and analysis also shows we need more quantitative data on the notion of ontology quality to better figure out what’s going on, so that it can inform the development of guidelines and methods for ontology development so that ontology development can be pushed from an art further into the realm of solid engineering underpinned with science.


[1] Noy, N. and McGuinness, D. (2001). Ontology development 101: A guide to creating your first ontology. TR KSL-01-05, Stanford Knowledge Systems Laboratory.

[2] Rector, A. et al. (2004). OWL pizzas: Practical experience of teaching OWL-DL: Common errors & common patterns. In Proc. of EKAW’04, volume 3257 of LNCS, pages 63–81. Springer.

[3] Keet, C. M. (2012). Detecting and revising flaws in OWL object property expressions. In Proc. of EKAW’12, volume 7603 of LNAI, pages 252–266. Springer.

[4] Guarino, N. and Welty, C. (2009). An overview of OntoClean. In Staab, S. and Studer, R., editors, Handbook on Ontologies, pages 201–220. Springer, 2 edition.

[5] Roussey, C., Corcho, O., and Vilches-Blázquez, L. (2009). A catalogue of OWL ontology antipatterns. In Proc. of K-CAP’09, pages 205–206.

[6] Poveda, M., Suárez-Figueroa, M.C., and Gómez-Pérez, A. (2010). Common pitfalls in ontology development. In Current Topics in Artificial Intelligence, CAEPIA 2009 Selected Papers, volume 5988 of LNAI, pages 91–100. Springer.

[7] Poveda-Villalón, M., Suárez-Figueroa, M. C., and Gómez- Pérez, A. (2012). Validating ontologies with OOPS! In Proc. of EKAW’12, volume 7603 of LNAI, pages 267–281. Springer.

[8] Vrandecic, D. (2009). Ontology evaluation. In Staab, S. and Studer, R., editors, Handbook on Ontologies, pages 293–313. Springer, 2nd edition.

[9] Keet, C.M., Suárez Figueroa, M.C., and Poveda-Villalón, M. (2013) The current landscape of pitfalls in ontologies. International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.



One response to “Quantitative results on pitfalls in ontologies

  1. Pingback: Mixed experiences with conferences and traveling | Keet blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.