KCAP13 poster on aligning and mapping foundational ontologies

I announced in an earlier post the realisation of the Repository of Ontologies for MULtiple USes ROMULUS foundational ontology library as part of Zubeida’s MSc thesis, as well as that a very brief overview describing it was accepted as a poster/demo paper [1] at the 7th International Conference on Knowledge Capture (KCAP’13) that will take place next week in Banff, Canada. The ‘sneak preview’ of the poster in jpeg format is included below. To stay in style, it has roughly the same colour scheme as the ontology library.

KCAP13romulusPoster

The poster’s content is slightly updated compared to the contents of the 2-page poster/demo paper: it has more detail on the results obtained with the automated alignments. On reason for that is the limited space of the KCAP paper, another is that a more comprehensive evaluation has been carried out in the meantime. We report on those results in a paper [2] recently accepted at the 5th International Conference on Knowledge Engineering and Ontology Development (KEOD’13). The results of the tools aren’t great when compared to the ‘gold standard’ of manual alignments and mappings, but there are some interesting differences due to—and thanks to—the differences in the algorithms that the tools use. Mere string matching generates false positives and misses ‘semantic [near-]synonyms’ (e.g., site vs. situoid, but missing perdurant/occurrent), and a high reliance on structural similarity causes a tool to miss alignments (compare, e.g., the first subclasses in GFO vs. those in DOLCE). One feature that surely helps to weed out false positives is the cross-check whether an alignment would be logically consistent or not, as LogMap does. That is also what Zubeida did with the complete set of alignments between DOLCE, BFO, and GFO, aided by HermiT and Protégé’s explanation feature.

The KEOD paper describes those ‘trials and tribulations’; or: there are many equivalence alignments that do not map due to a logical inconsistency. They have been analysed on the root cause (mainly: disjointness axioms between higher-level classes), and, where possible, solutions are proposed, such as subsumption instead of equivalence or proposing to make them sibling classes. Two such examples of alignments that do not map are shown graphically in the poster: a faltering temporal region that apparently means something different in each of the ontologies, and necessary-for does not map to generic-dependent due to conflicting domain/range axioms. The full list of alignments, mappings, and logical inconsistencies is now not only browsable on ROMULUS, as announced in the KCAP demo paper, but also searchable.

Having said that, it is probably worthwhile repeating the same caution made in the paper and previous blog post: what should be done with the inconsistencies is a separate issue, but at least now it is known in detail where the matching problems really are, so that we can go to the next level. And some mappings are possible, so some foundational ontology interchangeability is possible (at least from a practical engineering viewpoint).

References

[1] Khan, Z.C., Keet, C.M. Toward semantic interoperability with aligned foundational ontologies in ROMULUS. Seventh International Conference on Knowledge Capture (K-CAP’13), ACM proceedings. 23-26 June 2013, Banff, Canada. (poster &demo)

[2] Khan, Z.C., Keet, C.M. Addressing issues in foundational ontology mediation. Fifth International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

Quantitative results on pitfalls in ontologies

The amount of ontologies in the world is becoming large, and is increasing rapidly in amount and size of the ontologies. Such ontologies are only to some extent developed by ontologists/knowledge engineers and increasingly also by ‘novice modellers’ and domain experts. Does one do better than the other? When is an ontology ‘better’? What is the prevalence of pitfalls—potential errors or problems—in those ontologies? Which aspects are pitfalls?

I can go on with such general questions one would want to see answered, for the answers can help in designing better guidelines, methods, and methodologies, and therewith improving both on teaching ontology engineering and cracking the nut of what the ‘quality’ of an ontology really entails. To be sure, there has been done work in this direction, such as [1,2] for general modeling issues and authoring suggestions, methods [3,4], and structured cataloguing of “antipatterns” [5] and “pitfalls” [6,7]. The pitfall catalogue included 29 types of pitfalls (and growing), of which 21 are implemented in the OntOlogy Pitfall Scanner! (OOPS!). The nice thing of having such an automated pitfall detection tool for OWL ontologies, is that is offers the opportunity to obtain quantitative results on assumed quality of many ontologies (or, at least, on the presence of pitfalls that can be detected automatically), which has been noted to be lacking [8].

As you may have guessed already, we did exactly that—the ‘we’ being Mari Carmen Suárez Figueroa, María Poveda-Villalón, and I— with 406 OWL ontologies and we report the results in a recently accepted paper [9] at the 5th International Conference on Knowledge Engineering and Ontology Development (KEOD’13). We did not seek the answer to everything, but narrowed it down to the following questions and hypotheses (copied from the paper):

  1. A. What is the prevalence of each of those pitfalls in existing ontologies?
  2. B. To what extent do the pitfalls say something about quality of an ontology?

Question B is refined into two sub-questions:

  1. Which anomalies that appear in OWL ontologies are the most common?
  2. Are the ontologies developed by experienced developers and/or well-known or mature ontologies ‘better’ in some modelling quality sense than the ontologies developed by novices? This is refined into the following hypotheses:
    1. i. The prevalence and average of pitfalls is significantly higher in ontologies developed by novices compared to ontologies deemed established/mature.
    2. ii. The kind of pitfalls observed in novices’ ontologies differs significantly from those in well-known or mature ontologies.
    3. iii. The statistics on observed pitfalls of a random set of ontologies is closer to those of novices’ ontologies than the well-known or mature ones.
    4. iv. There exists a positive correlation between the detected pitfalls and the size or number of particular elements of the ontology.
    5. v. There exists a positive correlation between the detected pitfalls and the DL fragment of the OWL ontology.

The set of 406 ontologies we used in trying to answer these questions consists of three subsets, being 362 ontologies that were already scanned by OOPS! in the year until Oct 2012 (it was online available, so this can be considered a set of ‘random’ ontologies), 23 ontologies made by novices (students enrolled in an ontology engineering course), and 21 well-known ontologies that we assumed to be relatively mature (developed by ontologists, used applications, etc.), such as DOLCE, GFO, and GoodRelations. The ‘novices’ and ‘mature’ ontologies were scanned by OOPS! as well and also evaluated manually.

To make a long story short, I’ll go straight to the outcome (the materials & methods, data, statistical and qualitative analysis can be found in the paper and its supplementary material [9]). First, all 21 types of pitfalls that OOPS! scans for have been detected in the full set of 406 ontologies. The most common ones detected in the ontologies are the absence of annotations, declaring object properties but not their domain and range classes, and there are some issues with inverses. To a lesser extent, there are also issues with unconnected ontology elements and declaring a definition that is recursive. Second, with respect to the five hypotheses: the results falsify hypotheses (i), (ii), and (v), partially validate (iv) and validate (iii), where, regarding  (iv), for novices, the number of pitfalls/ontology does relate to the size and complexity of the ontology. Or, to put it bluntly: there are no striking differences between the sets of ‘novices’, ‘random’, and ‘mature’ ontologies, therewith providing a general landscape of pitfalls in ontologies.

This wasn’t quite what we had expected. We analysed the ‘novices’ and ‘mature’ ontologies in detail, and could find a few more candidate pitfalls as well as a few false positives (which, when taken into account, have an equalizing effect). Then, one has to ask whether the assumptions were valid. This we discuss in some detail in section 4 of the paper, and we could come up with several pros and cons. Based on the varied reviewers’ comments, I presume you’ll have your own opinion about it as well. Either way, the pitfall catalogue is being extended and the community may need to come up with a better way of defining what ‘maturity’ and ‘ontology quality’ means and how that relates to the presence of pitfalls in an ontology.

While it is certainly interesting and useful (imho) to have insight in the presence of pitfalls, the data and analysis also shows we need more quantitative data on the notion of ontology quality to better figure out what’s going on, so that it can inform the development of guidelines and methods for ontology development so that ontology development can be pushed from an art further into the realm of solid engineering underpinned with science.

References

[1] Noy, N. and McGuinness, D. (2001). Ontology development 101: A guide to creating your first ontology. TR KSL-01-05, Stanford Knowledge Systems Laboratory.

[2] Rector, A. et al. (2004). OWL pizzas: Practical experience of teaching OWL-DL: Common errors & common patterns. In Proc. of EKAW’04, volume 3257 of LNCS, pages 63–81. Springer.

[3] Keet, C. M. (2012). Detecting and revising flaws in OWL object property expressions. In Proc. of EKAW’12, volume 7603 of LNAI, pages 252–266. Springer.

[4] Guarino, N. and Welty, C. (2009). An overview of OntoClean. In Staab, S. and Studer, R., editors, Handbook on Ontologies, pages 201–220. Springer, 2 edition.

[5] Roussey, C., Corcho, O., and Vilches-Blázquez, L. (2009). A catalogue of OWL ontology antipatterns. In Proc. of K-CAP’09, pages 205–206.

[6] Poveda, M., Suárez-Figueroa, M.C., and Gómez-Pérez, A. (2010). Common pitfalls in ontology development. In Current Topics in Artificial Intelligence, CAEPIA 2009 Selected Papers, volume 5988 of LNAI, pages 91–100. Springer.

[7] Poveda-Villalón, M., Suárez-Figueroa, M. C., and Gómez- Pérez, A. (2012). Validating ontologies with OOPS! In Proc. of EKAW’12, volume 7603 of LNAI, pages 267–281. Springer.

[8] Vrandecic, D. (2009). Ontology evaluation. In Staab, S. and Studer, R., editors, Handbook on Ontologies, pages 293–313. Springer, 2nd edition.

[9] Keet, C.M., Suárez Figueroa, M.C., and Poveda-Villalón, M. (2013) The current landscape of pitfalls in ontologies. International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.