Design rationale and overview of the African Wildlife tutorial ontologies

(update 30-7-2020: more details are described in the journal article published in the Journal of Biomedical Semantics)

There are several tutorial ontologies, which typically focus on illustrating one or two aspects of ontology development, notably language features and automated reasoning. This may suffice for one’s aims, but for an ontology engineering course, one would need to be able to illustrate a myriad of development factors and devise exercises for a wider range of tasks of ontology development. For instance, to illustrate the use of ontology design patterns, competency questions, foundational ontologies, and science-based modelling practices, neither of which is addressed easily by the popular tutorial ontologies (notably: wine and pizza), perhaps because they predate most of the advances made in ontology engineering research. Also, I have noticed that my students replicate examples from the exercises they carry out and from inspecting popular and easy-to-find ontologies. Marking the practical assignments, I got to see sandwich and ice cream and burger ontologies with toppings and value partitions, and software and mobile phone ontologies where laptop models are instances rather than classes. Not providing good and versatile examples holistically, causes the propagation of sub-optimal ontology development at least in the exercises, which then also may affect negatively the development of an operational domain ontology that the graduates may have to develop later on.

I’ve been exploring alternatives and variants over the past 11 years in the ontology engineering courses that I have taught yearly to about 8-40 students/year. In an attempt to systematise and possibly generalise from that, I’ve identified 22 requirements that contribute to a good tutorial ontology, which concern the suitability of the subject domain (7 factors), the ease of demonstrating logics and reasoning tasks (7), and assistance with demonstrating engineering aspects (8). Its details are described in a technical report [1]. I don’t claim that it’s an exhaustive list, but that it is one that may help someone to develop their own tutorial ontology in a fun or interesting topic if they so wish—after all, not everyone is interested in pizzas, wines, African wildlife, pets, shirts, a small university, or Robert Stevens’ family.

I’ve tried out a variety of extant tutorial ontologies as well as a range of versions of the African Wildlife Ontology (AWO) over the years (early experiences), eventually settling for a set of 14 versions, all the way from the example from the Primer [2] to DOLCE- and BFO-aligned to translated in several languages, and some with possible answers to some of the exercises. A graphical rendering of the main classes and relations is shown in the following figure:

The versions of the AWO are summarised in the following table, which is also mentioned as annotation in the OWL files.

 

The AWO meets a majority of the 22 requirements, is mature by now, and it has been used yearly in an ontology engineering course or tutorial since 2010. Also, it is links up with my ontology engineering textbook with relevant examples and exercises. The AWO provides a wide range of options concerning examples and exercises for ontology engineering well beyond illustrating only logic features and automated reasoning. For instance, it assists in demonstrating tasks about ontology quality, such as alignment to a foundational ontology and satisfying competency questions, versioning, and multilingual ontologies. For instance, it is easier to demonstrate alignment of a class Animal to DOLCE’s (Non-Agentive) Physical Object than, say, debating what Algorithm aligns with or descend into political debates on the gender binary or what constitutes a family. One can use the height or the colours of the plants and animals to discuss how to model attributes as qualities or dependent entities cf. OWL’s data properties or an artificial ValuePartition. Declare, say, de individual lion simba as an instance of Lion, rather than the confusion regarding grape varieties. Use intuitively obvious disjointness between animals and plants, and subsequently easy catches on sensitising modellers to the far-reaching effects of declaring domain and range axioms by first asserting that animals eat animals, and then adding that carnivorous plants eat insects. In addition, it links up easily to topics for ontology integration activities, such as with biodiversity data, wildlife trade, and tourism to create, e.g., an OBDA system with freely available data (e.g., taken from here) or an ontology-enhanced website for an organisation that offers environmentally sustainable safaris. More examples of broad usage options are described in section 2.3 in the tech report.

The AWO is freely available under a CC-BY licence through the textbook’s webpage at https://people.cs.uct.ac.za/~mkeet/OEbook/ in this folder. A more comprehensive description of the requirements, design, and content is described in a technical report [1] for the time being.

 

References

[1] Keet, CM. The African Wildlife Ontology tutorial ontologies: requirements, design, and content. Technical Report 1905.09519. 23 May 2019. https://arxiv.org/abs/1905.09519.

[2] Antoniou, G., van Harmelen, F. A Semantic Web Primer. MIT Press, USA. 2003.

Advertisement

A controlled language for competency questions

The formulation of so-called competency questions (CQs) at the start of the development of an ontology or a similar artefact is a recurring exercise in various ontology development methodologies. For instance, “Which animals are the predators of impalas?” that an African Wildlife ontology should be able to answer and “What are the main parsers for compilers?” that a software ontology may be able to answer. Yet, going by the small number of publicly available CQs, it seems like that many developers skip that step in the process. And it turned out that for those who at least try, a considerable number of purported CQs, actually aren’t at all, are mis-formulated for even having a chance for it to work smoothly (for, say, automated formalisation), or are grammatically incorrect (a depressing 1/3 of the sentences in our test set, to be more precise). Also, there’s no software support in guiding a modeller to formulate CQs, nor to actually do something with it, such as converting it automatically into SPARQL; hence, it is disjointed from the actual artefact under development, which doesn’t help uptake.

In an attempt to narrow this gap, we have developed a controlled natural language (CNL) called CLaRO: a Competency question Language for specifying Requirements for an Ontology, model, or specification [1] for CQs for ‘TBoxes’ (type-level information and knowledge, not instances). Advantages of a CNL for CQs include that it should be easier—or at least less hard—to formalise a CQ into a query over the model and to formulate a CQ in the first place. CLaRO more specifically operates at the language layer, so it deals with noun and verb phrases, rather that the primitives of a representation language and the predetermined modeling style that comes with it. It is also the first one that has been evaluated on coverage, which turned out to be good and better than earlier works on templates for CQs. To add more to it, we also made a basic tool that offers assistive authoring to write CQs (screencast).

We got there by availing of a recently published dataset of 234 CQs that had been analysed linguistically into patterns. We analysed those patterns, and that outcome informed the design of CLaRO. Given the size, this first version pf CLaRO is template-based, with core CQs and several variants, totalling to 134 templates. CLaRO was evaluated with a random sample from the original 234 CQs, a newly created set of CQs scrambled together for related work, and half of the Pizza CQs, as well as evaluated against templates presented elsewhere [2,3]. The results are summarised in the paper and discussed in more detail in a related longer technical report [4]. Here’s the nice table with the aggregate data:

Aggregated results for coverage of the three test sets. The best values are highlighted in italics. (CLaRO results are for the complete set of 134 templates) (source: based on [1])

Given the encouraging results, we also created a proof of concept CQ authoring tool, which both can assist in the authoring of CQs and may contribute to get a better idea of requirements for such a tool. One can use autocomplete so that it proposes a template, and then fill in a selected template, or just ignore it and write a free-form CQ, hit enter, and save it to a text file; the file can also be opened and CQs deleted. Here are a few screenshots on selecting and adding a CQ in the tool:

We will be presenting CLaRO at the 13th International Conference on Metadata and Semantics Research (MTSR’19) in Rome at the end of October. If you have any questions, comments, or suggestions, please don’t hesitate to contact us. The CNL specification in csv and XML formats, the evaluation data, and the tool with the source code are available from the CLaRO Github repo.

 

References:

[1] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS. (in print)

[2] Ren, Y., Parvizi, A., Mellish, C., Pan, J.Z., van Deemter, K., Stevens, R.: Towards competency question-driven ontology authoring. In: Extended Semantic Web Conference (ESWC’14). LNCS, Springer (2014)

[3] Bezerra, C., Freitas, F., Santana, F.: Evaluating ontologies with competency questions. In: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) – Volume 03. pp. 284-285. IEEE Computer Society, Washington, DC, USA (2013)

[4] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: A data-driven CNL for specifying competency questions. University of Cape Town. Technical Report. 17 July 2019. https://arxiv.org/abs/1907.07378