A set of competency questions and SPARQL-OWL queries, with analysis

As a good beginning of the new year, our Data in Brief article Dataset of Ontology Competency Questions to SPARQL-OWL Queries Translations [1] was accepted and came online this week, which accompanies our Journal of Web Semantics article Analysis of Ontology Competency Questions and their Formalisations in SPARQL-OWL [2] that was published in December 2019—with ‘our’ referring to my collaborators in Poznan, Dawid Wisniewski, Jedrzej Potoniec, and Agnieszka Lawrynowicz, and myself. The former article provides extensive detail of a dataset we created that was subsequently used for analysis that provided new insights that is described in the latter article.

The dataset

In short, we tried to find existing good TBox-level competency questions (CQs) for available ontologies and manually formulate (i.e., formalise the CQ in) SPARQL-OWL queries for each of the CQs over said ontologies. We ended up with 234 CQs for 5 ontologies, with 131 accompanying SPARQL-OWL queries. This constitutes the first gold standard pipeline for verifying an ontology’s requirements and it presents the systematic analyses of what is translatable from the CQs and what not, and when not, why not. This may assist in further research and tool development on CQs, automating CQ verification, assessing the main query language constructs and therewith language optimisation, among others. The dataset itself is indeed independently reusable for other experiments, and has been reused already [3].

The key insights

The first analysis we conducted on it, reported in [2], revealed several insights. First, a larger set of CQs (cf. earlier work) indeed did increase the number of CQ patterns. There are recurring patterns in the shape of the CQs, when analysed linguistically; a popular one is What EC1 PC1 EC2? obtained from CQs like “What data are collected for the trail making test?” (a Dem@care CQ). Observe that, yes, indeed, we did decouple the language layer from the formalisation layer rather than mixing the two; hence, the ECs (resp. PCs) are not necessarily classes (resp. object properties) in an ontology. The SPARQL-OWL queries were also analysed at to what is really used of that query language, and used most often (see table 7 of the paper).

Second, these characteristics are not the same across CQ sets by different authors of different ontologies in different subject domains, although some patterns do recur and are thus somehow ‘popular’ regardless. Third, the relation CQ (pattern or not) : SPARQL-OWL query (or its signature) is m:n, not 1:1. That is, a CQ may have multiple SPARQL-OWL queries or signatures, and a SPARQL-OWL query or signature may be put into a natural language question (CQ) in different ways. The latter sucks for any aim of automated verification, but unfortunately, there doesn’t seem to be an easy way around that: 1) there are different ways to say the same thing, and 2) the same knowledge can be represented in different ways and therewith leading to a different shape of the query. Some possible ways to mitigate either is being looked into, like specifying a CQ controlled natural language [3] and modelling styles [4] so that one might be able to generate an algorithm to find and link or swap or choose one of them [5,6], but all that is still in the preliminary stages.

Meanwhile, there is that freely available dataset and the in-depth rigorous analysis, so that, hopefully, a solution may be found sooner rather than later.



[1] Potoniec, J., Wisniewski, D., Lawrynowicz, A., Keet, C.M. Dataset of Ontology Competency Questions to SPARQL-OWL Queries Translations. Data in Brief, 2020, in press.

[2] Wisniewski, D., Potoniec, J., Lawrynowicz, A., Keet, C.M. Analysis of Ontology Competency Questions and their Formalisations in SPARQL-OWL. Journal of Web Semantics, 2019, 59:100534.

[3] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS vol 1057, 3-15.

[4] Fillottrani, P.R., Keet, C.M. Dimensions Affecting Representation Styles in Ontologies. 1st Iberoamerican conference on Knowledge Graphs and Semantic Web (KGSWC’19). Springer CCIS vol 1029, 186-200. 24-28 June 2019, Villa Clara, Cuba. Paper at Springer

[5] Fillottrani, P.R., Keet, C.M. Patterns for Heterogeneous TBox Mappings to Bridge Different Modelling Decisions. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS vol 10249, 371-386. Portoroz, Slovenia, May 28 – June 2, 2017.

[6] Khan, Z.C., Keet, C.M. Automatically changing modules in modular ontology development and management. Annual Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT’17). ACM Proceedings, 19:1-19:10. Thaba Nchu, South Africa. September 26-28, 2017.

Computer ethics (SIPP) notes relevant to South Africa

Social issues and Professional Practice in IT & Computing (formerly known as ‘computer ethics’ in our curriculum) increased in prominence in curriculum guidelines in recent years. Also, there is an increase in popular and scientific literature on computer ethics especially since Big Data, the popularisation of Artificial Intelligence, and now the 4th Industrial Revolution. Most of the articles and books are focussed on ethical and social issues where SIPP is taught mostly, being in ‘the West’.

It is taught elsewhere as well. For instance, since the early 2000s, the Computer Science Department at the University of Cape Town has taught it as part of a Masters in IT conversion course and as a block in a first-year computer science course. While initial material and lecture notes were reused from one of those universities in ‘the West’, over time, attempts have been made to localise it to some extent at least. For instance, South Africa has its own version of EU’s GDPR (the POPI Act), there is a South African IT organisation (IITPSA) with its code of conduct, and is the textbook case that illustrates the concept of leapfrogging with its wireless network (and perhaps also with the digital divide). In addition, some ‘aspects’ look different from a country that is classified as an emerging economy than for a high-income country; e.g., as patent protection and Silicon Valley’s data collection vs. potentially stifling emerging local tech companies and digital colonialism, respectively.

Updating lecture notes takes time, and so it is typically a multi-author effort carried out every few years, as it is in this case. Differently from the previous main update, is that, in line with teaching and with the times, the lecture notes are now publicly available for free on UCT’s “Open Educational Resources” site. It is with some hesitation, as it clearly does not have the quality of a textbook and we know of certain limitations that I would have liked to be better. Yet, I hope that it may be of some use already nonetheless, be it for people in the region or from ‘outside’ looking in.

I have contributed some sections as well, partially because I think it’s an interesting theme and partially because I have to teach it. I would have liked to add more, but time was running out (i.e., it’s a balancing act with other commitments, like research, teaching, and admin). With more time, the privacy chapter would have been updated better (e.g., also touching upon privacy in the context of the common practice of mobile phone sharing), emerging concepts would have been better integrated (e.g., digital colonialism, surveillance capitalism), some of the separate exercises could have been integrated, and so on and so forth. Alas, maybe a next time. (To any of my students reading this: some of these aspects are already integrated in the slides that are used in the CSC1016S lectures, which are running ahead in content compared to the written notes, and that is examinable content as well.)

More and better TDD for ontology authoring

Test-driven development (TDD) for ontology authoring [1] has received attention previously, including its accompanying tool TDDOnto [2] that was subsequently improved upon into the (also open source) TDDonto2 tool [3]. The TDDonto2 demo paper [3] did not contain the technical details about the new-and-improved algorithms and specification for TDD testing that we claimed it had. They are published just now in the International Journal on Artificial Intelligence Tools, as the article entitled More Effective Ontology Authoring with Test-Driven Development and the TDDonto2 tool [4]. The better algorithms cover more OWL language features than the original v1 of the theory and tool and it includes a specification for TDD testing such that there is not just pass/fail/absent as test result, but specific outcomes of the TDD test that are more informative, like that the ontology will become incoherent if that axiom were to be added. Given that model, the general flow for a simple standard case of a single TDD test (though more axioms can be tested at once) is as follows:

simplified view of the extended TDD process (source: adapted from [4])

The elements in the figure that are coloured light grey are the steps covered by the specification for TDD testing, algorithms, and TDDonto2 tool that is introduced in the paper.

The paper’s title clearly also hints to another contribution: using TDDonto2 for ontology authoring is significantly more effective. It was compared against the commonly used (and test-last) Protégé interface, which showed that the participants completed a larger part of the task in less time and with fewer mistakes. It also requires fewer interactions (clicking and typing) in the interface, which we reported on in an earlier (longer) tech report [5].

screenshot of the outcome of running the four tests on the sample ontology, in TDDonto2

As usual with research, more can be done. This is especially with respect to the white boxes in the figure above, i.e., the other aspects that would contribute toward a complete TDD methodology for ontology development. One step that we have been working on, is the idea of turning competency questions into axioms for TDD, which now is doable from CQ to SPARQL-OWL query [6] (more about that later), a CNL that may contribute to the authoring [7], and trying to figure out the modelling styles more precisely [8], since they hamper automation of these first steps in the process to get those axioms into the TDD plugin in a user-friendly way.



[1] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS vol. 9678, 642-657. 29 May – 2 June, 2016, Crete, Greece.

[2] Lawrynowicz, A., Keet, C.M. The TDDonto Tool for Test-Driven Development of DL Knowledge bases. 29th International Workshop on Description Logics (DL’16). April 22-25, Cape Town, South Africa. CEUR WS vol. 1577.

[3] Davies, K. Keet, C.M., Lawrynowicz, A. TDDonto2: A Test-Driven Development Plugin for arbitrary TBox and ABox axioms. The Semantic Web: ESWC 2017 Satellite Events, Blomqvist, E et al. (eds.). Springer LNCS vol 10577, 120-125. Portoroz, Slovenia, May 28 – June 2, 2017.

[4] Davies, K., Keet, C.M., Lawrynowicz, A. More Effective Ontology Authoring with Test-Driven Development and the TDDonto2 tool. International Journal on Artificial Intelligence Tools, 2019, 28(7): 1950023.

[5] Keet, C.M., Davies, K., Lawrynowicz, A. More Effective Ontology Authoring with Test-Driven Development. Technical Report 1812.06015. December 2018

[6] Wisniewski, D., Potoniec, J., Lawrynowicz, A., Keet, C.M. Analysis of Ontology Competency Questions and their Formalisations in SPARQL-OWL. Journal of Web Semantics. (in print)

[7] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS. (in print)

[8] Fillottrani, P.R., Keet, C.M.. Dimensions Affecting Representation Styles in Ontologies. 1st Iberoamerican conference on Knowledge Graphs and Semantic Web (KGSWC’19). Springer CCIS vol. 1029, 186-200. 23-30 June 2019, Villa Clara, Cuba.

Localising Protégé with Manchester syntax into your language of choice

Some people like a quasi natural language interface in ontology development tools, which is why Manchester Syntax was proposed [1]. A downside is that it locks the ontology developer into English, so that weird chimaeras are generated in the interface if the author prefers another language for the ontology, such as, e.g., the “jirafa come only (oja or ramita)” mentioned in an earlier post and that was deemed unpleasant in an experiment a while ago [2]. Those who prefer the quasi natural language components will have to resort to localising Manchester syntax and the tool’s interface.

This is precisely what two of my former students—Adam Kaliski and Casey O’Donnell—did during their mini-project in the ontology engineering course of 2017. A localisation in Afrikaans, as the case turned out to be. To make this publicly available, Michael Harrison brushed up the code a bit and tested it worked also in the new version of Protégé. It turned out it wasn’t that easy to localise it to another language the way it was done, so one of my PhD students, Toky Raboanary, redesigned the whole thing. This was then tested with Spanish, and found to be working. The remainder of the post describes informally some main aspects of it. If you don’t want to read all that but want to play with it right away: here are the jar files, open source code, and localisation instructions for if you want to create, say, a French or Dutch variant.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

The localisation functions as a plugin for Protégé as a ‘view’ component. It can be selected under “Windows – Views – Class views” and then Beskrywing for the Afrikaans and Descripción for Spanish, and dragged into the desired position; this is likewise for object properties.

Instead of burying the translations in the code, they are specified in a separate XML file, whose content is fetched during the rendering. Adding a new ‘simple’ (more about that later) language merely amounts to adding a new XML file with the translations of the Protégé labels and of the relevant Manchester syntax. Here are the ‘simple’ translations—i.e., where both are fixed strings—for Afrikaans for the relevant tool interface components:


Class Description



(Label in Afrikaans)

Equivalent To Dieselfde as
SubClass Of Subklas van
General Class axioms Algemene Klasaksiomas
SubClass Of (Anonymous Ancestor) Subklas van (Naamlose Voorvader)
Disjoint With Disjunkte van
Disjoint Union Of Disjunkte Unie van


The second set of translations is for the Manchester syntax, so as to render that also in the target language. The relevant mappings for Afrikaans class description keywords are listed in the table below, which contain the final choices made by the students who developed the original plugin. For instance, min and max could have been rendered as minimum and maksimum, but the ten minste and by die meeste were deemed more readable despite being multi-word strings. Another interesting bit in the translation is negation, where there has to be a second ‘no’ since Afrikaans has double negation in this construction, so that it renders it as nie <expression> nie. That final rendering is not grammatically perfect, but (hopefully) sufficiently clear:

An attempt at double negation with a fixed string

An attempt at double negation with a fixed string

Manchester OWL Keyword Afrikaans Manchester OWL

Keyword or phrase

some sommige
only slegs
min ten minste
max by die meeste
exactly precies
and en
or of
not nie <expression> nie
SubClassOf SubklasVan
EquivalentTo DieselfdeAs
DisjointWith DisjunkteVan


The people involved in the translations for the object properties view for Afrikaans are Toky, my colleague Tommie Meyer (also at UCT), and myself; snyding for ‘intersection’ sounds somewhat odd to me, but the real tough one to translate was ‘SuperProperty’. Of the four options that were considered—SuperEienskap, SuperVerwantskap, SuperRelasie, and SuperVerband SuperVerwantskap was chosen with Tommie having had the final vote, which is also a semantic translation, not a literal translation.

Screenshot of the object properties description, with comparison to the English

The Spanish version also has multi-word strings, but at least does not do double negation. On the other hand, it has accents. To generate the Spanish version, myself, my collaborator Pablo Fillottrani from the Universidad Nacional del Sur, Argentina, and Toky had a go at it in translating the terms. This was then implemented with the XML file. In case you do not want to dig into the XML file and not install the plugin either, but have a quick look at the translations, they are as follows for the class description view:


Class Description



(in Spanish)

Equivalent To Equivalente a
SubClass Of Subclase de
General Class axioms Axiomas generales de clase
SubClass Of (Anonymous Ancestor) Subclase de (Ancestro Anónimo)
Disjoint With Disjunto con
Disjoint Union Of Unión Disjunta de
Instances Instancias


Manchester OWL Keyword Spanish Manchester OWL Keyword
some al menos uno
only sólo
min al mínimo
max al máximo
and y
or o
not no
exactly exactamente
SubClassOf SubclaseDe
EquivalentTo EquivalenteA
DisjointWith DisjuntoCon

And here’s a rendering of a real ontology, for geo linked data in Spanish, rather than African wildlife yet again:

screenshot of the plugin behaviour with someone else’s ontology in Spanish

One final comment remains, which has to do with the ‘simple’ mentioned above. The approach of localisation presented here works only with fixed strings, i.e., the strings do not have to change depending on the context where it is uses. It won’t work with, say, isiZulu—a highly agglutinating and inflectional language—because isiZulu doesn’t have fixed strings for the Manchester syntax keywords nor for some other labels. For instance, ‘at least one’ has seven variants for nouns in the singular, depending on the noun class of the noun of the OWL class it quantifies over; e.g., elilodwa for ‘at least one’ apple, and esisodwa for ‘at least one’ twig. Also, the conjugation of the verb for the object property depends on the noun class of the noun of the OWL class, but in this case for the one that plays the subject; e.g., it’s “eats” in English for both humans and elephants eating, say, fruit, so one string for the name of the object property, but that’s udla and idla, respectively, in isiZulu. This requires annotations of the classes with ontolex-lemon or a similar approach and a set of rules (which we have, btw) to determine what to do in which case, which requires on-the-fly modifications to Manchester syntax keywords and elements’ names or labels. And then there’s still phonological conditioning to account for. It surely can be done, but it is not as doable as with the ‘simple’ languages that have at least a disjunctive orthography and much less genders or noun classes for the nouns.

In closing, while there’s indeed more to translate in the Protégé interface in order to fully localise it, hopefully this already helps as-is either for reading at least a whole axiom in one’s language or as stepping stone to extend it further for the other terms in the Manchester syntax and the interface. Feel free to extend our open source code.



[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. 8th Extended Semantic Web Conference (ESWC’11), G. Antoniou et al (Eds.), Heraklion, Crete, Greece, 29 May-2 June, 2011. Springer LNCS 6643, 321-335.

[3] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157. accepted version

Design rationale and overview of the African Wildlife tutorial ontologies

(update 30-7-2020: more details are described in the journal article published in the Journal of Biomedical Semantics)

There are several tutorial ontologies, which typically focus on illustrating one or two aspects of ontology development, notably language features and automated reasoning. This may suffice for one’s aims, but for an ontology engineering course, one would need to be able to illustrate a myriad of development factors and devise exercises for a wider range of tasks of ontology development. For instance, to illustrate the use of ontology design patterns, competency questions, foundational ontologies, and science-based modelling practices, neither of which is addressed easily by the popular tutorial ontologies (notably: wine and pizza), perhaps because they predate most of the advances made in ontology engineering research. Also, I have noticed that my students replicate examples from the exercises they carry out and from inspecting popular and easy-to-find ontologies. Marking the practical assignments, I got to see sandwich and ice cream and burger ontologies with toppings and value partitions, and software and mobile phone ontologies where laptop models are instances rather than classes. Not providing good and versatile examples holistically, causes the propagation of sub-optimal ontology development at least in the exercises, which then also may affect negatively the development of an operational domain ontology that the graduates may have to develop later on.

I’ve been exploring alternatives and variants over the past 11 years in the ontology engineering courses that I have taught yearly to about 8-40 students/year. In an attempt to systematise and possibly generalise from that, I’ve identified 22 requirements that contribute to a good tutorial ontology, which concern the suitability of the subject domain (7 factors), the ease of demonstrating logics and reasoning tasks (7), and assistance with demonstrating engineering aspects (8). Its details are described in a technical report [1]. I don’t claim that it’s an exhaustive list, but that it is one that may help someone to develop their own tutorial ontology in a fun or interesting topic if they so wish—after all, not everyone is interested in pizzas, wines, African wildlife, pets, shirts, a small university, or Robert Stevens’ family.

I’ve tried out a variety of extant tutorial ontologies as well as a range of versions of the African Wildlife Ontology (AWO) over the years (early experiences), eventually settling for a set of 14 versions, all the way from the example from the Primer [2] to DOLCE- and BFO-aligned to translated in several languages, and some with possible answers to some of the exercises. A graphical rendering of the main classes and relations is shown in the following figure:

The versions of the AWO are summarised in the following table, which is also mentioned as annotation in the OWL files.


The AWO meets a majority of the 22 requirements, is mature by now, and it has been used yearly in an ontology engineering course or tutorial since 2010. Also, it is links up with my ontology engineering textbook with relevant examples and exercises. The AWO provides a wide range of options concerning examples and exercises for ontology engineering well beyond illustrating only logic features and automated reasoning. For instance, it assists in demonstrating tasks about ontology quality, such as alignment to a foundational ontology and satisfying competency questions, versioning, and multilingual ontologies. For instance, it is easier to demonstrate alignment of a class Animal to DOLCE’s (Non-Agentive) Physical Object than, say, debating what Algorithm aligns with or descend into political debates on the gender binary or what constitutes a family. One can use the height or the colours of the plants and animals to discuss how to model attributes as qualities or dependent entities cf. OWL’s data properties or an artificial ValuePartition. Declare, say, de individual lion simba as an instance of Lion, rather than the confusion regarding grape varieties. Use intuitively obvious disjointness between animals and plants, and subsequently easy catches on sensitising modellers to the far-reaching effects of declaring domain and range axioms by first asserting that animals eat animals, and then adding that carnivorous plants eat insects. In addition, it links up easily to topics for ontology integration activities, such as with biodiversity data, wildlife trade, and tourism to create, e.g., an OBDA system with freely available data (e.g., taken from here) or an ontology-enhanced website for an organisation that offers environmentally sustainable safaris. More examples of broad usage options are described in section 2.3 in the tech report.

The AWO is freely available under a CC-BY licence through the textbook’s webpage at https://people.cs.uct.ac.za/~mkeet/OEbook/ in this folder. A more comprehensive description of the requirements, design, and content is described in a technical report [1] for the time being.



[1] Keet, CM. The African Wildlife Ontology tutorial ontologies: requirements, design, and content. Technical Report 1905.09519. 23 May 2019. https://arxiv.org/abs/1905.09519.

[2] Antoniou, G., van Harmelen, F. A Semantic Web Primer. MIT Press, USA. 2003.

A controlled language for competency questions

The formulation of so-called competency questions (CQs) at the start of the development of an ontology or a similar artefact is a recurring exercise in various ontology development methodologies. For instance, “Which animals are the predators of impalas?” that an African Wildlife ontology should be able to answer and “What are the main parsers for compilers?” that a software ontology may be able to answer. Yet, going by the small number of publicly available CQs, it seems like that many developers skip that step in the process. And it turned out that for those who at least try, a considerable number of purported CQs, actually aren’t at all, are mis-formulated for even having a chance for it to work smoothly (for, say, automated formalisation), or are grammatically incorrect (a depressing 1/3 of the sentences in our test set, to be more precise). Also, there’s no software support in guiding a modeller to formulate CQs, nor to actually do something with it, such as converting it automatically into SPARQL; hence, it is disjointed from the actual artefact under development, which doesn’t help uptake.

In an attempt to narrow this gap, we have developed a controlled natural language (CNL) called CLaRO: a Competency question Language for specifying Requirements for an Ontology, model, or specification [1] for CQs for ‘TBoxes’ (type-level information and knowledge, not instances). Advantages of a CNL for CQs include that it should be easier—or at least less hard—to formalise a CQ into a query over the model and to formulate a CQ in the first place. CLaRO more specifically operates at the language layer, so it deals with noun and verb phrases, rather that the primitives of a representation language and the predetermined modeling style that comes with it. It is also the first one that has been evaluated on coverage, which turned out to be good and better than earlier works on templates for CQs. To add more to it, we also made a basic tool that offers assistive authoring to write CQs (screencast).

We got there by availing of a recently published dataset of 234 CQs that had been analysed linguistically into patterns. We analysed those patterns, and that outcome informed the design of CLaRO. Given the size, this first version pf CLaRO is template-based, with core CQs and several variants, totalling to 134 templates. CLaRO was evaluated with a random sample from the original 234 CQs, a newly created set of CQs scrambled together for related work, and half of the Pizza CQs, as well as evaluated against templates presented elsewhere [2,3]. The results are summarised in the paper and discussed in more detail in a related longer technical report [4]. Here’s the nice table with the aggregate data:

Aggregated results for coverage of the three test sets. The best values are highlighted in italics. (CLaRO results are for the complete set of 134 templates) (source: based on [1])

Given the encouraging results, we also created a proof of concept CQ authoring tool, which both can assist in the authoring of CQs and may contribute to get a better idea of requirements for such a tool. One can use autocomplete so that it proposes a template, and then fill in a selected template, or just ignore it and write a free-form CQ, hit enter, and save it to a text file; the file can also be opened and CQs deleted. Here are a few screenshots on selecting and adding a CQ in the tool:

We will be presenting CLaRO at the 13th International Conference on Metadata and Semantics Research (MTSR’19) in Rome at the end of October. If you have any questions, comments, or suggestions, please don’t hesitate to contact us. The CNL specification in csv and XML formats, the evaluation data, and the tool with the source code are available from the CLaRO Github repo.



[1] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS. (in print)

[2] Ren, Y., Parvizi, A., Mellish, C., Pan, J.Z., van Deemter, K., Stevens, R.: Towards competency question-driven ontology authoring. In: Extended Semantic Web Conference (ESWC’14). LNCS, Springer (2014)

[3] Bezerra, C., Freitas, F., Santana, F.: Evaluating ontologies with competency questions. In: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) – Volume 03. pp. 284-285. IEEE Computer Society, Washington, DC, USA (2013)

[4] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: A data-driven CNL for specifying competency questions. University of Cape Town. Technical Report. 17 July 2019. https://arxiv.org/abs/1907.07378

Tutorial: OntoClean in OWL and with an OWL reasoner

The novelty surrounding all things OntoClean described here, is that we made a tutorial out of a scientific paper and used an example that is different from the (in?)famous manual example to clean up a ‘dirty’ taxonomy.

I’m assuming you have at least heard of OntoClean, which is an ontology-inspired method to examine the taxonomy of an ontology, which may be useful especially when the classes (/universals/concepts/..) have no or only a few properties or attributes declared. Based on that ontological information provided by the modeller, it will highlight violations of ontological principles in the taxonomy so that the ontologist may fix it. Its most recent overview is described in Guarino & Welty’s book chapter [1] and there are handouts and slides that show some of the intermediate steps; a 1.5-page summary is included as section 5.2.2 in my textbook [2].

Besides that paper-based description [1], there have been two attempts to get the reasoning with the meta-properties going in a way that can exploit existing technologies, which are OntOWLClean [3] and OntOWL2Clean [4]. As the names suggest, those existing and widely-used mechanisms are OWL and the DL-based reasoners for OWL, and the latter uses OWL2-specific language features (such as role chains) whereas the former does not. As it happened, some of my former students of the OE course wanted to try the OntoOWLClean approach by Welty, and, as they were with three students in the mini-project team, they also had to make their own example taxonomy, and compare the two approaches. It is their—Todii Mashoko, Siseko Neti, and Banele Matsebula’s—report and materials we—Zola Mahlaza and I—have brushed up and rearranged into a tutorial on OntoClean with OWL and a DL reasoner with accompanying OWL files for the main stages in the process.

There are the two input ontologies in OWL (the domain ontology to clean and the ‘ontoclean ontology’ that codes the rules in the TBox), an ontology for the stage after punning the taxonomy into the ABox, and one after having assigned the meta-properties, so that students can check they did the steps correctly with respect to the tutorial example and instructions. The first screenshot below shows a section of the ontology after pushing the taxonomy into the ABox and having assigned the meta-properties. The second screenshot illustrates a state after having selected, started, and run the reasoner and clicked on “explain” to obtain some justifications why the ontology is inconsistent.

section of the punned ontology where meta-properties have been assigned to each new individual.

A selection of the inconsistencies (due to violating OntoClean rules) with their respective explanations

Those explanations, like shown in the second screenshot, indicate which OntoClean rule has been violated. Among others, there’s the OntoClean rule that (1) classes that are dependent may have as subclasses only those classes that are also dependent. The ontology, however, has: i) Father is dependent, ii) Male is non-dependent, and iii) Father has as subclass Male. This subsumption violates rule (1). Indeed, not all males are fathers, so it would be, at least, the other way around (fathers are males), but it also could be remodelled in the ontology such that father is a role that a male can play.

Let us look at the second generated explanation, which is about violating another OntoClean rule: (2) sortal classes have only as subclasses classes that are also sortals. Now, the ontology has: i) Ball is a sortal, ii) Sphere is a non-sortal, and iii) Ball has as subclass Sphere. This violates rule (2). So, the hierarchy has to be updated such that Sphere is not subsumed by Ball anymore. (e.g., Ball has as shape some Sphere, though note that not all balls are spherical [notably, rugby balls are not]). More explanations of the rule violations are described in the tutorial.

Seeing that there are several possible options to change the taxonomy, there is no solution ontology. We considered creating one, but there are at least two ‘levels’ that will influence what a solution may look like: one could be based on a (minimum or not) number of changes with respect to the assigned meta-properties and another on re-examining the assigned meta-properties (and then restructuring the hierarchy). In fact, and unlike the original OntoClean example, there is at least one case where there is a meta-property assignment that would generally be considered to be wrong, even though it does show the application of the OntoClean rule correctly. How best to assign a meta-property, i.e., which one it should be, is not always easy, and the student is also encouraged to consider that aspect of the method. Some guidance on how to best modify the taxonomy—like Father is-a Male vs. Father inheres-in some Male—may be found in other sections and chapters of the textbook, among other resources.


p.s.: this tutorial is the result of one of the activities to improve on the OE open textbook, which are funded by the DOT4D project, as was the tool to render the axioms in DL in Protégé. A few more things are in the pipeline (TBC).



[1] Guarino, N. and Welty, C. A. (2009). An overview of OntoClean. In Staab, S. and Studer, R., editors, Handbook on Ontologies, International Handbooks on Information Systems, pages 201-220. Springer.

[2] Keet, C. M. (2018). An introduction to ontology engineering. College Publications, vol 20. 344p.

[3] Welty, C. A. (2006). OntOWLClean: Cleaning OWL ontologies with OWL. In Bennett, B. and Fellbaum, C., editors, Proceedings of the Fourth International Conference on Formal Ontology in Information Systems (FOIS 2006), Baltimore, Maryland, USA, November 9-11, 2006, volume 150 of Frontiers in Artificial Intelligence and Applications, pages 347-359. IOS Press.

[4] Glimm, B., Rudolph, S., Volker, J. (2010). Integrated metamodeling and diagnosis in OWL 2. In Peter F. Patel-Schneider, Yue Pan, Pascal Hitzler, Peter Mika, Lei Zhang, Je_ Z. Pan, Ian Horrocks, and Birte Glimm, editors, Proceedings of the 9th International Semantic Web Conference, LNCS vol 6496, pages 257-272. Springer.

DL notation plugin for Protégé 5.x

Once upon a time… the Protégé ontology development environment used Description Logic (DL) symbols and all was well—for some users at least. Then Manchester Syntax came along as the new kid on the block, using hearsay and opinion and some other authors’ preferences for an alternative rendering to the DL notation [1]. Subsequently, everyone who used Protégé was forced to deal with those new and untested keywords in the interface, like ‘some’ and ‘only’ and such, rather than the DL symbols. That had another unfortunate side-effect, being that it hampers internationalisation, for it jumbles up things rather awkwardly when your ontology vocabulary is not in English, like, say, “jirafa come only (oja or ramita)”. Even in the same English-as-first-language country, it turned out that under a controlled set-up, the DL axiom rendering in Protégé fared well in a fairly large sized experiment when compared to the Protégé interface with the sort of Manchester syntax with GUI [2], and also the OWL 2 RL rules rendering appear more positive in another (smaller) experiment [3]. Various HCI factors remain to be examined in more detail, though.

In the meantime, we didn’t fully reinstate the DL notation in Protégé in the way it was in Protégé v3.x from some 15 years ago, but with our new plugin, it will at least render the class expression in DL notation in the tool. This has the benefits that

  1. the modeller will receive immediate feedback during the authoring stage regarding a notation that may be more familiar to at least a knowledge engineer or expert modeller;
  2. it offers a natural language-independent rendering of the axioms with respect to the constructors, so that people may develop their ontology in their own language if they wish to do so, without being hampered by continuous code switching or the need for localisation; and
  3. it also may ease the transition from theory (logics) to implementation for ontology engineering novices.

Whether it needs to be integrated further among more components of the tabs and views in Protégé or other ODEs, is also a question for HCI experts to answer. The code for the DL plugin is open source, so you could extend it if you wish to do so.

The plugin itself is a jar file that can simply be dragged into the plugin folder of a Protégé installation (5.x); see the github repo for details. To illustrate it briefly, after dragging the jar file into the plugin folder, open Protégé, and add it as a view:

Then when you add some new axioms or load an ontology, select a class, and it will render all the axioms in DL notation, as shown in the following two screenshots form different ontologies:

For the sake of illustration, here’s the giraffe that eats only leaves or twigs, in the Spanish version of the African Wildlife Ontology:

The first version of the tool was developed by Michael Harrison and Larry Liu as part of their mini-project for the ontology engineering course in 2017, and it was brushed up for presentation beyond that just now by Michael Harrison (meanwhile an MSc student a CS@UCT), which was supported by a DOT4D grant to improve my textbook on ontology engineering and accompanying educational resources. We haven’t examined all possible ‘shapes’ that a class expression can take, but it definitely processes the commonly used features well. At the time of writing, we haven’t detected any errors.

p.s.: if you want your whole ontology exported at once in DL notation and to latex, for purposes of documentation generation, that is a different usage scenario and is already possible [4].

p.p.s.: if you want more DL notation, please let me know, and I’ll try to find more resources to make a v2 with more features.


[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] E. Alharbi, J. Howse, G. Stapleton, A. Hamie and A. Touloumis. The efficacy of OWL and DL on user understanding of axioms and their entailments. The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudre-Mauroux, J. Sequeda, C. Lange and J. He (eds.). Springer 2017, pp20-36.

[3] M. K. Sarker, A. Krisnadhi, D. Carral and P. Hitzler, Rule-based OWL modeling with ROWLtab Protégé plugin. Proceedings of ESWC’17, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig (eds.). Springer. 2017, pp 419-433.

[4] Cogan Shimizu, Pascal Hitzler, Matthew Horridge: Rendering OWL in Description Logic Syntax. ESWC (Satellite Events) 2017. Springer LNCS. pp109-113

Language annotation on the Web with MoLA

The Web consists of very many resources in many languages and has information about even more. Sure, the majority of Internet users speak English, Chinese, or Spanish, but there are sites, pages, paragraphs, and documents in other languages and about lesser known ‘languoids’ (language, dialect, variant, etc.), ranging from, say, a poem about the poor man’s dinner written in an old Brabants dialect that used to be spoken in the south of the Netherlands to the effects of mobile phones on Zimbabwean (cf. South African) isiNdebele [1]. How should that be annotated? Here’s a complex use case of languoids for old French:

(source: [2])

The extant multilingual Semantic Web models, such as W3C’s ontolex-lemon community standard have outsourced that to a ‘the language tag comes from some place’, as they focus on the word-level and/or sentence-level for the multilingual (semantic) Web. There are indeed standardisations of language tags. Notably, there are the ISO 639 codes (parts 1, 2, 3 and 5) for some 8000 languages—but there are more languoids that are not covered by the ISO list, with an estimated 8-15K or so currently spoken languoids and 150K extinct. There are also Glottolog, Ethnologue, and MultiTree, which are more comprehensive in some respect, but they are limited and problematic in other cases. For instance, Glottolog—the best among them—still uses the broader/narrower than, has artificial names for grouping languoids, has inconsistencies in modelling decisions, and is still incomplete in coverage.

My co-authors—Frances Gillis-Webber, also at UCT, and Sabine Tittel, with the Heidelberg Academy of Sciences and Humanities—and I aim to change that so as to allow for more comprehensive and more inclusive language tags and annotations on the Semantic Web.

In order to be able to do so, we developed a Model for Language Annotation (MoLA) that caters for relatively comprehensive languoid annotations and how they are related, such as allowing recording which languoid evolved from or was influenced by which other languoid, when it was spoken and where, its preferred and alternate names, what sort of lect it is (e.g., dialect, pidgin), which dialect cluster or language family it is a member of, and backward compatibility with the ISO 639 codes.

The design approach was that of labour-intensive manual modelling, including competency questions, an extensive use case, and iterative development of the model at the conceptualisation stage using the Object-Role Modeling (ORM) language. This model was then formalised in OWL (well, most of it). It was tested on the competency questions, smaller use case scenarios, and validated with the large use case. A snippet for Spanish is as follows, as the one for Old French gets quite lengthy (some details).

It enhances Glottolog’s model on several key points, including proper relations between languoids cf BT/RT, a languoid can be associated with zero or more regions, and it allows for multiple names of a languoid both concurrently and over time.

This sounds like a smooth process, but there were a few modelling hurdles that had to be overcome. One of them is level of granularity of analysis of a languoid. For instance, one could argue both that isiXhosa is a language—it’s one of the 11 official languages of South Africa—but also that it is a dialect cluster (i.e., a collection), as there are multiple dialects of isiXhosa. This is a similar case for Old French that’s a language and member of the Romance family of languages, but also can be seen as a collection of dialects (e.g., Picard and Norman), and dialects, in turn, may have varieties (e.g., Artois and Santerre for Picard). On the bright side, this now can be represented and, because it is represented explicitly, it can be queried, such as “Which languoids are dialects of French that were spoken in the middle ages in France?” and “Which languoids are a member of Nguni?”. The knowledgebase still needs to be populated, though, so it won’t work yet with all languoids.

More details can be found in the paper that was recently published [2]. It will be presented in a few weeks at the 1st Iberoamerican Conference on Knowledge Graphs and Semantic Web (KGSWC’19), in Villa Clara, Cuba, alongside 13 other papers on ontologies. The first author, Frances, is soon travelling to the ISWS summer school, so I will present it at KGSWC’19.



[1] Nkomo, D., Khumalo, L. Embracing the mobile phone technology: its social and linguistic impact with special reference to Zimbabwean Ndebele. African Identities, 10(2): 143-153.

[2] Gillis-Webber, F., Tittel, S., Keet, C.M.. A Model for Language Annotations on the Web. 1st Iberoamerican conference on Knowledge Graphs and Semantic Web (KGSWC’19). Springer CCIS. 23-30 June 2019, Villa Clara, Cuba.

About modelling styles in ontologies

As any modeller will know, there are pieces of information or knowledge that can be represented in different ways. For instance, representing ‘marriage’ as class or as a ‘married to’ relationship, adding ‘address’ as an attribute or a class in one’s model, and whether ‘employee’ will be positioned as a subclass of ‘person’ or as a role that ‘person’ plays. In some cases, there a good ontological arguments to represent it in one way or the other, in other cases, that’s less clear, and in yet other cases, efficiency is king so that the most compact way of representing it is favoured. This leads to different design decisions in ontologies, which hampers ontology reuse and alignment and affects other tasks, such as evaluating competency questions over the ontology and verbalising ontologies.

When such choices are made consistently throughout the ontology, one may consider this to be a modelling style or representation style. If one then knows which style an ontology is in, it would simplify use and reuse of the ontology. But what exactly is a representation style?

While examples are easy to come by, shedding light on that intuitive notion turned out to be harder than it looked like. My co-author Pablo Fillottrani and I tried to disentangle it nonetheless, by characterising the inherent features and the dimensions by which a style may differ. This resulted in 28 different traits for the 10 identified dimensions.  For instance, the dimension “modular vs. monolithic” has three possible options: 1) ‘Monolithic’, where the ontology is stored in one file (no imports or mergers); 2) ‘Modular, external’, where at least one ontology is imported or merged, and it kept its URI (e.g., importing DOLCE into one’s domain ontology, not re-creating it there); 3) ‘Modular, internal’, where there’s at least one ontology import that’s based on having carved up the domain in the sense of decomposition of the domain (e.g., dividing up a domain into pizzas and drinks at pizzerias).  Other dimensions include, among others, the granularity of relations (many of few), how the hierarchy looks like, and attributes/data properties.

We tried to “eat our own dogfood” and applied the dimensions and traits to a set of 30 ontologies. This showed that it is feasible to do, although we needed two rounds to get to that stage—after the first round of parallel annotation, it turned out we had interpreted a few traits differently, and needed to refine the number of traits and be more precise in their descriptions (which we did). Perhaps unsurprising, some tendencies were observed, and we could identify three easily recognisable types of ontologies because most ontologies had clearly one or the other trait and similar values for sets of trait. Of course, there were also ontologies that were inherently “mixed” in the sense of having applied different and conflicting design decisions within the same ontology, or even included two choices. Coding up the results, we generated two spider diagrams that visualise that difference. Here’s one:

Details of the dimensions, traits, set-up and results of the evaluation, and discussion thereof have been published this week [1] and we’ll present it next month at the 1st Iberoamerican Conference on Knowledge Graphs and Semantic Web (KGSWC’19), in Villa Clara, Cuba, alongside 13 other papers on ontologies. I’m looking forward to it!



[1] Keet, C.M., Fillottrani, P.R.. Dimensions Affecting Representation Styles in Ontologies. 1st Iberoamerican conference on Knowledge Graphs and Semantic Web (KGSWC’19). Springer CCIS vol 1029, 186-200. 24-28 June 2019, Villa Clara, Cuba. Paper at Springer