# Progress on generating educational questions from ontologies

With increasing student numbers, but not as much more funding for schools and universities, and the desire to automate certain tasks anyhow, there have been multiple efforts to generate and mark educational exercises automatically. There are a number of efforts for the relatively easy tasks, such as for learning a language, which range from the entry level with simple vocabulary exercises to advanced ones of automatically marking essays. I’ve dabbled in that area as well, mainly with 3rd-year capstone projects and 4th-year honours project student projects [1]. Then there’s one notch up with fact recall and concept meaning recall questions, and further steps up, such as generating multiple-choice questions (MCQs) with not just obviously wrong distractors but good distractors to make the question harder. There’s quite a bit of work done on generating those MCQs in theory and in tooling, notably [2,3,4,5]. As a recent review [6] also notes, however, there are still quite a few gaps. Among others, about generalisability of theory and systems – can you plug in any structured data or knowledge source to question templates – and the type of questions. Most of the research on ‘not-so-hard to generate and mark’ questions has been done for MCQs, but there are multiple of other types of questions that also should be doable to generate automatically, such as true/false, yes/no, and enumerations. For instance, with an axiom such as $impala \sqsubseteq \exists livesOn.land$ in a ontology or knowledge graph, a suitable question generation system may then generate “Does an impala live on land?” or “True or false: An impala lives on land.”, among other options.

We set out to make a start with tackling those sort of questions, for the type-level information from an ontology (cf. facts in the ABox or knowledge graph). The only work done there, when we started with it, was for the slick and fancy Inquire Biology [5], but which did not have their tech available for inspection and use, so we had to start from scratch. In particular, we wanted to find a way to be able to plug in any ontology into a system and generate those non-MCQ other types of educations questions (10 in total), where the questions generated are at least grammatically good and for which the answers also can be generated automatically, so that we get to automated marking as well.

Initial explorations started in 2019 with an honours project to develop some basics and a baseline, which was then expanded upon. Meanwhile, we have some more designed, developed, and evaluated, which was written up in the paper “Generating Answerable Questions from Ontologies for Educational Exercises” [7] that has been accepted for publication and presentation at the 15th international conference on metadata and semantics research (MTSR’21) that will be held online next week.

In short:

• Different types of questions and the answer they have to provide put different prerequisites on the content of the ontology with certain types of axioms. We specified those for 10 types of educational questions.
• Three strategies of question generation were devised, being ‘simple’ from the vocabulary and axioms and plug it into a template, guided by some more semantics in the ontology (a foundational ontology), and one that didn’t really care about either but rather took a natural language approach. Variants were added to cater for differences in naming and other variations, amounting to 75 question templates in total.
• The human evaluation with questions generated from three ontologies showed that while the semantics-based one was slightly better than the baseline, the NLP-based one gave the best results on syntactic and semantic correctness of the sentences (according to the human evaluators).
• It was tested with several ontologies in different domains, and the generalisability looks promising.

To be honest to those getting their hopes up: there are some issues that cause it never to make it to the ‘100% fabulous!’ if one still wants to designs a system that should be able to take any ontology as input. A main culprit is naming of elements in the ontology, which varies widely across ontologies. There are several guidelines for how to name entities, such as using camel case or underscores, and those things easily can be coded into an algorithm, indeed, but developers don’t stick to them consistently or there’s an ontology import that uses another naming convention so that there likely will be a glitch in the generated sentences here or there. Or they name things within the context of the hierarchy where they put the class, but in the question it is out of that context and then looks weird or is even meaningless. I moaned about this before; e.g., ‘American’ as the name of the class that should have been named ‘American Pizza’ in the Pizza ontology. Or the word used for the name of the class can have different POS tags such that it makes the generated sentence hard to read; e.g., ‘stuff’ as a noun or a verb.

Be this as it may, overall, promising results were obtained and are being extended (more to follow). Some details can be found in the (CRC of the) paper and the algorithms and data are available from the GitHub repo. The first author of the paper, Toky Raboanary, recently made a short presentation video about the paper for the yearly Open Evening/Showcase, which was held virtually and that page is still online available.

References

[1] Gilbert, N., Keet, C.M. Automating question generation and marking of language learning exercises for isiZulu. 6th International Workshop on Controlled Natural language (CNL’18). Davis, B., Keet, C.M., Wyner, A. (Eds.). IOS Press, FAIA vol. 304, 31-40. Co. Kildare, Ireland, 27-28 August 2018.

[2] Alsubait, T., Parsia, B., Sattler, U. Ontology-based multiple choice question generation. KI – Kuenstliche Intelligenz, 2016, 30(2), 183-188.

[3] Rodriguez Rocha, O., Faron Zucker, C. Automatic generation of quizzes from dbpedia according to educational standards. In: The Third Educational Knowledge Management Workshop. pp. 1035-1041 (2018), Lyon, France. April 23 – 27, 2018.

[4] Vega-Gorgojo, G. Clover Quiz: A trivia game powered by DBpedia. Semantic Web Journal, 2019, 10(4), 779-793.

[5] Chaudhri, V., Cheng, B., Overholtzer, A., Roschelle, J., Spaulding, A., Clark, P., Greaves, M., Gunning, D. Inquire biology: A textbook that answers questions. AI Magazine, 2013, 34(3), 55-72.

[6] Kurdi, G., Leo, J., Parsia, B., Sattler, U., Al-Emari, S. A systematic review of automatic question generation for educational purposes. Int. J. Artif. Intell. Edu, 2020, 30(1), 121-204.

[7] Raboanary, T., Wang, S., Keet, C.M. Generating Answerable Questions from Ontologies for Educational Exercises. 15th Metadata and Semantics Research Conference (MTSR’21). 29 Nov – 3 Dec, Madrid, Spain / online. Springer CCIS (in print).

# Bias in ontologies?

Bias in models in the area of Machine Learning and Deep Learning are well known. They feature in the news regularly with catchy headlines and there are longer, more in-depth, reports as well, such as the Excavating AI by Crawford and Paglen and the book Weapons of Math Destruction by O’Neil (with many positive reviews). What about other types of ‘models’, like those that are not built in a data-driven bottom-up way from datasets that happen to lie around for the taking, but that are built by humans? Within Artificial Intelligence still, there are, notably, ontologies. I searched for papers about bias in ontologies, but could find only one vision paper with an anecdote for knowledge graphs [1], one attempt toward a framework but looking at FOAF only [2], which is stretching it a little for what passes as an ontology, and then stretching it even further, there’s an old one of mine on bias in relation to conceptual data models for databases [3].

We simply don’t have bias in ontologies? That sounds a bit optimistic since it’s pervasive elsewhere, and at least worthy of examination whether there is such notion as bias in ontologies and if so, what the sources of that may be. And, if one wants to dig deeper, since Ontology: what is bias anyhow? The popular media is much more liberal in the use of the term ‘bias’ than scientific literature and I’m not going to answer that last question here now. What I did do, is try to identify sources of bias in the context of ontologies and I took a relevant selection of Dimara et al’s list of 154 biases [4] (just like only a subset is relevant to their scope) to see whether they would apply to a set of existing ontologies in roughly the same domain.

The outcome of that exploratory analysis [5], in short, is: yes, there is such notion as bias in ontologies as well. First, I’ve identified 8 types of sources, described them, and illustrated them with hand-picked examples from extant ontologies. Second, I examined the three COVID-19 ontologies (CIDO, CODO, COVoc) on possible bias, and they exhibited different subsets indeed.

The sources can be philosophical, by purpose (commonly known as encoding bias), and ‘subject domain’ source, such as scientific theory, granularity, linguistic, social-cultural, political or religious, and economic motivations, and they may be explicit choices or implicit.

An example of an economic motivation is to (try to) categorise some disorder as a type of disease: there latter gets more resources for medicines, research, treatments and is more costly for insurers who’s rather keep it out of the terminology altogether. Or modifying the properties of a disease or disorder in the classification in the medical ontology so that more people will be categorised as having the disorder even when they don’t. It has happened (see paper for details). Terrorism ontologies can provide ample material for political views to creep in.

Besides the hand-picked examples, I did assess the three COVID-19 ontologies in more detail. Not because I wanted to pick on them—I actually think it’s laudable they tried in trying times—but because they were developed in the same timeframe by three different groups in relative isolation from each other. I looked at both the sources, which can be argued to be present and identified some from a selection of Dimara et al’s list, such as the “mere exposure/familiarity” bias and “false consensus” bias (see table below). How they are present, is also described in that same paper, entitled “An exploration into cognitive bias in ontologies”, which has recently been accepted at the workshop on Cognition And OntologieS V (CAOS’21), which is part of the Joint Ontology Workshops Episode VII at the Bolzano Summer of Knowledge.

Will it matter for automated reasoning when the ontologies are deployed in various information systems? For reasoning over the TBox only, perhaps not so much, or, at least, any inconsistencies that it would have caused should have been detected and discussed during the ontology development stage, rather.

Will it matter for, say, annotating data or literature etc? Some of it yes, for sure. For instance, COVoc has only ‘male’ in the vocabulary, not female (in line with a well-known issue in evidence-based medicine), so when it is used for the “scientific literature triage” they want to, then it’s going to be even harder to retrieve COVID-19 research papers in relation to women specifically. Similarly, when ontologies are used with data, such as for ontology-based data access, bias may have negative effects. Take as example CIDO’s optimism bias, where a ‘COVID-19 experimental drug in a clinical trial’ is a subclass of ‘COVID-19 drug’, and this ontology would be used for OBDA and data integration, as illustrated in the following use case scenario with actual data from the ClinicalTrials database and the FDA approved drugs database:

The data together with the OBDA-enabled reasoner will return ‘hydroxychloroquine’, which is incorrect and the error is due to the biased and erroneous class subsumption declared in the ontology, not the data source itself.

Some peculiarities of content in an ontology may not be due to an underlying bias, but merely a case of ‘ran out of time’ rather than an act of omission due to a bias, for instance. Or it may not be an honest mistake due to bias but a mistake because of some other reason, such as due to having clicked erroneously on a wrong button in the tool’s interface, say, or having misunderstood the modelling language’s features. Disentangling the notion of bias from attendant ontology quality issues is one of the possible avenues of future work. One also can have a go at those lists and mini-taxonomies of cognitive biases and make a better or more comprehensive one, or to try to harmonise the multitude of definitions of what bias is exactly. Methods and supporting software may also assist ontology developers more concretely further down the line. Or: there seems to be enough to do yet.

Lastly, I still hope that I’ll be allowed to present the paper in person at the CAOS workshop, but it’s increasingly looking less and less likely, as our third wave doesn’t seem to want to quiet down and Italy is putting up more hurdles. If not, I’ll try to make a fancy video presentation.

References

[1] K. Janowicz, B. Yan, B. Regalia, R. Zhu, G. Mai, Debiasing knowledge graphs: Why female presidents are not like female popes, in: M. van Erp, M. Atre, V. Lopez, K. Srinivas, C. Fortuna (Eds.), Proceeding of ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks, volume 2180 of CEUR-WS, 2017.

[2] D. L. Gomes, T. H. Bragato Barros, The bias in ontologies: An analysis of the FOAF ontology, in: M. Lykke, T. Svarre, M. Skov, D. Martínez-Ávila (Eds.), Proceedings of the Sixteenth International ISKO Conference, Ergon-Verlag, 2020, pp. 236 – 244.

[3] Keet, C.M. Dirty wars, databases, and indices. Peace & Conflict Review, 2009, 4(1):75-78.

[4] E. Dimara, S. Franconeri, C. Plaisant, A. Bezerianos, P. Dragicevic, A task-based taxonomy of cognitive biases for information visualization, IEEE Transactions on Visualization and Computer Graphics 26 (2020) 1413–1432.

[5] Keet, C.M. An exploration into cognitive bias in ontologies. Cognition And OntologieS (CAOS’21), part of JOWO’21, part of BoSK’21. 13-16 September 2021, Bolzano, Italy. (in print)

# Automatically simplifying an ontology with NOMSA

Ever wanted only to get the gist of the ontology rather than wading manually through thousands of axioms, or to extract only a section of an ontology for reuse? Then the NOMSA tool may provide the solution to your problem.

There are quite a number of ways to create modules for a range of purposes [1]. We zoomed in on the notion of abstraction: how to remove all sorts of details and create a new ontology module of that. It’s a long-standing topic in computer science that returns every couple of years with another few tries. My first attempts date back to 2005 [2], which references modules & abstractions for conceptual models and logical theories to works published in the mid-1990s and, stretching the scope to granularity, to 1985, even. Those efforts, however, tend to halt at the theory stage or worked for one very specific scenario (e.g., clustering in ER diagrams). In this case, however, my former PhD student and now Senior Research at the CSIR, Zubeida Khan, went further and also devised the algorithms for five types of abstraction, implemented them for OWL ontologies, and evaluated them on various metrics.

The tool itself, NOMSA, was presented very briefly at the EKAW 2018 Posters & Demos session [3] and has supplementary material, such as the definitions and algorithms, a very short screencast and the source code. Five different ways of abstraction to generate ontology modules were implemented: i) removing participation constraints between classes (e.g., the ‘each X R at least one Y’ type of axioms), ii) removing vocabulary (e.g., remove all object properties to yield a bare taxonomy of classes), iii) keeping only a small number of levels in the hierarchy, iv) weightings based on how much some element is used (removing less-connected elements), and v) removing specific language profile features (e.g., qualified cardinality, object property characteristics).

In the meantime, we have added a categorisation of different ways of abstracting conceptual models and ontologies, a larger use case illustrating those five types of abstractions that were chosen for specification and implementation, and an evaluation to see how well the abstraction algorithms work on a set of published ontologies. It was all written up and polished in 2018. Then it took a while in the publication pipeline mixed with pandemic delays, but eventually it has emerged as a book chapter entitled Structuring abstraction to achieve ontology modularisation [4] in the book “Advanced Concepts, methods, and Applications in Semantic Computing” that was edited by Olawande Daramola and Thomas Moser, in January 2021.

Since I bought new video editing software for the ‘physically distanced learning’ that we’re in now at UCT, I decided to play a bit with the software’s features and record a more comprehensive screencast demo video. In the nearly 13 minutes, I illustrate NOMSA with four real ontologies, being the AWO tutorial ontology, BioTop top-domain ontology, BFO top-level ontology, and the Stuff core ontology. Here’s a screengrab from somewhere in the middle of the presentation, where I just automatically removed all 76 object properties from BioTop, with just one click of a button:

The embedded video (below) might keep it perhaps still readable with really good eyesight; else you can view it here in a separate tab.

The source code is available from Zubeida’s website (and I have a local copy as well). If you have any questions or suggestions, please feel free to contact either of us. Under the fair use clause, we also can share the book chapter that contains the details.

References

[1] Khan, Z.C., Keet, C.M. An empirically-based framework for ontology modularization. Applied Ontology, 2015, 10(3-4):171-195.

[2] Keet, C.M. Using abstractions to facilitate management of large ORM models and ontologies. International Workshop on Object-Role Modeling (ORM’05). Cyprus, 3-4 November 2005. In: OTM Workshops 2005. Halpin, T., Meersman, R. (eds.), LNCS 3762. Berlin: Springer-Verlag, 2005. pp603-612.

[3] Khan, Z.C., Keet, C.M. NOMSA: Automated modularisation for abstraction modules. Proceedings of the EKAW 2018 Posters and Demonstrations Session (EKAW’18). CEUR-WS vol. 2262, pp13-16. 12-16 Nov. 2018, Nancy, France.

[4] Khan, Z.C., Keet, C.M. Structuring abstraction to achieve ontology modularisation. Advanced Concepts, methods, and Applications in Semantic Computing. Daramola O, Moser T (Eds.). IGI Global. 2021, 296p. DOI: 10.4018/978-1-7998-6697-8.ch004

# A requirements catalogue for ontology languages

If you could ‘mail order’ a language for representing ontologies or fancy knowledge graphs, what features would you want it to have? Or, from an artefact development viewpoint: what requirements would it have to meet? Perhaps it may not be a ‘Christmas wish list’ in these days, but a COVID-19 lockdown ‘keep dreaming’ one instead, although perhaps it may even be feasible to realise if you don’t ask for too much. Either way, answering this on the spot may not be easy, and possibly incomplete. Therefore, I have created a sample catalogue, based on the published list of requirements and goals for OWL and CL, and I added a few more. The possible requirements to choose from currently are loosely structured into six groups: expressiveness/constructs/modelling features; features of the language as a whole; usability by a computer; usability for modelling by humans; interaction with ‘outside’, i.e., other languages and systems; and ontological decisions. If you think the current draft catalogue should be extended, please leave a comment on this post or contact the author, and I’ll update accordingly.

Expressiveness/constructs/modelling features

E-1 Equipped with basic language elements: predicates (1, 2, n-ary), classes, roles, properties, data-types, individuals, … [select or add as appropriate].

E-2 Equipped with language features/constraints/constructs: domain/range axioms, equality (for classes, for individuals), cardinality constraints, transitivity, … [select or add as appropriate].

E-3 Sufficiently expressive to express various commonly used ‘syntactic sugarings’ for logical forms or commonly used patterns of logical sentences.

E-4 Such that any assumptions about logical relationships between different expressions can be expressed in the logic directly.

Features of the language as a whole

F-1 It has to cater for meta-data; e.g., author, release notes, release date, copyright, … [select or add as appropriate].

F-2 An ontology represented in the language may change over time and it should be possible to track that.

F-3 Provide a general-purpose syntax for communicating logical expressions.

F-4 Unambiguous, i.e., not needed to have negotiation about syntactic roles of symbols, or translations between syntactic roles.

F-5 Such that every name has the same logical meaning at every node of the network.

F-6 Such that it is possible to refer to a local universe of discourse (roughly: a module).

F-7 Such that it is possible to relate the ontology to other such universes of discourse.

F-8 Specified with a particular semantics.

F-9 Should not make arbitrary assumptions about semantics.

F-10 Cater for internationalization (e.g., language tags, additional language model).

F-11 Extendable (e.g., regarding adding more axioms to same ontology, add more vocabulary, and/or in the sense of importing other ontologies).

F-12 Balance expressivity and complexity (e.g., for scalable applications, for decidable automated reasoning tasks).

F-13 Have a query language for the ontology.

F-14 Declared with Closed World Assumption.

F-15 Declared with Open World Assumption.

F-16 Use Unique Name Assumption.

F-17 Do not use Unique Name Assumption.

F-18 Ability to modify the language with the language features.

F-19 Ability to plug in language feature extensions; e.g., ‘loading’ a module for a temporal extension.

Usability by computer

UC-1 Be an (identifiable) object on the Web.

UC-2 Be usable on the Web.

UC-3 Using URIs and URI references that should be usable as names in the language.

UC-4 Using URIs to give names to expressions and sets of expressions, in order to facilitate Web operations such as retrieval, importation and cross reference.

UC-5 Have a serialisation in [XML/JSON/…] syntax.

UC-6 Have symbol support for the syntax in LaTeX/…

UC-7 Such that the same entailments are supported, everywhere on the network of ontologies.

UC-8 Able to be used by tools that can do subsumption reasoning/taxonomic classification.

UC-9 Able to be used by tools that can detect inconsistency.

UC-10 Possible to read and write in the document with simple tools, such as a text editor.

UC-11 Unabiguous and simple grammar to ensure parsing documents as simple as possible.

Usability & modelling by humans

HU-1 Easy to use

HU-2 Have at least one compact, human-readable syntax defined which can be used to express the entire language

HU-3 Have at least one compact, human-readable syntax defined so that it can be easily typed up in emails

HU-4 Such that no agent should be able to limit the ability of another agent to refer to any entity or to make assertions about any entity

HU-5 Such that a modeller is free to invent new names and use them in published content.

HU-6 Have clearly definined syntactic sugar, such as a controlled natural language for authoring or rendering the ontology or an exhaustive diagramamtic notation

Interaction with outside

I-1 Shareable (e.g., on paper, on the computer, concurrent access)

I-2 Interoperable (with what?)

I-3 Compatible with existing standards (e.g., RDF, OWL, XML, URIs, Unicode)

I-4 Support an open networks of ontologies

I-5 Possible to import ontologies (theories, files)

I-6 Option ot declare inter-ontology assertions

Ontological decisions

O-1 3-Dimensionalist commitment, where entities are in space but one doesn’t care about time

O-2 3-Dimensionalist with a temporal extension

O-3 4-Dimensionalist commitment, where entities are in spacetime

O-4 Standard view of relations and relationships (there is an order in which the entities participare)

O-5 Positionalist relations and relationships (there’s no order, but entities play a role in the relation/relationship)

O-6 Have additional primitives, such as for subsumption, parthood, collective, stuff, sortal, anti-rigid entities, … [select or add as appropriate]

O-7 Statements are either true or false

O-8 Statements may vague or uncertain; e.g., fuzzy, rough, probabilistic [select as appropriate]

O-9 There should be a clear separation between natural language and ontology

O-10 Ontology and natural language are intertwined

That’s all, for now.

# Design rationale and overview of the African Wildlife tutorial ontologies

(update 30-7-2020: more details are described in the journal article published in the Journal of Biomedical Semantics)

There are several tutorial ontologies, which typically focus on illustrating one or two aspects of ontology development, notably language features and automated reasoning. This may suffice for one’s aims, but for an ontology engineering course, one would need to be able to illustrate a myriad of development factors and devise exercises for a wider range of tasks of ontology development. For instance, to illustrate the use of ontology design patterns, competency questions, foundational ontologies, and science-based modelling practices, neither of which is addressed easily by the popular tutorial ontologies (notably: wine and pizza), perhaps because they predate most of the advances made in ontology engineering research. Also, I have noticed that my students replicate examples from the exercises they carry out and from inspecting popular and easy-to-find ontologies. Marking the practical assignments, I got to see sandwich and ice cream and burger ontologies with toppings and value partitions, and software and mobile phone ontologies where laptop models are instances rather than classes. Not providing good and versatile examples holistically, causes the propagation of sub-optimal ontology development at least in the exercises, which then also may affect negatively the development of an operational domain ontology that the graduates may have to develop later on.

I’ve been exploring alternatives and variants over the past 11 years in the ontology engineering courses that I have taught yearly to about 8-40 students/year. In an attempt to systematise and possibly generalise from that, I’ve identified 22 requirements that contribute to a good tutorial ontology, which concern the suitability of the subject domain (7 factors), the ease of demonstrating logics and reasoning tasks (7), and assistance with demonstrating engineering aspects (8). Its details are described in a technical report [1]. I don’t claim that it’s an exhaustive list, but that it is one that may help someone to develop their own tutorial ontology in a fun or interesting topic if they so wish—after all, not everyone is interested in pizzas, wines, African wildlife, pets, shirts, a small university, or Robert Stevens’ family.

I’ve tried out a variety of extant tutorial ontologies as well as a range of versions of the African Wildlife Ontology (AWO) over the years (early experiences), eventually settling for a set of 14 versions, all the way from the example from the Primer [2] to DOLCE- and BFO-aligned to translated in several languages, and some with possible answers to some of the exercises. A graphical rendering of the main classes and relations is shown in the following figure:

The versions of the AWO are summarised in the following table, which is also mentioned as annotation in the OWL files.

The AWO meets a majority of the 22 requirements, is mature by now, and it has been used yearly in an ontology engineering course or tutorial since 2010. Also, it is links up with my ontology engineering textbook with relevant examples and exercises. The AWO provides a wide range of options concerning examples and exercises for ontology engineering well beyond illustrating only logic features and automated reasoning. For instance, it assists in demonstrating tasks about ontology quality, such as alignment to a foundational ontology and satisfying competency questions, versioning, and multilingual ontologies. For instance, it is easier to demonstrate alignment of a class Animal to DOLCE’s (Non-Agentive) Physical Object than, say, debating what Algorithm aligns with or descend into political debates on the gender binary or what constitutes a family. One can use the height or the colours of the plants and animals to discuss how to model attributes as qualities or dependent entities cf. OWL’s data properties or an artificial ValuePartition. Declare, say, de individual lion simba as an instance of Lion, rather than the confusion regarding grape varieties. Use intuitively obvious disjointness between animals and plants, and subsequently easy catches on sensitising modellers to the far-reaching effects of declaring domain and range axioms by first asserting that animals eat animals, and then adding that carnivorous plants eat insects. In addition, it links up easily to topics for ontology integration activities, such as with biodiversity data, wildlife trade, and tourism to create, e.g., an OBDA system with freely available data (e.g., taken from here) or an ontology-enhanced website for an organisation that offers environmentally sustainable safaris. More examples of broad usage options are described in section 2.3 in the tech report.

The AWO is freely available under a CC-BY licence through the textbook’s webpage at https://people.cs.uct.ac.za/~mkeet/OEbook/ in this folder. A more comprehensive description of the requirements, design, and content is described in a technical report [1] for the time being.

References

[1] Keet, CM. The African Wildlife Ontology tutorial ontologies: requirements, design, and content. Technical Report 1905.09519. 23 May 2019. https://arxiv.org/abs/1905.09519.

[2] Antoniou, G., van Harmelen, F. A Semantic Web Primer. MIT Press, USA. 2003.

# DL notation plugin for Protégé 5.x

Once upon a time… the Protégé ontology development environment used Description Logic (DL) symbols and all was well—for some users at least. Then Manchester Syntax came along as the new kid on the block, using hearsay and opinion and some other authors’ preferences for an alternative rendering to the DL notation [1]. Subsequently, everyone who used Protégé was forced to deal with those new and untested keywords in the interface, like ‘some’ and ‘only’ and such, rather than the DL symbols. That had another unfortunate side-effect, being that it hampers internationalisation, for it jumbles up things rather awkwardly when your ontology vocabulary is not in English, like, say, “jirafa come only (oja or ramita)”. Even in the same English-as-first-language country, it turned out that under a controlled set-up, the DL axiom rendering in Protégé fared well in a fairly large sized experiment when compared to the Protégé interface with the sort of Manchester syntax with GUI [2], and also the OWL 2 RL rules rendering appear more positive in another (smaller) experiment [3]. Various HCI factors remain to be examined in more detail, though.

In the meantime, we didn’t fully reinstate the DL notation in Protégé in the way it was in Protégé v3.x from some 15 years ago, but with our new plugin, it will at least render the class expression in DL notation in the tool. This has the benefits that

1. the modeller will receive immediate feedback during the authoring stage regarding a notation that may be more familiar to at least a knowledge engineer or expert modeller;
2. it offers a natural language-independent rendering of the axioms with respect to the constructors, so that people may develop their ontology in their own language if they wish to do so, without being hampered by continuous code switching or the need for localisation; and
3. it also may ease the transition from theory (logics) to implementation for ontology engineering novices.

Whether it needs to be integrated further among more components of the tabs and views in Protégé or other ODEs, is also a question for HCI experts to answer. The code for the DL plugin is open source, so you could extend it if you wish to do so.

The plugin itself is a jar file that can simply be dragged into the plugin folder of a Protégé installation (5.x); see the github repo for details. To illustrate it briefly, after dragging the jar file into the plugin folder, open Protégé, and add it as a view:

Then when you add some new axioms or load an ontology, select a class, and it will render all the axioms in DL notation, as shown in the following two screenshots form different ontologies:

For the sake of illustration, here’s the giraffe that eats only leaves or twigs, in the Spanish version of the African Wildlife Ontology:

The first version of the tool was developed by Michael Harrison and Larry Liu as part of their mini-project for the ontology engineering course in 2017, and it was brushed up for presentation beyond that just now by Michael Harrison (meanwhile an MSc student a CS@UCT), which was supported by a DOT4D grant to improve my textbook on ontology engineering and accompanying educational resources. We haven’t examined all possible ‘shapes’ that a class expression can take, but it definitely processes the commonly used features well. At the time of writing, we haven’t detected any errors.

p.s.: if you want your whole ontology exported at once in DL notation and to latex, for purposes of documentation generation, that is a different usage scenario and is already possible [4].

p.p.s.: if you want more DL notation, please let me know, and I’ll try to find more resources to make a v2 with more features.

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] E. Alharbi, J. Howse, G. Stapleton, A. Hamie and A. Touloumis. The efficacy of OWL and DL on user understanding of axioms and their entailments. The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudre-Mauroux, J. Sequeda, C. Lange and J. He (eds.). Springer 2017, pp20-36.

[3] M. K. Sarker, A. Krisnadhi, D. Carral and P. Hitzler, Rule-based OWL modeling with ROWLtab Protégé plugin. Proceedings of ESWC’17, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig (eds.). Springer. 2017, pp 419-433.

[4] Cogan Shimizu, Pascal Hitzler, Matthew Horridge: Rendering OWL in Description Logic Syntax. ESWC (Satellite Events) 2017. Springer LNCS. pp109-113

# Some experiences on making a textbook available

I did make available a textbook on ontology engineering for free in July 2018. Meanwhile, I’ve had several “why did you do this and not a proper publisher??!?” I had tried to answer that already in the textbook’s FAQ. Turns out that that short answer may be a bit too short after all. So, here follows a bit more about that.

The main question I tried to answer in the book’s FAQ was “Would it not have been better with a ‘proper publisher’?” and the answer to that was:

Probably. The layout would have looked better, for sure. There are several reasons why it isn’t. First and foremost, I think knowledge should be free, open, and shared. I also have benefited from material that has been made openly available, and I think it is fair to continue contributing to such sharing. Also, my current employer pays me sufficient to live from and I don’t think it would sell thousands of copies (needed for making a decent amount of money from a textbook), so setting up such a barrier of high costs for its use does not seem like a good idea. A minor consideration is that it would have taken much more time to publish, both due to the logistics and the additional reviewing (previous multi-author general textbook efforts led to nothing due to conflicting interests and lack of time, so I unlikely would ever satisfy all reviewers, if they would get around reading it), yet I need the book for the next OE installment I will teach soon.

Ontology Engineering (OE) is listed as an elective in the ACM curriculum guidelines. Yet, it’s suited best for advanced undergrad/postgrad level because of the prerequisites (like knowing the basics of databases and conceptual modeling). This means there won’t be big 800-students size classes all over the world lining up for OE. I guess it would not go beyond some 500-1000/year throughout the world (50 classes of 10-20 computer science students), and surely not all classes would use the textbook. Let’s say, optimistically, that 100 students/year would be asked to use the book.

With that low volume in mind, I did look up the cost of similar books in the same and similar fields with the ‘regular’ academic publishers. It doesn’t look enticing for either the author or the student. For instance this one from Springer and that one from IGI Global are all still >100 euro. for. the. eBook., and they’re the cheap ones (not counting the 100-page ‘silver bullet’ book). Handbooks and similar on ontologies, e.g., this and that one are offered for >200 euro (eBook). Admitted there’s the odd topical book that’s cheaper and in the 50-70 euro range here and there (still just the eBook) or again >100 as well, for a, to me, inexplicable reason (not page numbers) for other books (like these and those). There’s an option to publish a textbook with Springer in open access format, but that would cost me a lot of money, and UCT only has a fund for OA journal papers, not books (nor for conference papers, btw).

IOS press does not fare much better. For instance, a softcover version in the studies on semantic web series, which is their cheapest range, would be about 70 euro due to number of pages, which is over R1100, and so again above budget for most students in South Africa, where the going rate is that a book would need to be below about R600 for students to buy it. A plain eBook or softcover IOS Press not in that series goes for about 100 euro again, i.e., around R1700 depending on the exchange rate—about three times the maximum acceptable price for a textbook.

The MIT press BFO eBook is only R425 on takealot, yet considering other MIT press textbooks there, with the size of the OE book, it then would be around the R600-700. Oxford University Press and its Cambridge counterpart—that, unlike MIT press, I had checked out when deciding—are more expensive and again approaching 80-100 euro.

One that made me digress for a bit of exploration was Macmillan HE, which had an “Ada Lovelace day 2018” listing books by female authors, but a logics for CS book was again at some 83 euros, although the softer area of knowledge management for information systems got a book down to 50 euros, and something more popular, like a book on linguistics published by its subsidiary “Red Globe Press”, was down to even ‘just’ 35 euros. Trying to understand it more, Macmillan HE’s “about us” revealed that “Macmillan International Higher Education is a division of Macmillan Education and part of the Springer Nature Group, publishers of Nature and Scientific American.” and it turns out Macmillan publishes through Red Globe Press. Or: it’s all the same company, with different profit margins, and mostly those profit margins are too high to result in affordable textbooks, whichever subsidiary construction is used.

So, I had given up on the ‘proper publisher route’ on financial grounds, given that:

• Any ontology engineering (OE) book will not sell large amounts of copies, so it will be expensive due to relatively low sales volume and I still will not make a substantial amount from royalties anyway.
• Most of the money spent when buying a textbook from an established publisher goes to the coffers of the publisher (production costs etc + about 30-40% pure profit [more info]). Also, scholarships ought not to be indirect subsidy schemes for large-profit-margin publishers.
• Most publishers would charge an amount of money for the book that would render the book too expensive for my own students. It’s bad enough when that happens with other textbooks when there’s no alternative, but here I do have direct and easy-to-realise agency to avoid such a situation.

Of course, there’s still the ‘knowledge should be free’ etc. argument, but this was to show that even if one were not to have that viewpoint, it’s still not a smart move to publish the textbook with the well-known academic publishers, even more so if the topic isn’t in the core undergraduate computer science curriculum.

Interestingly, after ‘publishing’ it on my website and listing it on OpenUCT and the Open Textbook Archive—I’m certainly not the only one who had done a market analysis or has certain political convictions—one colleague pointed me to the non-profit College Publications that aims to “break the monopoly that commercial publishers have” and another colleague pointed me to UCT press. I had contacted both, and the former responded. In the meantime, the book has been published by CP and is now also listed on Amazon for just \$18 (about 16 euro) or some R250 for the paperback version—whilst the original pdf file is still freely available—or: you pay for production costs of the paperback, which has a slightly nicer layout and the errata I knew of at the time have been corrected.

I have noticed that some people don’t take the informal self publishing seriously—even below the so-called ‘vanity publishers’ like Lulu—notwithstanding the archives to cater for it, the financial take on the matter, the knowledge sharing argument, and the ‘textbooks for development’ in emerging economies angle of it. So, I guess no brownie points from them then and, on top of that, my publication record did, and does, take a hit. Yet, writing a book, as an activity, is a nice and rewarding change from just churning out more and more papers like a paper production machine, and I hope it will contribute to keeping the OE research area alive and lead to better ontologies in ontology-driven information systems. The textbook got its first two citations already, the feedback is mostly very positive, readers have shared it elsewhere (reddit, ungule.it, Open Libra, Ebooks directory, and other platforms), and I recently got some funding from the DOT4D project to improve the resources further (for things like another chapter, new exercises, some tools development to illuminate the theory, a proofreading contest, updating the slides for sharing, and such). So, overall, if I had to make the choice again now, I’d still do it again the way I did. Also, I hope more textbook authors will start seeing self-publishing, or else non-profit, as a good option. Last, the notion of open textbooks is gaining momentum, so you even could become a trendsetter and be fashionable 😉

# ISAO 2018, Cape Town, ‘trip’ report

The Fourth Interdisciplinary School on Applied Ontology has just come to an end, after five days of lectures, mini-projects, a poster session, exercises, and social activities spread over six days from 10 to 15 September in Cape Town on the UCT campus. It’s not exactly fair to call this a ‘trip report’, as I was the local organizer and one of the lecturers, but it’s a brief recap ‘trip report kind of blog post’ nonetheless.

The scientific programme consisted of lectures and tutorials on:

The linked slides (titles of the lectures, above) reveal only part of the contents covered, though. There were useful group exercises and plenary discussion with the ontological analysis of medical terms such as what a headache is, a tooth extraction, blood, or aspirin, an exercises on putting into practice the design process of a conceptual modelling language of one’s liking (e.g.: how to formalize flowcharts, including an ontological analysis of what those elements are and ontological commitments embedded in a language), and trying to prove some theorems of parthood theories.

There was also a session with 2-minute ‘blitztalks’ by participants interested in briefly describing their ongoing research, which was followed by an interactive poster session.

It was the first time that an ISAO had mini-projects, which turned out to have had better outcomes than I expected, considering the limited time available for it. Each group had to pick a term and investigate what it meant in the various disciplines (task description); e.g.: what does ‘concept’ or ‘category’ mean in psychology, ontology, data science, and linguistics, and ‘function’ in manufacturing, society, medicine, and anatomy? The presentations at the end of the week by each group were interesting and most of the material presented there easily could be added to the IAOA Education wiki’s term list (an activity in progress).

What was not a first-time activity, was the Ontology Pub Quiz, which is a bit of a merger of scientific programme and social activity. We created a new version based on questions from several ISAO’18 lecturers and a few relevant questions created earlier (questions and answers; we did only questions 1-3,6-7). We tried a new format compared to the ISAO’16 quiz and JOWO’17 quiz: each team had 5 minutes to answer a set of 5 questions, and another team marked the answers. This set-up was not as hectic as the other format, and resulted in more within-team interaction cf. among all participants interaction. As in prior editions, some questions and answers were debatable (and there’s still the plan to make note of that and fix it—or you could write an article about it, perhaps :)). The students of the winning team received 2 years free IAOA membership (and chocolate for all team members) and the students of the other two teams received one year free IAOA membership.

Impression of part of the poster session area, moving into the welcome reception

As with the three previous ISAO editions, there was also a social programme, which aimed to facilitate getting to know one another, networking, and have time for scientific conversations. On the first day, the poster session eased into a welcome reception (after a brief wine lapse in the coffee break before the blitztalks). The second day had an activity to stretch the legs after the lectures and before the mini-project work, which was a Bachata dance lesson by Angus Prince from Evolution Dance. Not everyone was eager at the start, but it turned out an enjoyable and entertaining hour. Wednesday was supposed to be a hike up the iconic Table Mountain, but of all the dry days we’ve had here in Cape Town, on that day it was cloudy and rainy, so an alternative plan of indoor chocolate tasting in the Biscuit Mill was devised and executed. Thursday evening was an evening off (from scheduled activities, at least), and Friday early evening we had the pub quiz in the UCT club (the campus pub). Although there was no official planning for Saturday afternoon after the morning lectures, there was again an attempt at Table Mountain, concluding the week.

The participants came from all over the world, including relatively many from Southern Africa with participants coming also from Botswana and Mauritius, besides several universities in South Africa (UCT, SUN, CUT). I hope everyone has learned something from the programme that is or will be of use, enjoyed the social programme, and made some useful new contacts and/or solidified existing ones. I look forward to seeing you all at the next ISAO or, better, FOIS, in 2020 in Bolzano, Italy.

Finally, as a non-trip-report comment from my local chairing viewpoint: special thanks go to the volunteers Zubeida Khan for the ISAO website, Zola Mahlaza and Michael Harrison for on-site assistance, and Sam Chetty for the IT admin.

# An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

• Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
• Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
• Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

# Orchestrating 28 logical theories of mereo(topo)logy

Parts and wholes, again. This time it’s about the logic-aspects of theories of parthood (cf. aligning different hierarchies of (part-whole) relations and make them compatible with foundational ontologies). I intended to write this post before the Ninth Conference on Knowledge Capture (K-CAP 2017), where the paper describing the new material would be presented by my co-author, Oliver Kutz. Now, afterwards, I can add that “Orchestrating a Network of Mereo(topo) logical Theories” [1] even won the Best Paper Award. The novelties, in broad strokes, are that we figured out and structured some hitherto messy and confusing state of affairs, showed that one can do more than generally assumed especially with a new logics orchestration framework, and we proposed first steps toward conflict resolution to sort out expressivity and logic limitations trade-offs. Constructing a tweet-size “tl;dr” version of the contents is not easy, and as I have as much space here on my blog as I like, it ended up to be three paragraphs here: scene-setting, solution, and a few examples to illustrate some of it.

Problems

As ontologists know, parthood is used widely in ontologies across most subject domains, such as biomedicine, geographic information systems, architecture, and so on. Ontology (the philosophers) offer a parthood relation that has a bunch of computationally unpleasant properties that are structured in a plethora of mereologicial and meretopological theories such that it has become hard to see the forest for the trees. This is then complicated in practice because there are multiple logics of varying expressivity (support more or less language features), with the result that only certain fragments of the mereo(topo)logical theories can be represented. However, it’s mostly not clear what can be used when, during the ontology authoring stage one may want to have all those features so as to check correctness, and it’s not easy to predict what will happen when one aligns ontologies with different fragments of mereo(topo)logy.

Solution

We solved these problems by specifying a structured network of theories formulated in multiple logics that are glued together by the various linking constructs of the Distributed Ontology, Model, and Specification Language (DOL). The ‘structured network of theories’-part concerns all the maximal expressible fragments of the KGEMT mereotopological theory and five of its most well-recognised sub-theories (like GEM and MT) in the seven Description Logics-based OWL species, first-order logic, and higher order logic. The ‘glued together’-part refers to relating the resultant 28 theories within DOL (in Ontohub), which is a non-trivial (understatement, unfortunately) metalanguage that has the constructors for the glue, such as enabling one to declare to merge two theories/modules represented in different logics, extending a theory (ontology) with axioms that go beyond that language without messing up the original (expressivity-restricted) ontology, and more. Further, because the annoying thing of merging two ontologies/modules can be that the merged ontology may be in a different language than the two original ones, which is very hard to predict, we have a cute proof-of-concept tool so that it assists with steps toward resolution of language feature conflicts by pinpointing profile violations.

Examples

The paper describes nine mechanisms with DOL and the mereotopological theories. Here I’ll start with a simple one: we have Minimal Topology (MT) partially represented in OWL 2 EL/QL in “theory8” where the connection relation (C) is just reflexive (among other axioms; see table in the paper for details). Now what if we add connection’s symmetry, which results in “theory4”? First, we do this by not harming theory8, in DOL syntax (see also the ESSLI’16 tutorial):

```logic OWL2.QL
ontology theory4 =
theory8
then
ObjectProperty: C Characteristics: Symmetric %(t7)```

What is the logic of theory4? Still in OWL, and if so, which species? The Owl classifier shows the result:

Another case is that OWL does not let one define an object property; at best, one can add domain and range axioms and the occasional ‘characteristic’ (like aforementioned symmetry), for allowing arbitrary full definitions pushes it out of the decidable fragment. One can add them, though, in a system that can handle first order logic, such as the Heterogeneous toolset (Hets); for instance, where in OWL one can add only “overlap” as a primitive relation (vocabulary element without definition), we can take such a theory and declare that definition:

```logic CASL.FOL
ontology theory20 =
theory6_plus_antisym_and_WS
then %wdef
. forall x,y:Thing . O(x,y) <=> exists z:Thing (P(z,x) /\ P(z,y)) %(t21)
. forall x,y:Thing . EQ(x,y) <=> P(x,y) /\ P(y,x) %(t22)```

As last example, let me illustrate the notion of the conflict resolution. Consider theory19—ground mereology, partially—that is within OWL 2 EL expressivity and theory18—also ground mereology, partially—that is within OWL 2 DL expressivity. So, they can’t be the same; the difference is that theory18 has parthood reflexive and transitive and proper parthood asymmetric and irreflexive, whereas theory19 has both parthood and proper parthood transitive. What happens if one aligns the ontologies that contain these theories, say, O1 (with theory18) and O2 (with theory19)? The Owl classifier provides easy pinpointing and tells you the profile: OWL 2 full (or: first order logic, or: beyond OWL 2 DL—top row) and why (bottom section):

Now, what can one do? The conflict resolution cannot be fully automated, because it depends on what the modeller wants or needs, but there’s enough data generated already and there are known trade-offs so that it is possible to describe the consequences:

• Choose the O1 axioms (with irreflexivity and asymmetry on proper part of), which will make the ontology interoperable with other ontologies in OWL 2 DL, FOL or HOL.
• Choose O2’s axioms (with transitivity on part of and proper part of), which will facilitate linking to ontologies in OWL 2 RL, 2 EL, 2 DL, FOL, and HOL.
• Choose to keep both sets will result in an OWL 2 Full ontology that is undecidable, and it is then compatible only with FOL and HOL ontologies.

As serious final note: there’s still fun to be had on the logic side of things with countermodels and sub-networks and such, and with refining the conflict resolution to assist ontology engineers better. (or: TBC)

As less serious final note: the working title of early drafts of the paper was “DOLifying mereo(topo)logy”, but at some point we chickened out and let go of that frivolity.

References

[1] Keet, C.M., Kutz, O. Orchestrating a Network of Mereo(topo)logical Theories. Ninth International Conference on Knowledge Capture (K-CAP’17), Austin, Texas, USA, December 4-6, 2017. ACM Proceedings.