# An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

• Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
• Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
• Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

# Orchestrating 28 logical theories of mereo(topo)logy

Parts and wholes, again. This time it’s about the logic-aspects of theories of parthood (cf. aligning different hierarchies of (part-whole) relations and make them compatible with foundational ontologies). I intended to write this post before the Ninth Conference on Knowledge Capture (K-CAP 2017), where the paper describing the new material would be presented by my co-author, Oliver Kutz. Now, afterwards, I can add that “Orchestrating a Network of Mereo(topo) logical Theories” [1] even won the Best Paper Award. The novelties, in broad strokes, are that we figured out and structured some hitherto messy and confusing state of affairs, showed that one can do more than generally assumed especially with a new logics orchestration framework, and we proposed first steps toward conflict resolution to sort out expressivity and logic limitations trade-offs. Constructing a tweet-size “tl;dr” version of the contents is not easy, and as I have as much space here on my blog as I like, it ended up to be three paragraphs here: scene-setting, solution, and a few examples to illustrate some of it.

Problems

As ontologists know, parthood is used widely in ontologies across most subject domains, such as biomedicine, geographic information systems, architecture, and so on. Ontology (the philosophers) offer a parthood relation that has a bunch of computationally unpleasant properties that are structured in a plethora of mereologicial and meretopological theories such that it has become hard to see the forest for the trees. This is then complicated in practice because there are multiple logics of varying expressivity (support more or less language features), with the result that only certain fragments of the mereo(topo)logical theories can be represented. However, it’s mostly not clear what can be used when, during the ontology authoring stage one may want to have all those features so as to check correctness, and it’s not easy to predict what will happen when one aligns ontologies with different fragments of mereo(topo)logy.

Solution

We solved these problems by specifying a structured network of theories formulated in multiple logics that are glued together by the various linking constructs of the Distributed Ontology, Model, and Specification Language (DOL). The ‘structured network of theories’-part concerns all the maximal expressible fragments of the KGEMT mereotopological theory and five of its most well-recognised sub-theories (like GEM and MT) in the seven Description Logics-based OWL species, first-order logic, and higher order logic. The ‘glued together’-part refers to relating the resultant 28 theories within DOL (in Ontohub), which is a non-trivial (understatement, unfortunately) metalanguage that has the constructors for the glue, such as enabling one to declare to merge two theories/modules represented in different logics, extending a theory (ontology) with axioms that go beyond that language without messing up the original (expressivity-restricted) ontology, and more. Further, because the annoying thing of merging two ontologies/modules can be that the merged ontology may be in a different language than the two original ones, which is very hard to predict, we have a cute proof-of-concept tool so that it assists with steps toward resolution of language feature conflicts by pinpointing profile violations.

Examples

The paper describes nine mechanisms with DOL and the mereotopological theories. Here I’ll start with a simple one: we have Minimal Topology (MT) partially represented in OWL 2 EL/QL in “theory8” where the connection relation (C) is just reflexive (among other axioms; see table in the paper for details). Now what if we add connection’s symmetry, which results in “theory4”? First, we do this by not harming theory8, in DOL syntax (see also the ESSLI’16 tutorial):

logic OWL2.QL
ontology theory4 =
theory8
then
ObjectProperty: C Characteristics: Symmetric %(t7)

What is the logic of theory4? Still in OWL, and if so, which species? The Owl classifier shows the result:

Another case is that OWL does not let one define an object property; at best, one can add domain and range axioms and the occasional ‘characteristic’ (like aforementioned symmetry), for allowing arbitrary full definitions pushes it out of the decidable fragment. One can add them, though, in a system that can handle first order logic, such as the Heterogeneous toolset (Hets); for instance, where in OWL one can add only “overlap” as a primitive relation (vocabulary element without definition), we can take such a theory and declare that definition:

logic CASL.FOL
ontology theory20 =
theory6_plus_antisym_and_WS
then %wdef
. forall x,y:Thing . O(x,y) <=> exists z:Thing (P(z,x) /\ P(z,y)) %(t21)
. forall x,y:Thing . EQ(x,y) <=> P(x,y) /\ P(y,x) %(t22)

As last example, let me illustrate the notion of the conflict resolution. Consider theory19—ground mereology, partially—that is within OWL 2 EL expressivity and theory18—also ground mereology, partially—that is within OWL 2 DL expressivity. So, they can’t be the same; the difference is that theory18 has parthood reflexive and transitive and proper parthood asymmetric and irreflexive, whereas theory19 has both parthood and proper parthood transitive. What happens if one aligns the ontologies that contain these theories, say, O1 (with theory18) and O2 (with theory19)? The Owl classifier provides easy pinpointing and tells you the profile: OWL 2 full (or: first order logic, or: beyond OWL 2 DL—top row) and why (bottom section):

Now, what can one do? The conflict resolution cannot be fully automated, because it depends on what the modeller wants or needs, but there’s enough data generated already and there are known trade-offs so that it is possible to describe the consequences:

• Choose the O1 axioms (with irreflexivity and asymmetry on proper part of), which will make the ontology interoperable with other ontologies in OWL 2 DL, FOL or HOL.
• Choose O2’s axioms (with transitivity on part of and proper part of), which will facilitate linking to ontologies in OWL 2 RL, 2 EL, 2 DL, FOL, and HOL.
• Choose to keep both sets will result in an OWL 2 Full ontology that is undecidable, and it is then compatible only with FOL and HOL ontologies.

As serious final note: there’s still fun to be had on the logic side of things with countermodels and sub-networks and such, and with refining the conflict resolution to assist ontology engineers better. (or: TBC)

As less serious final note: the working title of early drafts of the paper was “DOLifying mereo(topo)logy”, but at some point we chickened out and let go of that frivolity.

References

[1] Keet, C.M., Kutz, O. Orchestrating a Network of Mereo(topo)logical Theories. Ninth International Conference on Knowledge Capture (K-CAP’17), Austin, Texas, USA, December 4-6, 2017. ACM Proceedings.

# Figuring out the verbalisation of temporal constraints in ontologies and conceptual models

Temporal conceptual models, ontologies, and their logics are nothing new, but that sort of information and knowledge representation still doesn’t gain a lot of traction (cf. say, formal methods for verification). This is in no small part because modelling temporal information is not easy. Several conceptual modelling languages do have various temporal extensions, but most modellers don’t even use all of the default language features yet [1]. How could one at least reduce the barrier to adoption of temporal logics and modelling languages? The two principle approaches are visualisation with a diagrammatic language and rendering it in a (pseudo-)natural language. One of my postgraduate students looked at the former, trying to figure out what would be the best icons and such, which showed there was still a steep learning curve [2]. Before examining whether that could be optimised, I wondered whether the natural language option might be promising. The problem was, that no-one had yet tried to determine what the natural language counterpart of the temporal constraints were supposed to be, let alone whether they be ‘adequate’ or the ‘best’ way of rendering the temporal constraints in tolerable natural language sentences. I wanted to know that badly enough that I tried to find out.

Given that using templates is a tried-and-tested relatively successful approach for atemporal conceptual models and ontologies (e.g., for ORM, the ACE system), it makes sense to do something similar, but then for some temporal extension. As temporal conceptual modelling language I used one that has a Description Logics foundation (DLRUS [3,4]) for that easily links to ontologies as well, added a few known temporal constraints (like for relationships/DL roles, mandatory) and removing others (some didn’t seem all that interesting), which resulted in 34 constraints, still. For each one, I tried to devise more and less reasonable templates, resulting in 101 templates overall. Those templates were evaluated on semantics and preference by three temporal logic experts and five ‘mixed experts’ (experts in natural language generation, logic, or modelling). This resulted in a final set of preferred templates to verbalise the temporal constraints. The remainder of this post first describes a bit about the templates and then the results of which I think they are most interesting.

Templates

The basic idea of a template—in the context of the verbalisation of conceptual models and ontologies—is to have some natural language for the constraint where then the vocabulary gets slotted in at runtime. Take, for instance, simple named class subsumption in an ontology, $C \sqsubseteq D$, for which one could define a template “Each [C] is a(n) [D]”, so that with some axiom $Manager \sqsubseteq Employee$, it would generate the sentence “Each Manager is an Employee”. One also could have devised the template “All [C] are [D]” and then it would have generated “All Managers are Employees”. The choice between the two templates in this case is just taste, for in both cases, the semantics is the same. More complex axioms are not always that straightforward. For instance, for the axiom type $C \sqsubseteq \exists R.D$, would “Each [C] [R] some [D]” be good enough, or would perhaps “Each [C] must [R] at least one [D]” be better? E.g., “Each Professor teaches some Course” vs “Each Professor must teach at least one Course”.

The same can be done for the temporal constraints. To get there, I did a bit of a linguistic detour that informed the template design (described in the paper [5]). Let us take as first example for templates temporal class that has a semantics of $o \in C^{\mathcal{I}(t)} \rightarrow \exists t' \neq t. o \notin C^{\mathcal{I}(t')}$; for instance, UndergraduateStudent (assuming they graduate and end up as alumni or as drop outs, and weren’t undergrads from birth):

1. If an object is an instance of entity type [C], then there is some time where it is not a(n) [C].
2. [C] is an entity type whose objects are, for some time in their existence, not instances of [C].
3. [C] is an entity type of which each object is not a(n) [C] for some time during its existence.
4. All instances of entity type [C] are not a(n) [C] for some time.
5. Each [C] is not a(n) [C] for some time.
6. Each [C] is for some time not a(n) [C].

Which one(s) do you think captures the semantics, and which one(s) do you prefer?

A more elaborate constraint for relationships is ‘dynamic extension for relationships, past, mandatory], which is formalised as $\langle o , o' \rangle \in \mbox{{\sc RDexM}-}_{R_1,R_2}^{\mathcal{I}(t)} \rightarrow (\langle o , o' \rangle \in{\tt R_1}^{\mathcal{I}(t)} \rightarrow \exists t' where $\langle o , o' \rangle \in \mbox{{\sc RDex}}_{R_1,R_2}^{\mathcal{I}(t)} \rightarrow ( \langle o , o' \rangle \in{\tt R_1}^{\mathcal{I}(t)} \rightarrow \exists t'>t. \langle o , o' \rangle \in {\tt R_2}^{\mathcal{I}(t')})$.; e.g., every passenger who boards a flight must have checked in for that flight. Two options could be:

1. Each ..C_1.. ..R_1.. ..C_2.. was preceded by ..C_1.. ..R_2.. ..C_2.. some time earlier.
2. Each ..C_1.. ..R_1.. ..C_2.. must be preceded by ..C_1.. ..R_2.. ..C_2.. .

I’m not saying they are all correct; they were some of the options given, which the participants could choose from and comment on. The full list of constraints and template options are available in the supplementary material, which also contains a file where you can fill in your own answers, see what the (anonymised) participants said, and it has the final list of ‘best’ constraints.

Results

The main aggregate quantitative results are shown in the following table.

Many observations can be made from the data (see the paper for details). Some of the salient aspects are that there was low inter-annotator agreement among the experts, despite that they know each other (temporal logics is a small community) and that the ‘mixed group’ deemed many sentences correct that the experts deemed wrong in the sense of not properly capturing the semantics of the constraint. Put differently, it looks like the mixed experts, as a group, did not fully grasp some subtle distinction in the temporal constraints.

With respect to the templates, the preferred ones don’t follow the structure of the logic, but are, in a way, a separate rendering, or: there’s no neat 1:1 mapping between axiom type and template structure. That said, that doesn’t mean that they always chose the shortest template: the experts definitely did not, while the mixed experts leaned a bit toward preferring templates with fewer words even though they were surely not always the semantically correct option.

It may not look good that the experts preferred different templates, but in a follow-up interview with one of the experts, the expert noted that it was not really a problem “for there is the logic that does have the precise meaning anyway” and thus “resolves any confusion that may arise from using slightly different terminology”. The temporal logic expert does have a point from the expert’s view, fair enough, but that pretty much defeats my aim with the experiment. Asking more non-experts may not be a good strategy either, for they are, on average, too lenient.

So, for now, we do have a set of, relatively, ‘best’ templates to verbalise temporal constraints in temporal conceptual models and ontologies. The next step is to compare that with the diagrammatic representation. This we did [6], and I’ll describe those results informally in a next post.

I’ll present more details at the upcoming CREOL: Contextual Representation of Events and Objects in Language Workshop that is part of the Joint Ontology Workshops 2017, which will be held next week (21-23 September) in Bolzano, Italy. As the KRDB group at FUB in Bolzano has a few temporal logic experts, I’m looking forward to the discussions! Also, I’d be happy if you would be willing to fill in the spreadsheet with your preferences (before looking at the answers given by the participants!), and send them to me.

References

[1] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Johannesson, P., Lee, M.L. Liddle, S.W., Opdahl, A.L., Pastor López, O. (Eds.). Springer LNCS vol 9381, 585-593. 19-22 Oct, Stockholm, Sweden.

[2] T. Shunmugam. Adoption of a visual model for temporal database representation. M. IT thesis, Department of Computer Science, University of Cape Town, South Africa, 2016.

[3] A. Artale, E. Franconi, F. Wolter, and M. Zakharyaschev. A temporal description logic for reasoning about conceptual schemas and queries. In S. Flesca, S. Greco, N. Leone, and G. Ianni, editors, Proceedings of the 8th Joint European Conference on Logics in Artificial Intelligence (JELIA-02), volume 2424 of LNAI, pages 98-110. Springer Verlag, 2002.

[4] A. Artale, C. Parent, and S. Spaccapietra. Evolving objects in temporal information systems. Annals of Mathematics and Artificial Intelligence, 50(1-2):5-38, 2007.

[5] Keet, C.M. Natural language template selection for temporal constraints. CREOL: Contextual Representation of Events and Objects in Language, Joint Ontology Workshops 2017, 21-23 September 2017, Bolzano, Italy. CEUR-WS Vol. (in print).

[6] Keet, C.M., Berman, S. Determining the preferred representation of temporal constraints in conceptual models. 36th International Conference on Conceptual Modeling (ER’17). Springer LNCS. 6-9 Nov 2017, Valencia, Spain. (in print)

# Our ESWC17 demos: TDDonto2 and an OWL verbaliser for isiZulu

Besides the full paper on heterogeneous alignments for 14th Extended Semantic Web Conference (ESWC’17) that will take place next week in Portoroz, Slovenia, we also managed to squeeze out two demo papers. You may already know of TDDonto2 with Kieren Davies and Agnieszka Lawrynowicz, which was discussed in an earlier post that has been updated with a tutorial video. It now has a demo paper as well [1], which describes the rationale and a few scenarios. The other demo, with Musa Xakaza and Langa Khumalo, is new-new, but the regular reader might have seen it coming: we finally managed to link the verbalisation patterns for certain Description Logic axiom types [2,3] to those in OWL ontologies. The tool takes as input an ontology in isiZulu and the verbalisation algorithms, and out come the isiZulu sentences, be this in plain text for further processing or in a GUI for inspection by a domain expert [4]. There is a basic demo-screencast to show it’s all working.

The overall architecture may be of interest, for it deviates from most OWL verbalisers. It is shown in the following figure:

For instance, we use the Python-based OWL API Owlready, rather than a Java-based app, for Python is rather popular in NLP and the verbalisation algorithms may be used elsewhere as well. We made more such decisions with the aim to make whatever we did as multi-purpose usable as possible, like the list of nouns with noun classes (surprisingly, and annoyingly, there is no such readily available list yet, though isizulu.net probably will have it somewhere but inaccessible), verb roots, and exceptions in pluralisation. (Problems for integrating the verbaliser with, say, Protégé will be interesting to discuss during the demo session!)

The text-based output doesn’t look as nice as the GUI interface, so I will show here only the GUI interface, which is adorned with some annotations to illustrate that those verbalisation algorithms in the background are far from trivial templates:

For instance, while in English the universal quantification is always ‘Each’ or ‘All’ regardless the named class quantified over, in isiZulu it depends on the noun class of the noun that is the name of the OWL class. For instance, in the figure above, izingwe ‘leopards’ is in noun class 10, so the ‘Each/All’ is Zonke, amavazi ‘vases’ is in noun class 6, so ‘Each/All’ then becomes Onke, and abantu ‘people’/’humans’ is in noun class 2, making Bonke. There are 17 noun classes. They also determine the subject concords (SC, alike conjugation) for the verbs, with zi- for noun class 10, ­a- for noun class 6, and ba- for noun class 2, to name a few. How this all works is described in [2,3]. We’ve implemented all those algorithms and integrated the pluraliser [5] in it to make it work. The source files are available to check and play with already, you can do so and ask us during the ESWC17 demo session, and/or also have a look at the related outputs of the NRF-funded project Grammar Engine for Nguni natural language interfaces (GeNi).

References

[1] Davies, K. Keet, C.M., Lawrynowicz, A. TDDonto2: A Test-Driven Development Plugin for arbitrary TBox and ABox axioms. Extended Semantic Web Conference (ESWC’17), Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (demo paper)

[2] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157.

[3] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. 9th International Natural Language Generation conference (INLG’16), 5-8 September, 2016, Edinburgh, UK. Association for Computational Linguistics, 174-183.

[4] Keet, C.M. Xakaza, M., Khumalo, L. Verbalising OWL ontologies in isiZulu with Python. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (demo paper)

[5] Byamugisha, J., Keet, C.M., Khumalo, L. Pluralising Nouns in isiZulu and Related Languages. 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’16), Springer LNCS. April 3-9, 2016, Konya, Turkey.

# Relations with roles / verbalising object properties in isiZulu

The narratives can be very different for the paper “A model for verbalising relations with roles in multiple languages” that was recently accepted paper at the 20th International Conference on Knowledge Engineering and Knowledge management (EKAW’16), for the paper makes a nice smoothie of the three ingredients of language, logic, and ontology. The natural language part zooms in on isiZulu as use case (possibly losing some ontologist or logician readers), then there are the logics about mapping the Description Logic DLR’s role components with OWL (lose possible interest of the natural language researchers), and a bit of philosophy (and lose most people…). It solves some thorny issues when trying to verbalise complicated verbs that we need for knowledge-to-text natural language generation in isiZulu and some other languages (e.g., German). And it solves the matching of logic-based representations popularised in mainly UML and ORM (that typically uses a logic in the DLR family of Description Logic languages) with the more commonly used OWL. The latter is even implemented as a Protégé plugin.

Let me start with some use-cases that cause problems that need to be solved. It is well-known that natural language renderings of ontologies facilitate communication with domain experts who are expected to model and validate the represented knowledge. This is doable for English, with ACE in the lead, but it isn’t for grammatically richer languages. There, there are complications, such as conjugation of verbs, an article that may be dependent on the preposition, or a preposition may modify the noun. For instance, works for, made by, located in, and is part of are quite common names for object properties in ontologies. They all do have a dependent preposition, however, there are different verb tenses, and the latter has a copulative and noun rather than just a verb. All that goes into the object properties name in an ‘English-based ontology’ and does not really have to be processed further in ontology verbalisation other than beautification. Not so in multiple other languages. For instance, the ‘in’ of located in ends up as affixes to the noun representing the object that the other object is located in. Like, imvilophu ‘envelope’ and emvilophini ‘in the envelope’ (locative underlined). Even something straightforward like a property eats can end up having to be conjugated differently depending on who’s eating: when a human eats, it is udla in isiZulu, but for, say, a dog, it is idla (modification underlined), which is driven by the system of noun classes, of which there are 17 in isiZulu. Many more examples illustrating different issues are described in the paper. To make a long story short, there are gradations in complicating effects, from no effect where a preposition can be squeezed in with the verb in naming an OP, to phonological conditioning, to modifying the article of the noun to modifying the noun. A ‘3rd pers. sg.’ may thus be context-dependent, and notions of prepositions may modify the verb or the noun or the article of the noun, or both. For a setting other than English ontologies (e.g., Greek, German, Lithuanian), a preposition may belong neither to the verb nor to the noun, but instead to the role that the object plays in the relation described by the verb in the sentence. For instance, one obtains yomuntu, rather than the basic noun umuntu, if it plays the role of the whole in a part-whole relation like in ‘heart is part of a human’ (inhliziyo iyingxenye yomuntu).

The question then becomes how to handle such a representation that also has to include roles? This is quite common in conceptual data modelling languages and in the DLR family of DL languages, which is known in ontology as positionalism [2]. Bumping up the role to an element in the representation language—thus, in addition to the relationship—enables one to attach information to it, like whether there is a (deep) preposition associated with it, the tense, or the case. Such role-based annotations can then be used to generate the right element, like einen Betrieb ‘some company’ to adjust the article for the case it goes with in German, or ya+umuntu=yomuntu ‘of a human’, modifying the noun in the object position in the sentence.

To get this working properly, with a solid theoretical foundation, we reused a part of the conceptual modelling languages’ metamodel [3] to create a language model for such annotations, in particular regarding the attributes of the classes in the metamodel. On its own, however, it is rather isolated and not immediately useful for ontologies that we set out to be in need of verbalising. To this end, it links to the ‘OWL way of representing relations’ (ontologically: the so-called standard view), and we separate out the logic-based representation from the readings that one can generate with the structured representation of the knowledge. All in all, the simplified high-level model looks like the picture below.

Simplified diagram in UML Class Diagram notation of the main components (see paper for attributes), linking a section of the metamodel (orange; positionalist commitment) to predicates (green; standard view) and their verbalisation (yellow). (Source: [1])

That much for the conceptual part; more details are described in the paper.

Just a fluffy colourful diagram isn’t enough for a solid implementation, however. To this end, we mapped one of the logics that adhere to positionalism to one of the standard view, being DLR [4] and OWL, respectively. It equally well could have been done for other pairs of languages (e.g., with Common Logic), but these two are more popular in terms of theory and tools.

Having the conceptual and logical foundations in place, we did implement it to see whether it actually can be done and to check whether the theory was sufficient. The Protégé plugin is called iMPALA—it could be an abbreviation for ‘Model for Positionalism And Language Annotation’—that both writes all the non-OWL annotations in a separate XML file and takes care of the renderings in Protégé. It works; yay. Specifically, it handles the interaction between the OWL file, the positionalist elements, and the annotations/attributes, plus the additional feature that one can add new linguistic annotation properties, so as to cater for extensibility. Here are a few screenshots:

OWL’s arbeitetFuer ‘works for’ is linked to the relationship arbeiten.

The prey role in the axiom of the impala being eaten by the ibhubesi.

Annotations of the prey role itself, which is a role in the relationship ukudla.

We did test it a bit, from just the regular feature testing to the African Wildlife ontology that was translated into isiZulu (spoken in South Africa) and a people and pets ontology in ciShona (spoken in Zimbabwe). These details are available in the online supplementary material.

The next step is to tie it all together, being the verbalisation patterns for isiZulu [5,6] and the OWL ontologies to generate full sentences, correctly. This is set to happen soon (provided all the protests don’t mess up the planning too much). If you want to know more details that are not, or not clearly, in the paper, then please have a look at the project page of A Grammar engine for Nguni natural language interfaces (GeNi), or come visit EKAW16 that will be held from 21-23 November in Bologna, Italy, where I will present the paper.

References

[1] Keet, C.M., Chirema, T. A model for verbalising relations with roles in multiple languages. 20th International Conference on Knowledge Engineering and Knowledge Management EKAW’16). Springer LNAI, 19-23 November 2016, Bologna, Italy. (in print)

[2] Leo, J. Modeling relations. Journal of Philosophical Logic, 2008, 37:353-385.

[3] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 2015, 98:30-53.

[4] Calvanese, D., De Giacomo, G. The Description Logics Handbook: Theory, Implementation and Applications, chap. Expressive description logics, pp. 178-218. Cambridge University Press (2003).

[5] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2016, in print.

[6] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. Proceedings of the 9th International Natural Language Generation conference 2016 (INLG’16), Edinburgh, Scotland, Sept 2016. ACL, 174-183.

# An exhaustive OWL species classifier

Students enrolled in my ontology engineering course have to do a “mini-project” on a particular topic, chosen from a list of topics, such as on ontology quality, verbalisations, or language features, and may be theoretical or software development-oriented. In terms of papers, the most impressive result was OntoPartS that resulted in an ESWC2012 paper with the two postgraduate students [1], but also quite some other useful results have come out of it over the past 7 years that I’m teaching it in one form or another. This year’s top project in terms of understanding the theory, creativity to do something with it that hasn’t been done before, and working software using Semantic Web technologies was the “OWL Classifier” by Aashiq Parker, Brian Mc George, and Muhummad Patel.

The OWL classifier classifies an OWL ontology in any of its ‘species’, which can be any of the 8 specified in the standard, i.e., the 3 OWL 1 ones and the 5 OWL 2 ones. It also gives information on the DL ‘alphabet soup’—which axioms use which language feature with which letter, and an explanation of the letters—and reports on which axioms are the ones that violate a particular species. An example is shown in the following screenshot, with an exercise ontology on phone points:

The students’ motivation to develop it was because they had to learn about DLs and the OWL species, but Protégé 4.x and 5.x don’t tell you the species and the interfaces have only a basic, generic, explanation for the DL expressivity. I concur. And is has gotten worse with Protégé 5.0: if an ontology is outside OWL 2 DL, it still says the ‘old’ DL expressivity plus an easy-to-overlook tiny red triangle in the top-right corner once the reasoner was invoked (using Hermit 1.3.8) or a cryptic “internal reasoner error” message (Pellet), whereas with Protégé 4.x you at least got a pop-up box complaining about the ‘non-simple role…’ issues. Compare that with the neat feedback like this:

It is also very ‘sensitive’—more so than one would be with Protégé alone. Any remote ontology imports have to be available at the location specified with the IRI. Violations due to wrong datatype usage is a known issue with the OWL Reasoner Evaluation set of ontologies, and which we’ve bumped into with the TDD testing as well. The tool doesn’t accept the invalid ones (wrong datatypes—one can select any XML data type in Protégé, but the OWL standard doesn’t support them all). In addition, a language such as OWL 2 QL has further restrictions on types of datatypes. (It is also not trivial to figure out manually whether some ontology is suitable for OBDA or not.) So I tried one from the Ontop website’s examples, presumably in OWL 2 QL:

Strictly speaking, it isn’t in OWL 2 QL! The OWL 2 QL profile does have xsd:integer as datatype [2], not xsd:int, as, and I quote the standard, “the intersection of the value spaces of any set of these datatypes [including xsd:integer but not xsd:int, mk] is either empty or infinite, which is necessary to obtain the desired computational properties”. [UPDATE 24-6, thanks to Martin Rezk:] The main toolset for OWL 2 QL, Ontop, actually does support xsd:int and a few other datatypes beyond the standard (e.g.: also float and boolean). There is similar syntax fun to be had with the pizza ontology: the original one is indeed in OWL DL, but if you open the file in Protégé 5 and save it, it is not in OWL DL anymore but in OWL 2 DL, for the save operation snuck in an owl#NamedIndividual. Click on the thumbnails below to see the before-and-after in the OWL classifier. This is not an increase in expressiveness—both are in SHOIN—just syntax and tooling.

The OWL Classifier can thus classify both OWL 1 and OWL 2 ontologies, which it does through a careful orchestration of two OWL APIs: v1.4.3 was the last one to support OWL 1 species checking, whereas for the OWL 2 ontologies, the latest version is used (v4.2.3). The jar file and the source code are freely available on github for anyone to use and to take further. Turning it into a Protégé plugin very likely will make at least next year’s ontology engineering students happy. Comments, questions, and suggestion are welcome!

References

[1] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

[2] Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, Carsten Lutz, eds. OWL 2 Web Ontology Language: Profiles. W3C Recommendation, 11 December 2012 (2nd ed.).

# The TDDonto tool to try out TDD for ontology authoring

Last month I wrote about Test-Driven Development for ontologies, which is described in more detail in the ESWC’16 paper I co-authored with Agnieszka Lawrynowicz [1]. That paper does not describe much about the actual tool implementing the tests, TDDonto, although we have it and used it for the performance evaluation. Some more detail on its design and more experimental results are described in the paper “The TDDonto Tool for Test-Driven Development of DL Knowledge Bases” [2] that has just been published in the proceedings of the 29th International Workshop on Description Logics, which will take place next weekend in Cape Town (22-25 April 2016).

What we couldn’t include there in [2] is multiple screenshots to show how it works, but a blog is a fine medium for that, so I’ll illustrate the tool with some examples in the remainder of the post. It’s an alpha version that works. No usability and HCI evaluations have been done, but at least it’s a Protégé plugin rather than command line :).

First, you need to download the plugin from Agnieszka’s ARISTOTELES project page and place the jar file in the plugins folder of Protégé 5.0. You can then go to the Protégé menu bar, select Windows – Views – Evaluation views – TDDOnto, and place it somewhere on the screen and start using it. For the examples here, I used the African Wildlife Ontology tutorial ontology (AWO v1) from my ontology engineering course.

Make sure to have selected an automated reasoner, and classify your ontology. Now, type a new test in the “New test” field at the top, e.g. carnivore DisjointWith: herbivore, click “Add test”, select the checkbox of the test to execute, and click the “Execute test”: the status will be returned, as shown in the screenshot below. In this case, the “OK” says that the disjointness is already asserted or entailed in the ontology.

Now let’s do a TDD test that is going to fail (you won’t know upfront, of course); e.g., testing whether impalas are herbivores:

The TDD test failed because the subsumption is neither asserted nor entailed in the ontology. One can then click “add to ontology”, which updates the ontology:

Note that the reasoner has to be run again after a change in the ontology.

Lets do two more: testing whether lion is a carnivore and that flower is a plan part. The output of the tests is as follows:

It returns “OK” for the lion, because it is entailed in the ontology: a carnivore is an entity that eats only animals or parts thereof, and lions eat only herbivore and eats some impala (which are animals). The other one, Flower SubClassOf: PlantParts fails as “undefined”, because Flower is not in the ontology.

Ontologies do not have only subsumption and disjointness axioms, so let’s assume that impalas eat leaves and we want check whether that is in the ontology, as well as whether lions eat animals:

The former failed because there are no properties for the impala in the AWO v1, the latter passed, because a lion eats impala, and impala is an animal. Or: the TDDOnto tool indeed behaves as expected.

Currently, only a subset of all the specified tests have been implemented, due to some limitations of existing tools, but we’re working on implementing those as well.

If you have any feedback on TDDOnto, please don’t hesitate to tell us. I hope to be seeing you later in the week at DL’16, where I’ll be presenting the paper on Sunday afternoon (24th) and I also can give a live demo any time during the workshop (or afterwards, if you stay for KR’16).

References

[1] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS. 29 May – 2 June, 2016, Crete, Greece. (in print)

[2] Lawrynowicz, A., Keet, C.M. The TDDonto Tool for Test-Driven Development of DL Knowledge bases. 29th International Workshop on Description Logics (DL’16). April 22-25, Cape Town, South Africa. CEUR WS vol. 1577.