# Orchestrating 28 logical theories of mereo(topo)logy

Parts and wholes, again. This time it’s about the logic-aspects of theories of parthood (cf. aligning different hierarchies of (part-whole) relations and make them compatible with foundational ontologies). I intended to write this post before the Ninth Conference on Knowledge Capture (K-CAP 2017), where the paper describing the new material would be presented by my co-author, Oliver Kutz. Now, afterwards, I can add that “Orchestrating a Network of Mereo(topo) logical Theories” [1] even won the Best Paper Award. The novelties, in broad strokes, are that we figured out and structured some hitherto messy and confusing state of affairs, showed that one can do more than generally assumed especially with a new logics orchestration framework, and we proposed first steps toward conflict resolution to sort out expressivity and logic limitations trade-offs. Constructing a tweet-size “tl;dr” version of the contents is not easy, and as I have as much space here on my blog as I like, it ended up to be three paragraphs here: scene-setting, solution, and a few examples to illustrate some of it.

Problems

As ontologists know, parthood is used widely in ontologies across most subject domains, such as biomedicine, geographic information systems, architecture, and so on. Ontology (the philosophers) offer a parthood relation that has a bunch of computationally unpleasant properties that are structured in a plethora of mereologicial and meretopological theories such that it has become hard to see the forest for the trees. This is then complicated in practice because there are multiple logics of varying expressivity (support more or less language features), with the result that only certain fragments of the mereo(topo)logical theories can be represented. However, it’s mostly not clear what can be used when, during the ontology authoring stage one may want to have all those features so as to check correctness, and it’s not easy to predict what will happen when one aligns ontologies with different fragments of mereo(topo)logy.

Solution

We solved these problems by specifying a structured network of theories formulated in multiple logics that are glued together by the various linking constructs of the Distributed Ontology, Model, and Specification Language (DOL). The ‘structured network of theories’-part concerns all the maximal expressible fragments of the KGEMT mereotopological theory and five of its most well-recognised sub-theories (like GEM and MT) in the seven Description Logics-based OWL species, first-order logic, and higher order logic. The ‘glued together’-part refers to relating the resultant 28 theories within DOL (in Ontohub), which is a non-trivial (understatement, unfortunately) metalanguage that has the constructors for the glue, such as enabling one to declare to merge two theories/modules represented in different logics, extending a theory (ontology) with axioms that go beyond that language without messing up the original (expressivity-restricted) ontology, and more. Further, because the annoying thing of merging two ontologies/modules can be that the merged ontology may be in a different language than the two original ones, which is very hard to predict, we have a cute proof-of-concept tool so that it assists with steps toward resolution of language feature conflicts by pinpointing profile violations.

Examples

The paper describes nine mechanisms with DOL and the mereotopological theories. Here I’ll start with a simple one: we have Minimal Topology (MT) partially represented in OWL 2 EL/QL in “theory8” where the connection relation (C) is just reflexive (among other axioms; see table in the paper for details). Now what if we add connection’s symmetry, which results in “theory4”? First, we do this by not harming theory8, in DOL syntax (see also the ESSLI’16 tutorial):

logic OWL2.QL
ontology theory4 =
theory8
then
ObjectProperty: C Characteristics: Symmetric %(t7)

What is the logic of theory4? Still in OWL, and if so, which species? The Owl classifier shows the result:

Another case is that OWL does not let one define an object property; at best, one can add domain and range axioms and the occasional ‘characteristic’ (like aforementioned symmetry), for allowing arbitrary full definitions pushes it out of the decidable fragment. One can add them, though, in a system that can handle first order logic, such as the Heterogeneous toolset (Hets); for instance, where in OWL one can add only “overlap” as a primitive relation (vocabulary element without definition), we can take such a theory and declare that definition:

logic CASL.FOL
ontology theory20 =
theory6_plus_antisym_and_WS
then %wdef
. forall x,y:Thing . O(x,y) <=> exists z:Thing (P(z,x) /\ P(z,y)) %(t21)
. forall x,y:Thing . EQ(x,y) <=> P(x,y) /\ P(y,x) %(t22)

As last example, let me illustrate the notion of the conflict resolution. Consider theory19—ground mereology, partially—that is within OWL 2 EL expressivity and theory18—also ground mereology, partially—that is within OWL 2 DL expressivity. So, they can’t be the same; the difference is that theory18 has parthood reflexive and transitive and proper parthood asymmetric and irreflexive, whereas theory19 has both parthood and proper parthood transitive. What happens if one aligns the ontologies that contain these theories, say, O1 (with theory18) and O2 (with theory19)? The Owl classifier provides easy pinpointing and tells you the profile: OWL 2 full (or: first order logic, or: beyond OWL 2 DL—top row) and why (bottom section):

Now, what can one do? The conflict resolution cannot be fully automated, because it depends on what the modeller wants or needs, but there’s enough data generated already and there are known trade-offs so that it is possible to describe the consequences:

• Choose the O1 axioms (with irreflexivity and asymmetry on proper part of), which will make the ontology interoperable with other ontologies in OWL 2 DL, FOL or HOL.
• Choose O2’s axioms (with transitivity on part of and proper part of), which will facilitate linking to ontologies in OWL 2 RL, 2 EL, 2 DL, FOL, and HOL.
• Choose to keep both sets will result in an OWL 2 Full ontology that is undecidable, and it is then compatible only with FOL and HOL ontologies.

As serious final note: there’s still fun to be had on the logic side of things with countermodels and sub-networks and such, and with refining the conflict resolution to assist ontology engineers better. (or: TBC)

As less serious final note: the working title of early drafts of the paper was “DOLifying mereo(topo)logy”, but at some point we chickened out and let go of that frivolity.

References

[1] Keet, C.M., Kutz, O. Orchestrating a Network of Mereo(topo)logical Theories. Ninth International Conference on Knowledge Capture (K-CAP’17), Austin, Texas, USA, December 4-6, 2017. ACM Proceedings.

# Figuring out the verbalisation of temporal constraints in ontologies and conceptual models

Temporal conceptual models, ontologies, and their logics are nothing new, but that sort of information and knowledge representation still doesn’t gain a lot of traction (cf. say, formal methods for verification). This is in no small part because modelling temporal information is not easy. Several conceptual modelling languages do have various temporal extensions, but most modellers don’t even use all of the default language features yet [1]. How could one at least reduce the barrier to adoption of temporal logics and modelling languages? The two principle approaches are visualisation with a diagrammatic language and rendering it in a (pseudo-)natural language. One of my postgraduate students looked at the former, trying to figure out what would be the best icons and such, which showed there was still a steep learning curve [2]. Before examining whether that could be optimised, I wondered whether the natural language option might be promising. The problem was, that no-one had yet tried to determine what the natural language counterpart of the temporal constraints were supposed to be, let alone whether they be ‘adequate’ or the ‘best’ way of rendering the temporal constraints in tolerable natural language sentences. I wanted to know that badly enough that I tried to find out.

Given that using templates is a tried-and-tested relatively successful approach for atemporal conceptual models and ontologies (e.g., for ORM, the ACE system), it makes sense to do something similar, but then for some temporal extension. As temporal conceptual modelling language I used one that has a Description Logics foundation (DLRUS [3,4]) for that easily links to ontologies as well, added a few known temporal constraints (like for relationships/DL roles, mandatory) and removing others (some didn’t seem all that interesting), which resulted in 34 constraints, still. For each one, I tried to devise more and less reasonable templates, resulting in 101 templates overall. Those templates were evaluated on semantics and preference by three temporal logic experts and five ‘mixed experts’ (experts in natural language generation, logic, or modelling). This resulted in a final set of preferred templates to verbalise the temporal constraints. The remainder of this post first describes a bit about the templates and then the results of which I think they are most interesting.

Templates

The basic idea of a template—in the context of the verbalisation of conceptual models and ontologies—is to have some natural language for the constraint where then the vocabulary gets slotted in at runtime. Take, for instance, simple named class subsumption in an ontology, $C \sqsubseteq D$, for which one could define a template “Each [C] is a(n) [D]”, so that with some axiom $Manager \sqsubseteq Employee$, it would generate the sentence “Each Manager is an Employee”. One also could have devised the template “All [C] are [D]” and then it would have generated “All Managers are Employees”. The choice between the two templates in this case is just taste, for in both cases, the semantics is the same. More complex axioms are not always that straightforward. For instance, for the axiom type $C \sqsubseteq \exists R.D$, would “Each [C] [R] some [D]” be good enough, or would perhaps “Each [C] must [R] at least one [D]” be better? E.g., “Each Professor teaches some Course” vs “Each Professor must teach at least one Course”.

The same can be done for the temporal constraints. To get there, I did a bit of a linguistic detour that informed the template design (described in the paper [5]). Let us take as first example for templates temporal class that has a semantics of $o \in C^{\mathcal{I}(t)} \rightarrow \exists t' \neq t. o \notin C^{\mathcal{I}(t')}$; for instance, UndergraduateStudent (assuming they graduate and end up as alumni or as drop outs, and weren’t undergrads from birth):

1. If an object is an instance of entity type [C], then there is some time where it is not a(n) [C].
2. [C] is an entity type whose objects are, for some time in their existence, not instances of [C].
3. [C] is an entity type of which each object is not a(n) [C] for some time during its existence.
4. All instances of entity type [C] are not a(n) [C] for some time.
5. Each [C] is not a(n) [C] for some time.
6. Each [C] is for some time not a(n) [C].

Which one(s) do you think captures the semantics, and which one(s) do you prefer?

A more elaborate constraint for relationships is ‘dynamic extension for relationships, past, mandatory], which is formalised as $\langle o , o' \rangle \in \mbox{{\sc RDexM}-}_{R_1,R_2}^{\mathcal{I}(t)} \rightarrow (\langle o , o' \rangle \in{\tt R_1}^{\mathcal{I}(t)} \rightarrow \exists t' where $\langle o , o' \rangle \in \mbox{{\sc RDex}}_{R_1,R_2}^{\mathcal{I}(t)} \rightarrow ( \langle o , o' \rangle \in{\tt R_1}^{\mathcal{I}(t)} \rightarrow \exists t'>t. \langle o , o' \rangle \in {\tt R_2}^{\mathcal{I}(t')})$.; e.g., every passenger who boards a flight must have checked in for that flight. Two options could be:

1. Each ..C_1.. ..R_1.. ..C_2.. was preceded by ..C_1.. ..R_2.. ..C_2.. some time earlier.
2. Each ..C_1.. ..R_1.. ..C_2.. must be preceded by ..C_1.. ..R_2.. ..C_2.. .

I’m not saying they are all correct; they were some of the options given, which the participants could choose from and comment on. The full list of constraints and template options are available in the supplementary material, which also contains a file where you can fill in your own answers, see what the (anonymised) participants said, and it has the final list of ‘best’ constraints.

Results

The main aggregate quantitative results are shown in the following table.

Many observations can be made from the data (see the paper for details). Some of the salient aspects are that there was low inter-annotator agreement among the experts, despite that they know each other (temporal logics is a small community) and that the ‘mixed group’ deemed many sentences correct that the experts deemed wrong in the sense of not properly capturing the semantics of the constraint. Put differently, it looks like the mixed experts, as a group, did not fully grasp some subtle distinction in the temporal constraints.

With respect to the templates, the preferred ones don’t follow the structure of the logic, but are, in a way, a separate rendering, or: there’s no neat 1:1 mapping between axiom type and template structure. That said, that doesn’t mean that they always chose the shortest template: the experts definitely did not, while the mixed experts leaned a bit toward preferring templates with fewer words even though they were surely not always the semantically correct option.

It may not look good that the experts preferred different templates, but in a follow-up interview with one of the experts, the expert noted that it was not really a problem “for there is the logic that does have the precise meaning anyway” and thus “resolves any confusion that may arise from using slightly different terminology”. The temporal logic expert does have a point from the expert’s view, fair enough, but that pretty much defeats my aim with the experiment. Asking more non-experts may not be a good strategy either, for they are, on average, too lenient.

So, for now, we do have a set of, relatively, ‘best’ templates to verbalise temporal constraints in temporal conceptual models and ontologies. The next step is to compare that with the diagrammatic representation. This we did [6], and I’ll describe those results informally in a next post.

I’ll present more details at the upcoming CREOL: Contextual Representation of Events and Objects in Language Workshop that is part of the Joint Ontology Workshops 2017, which will be held next week (21-23 September) in Bolzano, Italy. As the KRDB group at FUB in Bolzano has a few temporal logic experts, I’m looking forward to the discussions! Also, I’d be happy if you would be willing to fill in the spreadsheet with your preferences (before looking at the answers given by the participants!), and send them to me.

References

[1] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Johannesson, P., Lee, M.L. Liddle, S.W., Opdahl, A.L., Pastor López, O. (Eds.). Springer LNCS vol 9381, 585-593. 19-22 Oct, Stockholm, Sweden.

[2] T. Shunmugam. Adoption of a visual model for temporal database representation. M. IT thesis, Department of Computer Science, University of Cape Town, South Africa, 2016.

[3] A. Artale, E. Franconi, F. Wolter, and M. Zakharyaschev. A temporal description logic for reasoning about conceptual schemas and queries. In S. Flesca, S. Greco, N. Leone, and G. Ianni, editors, Proceedings of the 8th Joint European Conference on Logics in Artificial Intelligence (JELIA-02), volume 2424 of LNAI, pages 98-110. Springer Verlag, 2002.

[4] A. Artale, C. Parent, and S. Spaccapietra. Evolving objects in temporal information systems. Annals of Mathematics and Artificial Intelligence, 50(1-2):5-38, 2007.

[5] Keet, C.M. Natural language template selection for temporal constraints. CREOL: Contextual Representation of Events and Objects in Language, Joint Ontology Workshops 2017, 21-23 September 2017, Bolzano, Italy. CEUR-WS Vol. (in print).

[6] Keet, C.M., Berman, S. Determining the preferred representation of temporal constraints in conceptual models. 36th International Conference on Conceptual Modeling (ER’17). Springer LNCS. 6-9 Nov 2017, Valencia, Spain. (in print)

# Our ESWC17 demos: TDDonto2 and an OWL verbaliser for isiZulu

Besides the full paper on heterogeneous alignments for 14th Extended Semantic Web Conference (ESWC’17) that will take place next week in Portoroz, Slovenia, we also managed to squeeze out two demo papers. You may already know of TDDonto2 with Kieren Davies and Agnieszka Lawrynowicz, which was discussed in an earlier post that has been updated with a tutorial video. It now has a demo paper as well [1], which describes the rationale and a few scenarios. The other demo, with Musa Xakaza and Langa Khumalo, is new-new, but the regular reader might have seen it coming: we finally managed to link the verbalisation patterns for certain Description Logic axiom types [2,3] to those in OWL ontologies. The tool takes as input an ontology in isiZulu and the verbalisation algorithms, and out come the isiZulu sentences, be this in plain text for further processing or in a GUI for inspection by a domain expert [4]. There is a basic demo-screencast to show it’s all working.

The overall architecture may be of interest, for it deviates from most OWL verbalisers. It is shown in the following figure:

For instance, we use the Python-based OWL API Owlready, rather than a Java-based app, for Python is rather popular in NLP and the verbalisation algorithms may be used elsewhere as well. We made more such decisions with the aim to make whatever we did as multi-purpose usable as possible, like the list of nouns with noun classes (surprisingly, and annoyingly, there is no such readily available list yet, though isizulu.net probably will have it somewhere but inaccessible), verb roots, and exceptions in pluralisation. (Problems for integrating the verbaliser with, say, Protégé will be interesting to discuss during the demo session!)

The text-based output doesn’t look as nice as the GUI interface, so I will show here only the GUI interface, which is adorned with some annotations to illustrate that those verbalisation algorithms in the background are far from trivial templates:

For instance, while in English the universal quantification is always ‘Each’ or ‘All’ regardless the named class quantified over, in isiZulu it depends on the noun class of the noun that is the name of the OWL class. For instance, in the figure above, izingwe ‘leopards’ is in noun class 10, so the ‘Each/All’ is Zonke, amavazi ‘vases’ is in noun class 6, so ‘Each/All’ then becomes Onke, and abantu ‘people’/’humans’ is in noun class 2, making Bonke. There are 17 noun classes. They also determine the subject concords (SC, alike conjugation) for the verbs, with zi- for noun class 10, ­a- for noun class 6, and ba- for noun class 2, to name a few. How this all works is described in [2,3]. We’ve implemented all those algorithms and integrated the pluraliser [5] in it to make it work. The source files are available to check and play with already, you can do so and ask us during the ESWC17 demo session, and/or also have a look at the related outputs of the NRF-funded project Grammar Engine for Nguni natural language interfaces (GeNi).

References

[1] Davies, K. Keet, C.M., Lawrynowicz, A. TDDonto2: A Test-Driven Development Plugin for arbitrary TBox and ABox axioms. Extended Semantic Web Conference (ESWC’17), Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (demo paper)

[2] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157.

[3] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. 9th International Natural Language Generation conference (INLG’16), 5-8 September, 2016, Edinburgh, UK. Association for Computational Linguistics, 174-183.

[4] Keet, C.M. Xakaza, M., Khumalo, L. Verbalising OWL ontologies in isiZulu with Python. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (demo paper)

[5] Byamugisha, J., Keet, C.M., Khumalo, L. Pluralising Nouns in isiZulu and Related Languages. 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’16), Springer LNCS. April 3-9, 2016, Konya, Turkey.

# Relations with roles / verbalising object properties in isiZulu

The narratives can be very different for the paper “A model for verbalising relations with roles in multiple languages” that was recently accepted paper at the 20th International Conference on Knowledge Engineering and Knowledge management (EKAW’16), for the paper makes a nice smoothie of the three ingredients of language, logic, and ontology. The natural language part zooms in on isiZulu as use case (possibly losing some ontologist or logician readers), then there are the logics about mapping the Description Logic DLR’s role components with OWL (lose possible interest of the natural language researchers), and a bit of philosophy (and lose most people…). It solves some thorny issues when trying to verbalise complicated verbs that we need for knowledge-to-text natural language generation in isiZulu and some other languages (e.g., German). And it solves the matching of logic-based representations popularised in mainly UML and ORM (that typically uses a logic in the DLR family of Description Logic languages) with the more commonly used OWL. The latter is even implemented as a Protégé plugin.

Let me start with some use-cases that cause problems that need to be solved. It is well-known that natural language renderings of ontologies facilitate communication with domain experts who are expected to model and validate the represented knowledge. This is doable for English, with ACE in the lead, but it isn’t for grammatically richer languages. There, there are complications, such as conjugation of verbs, an article that may be dependent on the preposition, or a preposition may modify the noun. For instance, works for, made by, located in, and is part of are quite common names for object properties in ontologies. They all do have a dependent preposition, however, there are different verb tenses, and the latter has a copulative and noun rather than just a verb. All that goes into the object properties name in an ‘English-based ontology’ and does not really have to be processed further in ontology verbalisation other than beautification. Not so in multiple other languages. For instance, the ‘in’ of located in ends up as affixes to the noun representing the object that the other object is located in. Like, imvilophu ‘envelope’ and emvilophini ‘in the envelope’ (locative underlined). Even something straightforward like a property eats can end up having to be conjugated differently depending on who’s eating: when a human eats, it is udla in isiZulu, but for, say, a dog, it is idla (modification underlined), which is driven by the system of noun classes, of which there are 17 in isiZulu. Many more examples illustrating different issues are described in the paper. To make a long story short, there are gradations in complicating effects, from no effect where a preposition can be squeezed in with the verb in naming an OP, to phonological conditioning, to modifying the article of the noun to modifying the noun. A ‘3rd pers. sg.’ may thus be context-dependent, and notions of prepositions may modify the verb or the noun or the article of the noun, or both. For a setting other than English ontologies (e.g., Greek, German, Lithuanian), a preposition may belong neither to the verb nor to the noun, but instead to the role that the object plays in the relation described by the verb in the sentence. For instance, one obtains yomuntu, rather than the basic noun umuntu, if it plays the role of the whole in a part-whole relation like in ‘heart is part of a human’ (inhliziyo iyingxenye yomuntu).

The question then becomes how to handle such a representation that also has to include roles? This is quite common in conceptual data modelling languages and in the DLR family of DL languages, which is known in ontology as positionalism [2]. Bumping up the role to an element in the representation language—thus, in addition to the relationship—enables one to attach information to it, like whether there is a (deep) preposition associated with it, the tense, or the case. Such role-based annotations can then be used to generate the right element, like einen Betrieb ‘some company’ to adjust the article for the case it goes with in German, or ya+umuntu=yomuntu ‘of a human’, modifying the noun in the object position in the sentence.

To get this working properly, with a solid theoretical foundation, we reused a part of the conceptual modelling languages’ metamodel [3] to create a language model for such annotations, in particular regarding the attributes of the classes in the metamodel. On its own, however, it is rather isolated and not immediately useful for ontologies that we set out to be in need of verbalising. To this end, it links to the ‘OWL way of representing relations’ (ontologically: the so-called standard view), and we separate out the logic-based representation from the readings that one can generate with the structured representation of the knowledge. All in all, the simplified high-level model looks like the picture below.

Simplified diagram in UML Class Diagram notation of the main components (see paper for attributes), linking a section of the metamodel (orange; positionalist commitment) to predicates (green; standard view) and their verbalisation (yellow). (Source: [1])

That much for the conceptual part; more details are described in the paper.

Just a fluffy colourful diagram isn’t enough for a solid implementation, however. To this end, we mapped one of the logics that adhere to positionalism to one of the standard view, being DLR [4] and OWL, respectively. It equally well could have been done for other pairs of languages (e.g., with Common Logic), but these two are more popular in terms of theory and tools.

Having the conceptual and logical foundations in place, we did implement it to see whether it actually can be done and to check whether the theory was sufficient. The Protégé plugin is called iMPALA—it could be an abbreviation for ‘Model for Positionalism And Language Annotation’—that both writes all the non-OWL annotations in a separate XML file and takes care of the renderings in Protégé. It works; yay. Specifically, it handles the interaction between the OWL file, the positionalist elements, and the annotations/attributes, plus the additional feature that one can add new linguistic annotation properties, so as to cater for extensibility. Here are a few screenshots:

OWL’s arbeitetFuer ‘works for’ is linked to the relationship arbeiten.

The prey role in the axiom of the impala being eaten by the ibhubesi.

Annotations of the prey role itself, which is a role in the relationship ukudla.

We did test it a bit, from just the regular feature testing to the African Wildlife ontology that was translated into isiZulu (spoken in South Africa) and a people and pets ontology in ciShona (spoken in Zimbabwe). These details are available in the online supplementary material.

The next step is to tie it all together, being the verbalisation patterns for isiZulu [5,6] and the OWL ontologies to generate full sentences, correctly. This is set to happen soon (provided all the protests don’t mess up the planning too much). If you want to know more details that are not, or not clearly, in the paper, then please have a look at the project page of A Grammar engine for Nguni natural language interfaces (GeNi), or come visit EKAW16 that will be held from 21-23 November in Bologna, Italy, where I will present the paper.

References

[1] Keet, C.M., Chirema, T. A model for verbalising relations with roles in multiple languages. 20th International Conference on Knowledge Engineering and Knowledge Management EKAW’16). Springer LNAI, 19-23 November 2016, Bologna, Italy. (in print)

[2] Leo, J. Modeling relations. Journal of Philosophical Logic, 2008, 37:353-385.

[3] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 2015, 98:30-53.

[4] Calvanese, D., De Giacomo, G. The Description Logics Handbook: Theory, Implementation and Applications, chap. Expressive description logics, pp. 178-218. Cambridge University Press (2003).

[5] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2016, in print.

[6] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. Proceedings of the 9th International Natural Language Generation conference 2016 (INLG’16), Edinburgh, Scotland, Sept 2016. ACL, 174-183.

# An exhaustive OWL species classifier

Students enrolled in my ontology engineering course have to do a “mini-project” on a particular topic, chosen from a list of topics, such as on ontology quality, verbalisations, or language features, and may be theoretical or software development-oriented. In terms of papers, the most impressive result was OntoPartS that resulted in an ESWC2012 paper with the two postgraduate students [1], but also quite some other useful results have come out of it over the past 7 years that I’m teaching it in one form or another. This year’s top project in terms of understanding the theory, creativity to do something with it that hasn’t been done before, and working software using Semantic Web technologies was the “OWL Classifier” by Aashiq Parker, Brian Mc George, and Muhummad Patel.

The OWL classifier classifies an OWL ontology in any of its ‘species’, which can be any of the 8 specified in the standard, i.e., the 3 OWL 1 ones and the 5 OWL 2 ones. It also gives information on the DL ‘alphabet soup’—which axioms use which language feature with which letter, and an explanation of the letters—and reports on which axioms are the ones that violate a particular species. An example is shown in the following screenshot, with an exercise ontology on phone points:

The students’ motivation to develop it was because they had to learn about DLs and the OWL species, but Protégé 4.x and 5.x don’t tell you the species and the interfaces have only a basic, generic, explanation for the DL expressivity. I concur. And is has gotten worse with Protégé 5.0: if an ontology is outside OWL 2 DL, it still says the ‘old’ DL expressivity plus an easy-to-overlook tiny red triangle in the top-right corner once the reasoner was invoked (using Hermit 1.3.8) or a cryptic “internal reasoner error” message (Pellet), whereas with Protégé 4.x you at least got a pop-up box complaining about the ‘non-simple role…’ issues. Compare that with the neat feedback like this:

It is also very ‘sensitive’—more so than one would be with Protégé alone. Any remote ontology imports have to be available at the location specified with the IRI. Violations due to wrong datatype usage is a known issue with the OWL Reasoner Evaluation set of ontologies, and which we’ve bumped into with the TDD testing as well. The tool doesn’t accept the invalid ones (wrong datatypes—one can select any XML data type in Protégé, but the OWL standard doesn’t support them all). In addition, a language such as OWL 2 QL has further restrictions on types of datatypes. (It is also not trivial to figure out manually whether some ontology is suitable for OBDA or not.) So I tried one from the Ontop website’s examples, presumably in OWL 2 QL:

Strictly speaking, it isn’t in OWL 2 QL! The OWL 2 QL profile does have xsd:integer as datatype [2], not xsd:int, as, and I quote the standard, “the intersection of the value spaces of any set of these datatypes [including xsd:integer but not xsd:int, mk] is either empty or infinite, which is necessary to obtain the desired computational properties”. [UPDATE 24-6, thanks to Martin Rezk:] The main toolset for OWL 2 QL, Ontop, actually does support xsd:int and a few other datatypes beyond the standard (e.g.: also float and boolean). There is similar syntax fun to be had with the pizza ontology: the original one is indeed in OWL DL, but if you open the file in Protégé 5 and save it, it is not in OWL DL anymore but in OWL 2 DL, for the save operation snuck in an owl#NamedIndividual. Click on the thumbnails below to see the before-and-after in the OWL classifier. This is not an increase in expressiveness—both are in SHOIN—just syntax and tooling.

The OWL Classifier can thus classify both OWL 1 and OWL 2 ontologies, which it does through a careful orchestration of two OWL APIs: v1.4.3 was the last one to support OWL 1 species checking, whereas for the OWL 2 ontologies, the latest version is used (v4.2.3). The jar file and the source code are freely available on github for anyone to use and to take further. Turning it into a Protégé plugin very likely will make at least next year’s ontology engineering students happy. Comments, questions, and suggestion are welcome!

References

[1] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

[2] Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, Carsten Lutz, eds. OWL 2 Web Ontology Language: Profiles. W3C Recommendation, 11 December 2012 (2nd ed.).

# The TDDonto tool to try out TDD for ontology authoring

Last month I wrote about Test-Driven Development for ontologies, which is described in more detail in the ESWC’16 paper I co-authored with Agnieszka Lawrynowicz [1]. That paper does not describe much about the actual tool implementing the tests, TDDonto, although we have it and used it for the performance evaluation. Some more detail on its design and more experimental results are described in the paper “The TDDonto Tool for Test-Driven Development of DL Knowledge Bases” [2] that has just been published in the proceedings of the 29th International Workshop on Description Logics, which will take place next weekend in Cape Town (22-25 April 2016).

What we couldn’t include there in [2] is multiple screenshots to show how it works, but a blog is a fine medium for that, so I’ll illustrate the tool with some examples in the remainder of the post. It’s an alpha version that works. No usability and HCI evaluations have been done, but at least it’s a Protégé plugin rather than command line :).

First, you need to download the plugin from Agnieszka’s ARISTOTELES project page and place the jar file in the plugins folder of Protégé 5.0. You can then go to the Protégé menu bar, select Windows – Views – Evaluation views – TDDOnto, and place it somewhere on the screen and start using it. For the examples here, I used the African Wildlife Ontology tutorial ontology (AWO v1) from my ontology engineering course.

Make sure to have selected an automated reasoner, and classify your ontology. Now, type a new test in the “New test” field at the top, e.g. carnivore DisjointWith: herbivore, click “Add test”, select the checkbox of the test to execute, and click the “Execute test”: the status will be returned, as shown in the screenshot below. In this case, the “OK” says that the disjointness is already asserted or entailed in the ontology.

Now let’s do a TDD test that is going to fail (you won’t know upfront, of course); e.g., testing whether impalas are herbivores:

The TDD test failed because the subsumption is neither asserted nor entailed in the ontology. One can then click “add to ontology”, which updates the ontology:

Note that the reasoner has to be run again after a change in the ontology.

Lets do two more: testing whether lion is a carnivore and that flower is a plan part. The output of the tests is as follows:

It returns “OK” for the lion, because it is entailed in the ontology: a carnivore is an entity that eats only animals or parts thereof, and lions eat only herbivore and eats some impala (which are animals). The other one, Flower SubClassOf: PlantParts fails as “undefined”, because Flower is not in the ontology.

Ontologies do not have only subsumption and disjointness axioms, so let’s assume that impalas eat leaves and we want check whether that is in the ontology, as well as whether lions eat animals:

The former failed because there are no properties for the impala in the AWO v1, the latter passed, because a lion eats impala, and impala is an animal. Or: the TDDOnto tool indeed behaves as expected.

Currently, only a subset of all the specified tests have been implemented, due to some limitations of existing tools, but we’re working on implementing those as well.

If you have any feedback on TDDOnto, please don’t hesitate to tell us. I hope to be seeing you later in the week at DL’16, where I’ll be presenting the paper on Sunday afternoon (24th) and I also can give a live demo any time during the workshop (or afterwards, if you stay for KR’16).

References

[1] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS. 29 May – 2 June, 2016, Crete, Greece. (in print)

[2] Lawrynowicz, A., Keet, C.M. The TDDonto Tool for Test-Driven Development of DL Knowledge bases. 29th International Workshop on Description Logics (DL’16). April 22-25, Cape Town, South Africa. CEUR WS vol. 1577.

# Ontology authoring with a Test-Driven Development approach

Ontology development has its processes and procedures—conducting a domain analysis, the implementation, maintenance, and so on—which have been developed since the mid 1990s. These high-level information systems-like methodologies don’t tell you what and how to add the knowledge to the ontology, however, i.e., the ontology authoring stage is a ‘just do it’, but not how. There are, perhaps surprisingly, few methods for how to do that; notably, FORZA uses domain and range constraints and the reasoner to propose a suitable part-whole relation [1] and advocatus diaboli zooms in on disjointness constraints among classes [2]. In a way, they all use what can be considered as tests on the ontology before adding an axiom. This smells of notions that are well-known in software engineering: unit tests, test-driven development (TDD), and Agile, with the latter two relying on different methodologies cf. the earlier ones (waterfall, iterative, and similar).

Some of those software engineering approaches have been adjusted and adopted for ontology engineering; e.g., the Agile-inspired OntoMaven that uses the standard reasoning services as tests [3], eXtreme Design with ODPs [4] that have been prepared previously, and Vrandecic and Gangemi’s early exploration of possibilities for unit tests [5]. Except for Warrender & Lord’s TDD tests for subsumption [6], they are all test-last approaches (design, author, test), rather than a test-first approach (test, author, test). The test-first approach is called test-driven development in software engineering [7], which has been ported to conceptual modelling recently as well [8]. TDD is a step up from a “add something and lets see what the reasoner says” stance, because one has to think and check upfront first before doing. (Competency questions can help with that, but they don’t say how to add the knowledge.) The question that arises, then, is how such a TDD approach would look like for ontology development. Some of the more detailed questions to be answered are (from [9]):

• Given the TDD procedure for programming in software engineering, then what does that mean for ontologies during ontology authoring?
• TDD uses mock objects for incomplete parts of the code, and mainly for methods; is there a parallel to it in ontology development, or can that aspect of TDD be ignored?
• In what way, and where, (if at all) can this be integrated as a methodological step in existing ontology engineering methodologies?

We—Agnieszka Lawrynowicz from Poznan University of Technology in Poland and I—answer these questions in our paper that was recently accepted at the 13th Extended Semantic Web Conference (ESWC’16), to be held in May 2016 in Crete, Greece: Test-Driven Development of Ontologies [9]. In short: we specified tests for the OWL 2 DL language features and basic types of axioms one can add, implemented it as a Protégé plug-in, and tested it on performance with 67 ontologies (result: great). The tool and test data can be downloaded from Agnieszka’s ARISTOTELES project page.

Now slightly less brief than that one-liner. The tests—like for class subsumption, domain and range, a property chain—can be specified in two principal ways, called TBox tests and ABox tests. The TBox tests rely solely on knowledge represented in the TBox, whereas for the ABox tests, mock individuals are explicitly added for a particular TBox axiom. For instance, the simple existential quantification, as shown below, where the TBox query is denoted in SPARQL-OWL notation.

For the implementation, there is (1) a ‘wrapping’ component that includes creating the mock entities, checking the condition (line 2 in the TBox test example in the figure above, and line 4 in ABox TDD test), returning the TDD test result, and cleaning up afterward; and (2) the interaction with the ontology doing the actual testing, such as querying the ontology and class and instance classification. There are several options to realise component (2) of the TBox TDD tests. The query can be sent to, e.g., OWL-BGP [10] that uses SPARQL-OWL and Hermit to return the answer (line 1), but one also could use just the OWL API and send it to the automated reasoner, among other options.

Because OWL-BGP and the other options didn’t cater for the tests with OWL’s object properties, such as a property chain, so a full implementation would require extending current tools, we decided to first examine performance of the different options for (2) for those tests that could be implemented with current tools so as to get an idea of which approach would be the best to extend, rather than gambling on one, implement all, and go on with user testing. This TDD tool got the unimaginative name TDDonto and can be installed as a Protégé plugin. We tested the performance with 67 TONES ontology repository ontologies. The outcome of that is that the TBox-based SPARQL-OWL approach is faster than the ABox TDD tests (except for class disjointness; see figure below), and the OWL API + reasoner for the TBox TDD tests is again faster in general. These differences are bigger with larger ontologies (see paper for details).

Test computation times per test type (ABox versus TBox-based SPARQL-OWL) and per the kind of the tested axiom (source: [9]).

Finally, can this TDD be simply ‘plugged in’ into one of the existing methodologies? No. As with TDD for software engineering, it has its own sequence of steps. An initial sketch is shown in the figure below. It outlines only the default scenario, where the knowledge to be added wasn’t there already and adding it doesn’t result in conflicts.

Sketch of a possible ontology lifecycle that focuses on TDD, and the steps of the TDD procedure summarised in key terms (source: [9]).

The “select scenario” has to do with what gets fed into the TDD tests, and therewith also where and how TDD could be used. We specified three of them: either (a) the knowledge engineer provides an axiom, (b) a domain expert fills in some template (e.g., the ‘all-some’ one) and that software generates the axiom for the domain ontology (e.g., $Professor \sqsubseteq \exists teaches.Course$), or (c) someone wrote a competency question that is either manually or automatically converted into an axiom. The “refactoring” could include a step for removing a property from a subclass when it is added to its superclass. The “regression testing” considers previous tests and what to do when any conflicts may have arisen (which may need an interaction with step 5).

There is quite a bit of work yet to be done on TDD for ontologies, but at least there is now a first comprehensive basis to work from. Both Agnieszka and I plan to go to ESWC’16, so I hope to see you there. If you want more details or read the tests with a nicer layout than how it is presented in the ESWC16 paper, then have a look at the extended version [11] or contact us, or leave a comment below.

References

[1] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM proceedings. Oct. 27 – Nov. 1, 2013, San Francisco, USA. pp569-578.

[2] Ferre, S. and Rudolph, S. (2012). Advocatus diaboli exploratory enrichment of ontologies with negative constraints. In ten Teije, A. et al., editors, 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12), volume 7603 of LNAI, pages 42-56. Springer. Oct 8-12, Galway, Ireland

[3] Paschke, A., Schaefermeier, R. Aspect OntoMaven – aspect-oriented ontology development and configuration with OntoMaven. Tech. Rep. 1507.00212v1, Free University of Berlin (July 2015)

[4] Blomqvist, E., Sepour, A.S., Presutti, V. Ontology testing — methodology and tool. In: Proc. of EKAW’12. LNAI, vol. 7603, pp. 216-226. Springer (2012)

[5] Vrandecic, D., Gangemi, A. Unit tests for ontologies. In: OTM workshops 2006. LNCS, vol. 4278, pp. 1012-1020. Springer (2006)

[6] Warrender, J.D., Lord, P. How, What and Why to test an ontology. Technical Report 1505.04112, Newcastle University (2015), http://arxiv.org/abs/1505.04112

[7] Beck, K.: Test-Driven Development: by example. Addison-Wesley, Boston, MA (2004)

[8] Tort, A., Olive, A., Sancho, M.R. An approach to test-driven development of conceptual schemas. Data & Knowledge Engineering 70, 1088-1111 (2011)

[9] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS. 29 May – 2 June, 2016, Crete, Greece. (in print)

[10] Kollia, I., Glimm, B., Horrocks, I. SPARQL Query Answering over OWL Ontologies. In: Proc, of ESWC’11. LNCS, vol. 6643, pp. 382-396. Springer (2011)

[11] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies (extended version). Technical Report, Arxiv.org http://arxiv.org/abs/1512.06211. Dec 19, 2015.