Localising Protégé with Manchester syntax into your language of choice

Some people like a quasi natural language interface in ontology development tools, which is why Manchester Syntax was proposed [1]. A downside is that it locks the ontology developer into English, so that weird chimaeras are generated in the interface if the author prefers another language for the ontology, such as, e.g., the “jirafa come only (oja or ramita)” mentioned in an earlier post and that was deemed unpleasant in an experiment a while ago [2]. Those who prefer the quasi natural language components will have to resort to localising Manchester syntax and the tool’s interface.

This is precisely what two of my former students—Adam Kaliski and Casey O’Donnell—did during their mini-project in the ontology engineering course of 2017. A localisation in Afrikaans, as the case turned out to be. To make this publicly available, Michael Harrison brushed up the code a bit and tested it worked also in the new version of Protégé. It turned out it wasn’t that easy to localise it to another language the way it was done, so one of my PhD students, Toky Raboanary, redesigned the whole thing. This was then tested with Spanish, and found to be working. The remainder of the post describes informally some main aspects of it. If you don’t want to read all that but want to play with it right away: here are the jar files, open source code, and localisation instructions for if you want to create, say, a French or Dutch variant.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

The localisation functions as a plugin for Protégé as a ‘view’ component. It can be selected under “Windows – Views – Class views” and then Beskrywing for the Afrikaans and Descripción for Spanish, and dragged into the desired position; this is likewise for object properties.

Instead of burying the translations in the code, they are specified in a separate XML file, whose content is fetched during the rendering. Adding a new ‘simple’ (more about that later) language merely amounts to adding a new XML file with the translations of the Protégé labels and of the relevant Manchester syntax. Here are the ‘simple’ translations—i.e., where both are fixed strings—for Afrikaans for the relevant tool interface components:

 

Class Description

(Label)

Klasbeskrywing

(Label in Afrikaans)

Equivalent To Dieselfde as
SubClass Of Subklas van
General Class axioms Algemene Klasaksiomas
SubClass Of (Anonymous Ancestor) Subklas van (Naamlose Voorvader)
Disjoint With Disjunkte van
Disjoint Union Of Disjunkte Unie van

 

The second set of translations is for the Manchester syntax, so as to render that also in the target language. The relevant mappings for Afrikaans class description keywords are listed in the table below, which contain the final choices made by the students who developed the original plugin. For instance, min and max could have been rendered as minimum and maksimum, but the ten minste and by die meeste were deemed more readable despite being multi-word strings. Another interesting bit in the translation is negation, where there has to be a second ‘no’ since Afrikaans has double negation in this construction, so that it renders it as nie <expression> nie. That final rendering is not grammatically perfect, but (hopefully) sufficiently clear:

An attempt at double negation with a fixed string

An attempt at double negation with a fixed string

Manchester OWL Keyword Afrikaans Manchester OWL

Keyword or phrase

some sommige
only slegs
min ten minste
max by die meeste
exactly precies
and en
or of
not nie <expression> nie
SubClassOf SubklasVan
EquivalentTo DieselfdeAs
DisjointWith DisjunkteVan

 

The people involved in the translations for the object properties view for Afrikaans are Toky, my colleague Tommie Meyer (also at UCT), and myself; snyding for ‘intersection’ sounds somewhat odd to me, but the real tough one to translate was ‘SuperProperty’. Of the four options that were considered—SuperEienskap, SuperVerwantskap, SuperRelasie, and SuperVerband SuperVerwantskap was chosen with Tommie having had the final vote, which is also a semantic translation, not a literal translation.

Screenshot of the object properties description, with comparison to the English

The Spanish version also has multi-word strings, but at least does not do double negation. On the other hand, it has accents. To generate the Spanish version, myself, my collaborator Pablo Fillottrani from the Universidad Nacional del Sur, Argentina, and Toky had a go at it in translating the terms. This was then implemented with the XML file. In case you do not want to dig into the XML file and not install the plugin either, but have a quick look at the translations, they are as follows for the class description view:

 

Class Description

Label

Descripción

(in Spanish)

Equivalent To Equivalente a
SubClass Of Subclase de
General Class axioms Axiomas generales de clase
SubClass Of (Anonymous Ancestor) Subclase de (Ancestro Anónimo)
Disjoint With Disjunto con
Disjoint Union Of Unión Disjunta de
Instances Instancias

 

Manchester OWL Keyword Spanish Manchester OWL Keyword
some al menos uno
only sólo
min al mínimo
max al máximo
and y
or o
not no
exactly exactamente
SubClassOf SubclaseDe
EquivalentTo EquivalenteA
DisjointWith DisjuntoCon

And here’s a rendering of a real ontology, for geo linked data in Spanish, rather than African wildlife yet again:

screenshot of the plugin behaviour with someone else’s ontology in Spanish

One final comment remains, which has to do with the ‘simple’ mentioned above. The approach of localisation presented here works only with fixed strings, i.e., the strings do not have to change depending on the context where it is uses. It won’t work with, say, isiZulu—a highly agglutinating and inflectional language—because isiZulu doesn’t have fixed strings for the Manchester syntax keywords nor for some other labels. For instance, ‘at least one’ has seven variants for nouns in the singular, depending on the noun class of the noun of the OWL class it quantifies over; e.g., elilodwa for ‘at least one’ apple, and esisodwa for ‘at least one’ twig. Also, the conjugation of the verb for the object property depends on the noun class of the noun of the OWL class, but in this case for the one that plays the subject; e.g., it’s “eats” in English for both humans and elephants eating, say, fruit, so one string for the name of the object property, but that’s udla and idla, respectively, in isiZulu. This requires annotations of the classes with ontolex-lemon or a similar approach and a set of rules (which we have, btw) to determine what to do in which case, which requires on-the-fly modifications to Manchester syntax keywords and elements’ names or labels. And then there’s still phonological conditioning to account for. It surely can be done, but it is not as doable as with the ‘simple’ languages that have at least a disjunctive orthography and much less genders or noun classes for the nouns.

In closing, while there’s indeed more to translate in the Protégé interface in order to fully localise it, hopefully this already helps as-is either for reading at least a whole axiom in one’s language or as stepping stone to extend it further for the other terms in the Manchester syntax and the interface. Feel free to extend our open source code.

 

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. 8th Extended Semantic Web Conference (ESWC’11), G. Antoniou et al (Eds.), Heraklion, Crete, Greece, 29 May-2 June, 2011. Springer LNCS 6643, 321-335.

[3] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157. accepted version

Tutorial: OntoClean in OWL and with an OWL reasoner

The novelty surrounding all things OntoClean described here, is that we made a tutorial out of a scientific paper and used an example that is different from the (in?)famous manual example to clean up a ‘dirty’ taxonomy.

I’m assuming you have at least heard of OntoClean, which is an ontology-inspired method to examine the taxonomy of an ontology, which may be useful especially when the classes (/universals/concepts/..) have no or only a few properties or attributes declared. Based on that ontological information provided by the modeller, it will highlight violations of ontological principles in the taxonomy so that the ontologist may fix it. Its most recent overview is described in Guarino & Welty’s book chapter [1] and there are handouts and slides that show some of the intermediate steps; a 1.5-page summary is included as section 5.2.2 in my textbook [2].

Besides that paper-based description [1], there have been two attempts to get the reasoning with the meta-properties going in a way that can exploit existing technologies, which are OntOWLClean [3] and OntOWL2Clean [4]. As the names suggest, those existing and widely-used mechanisms are OWL and the DL-based reasoners for OWL, and the latter uses OWL2-specific language features (such as role chains) whereas the former does not. As it happened, some of my former students of the OE course wanted to try the OntoOWLClean approach by Welty, and, as they were with three students in the mini-project team, they also had to make their own example taxonomy, and compare the two approaches. It is their—Todii Mashoko, Siseko Neti, and Banele Matsebula’s—report and materials we—Zola Mahlaza and I—have brushed up and rearranged into a tutorial on OntoClean with OWL and a DL reasoner with accompanying OWL files for the main stages in the process.

There are the two input ontologies in OWL (the domain ontology to clean and the ‘ontoclean ontology’ that codes the rules in the TBox), an ontology for the stage after punning the taxonomy into the ABox, and one after having assigned the meta-properties, so that students can check they did the steps correctly with respect to the tutorial example and instructions. The first screenshot below shows a section of the ontology after pushing the taxonomy into the ABox and having assigned the meta-properties. The second screenshot illustrates a state after having selected, started, and run the reasoner and clicked on “explain” to obtain some justifications why the ontology is inconsistent.

section of the punned ontology where meta-properties have been assigned to each new individual.

A selection of the inconsistencies (due to violating OntoClean rules) with their respective explanations

Those explanations, like shown in the second screenshot, indicate which OntoClean rule has been violated. Among others, there’s the OntoClean rule that (1) classes that are dependent may have as subclasses only those classes that are also dependent. The ontology, however, has: i) Father is dependent, ii) Male is non-dependent, and iii) Father has as subclass Male. This subsumption violates rule (1). Indeed, not all males are fathers, so it would be, at least, the other way around (fathers are males), but it also could be remodelled in the ontology such that father is a role that a male can play.

Let us look at the second generated explanation, which is about violating another OntoClean rule: (2) sortal classes have only as subclasses classes that are also sortals. Now, the ontology has: i) Ball is a sortal, ii) Sphere is a non-sortal, and iii) Ball has as subclass Sphere. This violates rule (2). So, the hierarchy has to be updated such that Sphere is not subsumed by Ball anymore. (e.g., Ball has as shape some Sphere, though note that not all balls are spherical [notably, rugby balls are not]). More explanations of the rule violations are described in the tutorial.

Seeing that there are several possible options to change the taxonomy, there is no solution ontology. We considered creating one, but there are at least two ‘levels’ that will influence what a solution may look like: one could be based on a (minimum or not) number of changes with respect to the assigned meta-properties and another on re-examining the assigned meta-properties (and then restructuring the hierarchy). In fact, and unlike the original OntoClean example, there is at least one case where there is a meta-property assignment that would generally be considered to be wrong, even though it does show the application of the OntoClean rule correctly. How best to assign a meta-property, i.e., which one it should be, is not always easy, and the student is also encouraged to consider that aspect of the method. Some guidance on how to best modify the taxonomy—like Father is-a Male vs. Father inheres-in some Male—may be found in other sections and chapters of the textbook, among other resources.

 

p.s.: this tutorial is the result of one of the activities to improve on the OE open textbook, which are funded by the DOT4D project, as was the tool to render the axioms in DL in Protégé. A few more things are in the pipeline (TBC).

 

References

[1] Guarino, N. and Welty, C. A. (2009). An overview of OntoClean. In Staab, S. and Studer, R., editors, Handbook on Ontologies, International Handbooks on Information Systems, pages 201-220. Springer.

[2] Keet, C. M. (2018). An introduction to ontology engineering. College Publications, vol 20. 344p.

[3] Welty, C. A. (2006). OntOWLClean: Cleaning OWL ontologies with OWL. In Bennett, B. and Fellbaum, C., editors, Proceedings of the Fourth International Conference on Formal Ontology in Information Systems (FOIS 2006), Baltimore, Maryland, USA, November 9-11, 2006, volume 150 of Frontiers in Artificial Intelligence and Applications, pages 347-359. IOS Press.

[4] Glimm, B., Rudolph, S., Volker, J. (2010). Integrated metamodeling and diagnosis in OWL 2. In Peter F. Patel-Schneider, Yue Pan, Pascal Hitzler, Peter Mika, Lei Zhang, Je_ Z. Pan, Ian Horrocks, and Birte Glimm, editors, Proceedings of the 9th International Semantic Web Conference, LNCS vol 6496, pages 257-272. Springer.

DL notation plugin for Protégé 5.x

Once upon a time… the Protégé ontology development environment used Description Logic (DL) symbols and all was well—for some users at least. Then Manchester Syntax came along as the new kid on the block, using hearsay and opinion and some other authors’ preferences for an alternative rendering to the DL notation [1]. Subsequently, everyone who used Protégé was forced to deal with those new and untested keywords in the interface, like ‘some’ and ‘only’ and such, rather than the DL symbols. That had another unfortunate side-effect, being that it hampers internationalisation, for it jumbles up things rather awkwardly when your ontology vocabulary is not in English, like, say, “jirafa come only (oja or ramita)”. Even in the same English-as-first-language country, it turned out that under a controlled set-up, the DL axiom rendering in Protégé fared well in a fairly large sized experiment when compared to the Protégé interface with the sort of Manchester syntax with GUI [2], and also the OWL 2 RL rules rendering appear more positive in another (smaller) experiment [3]. Various HCI factors remain to be examined in more detail, though.

In the meantime, we didn’t fully reinstate the DL notation in Protégé in the way it was in Protégé v3.x from some 15 years ago, but with our new plugin, it will at least render the class expression in DL notation in the tool. This has the benefits that

  1. the modeller will receive immediate feedback during the authoring stage regarding a notation that may be more familiar to at least a knowledge engineer or expert modeller;
  2. it offers a natural language-independent rendering of the axioms with respect to the constructors, so that people may develop their ontology in their own language if they wish to do so, without being hampered by continuous code switching or the need for localisation; and
  3. it also may ease the transition from theory (logics) to implementation for ontology engineering novices.

Whether it needs to be integrated further among more components of the tabs and views in Protégé or other ODEs, is also a question for HCI experts to answer. The code for the DL plugin is open source, so you could extend it if you wish to do so.

The plugin itself is a jar file that can simply be dragged into the plugin folder of a Protégé installation (5.x); see the github repo for details. To illustrate it briefly, after dragging the jar file into the plugin folder, open Protégé, and add it as a view:

Then when you add some new axioms or load an ontology, select a class, and it will render all the axioms in DL notation, as shown in the following two screenshots form different ontologies:

For the sake of illustration, here’s the giraffe that eats only leaves or twigs, in the Spanish version of the African Wildlife Ontology:

The first version of the tool was developed by Michael Harrison and Larry Liu as part of their mini-project for the ontology engineering course in 2017, and it was brushed up for presentation beyond that just now by Michael Harrison (meanwhile an MSc student a CS@UCT), which was supported by a DOT4D grant to improve my textbook on ontology engineering and accompanying educational resources. We haven’t examined all possible ‘shapes’ that a class expression can take, but it definitely processes the commonly used features well. At the time of writing, we haven’t detected any errors.

p.s.: if you want your whole ontology exported at once in DL notation and to latex, for purposes of documentation generation, that is a different usage scenario and is already possible [4].

p.p.s.: if you want more DL notation, please let me know, and I’ll try to find more resources to make a v2 with more features.

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] E. Alharbi, J. Howse, G. Stapleton, A. Hamie and A. Touloumis. The efficacy of OWL and DL on user understanding of axioms and their entailments. The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudre-Mauroux, J. Sequeda, C. Lange and J. He (eds.). Springer 2017, pp20-36.

[3] M. K. Sarker, A. Krisnadhi, D. Carral and P. Hitzler, Rule-based OWL modeling with ROWLtab Protégé plugin. Proceedings of ESWC’17, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig (eds.). Springer. 2017, pp 419-433.

[4] Cogan Shimizu, Pascal Hitzler, Matthew Horridge: Rendering OWL in Description Logic Syntax. ESWC (Satellite Events) 2017. Springer LNCS. pp109-113

Orchestrating 28 logical theories of mereo(topo)logy

Parts and wholes, again. This time it’s about the logic-aspects of theories of parthood (cf. aligning different hierarchies of (part-whole) relations and make them compatible with foundational ontologies). I intended to write this post before the Ninth Conference on Knowledge Capture (K-CAP 2017), where the paper describing the new material would be presented by my co-author, Oliver Kutz. Now, afterwards, I can add that “Orchestrating a Network of Mereo(topo) logical Theories” [1] even won the Best Paper Award. The novelties, in broad strokes, are that we figured out and structured some hitherto messy and confusing state of affairs, showed that one can do more than generally assumed especially with a new logics orchestration framework, and we proposed first steps toward conflict resolution to sort out expressivity and logic limitations trade-offs. Constructing a tweet-size “tl;dr” version of the contents is not easy, and as I have as much space here on my blog as I like, it ended up to be three paragraphs here: scene-setting, solution, and a few examples to illustrate some of it.

 

Problems

As ontologists know, parthood is used widely in ontologies across most subject domains, such as biomedicine, geographic information systems, architecture, and so on. Ontology (the philosophers) offer a parthood relation that has a bunch of computationally unpleasant properties that are structured in a plethora of mereologicial and meretopological theories such that it has become hard to see the forest for the trees. This is then complicated in practice because there are multiple logics of varying expressivity (support more or less language features), with the result that only certain fragments of the mereo(topo)logical theories can be represented. However, it’s mostly not clear what can be used when, during the ontology authoring stage one may want to have all those features so as to check correctness, and it’s not easy to predict what will happen when one aligns ontologies with different fragments of mereo(topo)logy.

 

Solution

We solved these problems by specifying a structured network of theories formulated in multiple logics that are glued together by the various linking constructs of the Distributed Ontology, Model, and Specification Language (DOL). The ‘structured network of theories’-part concerns all the maximal expressible fragments of the KGEMT mereotopological theory and five of its most well-recognised sub-theories (like GEM and MT) in the seven Description Logics-based OWL species, first-order logic, and higher order logic. The ‘glued together’-part refers to relating the resultant 28 theories within DOL (in Ontohub), which is a non-trivial (understatement, unfortunately) metalanguage that has the constructors for the glue, such as enabling one to declare to merge two theories/modules represented in different logics, extending a theory (ontology) with axioms that go beyond that language without messing up the original (expressivity-restricted) ontology, and more. Further, because the annoying thing of merging two ontologies/modules can be that the merged ontology may be in a different language than the two original ones, which is very hard to predict, we have a cute proof-of-concept tool so that it assists with steps toward resolution of language feature conflicts by pinpointing profile violations.

 

Examples

The paper describes nine mechanisms with DOL and the mereotopological theories. Here I’ll start with a simple one: we have Minimal Topology (MT) partially represented in OWL 2 EL/QL in “theory8” where the connection relation (C) is just reflexive (among other axioms; see table in the paper for details). Now what if we add connection’s symmetry, which results in “theory4”? First, we do this by not harming theory8, in DOL syntax (see also the ESSLI’16 tutorial):

logic OWL2.QL
ontology theory4 =
theory8
then
ObjectProperty: C Characteristics: Symmetric %(t7)

What is the logic of theory4? Still in OWL, and if so, which species? The Owl classifier shows the result:

 

Another case is that OWL does not let one define an object property; at best, one can add domain and range axioms and the occasional ‘characteristic’ (like aforementioned symmetry), for allowing arbitrary full definitions pushes it out of the decidable fragment. One can add them, though, in a system that can handle first order logic, such as the Heterogeneous toolset (Hets); for instance, where in OWL one can add only “overlap” as a primitive relation (vocabulary element without definition), we can take such a theory and declare that definition:

logic CASL.FOL
ontology theory20 =
theory6_plus_antisym_and_WS
then %wdef
. forall x,y:Thing . O(x,y) <=> exists z:Thing (P(z,x) /\ P(z,y)) %(t21)
. forall x,y:Thing . EQ(x,y) <=> P(x,y) /\ P(y,x) %(t22)

As last example, let me illustrate the notion of the conflict resolution. Consider theory19—ground mereology, partially—that is within OWL 2 EL expressivity and theory18—also ground mereology, partially—that is within OWL 2 DL expressivity. So, they can’t be the same; the difference is that theory18 has parthood reflexive and transitive and proper parthood asymmetric and irreflexive, whereas theory19 has both parthood and proper parthood transitive. What happens if one aligns the ontologies that contain these theories, say, O1 (with theory18) and O2 (with theory19)? The Owl classifier provides easy pinpointing and tells you the profile: OWL 2 full (or: first order logic, or: beyond OWL 2 DL—top row) and why (bottom section):

Now, what can one do? The conflict resolution cannot be fully automated, because it depends on what the modeller wants or needs, but there’s enough data generated already and there are known trade-offs so that it is possible to describe the consequences:

  • Choose the O1 axioms (with irreflexivity and asymmetry on proper part of), which will make the ontology interoperable with other ontologies in OWL 2 DL, FOL or HOL.
  • Choose O2’s axioms (with transitivity on part of and proper part of), which will facilitate linking to ontologies in OWL 2 RL, 2 EL, 2 DL, FOL, and HOL.
  • Choose to keep both sets will result in an OWL 2 Full ontology that is undecidable, and it is then compatible only with FOL and HOL ontologies.

As serious final note: there’s still fun to be had on the logic side of things with countermodels and sub-networks and such, and with refining the conflict resolution to assist ontology engineers better. (or: TBC)

As less serious final note: the working title of early drafts of the paper was “DOLifying mereo(topo)logy”, but at some point we chickened out and let go of that frivolity.

 

References

[1] Keet, C.M., Kutz, O. Orchestrating a Network of Mereo(topo)logical Theories. Ninth International Conference on Knowledge Capture (K-CAP’17), Austin, Texas, USA, December 4-6, 2017. ACM Proceedings.

Part-whole relations and foundational ontologies

Part-whole relations seem like a never-ending story—and it still doesn’t bore me. In this case, the ingredients were the taxonomy of part-whole relations [1] and a couple of foundational ontologies and the aim was to link the former to the latter. But what started off with the intention to write just a short workshop note, for seemingly clear and just in need of actually doing it, turned out to be not so straightforward after all. The selected foundational ontologies were not as compatible as assumed, and creating the corresponding orchestration of OWL files was a ‘non-trivial exercise’.

What were (some of) the issues? On the one hand, there are multiple part-whole relations, which are typically named differently when they have a specific domain or range. For instance, to relate a process to a sub-process (e.g., eating involves chewing), to relate a region to a region it contains, relating portions of stuff, and so on. Those relations are fairly well established in the literature. What they do demand for, however, is clarity as to what those categories really are. For instance, with the process example, is that to be understood as Process as meant in the DOLCE ontology, or, say, Process in BFO? What if a foundational ontology does not have a category needed for a commonly used part-whole relation?

The first step to answer such questions was to assess several foundational ontologies on 1) which of the part-whole relations they have now, and which categories are present that are needed for the domain and range declarations for those common part-whole relations. I assessed that for DOLCE, BFO, GFO, SUMO, GIST, and YAMATO. This foundational ontology comparison is summarised in tables 1 and 2 in the paper that emanated from the assessment [2], entitled “A note on the compatibility of part-whole relations with foundational ontologies” that I recently presented at FOUST-II: 2nd Workshop on Foundational Ontology, Joint Ontology Workshops 2017 in Bolzano, Italy. In short: none fits perfectly for various reasons, but there are more and less suitable ontologies for a possible alignment. DOLCE and SUMO were evaluated to have the best approximations. It appeared at the workshops presentation’s Q&A session, where two of the DOLCE developers were present, that the missing Collective was an oversight, or: the ontology is incomplete and it was not an explicit design choice to exclude it. This, then, would make DOLCE the best/easiest fit.

I’ll save you the trials and tribulations creating the orchestrated OWL files. The part-whole relations, their inverses, and their proper parthood versions were manually linked to modules of DOLCE and SUMO, and automatically linked to BFO and GFO. That was an addition of 49 relations (OWL object properties) and 121 logical axioms, which were then extended further with another 11 mereotopological relations and its 16 logical axioms. These files are accessible online directly here and also listed with brief descriptions.

While there is something usable now and, by design at least, these files are reusable as well, what it also highlighted is that there are still some outstanding questions, as there already were for the top-level categories of previously aligned foundational ontologies [3]. For instance, some categories seem the same, but they’re in ‘incompatible’ parts of the taxonomy (located in disjoint branches), so then either not the same after all, or this happened unintentionally. Only GIST has been updated recently, and it may be useful if the others foundational ontologies were to be as well, so as to obtain clarity on these issues. The full interaction of part-whole relations with classical mereology is not quite clear either: there are various extensions and deviations, such as specifically for portions [4,5], but one for processes may be interesting as well. Not that such prospective theories would be usable as-is in OWL ontology development, but there are more expressive languages that start having tooling support where it could be an interesting avenue for future work. I’ll write more about the latter in an upcoming post (covering the K-CAP 2017 paper that was recently accepted).

On a last note: the Joint Ontology Workshops (JOWO 2017) was a great event. Some 100 ontologists from all over the world attended. There were good presentations, lively conversations, and it was great to meet up again with researchers I had not seen for years, finally meet people I knew only via email, and make new connections. It will not be an easy task to surpass this event next year at FOIS 2018 in Cape Town.

 

References

 

[1] Keet, C.M., Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, 2008, 3(1-2):91-110.

[2] Keet, C.M. A note on the compatibility of part-whole relations with foundational ontologies. FOUST-II: 2nd Workshop on Foundational Ontology, Joint Ontology Workshops 2017, 21-23 September 2017, Bolzano, Italy. CEUR-WS Vol. (in print)

[3] Khan, Z.C., Keet, C.M. Foundational ontology mediation in ROMULUS. Knowledge Discovery, Knowledge Engineering and Knowledge Management: IC3K 2013 Selected Papers. A. Fred et al. (Eds.). Springer CCIS vol. 454, pp. 132-152, 2015. preprint

[4] Donnelly, M., Bittner, T. Summation relations and portions of stuff. Philosophical Studies, 2009, 143, 167-185.

[5] Keet, C.M. Relating some stuff to other stuff. 20th International Conference on Knowledge Engineering and Knowledge Management (EKAW’16). Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (Eds.). Springer LNAI vol. 10024, 368-383. 19-23 November 2016, Bologna, Italy.

Relations with roles / verbalising object properties in isiZulu

The narratives can be very different for the paper “A model for verbalising relations with roles in multiple languages” that was recently accepted paper at the 20th International Conference on Knowledge Engineering and Knowledge management (EKAW’16), for the paper makes a nice smoothie of the three ingredients of language, logic, and ontology. The natural language part zooms in on isiZulu as use case (possibly losing some ontologist or logician readers), then there are the logics about mapping the Description Logic DLR’s role components with OWL (lose possible interest of the natural language researchers), and a bit of philosophy (and lose most people…). It solves some thorny issues when trying to verbalise complicated verbs that we need for knowledge-to-text natural language generation in isiZulu and some other languages (e.g., German). And it solves the matching of logic-based representations popularised in mainly UML and ORM (that typically uses a logic in the DLR family of Description Logic languages) with the more commonly used OWL. The latter is even implemented as a Protégé plugin.

Let me start with some use-cases that cause problems that need to be solved. It is well-known that natural language renderings of ontologies facilitate communication with domain experts who are expected to model and validate the represented knowledge. This is doable for English, with ACE in the lead, but it isn’t for grammatically richer languages. There, there are complications, such as conjugation of verbs, an article that may be dependent on the preposition, or a preposition may modify the noun. For instance, works for, made by, located in, and is part of are quite common names for object properties in ontologies. They all do have a dependent preposition, however, there are different verb tenses, and the latter has a copulative and noun rather than just a verb. All that goes into the object properties name in an ‘English-based ontology’ and does not really have to be processed further in ontology verbalisation other than beautification. Not so in multiple other languages. For instance, the ‘in’ of located in ends up as affixes to the noun representing the object that the other object is located in. Like, imvilophu ‘envelope’ and emvilophini ‘in the envelope’ (locative underlined). Even something straightforward like a property eats can end up having to be conjugated differently depending on who’s eating: when a human eats, it is udla in isiZulu, but for, say, a dog, it is idla (modification underlined), which is driven by the system of noun classes, of which there are 17 in isiZulu. Many more examples illustrating different issues are described in the paper. To make a long story short, there are gradations in complicating effects, from no effect where a preposition can be squeezed in with the verb in naming an OP, to phonological conditioning, to modifying the article of the noun to modifying the noun. A ‘3rd pers. sg.’ may thus be context-dependent, and notions of prepositions may modify the verb or the noun or the article of the noun, or both. For a setting other than English ontologies (e.g., Greek, German, Lithuanian), a preposition may belong neither to the verb nor to the noun, but instead to the role that the object plays in the relation described by the verb in the sentence. For instance, one obtains yomuntu, rather than the basic noun umuntu, if it plays the role of the whole in a part-whole relation like in ‘heart is part of a human’ (inhliziyo iyingxenye yomuntu).

The question then becomes how to handle such a representation that also has to include roles? This is quite common in conceptual data modelling languages and in the DLR family of DL languages, which is known in ontology as positionalism [2]. Bumping up the role to an element in the representation language—thus, in addition to the relationship—enables one to attach information to it, like whether there is a (deep) preposition associated with it, the tense, or the case. Such role-based annotations can then be used to generate the right element, like einen Betrieb ‘some company’ to adjust the article for the case it goes with in German, or ya+umuntu=yomuntu ‘of a human’, modifying the noun in the object position in the sentence.

To get this working properly, with a solid theoretical foundation, we reused a part of the conceptual modelling languages’ metamodel [3] to create a language model for such annotations, in particular regarding the attributes of the classes in the metamodel. On its own, however, it is rather isolated and not immediately useful for ontologies that we set out to be in need of verbalising. To this end, it links to the ‘OWL way of representing relations’ (ontologically: the so-called standard view), and we separate out the logic-based representation from the readings that one can generate with the structured representation of the knowledge. All in all, the simplified high-level model looks like the picture below.

Simplified diagram in UML Class Diagram notation of the main components (see paper for attributes), linking a section of the metamodel (orange; positionalist commitment) to predicates (green; standard view) and their verbalisation (yellow). (Source: [1])

Simplified diagram in UML Class Diagram notation of the main components (see paper for attributes), linking a section of the metamodel (orange; positionalist commitment) to predicates (green; standard view) and their verbalisation (yellow). (Source: [1])

That much for the conceptual part; more details are described in the paper.

Just a fluffy colourful diagram isn’t enough for a solid implementation, however. To this end, we mapped one of the logics that adhere to positionalism to one of the standard view, being DLR [4] and OWL, respectively. It equally well could have been done for other pairs of languages (e.g., with Common Logic), but these two are more popular in terms of theory and tools.

Having the conceptual and logical foundations in place, we did implement it to see whether it actually can be done and to check whether the theory was sufficient. The Protégé plugin is called iMPALA—it could be an abbreviation for ‘Model for Positionalism And Language Annotation’—that both writes all the non-OWL annotations in a separate XML file and takes care of the renderings in Protégé. It works; yay. Specifically, it handles the interaction between the OWL file, the positionalist elements, and the annotations/attributes, plus the additional feature that one can add new linguistic annotation properties, so as to cater for extensibility. Here are a few screenshots:

OWL’s arbeitetFuer ‘works for’ is linked to the relationship arbeiten.

OWL’s arbeitetFuer ‘works for’ is linked to the relationship arbeiten.

The prey role in the axiom of the impala being eaten by the ibhubesi.

The prey role in the axiom of the impala being eaten by the ibhubesi.

 Annotations of the prey role itself, which is a role in the relationship ukudla.

Annotations of the prey role itself, which is a role in the relationship ukudla.

We did test it a bit, from just the regular feature testing to the African Wildlife ontology that was translated into isiZulu (spoken in South Africa) and a people and pets ontology in ciShona (spoken in Zimbabwe). These details are available in the online supplementary material.

The next step is to tie it all together, being the verbalisation patterns for isiZulu [5,6] and the OWL ontologies to generate full sentences, correctly. This is set to happen soon (provided all the protests don’t mess up the planning too much). If you want to know more details that are not, or not clearly, in the paper, then please have a look at the project page of A Grammar engine for Nguni natural language interfaces (GeNi), or come visit EKAW16 that will be held from 21-23 November in Bologna, Italy, where I will present the paper.

 

References

[1] Keet, C.M., Chirema, T. A model for verbalising relations with roles in multiple languages. 20th International Conference on Knowledge Engineering and Knowledge Management EKAW’16). Springer LNAI, 19-23 November 2016, Bologna, Italy. (in print)

[2] Leo, J. Modeling relations. Journal of Philosophical Logic, 2008, 37:353-385.

[3] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 2015, 98:30-53.

[4] Calvanese, D., De Giacomo, G. The Description Logics Handbook: Theory, Implementation and Applications, chap. Expressive description logics, pp. 178-218. Cambridge University Press (2003).

[5] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2016, in print.

[6] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. Proceedings of the 9th International Natural Language Generation conference 2016 (INLG’16), Edinburgh, Scotland, Sept 2016. ACL, 174-183.

Bootstrapping a Runyankore CNL from an isiZulu one mostly works well

Earlier this week the 5th Workshop on Controlled Natural Language (CNL’16) was held in Aberdeen, Scotland, where I presented progress made on a Runyankore CNL [1], rather than my student, Joan Byamugisha, who did most of the work on it (she could not attend due to nasty immigration rules by the UK, not a funding issue).

“Runyankore?”, you might ask. It is one of the languages spoken in Uganda. As Runyankore is very under-resourced, any bootstrapping to take a ‘shortcut’ to develop language resources would be welcome. We have a CNL for isiZulu [2], but that is spoken in South Africa, which is a few thousand kilometres further south of Uganda, and it is in a different Guthrie zone of the—in linguistics still called—Bantu languages, so it was a bit of a gamble to see whether those results could be repurposed for Runynakore. They could, needing only minor changes.

What stayed the same were the variables, or: components to make up a grammatically correct sentence when generating a sentence within the context of OWL axioms (ALC, to be more precise). They are: the noun class of the name of the concept (each noun is assigned a noun class—there are 20 in Runyankore), the category of the concept (e.g., noun, adjective), whether the concept is atomic (named OWL class) or an OWL class expression, the quantifier used in the axiom, and the position of the concept in the axiom. The only two real differences were that for universal quantification the word for the quantifier is the same when in the singular (cf. isiZulu, where it changes for both singular or plural), and for disjointness there is only one word, ti ‘is not’ (cf. isiZulu’s negative subject concord + pronomial). Two very minor differences are that for existential quantification ‘at least one’, the ‘at least’ is in a different place in the sentence but the ‘one’ behaves exactly the same, and ‘all’ for universal quantification comes after the head noun rather than before (but is also still dependent on the noun class).

It goes without saying that the vocabulary is different, but that is a minor aspect compared to figuring out the surface realisation for an axiom. Where the bootstrapping thus came in handy was that that arduous step of investigating from scratch the natural language grammar involved in verbalising OWL axioms could be skipped and instead the ones for isiZulu could be reused. Yay. This makes it look very promising to port to other languages in the Bantu language family. (yes, I know, “one swallow does not a summer make” [some Dutch proverb], but one surely is justified to turn up one’s hope a notch regarding generalizability and transferability of results.)

Joan also conducted a user survey to ascertain which surface realisation was preferred among Runyankore speakers, implemented the algorithms, and devised a new one for the ‘hasX’ naming scheme of OWL object properties (like hasSymptom and hasChild). All these details, as well as the details of the Runyankore CNL and the bootstrapping, are described in the paper [1].

 

I cannot resist a final comment on all this. There are people who like to pull it down and trivialise natural language interfaces for African languages, on the grounds of “who cares about text in those kind of countries; we have to accommodate the illiteracy with pictures and icons and speech and such”. People are not as illiterate as is claimed here and there (including by still mentally colonised people from African countries)—if they were, then the likes of Google and Facebook and Microsoft would not invest in localising their interfaces in African languages. The term “illiterate” is used by those people to include also those who do not read/write in English (typically an/the official language of government), even though they can read and write in their local language. People who can read and write—whichever natural language it may be—are not illiterate, neither here in Africa nor anywhere else. English is not the yardstick of (il)literacy, and anyone who thinks it is should think again and reflect a bit on cultural imperialism for starters.

 

References

[1] Byamugisha, J., Keet, C.M., DeRenzi, B. Bootstrapping a Runyankore CNL from an isiZulu CNL. 5th Workshop on Controlled Natural Language (CNL’16), Springer LNAI vol. 9767, 25-36. 25-27 July 2016, Aberdeen, UK. Springer’s version

[2] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2016. DOI: 10.1007/s10579-016-9340-0 (in print) accepted version