Aligning different relations: the case of part-whole relations—LDK2017

Despite the best intentions, I did not get around to writing a post on the paper that I presented last week at the First International Conference on Language, Data and Knowledge 2017, 19-20 June, Galway, Ireland, and now Paul Groth also ‘beat’ me to it writing a nice conference report of it. On the bright side, it is an opportunity to say upfront I really enjoyed the conference and look forward to the next edition in 2019. The ESWC’17 organisers might be slightly disappointed that there was no special track on the multilingual semantic web after all, but I did get the distinct impression that the LDK17 authors might just all have gambled on LDK17—an opportunity to binge two days on all things natural language & Semantic Web—rather than on one track at an overpriced conference (despite the allure of it being A-rated).

So, what was my paper about that could have been submitted to either? I ended up struggling—and solving—an issue with aligning OWL object properties that were not simple 1:1 mappings, in a similar scope as our ESWC17 paper (introduced here) [4], but then with too many complications. Complications were due to the different conceptualisations of part-whole relations and that one of the requirements was to solve what to do with an object property (relation, relationship) that does not have a neat, single, label, and therewith neither fitting with the common OWL modelling paradigm nor with the recently agreed-upon ontolex-lemon model for linguistic annotations.

The start of all this sounded nice and doable: we need to generate natural language for healthcare, using, e.g., SNOMED CT, in local languages in South Africa, focussing on the largest one, being isiZulu. Medical terminologies are riddled with part-whole relations, so we sought to address that one (simple existentials already having been solved), availing of a standard list of part-whole relations (e.g. [1]). That turned out to be a non-trivial exercise, but doable eventually [2]. What wasn’t addressed in [2] was that some ‘common’ part-whole relations, such as membership and containment, weren’t like that in isiZulu, at all. Moreover, it wasn’t just a language issue, but ontological as well. The LDK17 paper “Representing and aligning similar relations: parts and wholes in isiZulu vs English” [3] describes this in some detail.

Here’s a (simplified) list of (assumed to be) common part-whole relations, which takes into account both transitivity differences and domain and range:

Now here’s the one based on the isiZulu language and some ontological analysis of that:

That is: there are both generalisations—some distinctions are not being made—and specialisations—some distinctions are made here but not elsewhere. For instance, ‘musician is part of some orchestra’ and ‘heart is part of some human’ (or vv.) is all done and described in the same way (ingxenye ‘part of’ and SC+CONJ for ‘has part’ [more about that below]). Yet, there is a difference between an individual (e.g., a voter) participating in some process and a collective (e.g., the electorate) participating in a process, or vv. The paper describes this more precisely, going into some detail regarding the differences in categories of domain and range and into the consequences on transitivity of mereological parthood.

The other ‘odd thing’—cf. current multilingual Semantic Web assumptions and technologies, that is—is that while the conceptualisation of ‘has part’ exists, it does not have a single label as in English (or in several other languages, such as heeft as deel), but it is dependent on the noun class of the noun of the class that play the part and play the whole in the relation. It combines the subject concord (~conjugation) of the noun class of the noun that plays the whole with a conjunction that is phonologically conditioned based on the first letter of the noun that plays the part; with verbalisation in the plural and three phonological cases, there are 18 possible strings all denoting ‘has part’. This still could be sorted with a language with inverses, provided the part-of direction has a name, like the ingxenye. This is not the case for containment, however. Instead of the relation (object property) having a name—be this a verb like ‘contained in’ or some noun phrase—it is the noun that plays the whole (the container, if you will) that gets modified. For instance, imvilophu ‘envelope’ and emvilophini denoting ‘contained in the envelope’, or, for individuals and locations, the city iTheku ‘Durban’ and eThekwini meaning ‘located in Durban’ (no typo—there’s some phonological conditioning I’m brushing over). While I have gotten used to such constructions, it generated some surprise among several attendees that one can have notions, concepts, views on or interpretations or descriptions of reality, that exist but do not have even one single string of text throughout to refer to regardless the context it is used.

The naming issue was solved by adding some arbitrary string as ‘name’ of the object property, and relating that to the function that verbalises that specific part-whole relation. The former issue, i.e., not all the same part-whole relations, required a bit more work, using ontology pattern alignments, by extending one correspondence pattern from the ODP catalogue and introducing a new one (see paper for the formal details), using the same broad framework of formalisation as proposed in [4].

All this was then implemented and aligned, and verified to not result in some unsatisfiable classes, object properties, or inconsistency (files). This also works in the isiZulu verbalisation tool we demo-ed at ESWC17 (described in the previous post) [5], all as part of the NRF-funded GeNI project.

Now, ideally, I already would have had the time to read the papers I flagged in my LDK17 conference notes with “check paper”. I haven’t yet due to end-of-semester tasks. So, on the basis of just a positive-seeming presentation, here are a few that are on the top of my list to check out first, for quite different reasons:

  • Interaction between natural language reading capabilities and math education, focusing on language production (i.e., ‘can you talk about it?’) [6], mainly because math education in South Africa faces a lot of problems. It also generated a lively discussion in the Q&A session.
  • The OnLiT ontology for linguistic [7] and LLODifying linguistic glosses [8] terminology (also: one of the two also won the best paper award).
  • Deep text generation, for it was looking at trying to address skewed or limited data to learn from [9], which is an issue we face when trying to do some NLP with most South African languages.

 

References

[1] Keet, C.M., Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, 2008, 3(1-2):91-110.

[2] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. 9th International Natural Language Generation conference (INLG’16), September 5-8, 2016, Edinburgh, UK. ACL.

[3] Keet, C.M. Representing and aligning similar relations: parts and wholes in isiZulu vs English. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 58-73.

[4] Fillottrani, P.R., Keet, C.M. Patterns for Heterogeneous TBox Mappings to Bridge Different Modelling Decisions. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017.

[5] Keet, C.M. Xakaza, M., Khumalo, L. Verbalising OWL ontologies in isiZulu with Python. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (demo paper)

[6] Crossley, S., Kostyuk, V. Letting the genie out of the lamp: using natural language processing tools to predict math performance. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 330-342.

[7] Klimek, B., McCrae, J.P., Lehmann, C., Chiarcos, C., Hellmann, S. OnLiT: and ontology for linguistic terminology. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 42-57.

[8] Chiarcos, C., Ionov, M. Rind-Pawlowski, M., Fäth, C., Wichers Schreur, J., Nevskaya. I. LLODifying linguistic glosses. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 89-103.

[9] Dethlefs N., Turner A. Deep Text Generation — Using Hierarchical Decomposition to Mitigate the Effect of Rare Data Points. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 290-298.

On heterogeneous mappings between ontologies

Representing information and knowledge often can be done in different ways even when the same representation language is used. In some cases, one way of representing it is always better than another—or: the other option is sub-optimal or plain wrong—but in other cases the distinction is not all that clear-cut. For instance, whether to represent ‘Employee’ as a subclass of ‘Person’ or that it inheres in ‘Person’. Now, if two ontologies (or conceptual models) represent it differently but they have to be aligned, then how to find such different modelling patterns and how to align them? And, taking a step back: which alternate modelling patterns are there, and why those? We sought to answer these questions, whose outcome will be presented (and appear in the proceedings of [1]) the 14th Extended Semantic Web Conference (ESWC’17) that will take place later this month in Portoroz, Slovenia.

Setting aside the formal stuff in this blog post, let’s first have a look at some of those different modelling patterns. At it’s core, there are 1) modelling practices in ontologies vs conceptual models and 2) foundational [or: top-level, or upper] ontology guidance vs being ‘compacter’ in representing the knowledge. The generalisations of the following handwaivy examples are described in more detail in the paper, but for this blog post, it hopefully will do as a teaser of the six formalised patterns. Take, e.g., the following examples that are all variations on the same theme: to-reify-or-not-to-reify, where the example in B is further dressed up with content from a foundational ontology:

Indeed, in the examples, what is shown on the left-hand side does not have the exact same information content as what is shown on the right-hand side, but the underlying conceptualization is pretty much the same. The models on the right-hand side are more precise, for one has the opportunity to specify those, like stating that a particular marriage is between two persons (so, no group marriages allowed). Whether one always needs such more precise constraints is a separate matter.

Then there’s the Employee example mentioned in this post’s introduction with two alternate ways of representing it:

That is, a modeller chooses between representing the role an object performs/has as a subclass of that object or in a separate hierarchy of roles. Foundational ontologies take the latter option, domain ontologies the former.

These examples are instantiations of small modelling patterns (of which there may be more than the six formalised in the paper). To devise mappings between them, one ends up with alignments in such a way that they are between two patterns, rather than 1:1 mappings. To get there, we had to take some preliminary steps on how to represent it all formally, such as specifying the language for a pattern and a defining an ontology pattern alignment. This allowed us to formalise the patterns and devise that formal specification of the heterogeneous alignments.

That outcome, in turn, feeds into the alignment pattern search and checking algorithms. The algorithms show that it is feasible to find those patterns automatically, which then can propose possible alignments to the modeller, and that, upon aligning, one can check whether that’s done correctly. For instance, take the following two ontologies graphically represented in an (extended, enhanced) ICOM tool:

Two inter-ontology assertions have been made, pointed out with the two yellow arrows; i.e., ‘Tennis’ is a subclass of ‘Tournament’ and ‘TennisPlayer’ is a subclass of ‘Athlete’. The pattern search algorithm then will try to find instantiations for the small modelling patterns for alignment. Once something is found—in this case, pattern A fits—it will check whether all conditions for the alignment can be satisfied, and if so, it will propose a possible alignment, which is shown in the following illustrative figure:

Of interest here is, perhaps, the ‘new’ object property being proposed, indicated with the yellow arrow, that amounts to an equivalence to the partOf+Match+played. (That threesome can’t be mapped as equivalent to ‘participated’ due to differences in domain and range axioms, and drawing three subsumption lines from ‘participated’ to ‘part of’, ‘Match’, and ‘played’ is awkward.). The algorithms’ output then thus reduces the alignment into a final question to the modeller along the line of “are you ok with the alignment between the purple elements in the two diagrams?”, and accept or reject it. Please refer to the paper for further details.

The principles presented could possibly be used also for refactoring of an ontology, like in TDD [2] or when ‘preparing’ an ontology to align to a foundational ontology. More results on this topic are in the pipeline, and if you want to know now already, we can have a chat at ESWC.

References

[1] Fillottrani, P.R., Keet, C.M. Patterns for Heterogeneous TBox Mappings to Bridge Different Modelling Decisions. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (in print)

[2] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. In: Proceedings of the 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS 9678, 642-657. 29 May – 2 June, 2016, Crete, Greece.