Despite the best intentions, I did not get around to writing a post on the paper that I presented last week at the First International Conference on Language, Data and Knowledge 2017, 19-20 June, Galway, Ireland, and now Paul Groth also ‘beat’ me to it writing a nice conference report of it. On the bright side, it is an opportunity to say upfront I really enjoyed the conference and look forward to the next edition in 2019. The ESWC’17 organisers might be slightly disappointed that there was no special track on the multilingual semantic web after all, but I did get the distinct impression that the LDK17 authors might just all have gambled on LDK17—an opportunity to binge two days on all things natural language & Semantic Web—rather than on one track at an overpriced conference (despite the allure of it being A-rated).
So, what was my paper about that could have been submitted to either? I ended up struggling—and solving—an issue with aligning OWL object properties that were not simple 1:1 mappings, in a similar scope as our ESWC17 paper (introduced here) , but then with too many complications. Complications were due to the different conceptualisations of part-whole relations and that one of the requirements was to solve what to do with an object property (relation, relationship) that does not have a neat, single, label, and therewith neither fitting with the common OWL modelling paradigm nor with the recently agreed-upon ontolex-lemon model for linguistic annotations.
The start of all this sounded nice and doable: we need to generate natural language for healthcare, using, e.g., SNOMED CT, in local languages in South Africa, focussing on the largest one, being isiZulu. Medical terminologies are riddled with part-whole relations, so we sought to address that one (simple existentials already having been solved), availing of a standard list of part-whole relations (e.g. ). That turned out to be a non-trivial exercise, but doable eventually . What wasn’t addressed in  was that some ‘common’ part-whole relations, such as membership and containment, weren’t like that in isiZulu, at all. Moreover, it wasn’t just a language issue, but ontological as well. The LDK17 paper “Representing and aligning similar relations: parts and wholes in isiZulu vs English”  describes this in some detail.
Here’s a (simplified) list of (assumed to be) common part-whole relations, which takes into account both transitivity differences and domain and range:
Now here’s the one based on the isiZulu language and some ontological analysis of that:
That is: there are both generalisations—some distinctions are not being made—and specialisations—some distinctions are made here but not elsewhere. For instance, ‘musician is part of some orchestra’ and ‘heart is part of some human’ (or vv.) is all done and described in the same way (ingxenye ‘part of’ and SC+CONJ for ‘has part’ [more about that below]). Yet, there is a difference between an individual (e.g., a voter) participating in some process and a collective (e.g., the electorate) participating in a process, or vv. The paper describes this more precisely, going into some detail regarding the differences in categories of domain and range and into the consequences on transitivity of mereological parthood.
The other ‘odd thing’—cf. current multilingual Semantic Web assumptions and technologies, that is—is that while the conceptualisation of ‘has part’ exists, it does not have a single label as in English (or in several other languages, such as heeft as deel), but it is dependent on the noun class of the noun of the class that play the part and play the whole in the relation. It combines the subject concord (~conjugation) of the noun class of the noun that plays the whole with a conjunction that is phonologically conditioned based on the first letter of the noun that plays the part; with verbalisation in the plural and three phonological cases, there are 18 possible strings all denoting ‘has part’. This still could be sorted with a language with inverses, provided the part-of direction has a name, like the ingxenye. This is not the case for containment, however. Instead of the relation (object property) having a name—be this a verb like ‘contained in’ or some noun phrase—it is the noun that plays the whole (the container, if you will) that gets modified. For instance, imvilophu ‘envelope’ and emvilophini denoting ‘contained in the envelope’, or, for individuals and locations, the city iTheku ‘Durban’ and eThekwini meaning ‘located in Durban’ (no typo—there’s some phonological conditioning I’m brushing over). While I have gotten used to such constructions, it generated some surprise among several attendees that one can have notions, concepts, views on or interpretations or descriptions of reality, that exist but do not have even one single string of text throughout to refer to regardless the context it is used.
The naming issue was solved by adding some arbitrary string as ‘name’ of the object property, and relating that to the function that verbalises that specific part-whole relation. The former issue, i.e., not all the same part-whole relations, required a bit more work, using ontology pattern alignments, by extending one correspondence pattern from the ODP catalogue and introducing a new one (see paper for the formal details), using the same broad framework of formalisation as proposed in .
All this was then implemented and aligned, and verified to not result in some unsatisfiable classes, object properties, or inconsistency (files). This also works in the isiZulu verbalisation tool we demo-ed at ESWC17 (described in the previous post) , all as part of the NRF-funded GeNI project.
Now, ideally, I already would have had the time to read the papers I flagged in my LDK17 conference notes with “check paper”. I haven’t yet due to end-of-semester tasks. So, on the basis of just a positive-seeming presentation, here are a few that are on the top of my list to check out first, for quite different reasons:
- Interaction between natural language reading capabilities and math education, focusing on language production (i.e., ‘can you talk about it?’) , mainly because math education in South Africa faces a lot of problems. It also generated a lively discussion in the Q&A session.
- The OnLiT ontology for linguistic  and LLODifying linguistic glosses  terminology (also: one of the two also won the best paper award).
- Deep text generation, for it was looking at trying to address skewed or limited data to learn from , which is an issue we face when trying to do some NLP with most South African languages.
 Keet, C.M., Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, 2008, 3(1-2):91-110.
 Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. 9th International Natural Language Generation conference (INLG’16), September 5-8, 2016, Edinburgh, UK. ACL.
 Keet, C.M. Representing and aligning similar relations: parts and wholes in isiZulu vs English. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 58-73.
 Fillottrani, P.R., Keet, C.M. Patterns for Heterogeneous TBox Mappings to Bridge Different Modelling Decisions. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017.
 Keet, C.M. Xakaza, M., Khumalo, L. Verbalising OWL ontologies in isiZulu with Python. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (demo paper)
 Crossley, S., Kostyuk, V. Letting the genie out of the lamp: using natural language processing tools to predict math performance. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 330-342.
 Klimek, B., McCrae, J.P., Lehmann, C., Chiarcos, C., Hellmann, S. OnLiT: and ontology for linguistic terminology. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 42-57.
 Chiarcos, C., Ionov, M. Rind-Pawlowski, M., Fäth, C., Wichers Schreur, J., Nevskaya. I. LLODifying linguistic glosses. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 89-103.
 Dethlefs N., Turner A. Deep Text Generation — Using Hierarchical Decomposition to Mitigate the Effect of Rare Data Points. In: Gracia J., Bond F., McCrae J., Buitelaar P., Chiarcos C., Hellmann S. (eds) Language, Data, and Knowledge LDK 2017. Springer LNAI vol 10318, 290-298.