Preliminary results on multilingual ontologies in Bantu languages

As the avid reader of this blog may remember, I wrote about isiZulu verbalization of ontologies before, which presupposed that there was some way in which the isiZulu terms were stored in the ontology, but it did not say anything about those details. In addition, with the 11 official languages in South Africa, some multilingualism may have to be catered for as well. Multilingual ontologies—be it for localization or internationalization of ontologies—is a hot topic: lots of results are becoming available and one of the linguistic models for multilingual ontologies, ontolex-lemon, is a W3C Community Group result (specs). We, being my co-author Catherine Chavula and I, now have now some first insights into that for Bantu languages, which are described in the paper Is Lemon Sufficient for Building Multilingual Ontologies for Bantu Languages? that was accepted recently at the 11th OWL: Experiences and Directions Workshop (OWLED’14), where I’ll present the paper (Riva del Garda, Italy, Oct 17-18, 2014).

The answer to the question in the title of the paper is a ‘not quite’. To justify that, we first identify the requirements for building Bantu lexica, be it in lemon format or another, with a focus in the paper on Chichewa (a language spoken widely in Malawi) and a bit on isiZulu. The Bantu noun class system is challenging, especially when taken together with verb conjugation that is necessary for the OWL object properties. Noun classes are used to group nouns together, like masculine and feminine in some languages, but then based on semantic criteria, like whether the noun refer)s to a person, an animal, a long thin object, etc. Bantu languages have somewhere between 10 and 23 noun classes and they affect word forms. This in itself requires some creativity for creating a lexicon for an ontology, but the issue is exacerbated when considering the verbs, which are used to name object properties in an OLW ontology.

The common ontology development suggestion to put a verb in 3rd person singular to name the object property, which won’t work that easily for Bantu languages, however: the noun class of the noun (of the OWL class) that plays the subject (or: the first class in, say, an all-some axiom) determines how a verb is conjugated. For instance, if a person (in noun class 1) eats something, it is udla (in isiZulu), whereas when a giraffe (in noun class 9 in isiZulu) eats something, it is idla. In lemon, this would amount to an awful lot of rules snuck into each lemon lexicon, hand-crafted for each OWL class where it applies (i.e., for those axioms in which a particular object property appears), and thus also with a lot of duplication, which is undesirable. Even when you know that the domain and range will be one OWL class (e.g., always person), the entry—using the lemon Morphology module—is non-trivial (fig 5 in the paper shows it for foaf:knows in Chichewa).
Annotating an ontology with noun classes and lemon is possible, but not immediately with an ‘out of the box’ lemon. The reason for this is that there was no linguistic resource that actually had sufficient information on the noun class system. So we had to develop a small noun class ontology so that it can be used in conjunction with other linguistic resources such as LexInfo. This is described in some detail in the paper. An example of the Chichewa nc:1 and nc:2 morphology using lemon rules is as follows:

fig3owledTo put lemon to the test with this ncs ontology, Catherine made a version of FOAF in Chichewa using lemon, and did part of the GoodRelations ontology as well (available here). The foaf:person in Chichewa entry in the lexicon, which uses lemon, LexInfo, and the ncs ontology looks like this:

fig4owledThe paper closes with some open issues that will have to be addressed to increase usability of lemon and ‘Bantu ontologies’, and we’re working on some of them (to be continued…).

The presentation of this paper and 10 other full presentations, 2 short presentations, several posters and demos, and two invited talks (by Nicola Guarino and Claudia d’Amato) are on the programme of OWLED’14. Registration is open, and I hope to see you there!


