More results on a CNL for isiZulu

Although it has been a bit quiet here on the controlled natural languages for isiZulu front, lots of new stuff is in the pipeline, and the substantially extended version of our CNL14 and RuleML14 papers [1,2] is in print for publication in the Language Resources and Evaluation journal: Toward a knowledge-to-text controlled natural language of isiZulu [1] (online at LRE as well).

For those who haven’t read the other blog post or the papers on the topic, a brief introduction: for a plethora of reasons, one would want to generate natural language sentences based on some data, information, or knowledge stored on the computer. For instance, to generate automatically weather reports in isiZulu or to browse or query ‘intelligently’ online annotated newspaper text that is guided by an ontology behind-the-scenes in the inner workings of the interface. This means ‘converting’ structured input into structured natural language sentences, which amounts to a Controlled Natural Language (CNL) that is a fragment of the full natural language. For instance, class subsumption in DL (“\sqsubseteq “) is verbalised in English as ‘is a/an’. In isiZulu, it is y- or ng- depending on the first character of the name of the superclass. So, in its simplest form, indlovu \sqsubseteq isilwane (that is, elephant \sqsubseteq animal in an ‘English ontology’) would, with the appropriate algorithm, generate the sentence (be verbalized as) indlovu yisilwane (‘elephant is an animal’).

In the CNL14 and RuleML14 papers, we looked into what could be the verbalisation patterns for subsumption, disjointness, conjunction, and simple existential quantification, we evaluated which ones were preferred, and we designed algorithms for them, as none of them could be done with a template. The paper in the LRE journal extends those works with, mainly: a treatment of verbs (OWL object properties) and their conjugation, updated/extended algorithms to deal with that, design considerations for those algorithms, walk-throughs of the algorithms, and an exploratory evaluation to assess the correctness of the algorithm (is the sentence generated [un]grammatical and [un]ambiguous?). There’s also a longer discussion section and more related works.

Conjugation of the verb in isiZulu is not as trivial as in English, where, for verbalizing knowledge represented in ontologies, one simply uses the 3rd person singular (e.g., ‘eats’) or plural (‘eat’) anywhere it appears in an axiom. In isiZulu, it is conjugated based on the noun class of the noun to which it applies. There are 17 noun classes. For instance, umuntu ‘human’ is in noun class 1, and indlovu in noun class 9. Then, when a human eats something, it is umuntu udla whereas with the elephant, it is indlovu idla. Negating it is not simply putting a ‘not’ or ‘does not’ in front of it, as is the case in English (‘does not eat’), but it has its own conjugation (called negative subject concord) again for each noun class, and modifying the final vowel; the human not eating something then becomes umuntu akadli and for the elephant indovu ayidli. This is now precisely captured in the verbalization patterns and algorithms.

Though a bit tedious and not an easy ride compared to a template-based approach, but surely doable to put in an algorithm. Meanwhile, I did implement the algorithms. I’ll readily admit it’s a scruffy Python file and you’ll have to type the function in the interpreter rather than having it already linked to an ontology, but it works, and that’s what counts. (see that flag put in the sand? 😉 ) Here’s a screenshot with a few examples, just to show that it does what it should do.

Screenshot showing the working functions for verbalising subsumption, disjointness, universal quantificaiton, existential quantification and its negation, and conjunction.

Screenshot showing the working functions for verbalising subsumption, disjointness, universal quantificaiton, existential quantification and its negation, and conjunction.

The code and other files are available from the GeNi project page. The description of the implementation, and the refinements we made along the way in doing so (e.g., filling in that ‘pluralise it’ of the algorithm), is not part of the LRE article, for we were already pushing it beyond the page limit, so I’ll describe that in a later post.

 

References

[1] Keet, C.M., Khumalo, L. Toward verbalizing logical theories in isiZulu. 4th Workshop on Controlled Natural Language (CNL’14), Davis, B, Kuhn, T, Kaljurand, K. (Eds.). Springer LNAI vol. 8625, 78-89. 20-22 August 2014, Galway, Ireland.

[2] Keet, C.M., Khumalo, L. Basics for a grammar engine to verbalize logical theories in isiZulu. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS vol. 8620, 216-225. August 18-20, 2014, Prague, Czech Republic.

[3] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2016: in print. DOI: 10.1007/s10579-016-9340-0

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s