Yes, Google Translate English-isiZulu does exist, but it has many errors (some very funny) and there’s a lot more to Natural Language Generation (NLG) than machine translation, such as natural language-based query interfaces that has some AI behind it, and they are needed, too [1]. Why should one bother with isiZulu? Muendane has his lucid opinions about that [2], and in addition to that, it is the first language of about 23% of the population of South Africa (amounting to some 10 million people), about half can speak it, and it is a Bantu language, which is spoken by nearly 300 million people—what works for isiZulu grammar may well be transferrable to its related languages. Moreover, it being in a different language family than the more well-resourced languages, it can uncover some new problems to solve for NLG, and facilitate access to online information without the hurdle of having to learn English or French first, as is the case now in Sub-Saharan Africa.
The three principal approaches for NLG are canned text, templates, and grammars. While I knew from previous efforts [3] that the template-based approach is very well doable but has its limitations, and knowing some basic isiZulu, I guessed it might not work with the template-based approach but appealing if it would (for a range of reasons), that no single template could be identified so far was the other end of the spectrum. Put differently: we had to make a start with something resembling the foundations of a grammar engine.
Langa Khumalo, with the Linguistics program and director of the University Language Planning and Development Office at the University of KwaZulu-Natal, and I have been trying to come up with isiZulu NLG. We have patterns and algorithms for (‘simple’) universal and existential quantification, subsumption, negation (class disjointness), and conjunction; or: roughly OWL 2 EL and a restricted version of ALC. OWL 2 EL fist neatly with SNOMED CT, and therewith has the potential for interactive healthcare applications with the isiZulu healthcare terminologies that are being developed at UKZN.
The first results on isiZulu NLG are described in [4,5], which was not an act of salami-slicing, but we had more results than that fitted in a single paper. The first paper [4] will appear in the proceedings ofthe 4th workshop on Controlled Natural language (CNL’14), and is about finding those patterns and, for the options available, an attempt at figuring out which one would be best. The second paper [5], which will appear in the 8th International Web Rule Symposium (RuleML’14) conference proceedings, is more about devising the algorithms to make it work and how to actually generate those sentences. Langa and I plan to attend both events, so you can ask us about the details either in Prague (18-20 Aug) or Galway (20-22 Aug) in person. In the meantime, the CRCs of the papers are online (here and here).
Regarding the technical aspects, the main reasons why we cannot get away with devising templates to generate isiZulu controlled natural language is that isiZulu is non-trivial:
- There is a whole system of noun classes: nouns are grouped in one of the 17 noun classes, each with their own peculiarities, which is illustrated in Figure 1, below;
- Agglutination, informally: putting lots of bits and pieces together to make a word. A selection of those so-called ‘concords’ is included in Figure 2, below;
- Phonological conditioned copulatives, meaning that the ‘is a’ depends on the term that comes after it (ng or y); and
- Complex verb conjugation.
![A few DL symbols, their typical verbalization options in English, and an indication of possible patterns (source: [4])](https://keet.files.wordpress.com/2014/06/zuluconsoptions.png?w=538)
A few DL symbols, their typical verbalization options in English, and an indication of possible patterns (source: [4])
If you don’t feel like reading the two papers, but still would like to have some general overview and examples, you also can check out the slides of the CS colloquium I gave last week. I managed to ‘lure in’ also ICT4D people—and then smack them with a bit of logic and algorithms—but the other option, being talking about the other paper accepted at RuleML, probably would have had to be a ‘cookie colloquium’ to get anyone to attend (more about that paper in another post—it is fascinating, but possibly of less interest to a broader audience). If you want to skip the tedious bits and just get a feel of how one of the algorithms works out: check out the example starting on slide 63, which shows the steps to go from in FOL, or
in DL (“Each professor teaches at least one course”, if the vocabulary were in English), to “Bonke oSolwazi bafundisa isifundo esisodwa”.
Clearly, a lot remains to be done.
References
[1] Alberts, R., Fogwill, T., Keet, C.M. Several Required OWL Features for Indigenous Knowledge Management Systems. 7th Workshop on OWL: Experiences and Directions (OWLED’12). 27-28 May, Heraklion, Crete, Greece. CEUR-WS Vol-849. 12p
[2] Muendane, N.M. I am an African. 2006, Soultalk CC.
[3] Jarrar, M., Keet, C.M., Dongilli, P. Multilingual verbalization of ORM conceptual models and axiomatized ontologies. STARLab Technical Report, Vrije Universiteit Brussels, Belgium. February 2006.
[4] Keet, C.M., Khumalo, L. Toward verbalizing logical theories in isiZulu. 4th Workshop on Controlled Natural Language (CNL’14), 20-22 August 2014, Galway, Ireland. Springer LNAI. (in press)
[5] Keet, C.M., Khumalo, L. Basics for a grammar engine to verbalize logical theories in isiZulu. 8th International Web Rule Symposium (RuleML’14), August 18-20, 2014, Prague, Czech Republic. Springer LNCS (in press).
Hi Maria,
As you know we are here strongly devoted to Prolog and it seems to me that a FOL approach to NLG could benefit from tweaking the Prolog grammar machinery (if needed) to implement the isizulu particularities.
It does not look, however, that we have to mess with Prolog at a low level for we can implement it with normal Prolog constraint programs.
Should you be interested in developing this idea just contact me.
Yours
David
Hi David,
Thanks for your interest, and yes, we’d surely be intested in developing the idea of implementing it, be it prolog (should be possible with the constraints) or some other language. The main constraint would be that it should be able to handle OWL files (one of the application scenarios is with SNOMED CT, which you may find of interest as well).
Regards,
Maria
Pingback: Enjoyable and interesting controlled natural languages workshop (CNL’14) | Keet blog
Pingback: Preliminary results on multilingual ontologies in Bantu languages | Keet blog
Pingback: First steps for isiZulu natural language genera...
Pingback: First steps for isiZulu natural language genera...
Pingback: More results on a CNL for isiZulu | Keet blog