If you have read any of the blog posts on (automated) natural language generation for isiZulu, then you’ll probably agree with me that isiZulu verbs are non-trivial. True, verbs in other languages are most likely not as easy as in English, or Afrikaans for that matter (e.g., they made irregular verbs regular), but there are many little ‘bits and pieces’ ‘glued’ onto the verb root that make it semantically a ‘heavy’ element in a sentence. For instance:
- Aba-shana ba-ya-zi-theng-is-el-an-a izimpahla
- Children 2.SC-Pres-8.OC-buyVR -C-A-R-FV 8.clothes
- ‘The children are selling the clothes to each other’
The ba is the subject concord (~conjugation) to match with the noun class (which is 2) of the noun that plays the subject in the sentence (abashana), the ya denotes a continuous action (‘are doing something’ in the present), the zi is the object concord for the noun class (8) of the noun that plays the object in the sentence (izimpahla), theng is the verb root, then comes the CARP extension with is the causative (turning ‘buy’ into ‘sell’), and el the applicative and an the reciprocative, which take care of the ‘to each other’, and then finally the final vowel a.
More precisely, the general basic structure of the verb is as follows:
where NEG is the negative; SC the subject concord; T/A denotes tense/aspect; MOD the mood; OC the object concord; Verb Rad the verb radical; C the causative; A the applicative; R the reciprocal; and P the passive. For instance, if the children were not selling the clothes to each other, then instead of the SC, there would be the NEG SC in that position, making the verb abayazithengiselana.
To make sense of all this in a way that it would be amenable to computation, we—my co-author Langa Khumalo and I—specified the grammar of the complex verb for the present tense in a CFG using an incremental process of development. To the best of our (and the reviewer’s) knowledge, the outcome of the lengthy exercise is (1) the first comprehensive and precisely formulated documentation of the grammar rules for the isiZulu verb present tense, (2) all together in one place (cf. fragments sprinkled around in different papers, Wikipedia, and outdated literature (Doke in 1927 and 1935)), and (3) goes well beyond handling just one of the CARP, among others. The figure below summarises those rules, which are explained in detail in the forthcoming paper “Grammar rules for the isiZulu complex verb”, which will be published in the Southern African Linguistics and Applied Language Studies  (finally in print, yay!).
It is one thing to write these rules down on paper, and another to verify whether they’re actually doing what they’re supposed to be doing. Instead of fallible and laborious manual checking, we put them in JFLAP (for the lack of a better alternative at the time; discussed in the paper) and tested the CFG both on generation and recognition. The tests went reasonably well, and it helped fixing a rule during the testing phase.
Because the CFG doesn’t take into account phonological conditioning for the vowels, it generates strings not in the language. Such phonological conditioning is considered to be a post-processing step and was beyond the scope of elucidating and specifying the rules themselves. There are other causes of overgeneration that we did not get around to doing, for various reasons: there are rules that go across the verb root, which are simple to represent in coding-style notation (see paper) but not so much in a CFG, and rules for different types of verbs, but there’s no available resource that lists which verb roots are intransitive, which as monosyllabic and so on. We have started with scoping rules and solving issues for the latter, and do have a subset of phonological conditioning rules; so, to be continued… For now, though, we have completed at least one of the milestones.
Last, but not least, in case you wonder what’s the use of all this besides the linguistics to satisfy one’s curiosity and investigate and document an underresourced language: natural language generation for intelligent user interfaces in localised software, spellcheckers, and grammar checkers, among others.
 Keet, C.M., Khumalo, L. Grammar rules for the isiZulu complex verb. Southern African Linguistics and Applied Language Studies, (in print). Submitted version (the rules are the same as in the final version)
Pingback: ICTs for the South African indigenous languages should be a national imperative, too | Keet blog