Launch of the isiZulu spellchecker

launchspellchecker

Langa Khumalo, ULPDO director, giving the spellchecker demo, pointing out a detected spelling error in the text. On his left, Mpho Monareng, CEO of PanSALB.

Yesterday, the isiZulu spellchecker was launched at UKZN’s “Launch of the UKZN isiZulu Books and Human Language Technologies” event, which was also featured on 702 live radio, SABC 2 Morning Live, and e-news during the day. What we at UCT have to do with it is that both the theory and the spellchecker tool were developed in-house by members of the Department of Computer Science at UCT. The connection with UKZN’s University Language Planning & Development Office is that we used a section of their isiZulu National Corpus (INC) [1] to train the spellchecker with, and that they wanted a spellchecker (the latter came first).

The theory behind the spellchecker was described briefly in an earlier post and it has been presented at IST-Africa 2016 [2]. Basically, we don’t use a wordlist + rules-based approach as some experiments of 20 years ago did, nor a wordlist + a few rules of the now-defunct translate.org.za OpenOffice v3 plugin seven years ago, but a data-driven approach with a statistical language model that uses tri-grams. The section of the INC we used were novels and news items, so, including present-day isiZulu texts. At the time of the IST-Africa’16 paper, based on Balone Ndaba’s BSc CS honours project, the spell checking was very proof-of-concept, but it showed that it could be done and still achieve a good enough accuracy. We used that approach to create an enduser-usable isiZulu spellchecker, which saw the light of day thanks to our 3rd-year CS@UCT student Norman Pilusa, who both developed the front-end and optimised the backend so that it has an excellent performance.

Upon starting the platform-independent isiZulu_spellchecker.jar file, the English interface version looks like this:

zuspellopen

You can write text in the text box, or open a txt or docx file, which then is displayed in the textbox. Click “Run”. Now there are two options: you can choose to step-through the words that are detected as misspelled one at a time or “Show All” words that are detected as misspelled. Both are shown for some sample text in the screenshot below.

zuspellonessection

processing one error at a time

zuspellallsection

highlighting all words detected as very probably misspelled

Then it is up to you to choose what to do with it: correct it in the textbox, “Ignore once”, “Ignore all”, or “Add” the word to your (local) dictionary. If you have modified the text, you can save it with the changes made by clicking “Save correction”. You also can switch the interface from the default English to isiZulu by clicking “File – Use English”, and back to English via “iFayela – ulimi lesingisi”. You can download the isiZulu spellchecker from the ULPDO website and from the GitHub repository for those who want to get their hands on the source code.

To anticipate some possible questions you may have: incorporating it as a plugin to Microsoft word, OpenOffice/LibreOffice, and Mozilla Firefox was in the planning. The former is technologically ‘closed source’, however, and the latter two have a certain way of doing spellchecking that is not amenable to the data-driven approach with the trigrams. So, for now, it is a standalone tool. By design, it is desktop-based rather than for mobile phones, because according to the client (ULPDO@UKZN), they expect the first users to be professionals with admin documents and emails, journalists writing articles, and such, writing on PCs and laptops.

There was also a trade-off between a particular sort of error: the tool now flags more words as probably incorrect than it could have, yet it will detect (a subset of) capitalization, correctly, such as KwaZulu-Natal whilst flagging some of the deviant spellings that go around, as shown in the screenshot below.

zuspellkznThe customer preferred recognising such capitalisation.

Error correction sounds like an obvious feature as well, but that will require a bit more work, not just technologically, but also the underlying theory. It will probably be an honours project topic for next year.

In the grand scheme of things, the current v1 of the spellchecker is only a small step—yet, many such small steps in succession will get one far eventually.

The launch itself saw an impressive line-up of speeches and introductions: the keynote address was given by Dr Zweli Mkhize, UKZN Chancellor and member of the ANC NEC; Prof Ramesh Krishnamurthy, from Aston University UK, gave the opening address; Mpho Monareng, CEO of PanSALB gave an address and co-launched the human language technologies; UKZN’s VC Andre van Jaarsveld provided the official welcome; and two of UKZN’s DVCs, Prof Renuka Vithal and Prof Cheryl Potgieter, gave presentations. Besides our ‘5-minutes of fame’ with the isiZulu spellchecker, the event also launched the isiZulu National Corpus, the isiZulu Term Bank, the ZuluLex mobile-compatible application (Android and iPhone), and two isiZulu books on collected short stories and an English-isiZulu architecture glossary.

 

References

[1] Khumalo, L. Advances in developing corpora in African languages. Kuwala, 2015, 1(2): 21-30.

[2] Ndaba, B., Suleman, H., Keet, C.M., Khumalo, L. The Effects of a Corpus on isiZulu Spellcheckers based on N-grams. IST-Africa 2016. May 11-13, 2016, Durban, South Africa.

Advertisements