Surprising similarities and differences in orthography across several African languages

It is well-known that natural language interfaces and tools in one’s own language are known to be useful in ICT-mediated communication. For instance, tools like spellcheckers and Web search engines, machine translation, or even just straight-forward natural language processing to at least ‘understand’ documents and find the right one with a keyword search. Most languages in Southern Africa, and those in the (linguistically called) Bantu language family, are still under-resourced, however, so this is not a trivial task due to the limited data and researched and documented grammar. Any possibility to ‘bootstrap’ theory, techniques, and tools developed for one language and to fiddle just a bit to make it work for a similar one will save many resources compared to starting from scratch time and again. Likewise, it would be very useful if both the generic and the few language-specific NLP tools for the well-resourced languages could be reused or easily adapted across languages. The question is: does that work? We know very little about whether it does. Taking one step back, then: for that bootstrapping to work well, we need to have insight into how similar the languages are. And we may be able to find that out if only we knew how to measure similarity of languages.

The most well-know qualitative way for determining some notion of similarity started with Meinhof’s noun class system [1] and the Guthrie zones. That’s interesting, but not nearly enough for computational tools. An experiment has been done for morphological analysers [2], with promising results, yet it also had more of a qualitative flavour to it.

I’m adding here another proverbial “2 cents” to it, by taking a mostly quantitative approach to it, and focusing on orthography (how things are written down) in text documents and corpora. This was a two-step process. First, 12 versions of the Universal Declaration of Human Rights were examined on tokens and their word length; second, because the UDHR is a quite small document, isiZulu corpora were examined to see whether the UDHR was a representative sample, i.e., whether extrapolation from its results may be justified. The methods, results, and discussion are described in “An assessment of orthographic similarity measures for several African languages” [3].

The really cool thing of the language comparison is that it shows clusters of languages, indicating where bootstrapping may have more or less success, and they do not quite match with Guthrie zones. The cumulative frequency distributions of the words in the UDHR of several languages spoken in Sub-Saharan Africa is shown in the figure below, where the names of the languages are those of the file names of the NLTK data kit that contains the quality translations of the UDHR.

Cumulative frequency distributions of the words in the UDHR of several languages spoken in Sub-Saharan Africa (Source: [3]).

Cumulative frequency distributions of the words in the UDHR of several languages spoken in Sub-Saharan Africa (Source: [3]).

The paper contains some statistical tests, showing that the bottom cluster are not statistically significantly different form each other, but they are from the ‘middle’ cluster. So, the word length distribution of Kiswahili is substantially different from that of, among others, isiZulu, in that it has more shorter words and isiZulu more longer words, but Kiswahili’s pattern is similar to that of Afrikaans and English. This is important for NLP, for isiZulu is known to be highly agglutinating, but English (and thus also Kiswahili) is disjunctive. How important is such a difference? The simple answer is that grammatical elements of a sentences get ‘glued’ together in isiZulu, whereas at least some of them are written as separate words in Kiswahili. This is not to be conflated with, say, German, Dutch, and Afrikaans, where nouns can be concatenated to form new words, but, e.g., a preposition is glued onto a noun. For instance, ‘of clay’ is ngobumba, contracting nga+ubumba with a vowel coalescence rule (-a + u- = -o-), which thus happens much less often in a language with disjunctive orthography. This, in turn, affects the algorithms needed to computationally process the languages, hence, the prospects for bootstrapping.

Note that middle cluster looks deceptively isolating, but it isn’t. Sesotho and Setswana are statistically significantly different from the others, in that they are even more disjunctive than English. Sepedi (top-most line) even more so. While I don’t know that language, a hypothetical example suffice to illustrate this notion. There is conjugation of verbs, like ‘works’ or trabajas or usebenza (inflection underlined), but some orthographer a while ago could have decided to write that separate from the verb stem (e.g., trabaj as and u sebenza instead), hence, generating more tokens with fewer characters.

There are other aspects of language and orthography one can ‘play’ with to analyse quantitatively, like whether words mainly end in a vowel or not, and which vowel mostly, and whether two successive vowels are acceptable for a language (for some, it isn’t). This is further described in the paper [3].

Yet, the UDHR is just one document. To examine the generalisability of these observations, we need to know whether the UDHR text is a ‘typical’ one. This was assessed in more detail by zooming in on isiZulu both quantitatively and qualitatively with four other corpora and texts in different genres. The results show that the UHDR is a typical text document orthographically, at least for the cumulative frequency distribution of the word length.

There were some other differences across the other corpora, which have to do with genre and datedness, which was observed elsewhere for whole words [4]. For instance, news items of isiZulu newspapers nowadays include words like iFacebook and EFF, which surely don’t occur in a century-old bible translation. They do violate the ‘no two successive vowels’ rule and the ‘final vowel’ rule, though.

On the qualitative side of the matter, and which will have an effect on searching for information in texts, text summarization, and error correction of spellcheckers, is, again, that agglutination. For instance, searching on imali ‘money’ alone would be woefully inadequate to find all relevant texts; e.g., those news items also include kwemali, yimali, onemali, osozimali, kwezimali, and ngezimali, which are, respectively of -, and -, that/which/who has -, of – (pl.), about/by/with/per – (pl.) money. Searching on the stem or root only is not going to help you much either, however. Take, for instance -fund-, of which the results of just two days of Isolezwe news articles is shown in the table below (articles from 2015, when there were protests, too). Depending on what comes before fund and what comes after it, it can have a different meaning, such as abafundi ‘students’ and azifundi ‘they do not learn’.

isolezwefund

Placing this is the broader NLP scope, it also affects the widely-used notion of lexical diversity, which, in its basic form, is a type-to-token ratio. Lexical diversity is used as a proxy measure for ‘difficulty’ or level of a text (the higher the more difficult), language development in humans as they grow up, second-language learning, and related topics. Letting that loose on isiZulu text, it will count abafundi, bafundi, and nabafundi as three different tokens, so wheehee, high lexical diversity, yet in English, it amounts to ‘students’, ‘students’ and ‘and the students’. Put differently, somehow we have to come up with a more meaningful notion of lexical diversity for agglutinating languages. A first attempt is made in the paper in its section 4 [3].

Thus, the last word has not been said yet about orthographic similarity, yet we now do have more insight into it. The surprising similarity of isiZulu (South Africa) with Runyankore (Uganda) was exploited in another research activity, and shown to be very amenable to bootstrapping [5], so, in its own way providing supporting evidence for bootstrapping potential that the figure above also indicated as promising.

As a final comment on the tooling side of things, I did use NLTK (Python). It worked well for basic analyses of text, but it (and similar NLP tools) will need considerable customization for the agglutinating languages.

 

References

[1] C. Meinhof. 1932. Introduction to the phonology of the Bantu languages . Dietrich Reiner/Ernst Vohsen, Johannesburg. Translated, revised and enlarged in collaboration with the author and Dr. Alice Werner by N.J. Van Warmelo.

[2] L. Pretorius and S. Bosch. Exploiting cross-linguistic similarities in Zulu and Xhosa computational morphology: Facing the challenge of a disjunctive orthography. In Proceedings of the EACL 2009 Workshop on Language Technologies for African Languages – AfLaT 2009, pages 96–103, 2009.

[3] C.M. Keet. An assessment of orthographic similarity measures for several African languages. Technical report, arxiv 1608.03065. August 2016.

[4] Ndaba, B., Suleman, H., Keet, C.M., Khumalo, L. The Effects of a Corpus on isiZulu Spellcheckers based on N-grams. IST-Africa 2016. May 11-13, 2016, Durban, South Africa.

[5] J. Byamugisha, C. M. Keet, and B. DeRenzi. Bootstrapping a Runyankore CNL from an isiZulu CNL. In B. Davis et al., editors, 5th Workshop on Controlled Natural Language (CNL’16), volume 9767 of LNAI, pages 25–36. Springer, 2016. 25-27 July 2016, Aberdeen, UK.

An orchestration of ontologies for linguistic knowledge

Starting from multilingual knowledge representation in ontologies and an eye on linguistic linked data and controlled natural languages, we had developed a basic ontology for the Bantu noun class system [1] to link with the lemon model [2]. The noun class system is alike gender in, e.g., German and Italian, but then a bit different. It is based on semantics of the nouns and each Bantu language has some 12-23 noun classes. For instance, noun classes 1 and 2 are for singular and plural humans, 9 and 10 for animals (singular and plural, respectively), 11 for inanimates and long thin objects (e.g., a telephone cable), and class 14 has abstract nouns (e.g., beauty). Each class has its own augment or augment+prefix to be added to the stem. None of the other linguistic resources, such as ISOcat or the GOLD ontology, dealt with them, so, lemon did not either, but we needed it. The first version of the ontology we introduced in [1] had its limitations, but it mostly did its job. Mostly, but not fully.

Lemon needs that morphology module and then some for the rules. The ontology did not fully satisfy Bantu languages other than Chichewa and isiZulu. With the knowledge of the latter only, it was more alike a merged conceptual data model, for it was tailored to the two specific languages. Also, it wasn’t aligned to other models or ontologies, thus hampering interoperability and reuse. We didn’t have any competency questions or cool inferences either, because our scope then was just to annotate the names of the classes in an ontology. Hence, it was time for an improvement.

Among others, we don’t want just to annotate, but, given that Bantu languages are underresourced, see what we can add to derive implicit information, which could help with tagging terms. For instance

  • if you know abantu is a plural and in noun class 2 and umuntu is the singular of it, then umuntu is in noun class 1, or
  • when it is declared that inja is in noun class 9, then so is its stem -ja (or vv), or
  • language specific, which singular (plural) noun class goes with which plural (singular) noun class: while the majority neatly has a pair of successive odd and even numbers (1-2, 3-4, 5-6 etc), this is not always the case; e.g., in isiZulu, noun class 11 does not have noun class 12 as plural, but noun class 10 (which has its own augment and prefix).

Then, besides the interoperability and reuse requirements, we’d needed to distinguish between language-specific axioms and those that hold across the language family. To solve all that, we developed a framework, reusing the pyramid structure idea from BioTop [3] and the so-called “double articulation principle” of DOGMA [4], where the language-specific axioms are at the level of DOGMA’s conceptual model, for they add specific constraints.

To make a long story short, the framework/orchestration applied to the linguistic knowledge of Bantu noun classes in general, and specific to some language, looks as follows:

framework applied to some linguistics ontologies (source: [5])

framework applied to some linguistics ontologies (source: [5])

More details are described in the recently accepted paper “An orchestration framework for linguistic task ontologies” [5], to be presented as the 9th Metadata and Semantics Research Conference (MTSR’15), to be held from 9 to 11 September in Manchester, UK. My co-author Catherine Chavula will be attending MTSR’15 and present our paper, hoping/assuming that all those last-minute things—like visa and money actually being transferred to buy that plane ticket—will be sorted this month. (Odd ‘checks and balances’ that make life harder and more expensive for people outside of a visa-free zone and tied to a funding benefactor is a topic for some other time.).

The set of ontologies (in OWL) is available in NCS1.zip from my ontologies directory. It contains the goldModule—a module extracted from the GOLD ontology for general linguistics knowledge and that is aligned to the foundational ontology SUMO—the NCS ontology, and three languages-specific axiomatizations for the noun classes, being Chichewa, isiXhosa, and isiZulu (more TBA). The same approach can be used for other linguistic features in other language groups or families; e.g., instead of the NCS, one could have knowledge represented about conjugation in the Romance languages (Italian, Spanish etc.), and then the more precise axiomatization (conceptual data model, if you will) for constraints unique to each language.

 

p.s.: Bantu languages is the term used in linguistics, so that’s why it’s used here. Elsewhere, they are also called African languages. They’re not synonymous, however, as the latter includes also other, non-Bantu, languages, as it can designate any language spoken in Africa that may have a wholly different grammar, hence, the difference linguists make to avoid misinterpretation.

 

References

[1] Chavula, C., Keet, C.M. Is Lemon Sufficient for Building Multilingual Ontologies for Bantu Languages? 11th OWL: Experiences and Directions Workshop (OWLED’14). Keet, C.M., Tamma, V. (Eds.). Riva del Garda, Italy, Oct 17-18, 2014. CEUR-WS vol. 1265, 61-72.

[2] McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., Gracia, J., Hollink, L., Montiel-Ponsoda, E., Spohr, D., Wunner, T.: Interchanging lexical resources on the Semantic Web. Language Resources & Evaluation, 2012, 46(4), 701-719

[3] Beißwanger, E., Schulz, S., Stenzhorn, H., Hahn, U.: Biotop: An upper domain ontology for the life sciences: A description of its current structure, contents and interfaces to obo ontologies. Applied Ontology, 2008, 3(4), 205-212

[4] Jarrar, M., Meersman, R.: Ontology Engineering The DOGMA Approach. In: Advances in Web Semantics I, LNCS, vol. 4891, pp. 7-34. Springer (2009)

[5] Chavula, C., Keet, C.M. An Orchestration Framework for Linguistic Task Ontologies. 9th Metadata and Semantics Research Conference (MTSR’15), Springer CCIS. 9-11 September, 2015, Manchester, UK. (in print)

Annotated list of books on (South) Africa I read last year

I mentioned in the New Year’s post that I’ve been reading up on (South) Africa to obtain some more background information than provided in the online and printed newspapers and monthlies (such as The Africa Report, with, e.g., its article on Google in Africa). The remainder of this post is an annotated list of fiction and non-fiction books, collections, and pamphlets on Africa I read in 2011, which to quite an extent had to do with availability in the nearby bookshops. Yes, I’m talking about hardcopies. Looking them up online for this post, some are out of print, and less than half are available as eBook, Kindle edition, etc. The links are to the Kalahari.com online bookstore, when available, but several are available also internationally through booksellers such as Amazon.

Suggestions for “must reads” that can help me to understand this complex country and continent are welcome!

Non-Fiction

Long Walk to Freedom by Nelson Mandela (1994, Abacus). Highly recommendable to anyone interested in the struggle and appalling situations and injustices under the Apartheid regime. It easily readable, and makes a man out of the myth. It is a personal account, and not so much an exposé of ideas (cf., e.g., Fidel’s “my life” or “la historia me absolverá”).

Terrific Majesty: the Powers of Shaka Zulu and the Limits of Historical Invention by Carolyn Hamilton (1998, Harvard University Press). After the first chapter of academese, the remaining chapters provide a highly readable and fascinating picture of the life of King Shaka as well as the agendas of the multiple narrators of those times, somewhat alike a two-layered ‘soap opera’. 

The Racist’s Guide to the People in South Africa by Simon Kilpatrick (2010, Two Dogs). Illustrates well the new term I learned here, “equal opportunity offender”, although he does it in a satirical, witty, way. For the record, I can confirm Kilpatrick’s description of the Dutch [described in the same paragraph as the Germans]: yes, I do commit the cardinal sin of wearing socks in sandals, eat liquorice and lots of cheese, don’t leave a tip if the service or food is crappy, and as a child I went many times on summer holidays in France bringing most of our food from the Netherlands (indeed, that was cheaper). But, to some extent, I still wonder how accurate and/or exaggerated some of the descriptions of the other groups are. 

The End of Poverty: Economic Possibilities for our Time by Jeffrey D. Sachs (2005, Penguin Books). Appeared to be written for people who politically lean to the right to convince them to move toward a centrist position, for Sachs’ ego as do-good-er within a capitalist framework, and serves as an appeal to the baby boomers to let go of the generational egoism so as to come off less bad (or a bit better) in history. 

Persons in Community: African Ethics in Global Culture, edited by Ronald Nicholson (2008, UKZN Press). Various essays of varying quality. Positive: Ubuntu from different perspectives and in different contexts. One can safely skip the annoying writings with Christian religious stuff, which has done more harm than good, notwithstanding the attempts at revisionary history writing.

African Renaissance (read in part), edited by Malegapuru William Makgoba (1999, Mafube Publishing). A collection of essays written in 1999 on problems and looking forward on what to do to realise a better future for South Africa and the continent. I think this will become a useful document for assessing if, and if yes how, the hopes and ideas have been realized over time. As an aside, it introduced me to the term “potted plants in green houses” that refers to certain academics in South Africa (note: they can be found in other countries as well, albeit due to different reasons).

Currently reading: Africa’s Peacemaker? Lessons from South African Conflict Mediation (currently reading), edited by Kurt Shillinger (2009, Fanele). The collection contains analyses of several conflicts in Africa, and lessons learnt of South African efforts in conflict mediation. From the parts I have read, this would have been useful to read for one of the courses of the MA in Peace & development I did a while ago.

Lined up to read: Chabal’s Africa: the Politics of Suffering and Smiling (2009, Zed Books).

Pamphlets

The following three pamphlets are from New Frank Talk, and give plenty of food for thought—not just to me, but if you do a search on it, you’ll see various sources, including news articles, discussing the topics.

Black Colonialists: the root of the trouble with Africa by Chinweizu. On post-colonial time, loathing Blacks in government who behave like their former colonialist masters.

Blacks can’t be racist by Andile Mngxitama. The thesis is that if you are not in a position of power, you cannot be racist, as one cannot act upon one’s prejudices about certain identified groups of people (if one has them); hence: ‘race’-based prejudice + power + acting upon it = racist. (Most) Blacks are not in a position of power, hence, cannot be racist, or so goes the argument in a nutshell. 

The white revolutionary as a missionary? Contemporary travels and researches in Caffraria by Heinrich Böhmke. On the ‘well-meaning left’ going to Africa to ‘help the poor and do good’ as a modern-day version of the colonialist-missionary with its negative influences.

Fiction

The Angina Monologues by Rosamund Kendal (2010, Jacana Media). One of those books you just have to finish reading quickly to see how events unfold with the characters. It describes the experiences of three South African interns in a remote hospital in South Africa and how they come to grips with that new situation and their heritage with the different situations and mores they each grew up with.

The Master’s Ruse by Patricia Schonstein (2008, African Sun Press). The author has been so friendly to me, but it was not easy finishing reading the book. Perhaps it is a good book, attested by the freedom of the reader to read in it what fits the reader (and that wasn’t pretty). 

Black Diamond by Zakes Mda (2009, Penguin Books). Criticism of recent developments in South Africa is woven into the storyline. It also claims to insert all sorts of clichés, which is harder for me to assess. Disappointing is the portrayal of most of the female story characters who all happen to have all sorts of negative character traits and behaviours, with the male lead—having fought in the struggle, but not getting his share of the money and fame to become a ‘Black Diamond’—the good guy. It reads as if it were a Bouquet-book but then for a male readership.

Can he be the one? By Lauri Kubuitsile (2010, Sapphire press). Now this is a real Bouquet-book (called Sapphire here), but then with a cast of successful Black South Africans.

Earlier

Regarding possible suggestions, I have read several fiction and non-fiction books over the years, so possible glaring omissions from the aforementioned list may have been covered already—or: if you consider reading something about (South) Africa and none of the above piqued your interest, then maybe one or more of these ones do. Some of those books are, in alphabetical order by surname of author:

I write what I like by Steve Biko (1987, Heinemann). A must read. Writings from the ‘70s, on the Black Consciousness Movement. Introduced me to the term “Whitey” and (problems with) the “White liberal left”.

Elizabeth Costello by J.M. Coetzee (2004, Vintage).

Lettera ad un consumatore del nord by centro nuovo modello di svilluppo.

Concerning violence by Frantz Fanon (part of Wretched of the earth, which, when you search a bit, is available in whole as a free pdf download). Highly recommendable.

Hacia el reino del silencio by Miguel Díaz Nápoles (2008, Pablo de la Torriente, Editorial). On Cuban doctors in Ghana.

The challenge for Africa by Wangari Maathai (2010, Arrow Books). Highly recommendable. Interesting analyses of problems, ideas and successes for self-empowerment. If you have any difficulty choosing between this and Sachs’ book, take this one.

I am an African by Ngila Michael Muendane (2006, Soultalk CC). About decolonization of the mind. A must read.

How man can die better: the life of Robert Sobukwe by Benjamin Pogrund (version of 2006, Jonathan Ball Publishers). Highly recommendable to anyone interested in the struggle and appalling situations and injustices under the Apartheid regime; Sobukwe was with the PAC.

Inside rebellion: the politics of insurgent violence by Jeremy M. Weinstein (2007, Cambridge University Press). Highly recommendable, if you’re into this topic.

As mentioned, if you have any good suggestions, please leave them in the comments or email me off-line, lest I keep on picking books fairly randomly and hoping it is worthwhile the price and reading time.  But maybe I should venture more often into the real world, instead of ‘reading this one more book to be better prepared for it’.

TAR article on Google in Africa

The The Africa Report magazine’s cover story was “Is Google good for Africa?” [1] (the online page provides only an introduction to the longer article in the print/paid edition). Google is investing in Africa, both regarding connectivity and content: if there’s no content then there’s no need to go online, and if there’s no or a very slow connection, then there won’t be enough people online to make online presence profitable. In the words of Nelson Mattos, Google’s VP for EMEA: “Our business model works only when you have enough advertisements and lots of users online, and that’s the environment we are trying to create in Africa” (p24). Gemma Ware notes that “by investing now into Africa’s internet ecosystem, Google hopes to hardwire it with tools that will make people click through its websites”, and, as she aptly puts it: they have raised the flag first.

(Picture from WhiteAfrican's blogpost on "What should Google do in Africa?" (2))

On average, there is one web domain for every 94 people in the world, but for Africa, this is 1 in 10.000. Somewhere buried on p24 and p26 of the TAR article, two reasons are given: no credit card to buy space online and a ‘.[country]’ costs more than a ‘.com’ domain. There’s no lack of creativity (e.g., the Ushahidi platform co-founded by the new head of Google’s Africa policy Ory Okolloh, and much more).

In percentages of Google hits around the world, the USA tops with 31%, then India with 8%, China with 4.2%, UK 3%, Italy 2.3%, Germany and Brazil 2.9%, Russia 2.8%, France and Spain 2%, and at the lower end of the chart South Africa with 0.7%, Algeria and Nigeria with 0.6% and Sweden with 0.5%. The other African countries are not mentioned and have a lighter colour in the diagram than the lowest given value of 0.5%. These data should have been normalized by population size, but give a rough idea nevertheless.

40% of the Google searches in Africa are through mobile internet—including mine outside the office (unlike in Italy [well, Bolzano], here in South Africa they actually do sell functioning USB/Internet keys and SIM cards to foreigners). They estimated that there were about 14 million users in Africa in 2010 (the Facebook numbers on p26 total to about 28 million), which they expect to grow to 800 million by 2015. Now that’s what you can call a growth market.

There’s no Google data centre in Africa yet, but there are caches at several ISPs, which brings to mind the filter bubble. One can ponder about whether a cache and a bubble are better than practicing one’s patience. What you might not have considered, however, is that there are apparently (i.e.: so I was told, but did not check it) Internet access packages that charge lower rates for browsing national Web content and higher rates for international content where the data has to travel through the new fibre optic cable. So the caching isn’t necessarily a bad idea.

On content generation, Google has been holding “mapping parties” to add content to Google MapMaker, which also pleased its participants, because, as quoted in the article, they didn’t like seeing a blank spot as if there’s nothing, even though clearly there are roads, villages, communities, businesses in reality. There are funded projects to digitize Nelson Mandela’s documentary archives, crowd sourcing to generate content, Google Technology User Groups, helping businesses to create websites, and many other activities. In short, according to Google’s Senegal representative Tidjane Deme: “What Google is doing in Africa is very sexy”.

One of the ‘snapshots’ in the article mentions that Google now supports 31 African languages. I had a look at http://www.google.co.za, which has localized interfaces to 5 of the 9 official African languages in South Africa (isiZulu, Sesotho, isiXhosa, Setswana, Northern Sotho). As I have only rudimentary knowledge of isiZulu only, I had a look at that one to see how the localization has been done. Aside from the direct translations, such as izithombe for images and usesho for search, there are new concoctions. Apparently there is little IT and computing vocabulary in isiZulu, so new words have to be made up, or meanings of existing ones stretched liberally. For instance, logout has become phuma ngemvume (out/exit from authorization/permission) and when clicking on izigcawu (literally: open air meeting places) you navigate to the Google groups page, which are sort of understandable. This is different for izilungiselelo (noun class 8 or 10?) that brings you to Settings in the interface. There is no such word in the dictionary, although the stem –lungiselelo (noun class 6) translates as preparations/arrangements; my dictionary translates ‘setting’ (noun) into ukubeka (verb, in back-translation it means put/place, install; bilingual dictionaries are inconsistent, I know). It’s not just that Google is “hardwir[ing] [Africa] with tools”, they are ‘soft-wiring’ by unilaterally inventing a vocabulary, it seems, which reeks of cultural imperialism.

Admitted, I have not (yet) seen much IT for African languages, other than spell checkers for all 11 official languages in South Africa that work for OpenOffice and Mozilla, a nice online isiZulu-English dictionary and conjugation, and Laurette Pretorius’ research in computational linguistics—the former was heavily funded by outside funds and the second one a hobby project by German isiZulu enthusiast Carsten Gaebler. Nevertheless, it would have been nice if there were some coordinated, participatory, effort.

Writes the article’s author, Gemma Ware: “as Google’s influence grows, Africa’s techies are aware of the urgency to stake their own territorial claim”. This awareness has yet to be transformed into more action by more people. Overall, my impression is that ICT (and the shortage of ICT professionals) already has generated the buzz of excitement where people see plenty of possibilities, which makes it a stimulating environment down here.

References

[1] Gemma Ware. Is Google good for Africa?. The Africa Report, No 32, July 2011, pp20-26.

[2] Erik Hersman (WhiteAfrican). What Should Google do in Africa? June 28, 2011.

p.s.: The article does not really answer the question whether Google is good for Africa, and I didn’t either in the blog post; that’s a topic for a later date when I know more about what’s going on here.

ICT, Africa, peace, and gender

Just in case you thought that the terms in the title are rather eclectic, or even mutually exclusive, then you are wrong. ICT4Peace is a well-known combination, likewise for other organisations and events, such as the ICT for peace symposium in the Netherlands that I wrote about earlier. ICT & development activities, e.g., by Informatici Senza Frontiere, and ICT & Africa (or here or here, among many sites) is also well-known. There is even more material for ICT & gender. But what, then, about the combination of them?

Shastry Njeru sees links between them and many possibilities to put ICT to good use in Africa to enhance peaceful societies and post-conflict reconstruction where women play a pivotal role [1]. Not that much has been realized yet; so, if you are ever short on research or implementation topics, then Njeru’s paper undoubtedly will provide you with more topics than you can handle.

So, what, then, can ICT be used for in peacebuilding, in Africa, by women? One topic that features prominently in Njeru’s paper is communication among women to share experiences, exchange information, build communities, keep in contact, have  “discussion in virtual spaces, even when physical, real world meetings are impossible on account of geographical distance or political sensitivities” and so forth, using skype, blogs and other Web 2.0 tools such as Flickr, podcasts, etc., Internet access in their own language, and voice and video to text hardware and software to record the oral histories. A more general suggestion, i.e., not necessarily related to only women or only Africa is that “ICT for peacebuilding should form the repository for documents, press releases and other information related to the peace process”.

Some examples of what has been achieved already are: the use of mobile phone networks in Zambia to advocate women’s rights, Internet access for women entrepreneurs in textile industries in Douala in Cameroon, and ICT and mobile phone businesses are used as instruments of change by rural women in various ways in Uganda [1], including the Ugandan CD-ROM project [2].

Njeru thinks that everything can be done already with existing technologies that have to be used more creatively and such that there are policies, programmes, and funds that can overcome the social, political, and economic hurdles to realise the gendered ICT for peace in Africa. Hardware, maybe yes, but surely not software.

Regarding the hardware, mobile phone usage is growing fast (some reasons why) and Samsung, Sharp and Sanyo have jumped on board already with the solar panel-fuelled mobile phones to solve the problem of (lack of reliable) energy supply. The EeePc and the one laptop per child projects and the likes are nothing new either, nor are the palm pilots that are used for OpenMRS’s electronic health records in rural areas in, among others, Kenya. But this is not my area of expertise, so I will leave it to the hardware developers for the final [yes/no] on the question if extant hardware suffices.

Regarding software, developing a repository for the documents, press releases etc. is doable with current software as well, but a usable repository requires insight into how then the interfaces have to be designed so that it suits best for the intended users and how the data should be searched; thus, overall, it may not be simply a case of deployment of software, but also involve development of new applications. Internet access, including those Web 2.0 applications, in one’s own language requires localization of the software and a good strategy on how one can coordinate and maintain such software. This is very well doable, but it is not already lying on the shelf waiting to be deployed.

More challenging will be figuring out the best way to manage all the multimedia of photos, video reports, logged skype meetings and so forth. If one does not annotate them, then they are bound to end up in a ‘write-only’ data silo. However, those reports should not be (nor have been) made to merely save them, but one also should be able to find, retrieve, and use the information contained in them. A quick-and-dirty tagging system or somewhat more sophisticated wisdom-of-the-crowds tagging methods might work in the short term, but it will not in the long run, and thereby still letting those inadequately annotated multimedia pieces getting dust. An obvious direction for a solution is to create the annotation mechanism and develop an ontology about conflict & peacebuilding, develop a software system to put the two together, develop applications to access the properly annotated material, and train the annotators. This easily can take up the time and resources of an EU FP7 Integrated Project.

Undoubtedly, observation of current practices, their limitations, and subsequent requirements analysis will bring afore more creative opportunities of usage of ICT in a peacebuilding setting targeting women as the, mostly untapped, prime user base. A quick search on ICT jobs in Africa or peacebuilding (on the UN system and its affiliated organizations, and the NGO industry) to see if the existing structures invest in this area did not show anything other than jobs at their respective headquarters such as website development, network administration, or ICT group team leader. Maybe upper management does not realise the potential, or it is seen merely as an afterthought? Or maybe more grassroots initiatives have to be set up, be successful, and then organisations will come on board and devote resources to it? Or perhaps companies and venture capital should be more daring and give it a try—mobile phone companies already make a profit and some ‘philanthropy’ does well for a company’s image anyway—and there is always the option to take away some money from the military-industrial complex.

Whose responsibility would it be (if any) to take the lead (if necessary) in such endeavours? Either way, given that investment in green technologies can be positioned as a way out of the recession, then so can it be for ICT for peace(building) aimed at women, be they in Africa or other continents where people suffer from conflicts or are in the process of reconciliation and peacebuilding. One just has to divert the focus of ICT for destruction, fear-moderation, and the likes to one of ICT for constructive engagement, aiming at inclusive technologies and those applications that facilitate development of societies and empower people.

References

[1] Shastry Njeru. (2009). Information and Communication Technology (ICT), Gender, and Peacebuilding in Africa: A Case of Missed Connections. Peace & Conflict Review, 3(2), 32-40.

[2] Huyer S and Sikoska T. (2003). Overcoming the Gender Digital Divide: Understanding the ICTs and their potential for the Empowerment of Women. United Nations International Research and Training Institute for the Advancement of Women (UN-INSTRAW), Instraw Research Paper Series No. 1., 36p.