ICTs for South Africa’s indigenous languages should be a national imperative, too

South Africa has 11 official languages with English as the language of business, as decided during the post-Apartheid negotiations. In practice, that decision has resulted in the other 10 being sidelined, which holds even more so for the nine indigenous languages, as they were already underresourced. This trend runs counter to the citizens’ constitutional rights and the state’s obligations, as she “must take practical and positive measures to elevate the status and advance the use of these languages” (Section 6 (2)). But the obligations go beyond just language promotion. Take, e.g., the right to have access to the public health system: one study showed that only 6% of patient-doctor consultations was held in the patient’s home language[1], with the other 94% essentially not receiving the quality care they deserve due to language barriers[2].

Learning 3-4 languages up to practical multilingualism is obviously a step toward achieving effective communication, which therewith reduces divisions in society, which in turn fosters cohesion-building and inclusion, and may contribute to achieve redress of the injustices of the past. This route does tick multiple boxes of the aims presented in the National Development Plan 2030. How to achieve all that is another matter. Moreover, just learning a language is not enough if there’s no infrastructure to support it. For instance, what’s the point of searching the Web in, say, isiXhosa when there are only a few online documents in isiXhosa and the search engine algorithms can’t process the words properly anyway, hence, not returning the results you’re looking for? Where are the spellcheckers to assist writing emails, school essays, or news articles? Can’t the language barrier in healthcare be bridged by on-the-fly machine translation for any pair of languages, rather than using the Mobile Translate MD system that is based on canned text (i.e., a small set of manually translated sentences)?


Rule-based approaches to develop tools

Research is being carried out to devise Human Language Technologies (HLTs) to answer such questions and contribute to realizing those aspects of the NDP. This is not simply a case of copying-and-pasting tools for the more widely-spoken languages. For instance, even just automatically generating the plural noun in isiZulu from a noun in the singular required a new approach that combined syntax (how it is written) with semantics (the meaning) through inclusion of the noun class system in the algorithms[3] [summary]. In contrast, for English, just syntax-based rules can do the job[4] (more precisely: regular expressions in a Perl script). Rule-based approaches are also preferred for morphological analysers for the regional languages[5], which split each word into its constituent parts, and for natural language generation (NLG). An NLG system generates natural language text from structured data, information, or knowledge, such as data in spreadsheets. A simple way of realizing that is to use templates where the software slots in the values given by the data. This is not possible for isiZulu, because the sentence constituents are context-dependent, of which the idea is illustrated in Figure 1[6].

Figure 1. Illustration of a template for the ‘all-some’ axiom type of a logical theory (structured knowledge) and some values that are slotted in, such as Professors, resp. oSolwazi, and eat, resp. adla and zidla; ‘nc’ denotes the noun class of the noun, which governs agreement across related words in a sentence. The four sample sentences in English and isiZulu represent the same information.

Therefore, a grammar engine is needed to generate even the most basic sentences correctly. The core aspects of the workflow in the grammar engine [summary] are presented schematically in Figure 2[7], which is being extended with more precise details of the verbs as a context-free grammar [summary][8]. Such NLG could contribute to, e.g., automatically generating patient discharge notes in one’s own language, text-based weather forecasts, or online language learning exercises.

Figure 2. The isiZulu grammar engine for knowledge-to-text consists conceptually of three components: the verbalisation patterns with their algorithms to generate natural language for a selection of axiom types, a way of representing the knowledge in a structured manner, and the linking of the two to realize the generation of the sentences on-the-fly. It has been implemented in Python and Owlready.


Data-driven approaches that use lots of text

The rules-based approach is known to be resource-intensive. Therefore, and in combination with the recent Big Data hype, data-driven approaches with lost of text are on the rise: it offers the hope to achieve more with less effort, not even having to learn the language, and easier bootstrapping of tools for related languages. This can work, provided one has a lot of good quality text (a corpus). Corpora are being developed, such as the isiZulu National Corpus[9], and the recently established South African Centre for Digital Language Resources (SADiLaR) aims to pool the resources. We investigated the effects of a corpus on the quality of an isiZulu spellchecker [summary], which showed that learning the statistics-driven language model on old texts like the bible does not transfer well to modern-day texts such as news items, nor vice versa[10]. The spellchecker has about 90% accuracy in single-word error detection and it seems to contribute to the intellectualisation[11] of isiZulu [summary][12]. Its algorithms use trigrams and probabilities of their occurrence in the corpus to compute the probability that a word is spelled correctly, illustrated in Figure 3, rather than a dictionary-based approach that is impractical for agglutinating languages. The algorithms were reused for isiXhosa simply by feeding it a small isiXhosa corpus: it achieved about 80% accuracy already even without optimisations.

Figure 3. Illustration of the underlying approach of the isiZulu spellchecker

Data-driven approaches are also pursued in information retrieval to, e.g., develop search engines for isiZulu and isiXhosa[13]. Algorithms for data-driven machine translation (MT), on the other hand, can easily be misled by out-of-domain training data of parallel sentences in both languages from which it has to learn the patterns, such as such as concordial agreement like izi- zi- (see Figure 1). In one of our experiments where the MT system learned from software localization texts, an isiXhosa sentence in the context of health care, Le nto ayiqhelekanga kodwa ngokwenene iyenzeka ‘This is not very common, but certainly happens.’ came out as ‘The file is not valid but cannot be deleted.’, which is just wrong. We are currently creating a domain-specific parallel corpus to improve the MT quality that, it is hoped, will eventually replace the afore-mentioned Mobile Translate MD system. It remains to be seen whether such a data-driven MT or an NLG approach, or a combination thereof, may eventually further alleviate the language barriers in healthcare.


Because of the ubiquity of ICTs in all of society in South Africa, HLTs for the indigenous languages have become a necessity, be it for human-human or human-computer interaction. Profit-driven multinationals such as Google, Facebook, and Microsoft put resources into development of HLTs for African languages already. Languages, and the identities and cultures intertwined with them, are a national resource, however; hence, suggesting the need for more research and the creation of a substantial public good of a wide range of HLTs to assist people in the use of their language in the digital age and to contribute to effective communication in society.

Water scarcity in Cape Town—not the first city and the new levies are fine

I very well remember the water shortage in Lima when I was there in mid 1996-early 1997, so I already played with the idea of writing a post on the water shortage in Cape Town with the aim to scare some people here what it’s like when the water runs out and to provide some more suggestions to save water or stay clean with less water. For instance, you can greatly reduce the need for washing your hair if you shave it off, or let it grow and put it in one or more braids. Buy more underwear now if you can, because fishing them out of the unwashed laundry basket for reuse is gross (whether to wear them again inside-out or not). A difference then there in Lima with now in Cape Town, is that a lot of people relied on bottled water to drink already, whereas Cape Town water is (still) potable, so there is no real bottled-water logistics here, at least not to the same extent. On logistics: ‘Day Zero’ is the day when the taps run dry entirely and some 200 water distribution points will be the only source of water for 4 million people in the city. The current ‘Day Zero’ estimate is around April 22, give or take a few days depending on the scenario. There’s a nice app by Piotr Wolski where you can run through alternative scenarios to estimate Day Zero.

What pushed me to write a post is one nuisance about the claim that Cape Town is the ‘first major city in the world’ that faces this problem [1]—she isn’t!—and a real annoyance with a recent UCT News article by Kevin Winter on the planned water levy [2]. (The latter indirectly also relates to another irritation I have, in that it is mainly the affluent in leafy suburbia who keep using water excessively and think that somehow running out of water is not going to affect them (lifetime of being privileged, living in a bubble and all that)). So let me discuss both.

Cape Town is not the first major city with a water shortage

Searching now for information on the water shortage in Lima back then is out-crowded on the search engines by the fact that that 8-million (!) huge city has been having so many year-on-year shortages (e.g., a picture of a water distribution point in 2016). Lima is expected to become the first major city to become uninhabitable due to the persistent water crisis; e.g., see what it is like when people are running around to find a litre or two. This is due to less precipitation in the mountains (back then at least) and receding glaciers on the climate side of the issues, and more people in the city, economics, and socio-political issues as the human and systemic dimension. This has worsened over the past decades, where Ioris also lists receding aquifers and degraded catchments, which is blamed on human factors, notably uncontrolled mining, over-abstraction, and untreated effluents [3].

Water rationing back then in late 1996 was by quarter: the poorer quarters got cut off earlier than the richer ones. I rented a room in a lower middle income quarter (Pueblo Libre), and recall the taps running dry (and thus also no flushing the toilet—forget about washing clothes) first at 9pm to come back on in the early morning, then 6pm, then 3pm, 12noon, 9am, and some days nothing; the service came and went. No hay agua. When I didn’t have water anymore at home, the richer quarters still had some, as had the International Potato Center (CIP) where I did a research project on sweet potato. Initially, some people were pretending to play sports at the CIP during lunch hour so that they could have a shower afterward without losing face. At some point when the water outages became more prevalent in most quarters, pretences fell. At a later point, there was no water coming out of those showers there anymore either, nor was there water in the labs anymore, even though the CIP is located in the affluent La Molina district. So yes, eventually also the relatively rich had to do without water, in a society that has already an unequal distribution of resources. The number of foul-smelling people increased. The upside of not being clean yourself either is that then at least you don’t smell the stench of others anymore. This is just one anecdote of what I observed and experienced. Check out Ioris’s paper on Water scarcity and the exclusionary city: the struggle for water justice in Lima, Peru [3] for results from qualitative empirical research from 2009-2013. In short: it’s gotten worse and the problems have become more complex. Relevant for the next section is also its Table 2, which lists data of 12 municipalities: lowest income Villa El Salvador has an average household income of 881.8 PEN, 14.2 cubic metre water p.p., 27.2 water tariff, and they spend 3.1% of their income on water (highest is 4.1%, in low-income Chaclacayo), whereas the figures for highest-income San Isidro are 8303.4 PEN, 29.6 p.p., 68.6 water tariff, and 0.8% of their income is spent on water [3]. Or: the richest use most water and relatively pay the least. 2011 figures state 15.2 litres p.p./day water use by people in Lurigancho-Chosica and 447.5 litres in San Isidro, on average, and large disparities in the cost of water based on one’s socio-economic status [4].

So, Cape Town most definitely is not the first major city with water scarcity issues, nor the first one with a socio-economic and political dimension to it. Apparently, the ‘driest capital’ claim goes to Cairo, with Lima coming in second [5]. As a final note in this section, some 4 billion people have to put up with water scarcity for at least one month per year; more precisely, 71% or 4.3 billion people [6]. Sure, the Western Cape region is firmly in the red in the figures there, but, also, the “Regions with moderate to severe water scarcity during more than half of the year include northern Mexico and parts of the western United States, parts of Argentina and northern Chile, North Africa and Somalia, Southern Africa, the Middle East, Pakistan, and Australia” and “[h]igh water scarcity levels appear to prevail in areas with either high population density (for example, Greater London area) or the presence of much irrigated agriculture (High Plains in the United States), or both (India, eastern China, Nile delta)” [6].

Comments on polemics and facts on Cape Town’s water shortage

I chatted the other day with someone who’s in the water business here in Cape Town and asked his opinion on the shortage. The answer was “It’s very, very, very, bad… And I’m an optimist!”. You probably can find a few articles online that claim it’s all exaggerated; Olivier busts some of those claims [7] and the GroundUp water crisis articles also provide ample investigative news reporting on the dire state of affairs. Fact is, the dam levels on 29 December were at 31.4% and the rainy season will start only in late April (hopefully) or in May and, as mentioned before, Day Zero is expected to be in the second half of April, if everything continues as ‘business as usual’.

In another attempt to change the current ‘business as usual’, a new “level 6” water restrictions came into effect yesterday. Another item may be implemented on February 1, if the Cape Town government gets its way, being an extra water tax proportional to the value of the property where the raised revenue will be used to fund water augmentation schemes. It is this that Winter is whining about in the UCT news article [2]. He’s not the only one: there’s also an ‘Organisation Undoing Tax Abuse’ that claims they’ll sue the national government for it if it goes ahead. Winter speculates that the increase is more because of falling revenue due to lower water use, but he provides no evidence that the extra revenue will be used for something else than the reason stated by the municipality.

Winter argues that residents who have invested in water saving devices are “punished” with the levy and, rather, one should “use the opportunity to encourage this water to be shared with others in need, at a marginal cost”. In other words: the few with sufficient money to spare who invested in such measures (e.g., installing a borehole) should be offered a way to get a quicker return on investment. Trying to make money from an impending disaster. He continues “Non-potable, fit-for-purpose water can be used for flushing toilets, irrigating gardens, topping up swimming pools,…” WTF? He’s still fine with topping up swimming pools, caring about gardens? You shouldn’t have that much excess water from various uses to begin with. If one were to actually use the targeted 87 litres/day or, preferably, less, you will not have excess water. I surely don’t and I’m at about 45 litres/day[1]. Take a short shower instead of diving into the pool and if you swim for exercise then go running or to the gym for a change. Get your garden sorted out with indigenous plants that are already drought-resistant, so you don’t have to water the plants in the first place (or let them die—if it’s cleaning yourself vs. saving imported plants, please let the plants die now and redesign your garden next year).

Then, “it is also time to rethink expensive centralised schemes and the role of local government control in the distribution of water”. Sure, everyone is allowed to have an opinion, but a libertarian anti-government stance is surely not going to help. Read up on Lima as example. Winter goes on “The drought levy appears to send the wrong message. It fails to incentivise local initiatives that will enable access to a local water supply. Neighbourhood-scale water supplies offer a promising alternative.”. The policy target is to reduce water usage across the board. Thus, it makes total sense that the Cape Town government would not want to incentivise such local initiatives, for Winter’s proposal amounts to incentivising water wasters wasting more water by allowing them to make money of the excess water they’re using. The water wasters must reduce their water use so that also they will not have any excess water anymore.

He goes on fantasising if decentralised water systems would work, as if that could be implemented now-now. There’s not even a rough calculation whether such a scenario even might work, let alone if so, how much it will contribute to easing the pressures on the water demand and how much it would cost to implement it all. So, Winter’s proposal is just braai-talk at best, in the most generous and favourable reading of the piece. Oh, and if the reader did not get the message yet: “Property owners will then become the responsible owners of these systems”. So that the rich can become richer by exploiting the poor even more, also when it comes to the very basic necessity for life! Those property owners—the ones who’ll have to pay most levy in particular—aren’t particularly responsible now, neither on the water usage nor on a less unfair wealth distribution, so I don’t see why I should believe Winter’s word that they then would suddenly become “responsible” sharing citizens in his decentralised water system. If ‘leafy suburbia’ were only to have their bubble punctured and would have reduced their water usage substantially already[2], we’d have (had?) sufficient water to make it to the rainy season before Day Zero would be upon us.

You could say, ‘but I know x and y in leafy suburbia who are saving water and they’re nice people’, and I do save too, but I’m willing to pay my bit of the levy, hoping that it hits home that everyone will have to start taking it seriously, and have it invested it in those water augmentation measures for the public good, including more distribution points at least. Do the math: 200 distribution points for 4 million people is 20000 people/day, that’s 4.3 seconds per person/day (24/7 service) that you can tap water to carry home, or worse. Other investment measures should help as well, such as desalination plants under construction, but it is not clear at all whether that will be enough [8].

If the electricity load shedding policy is anything to go by—everyone affected equally—then so will, or should, the upcoming ‘water shedding’ affect everyone equally. Fact is that domestic users are the biggest water users [9], with 15% by business and industry, and only 4.7% by informal settlements in 2015/2016 (checked), and the rich leafy suburbs the most (open data). I would suggest to the Cape Town government that they should turn off the water for a bit to educate the privileged class in the leafy suburbs to teach them the hard way what it will be like if they keep using too much water. They also could take up Lima’s costing plan for water, as is already done with property tax in Cape Town, i.e., charge more for water in those suburbs with higher property values.

Property values are a reasonable indicator of affluence. So if residents in the leafy suburbs keep on using too much, then they surely should cough up at least some of the money for the water augmentation schemes. Not reducing water usage now and not coughing up money for bad behaviour over the past months is bound to result in some unpleasant situations. Have a look again at the video clip from Lima if you must—people at night running on the streets with a bucket, desperate to find a litre. Not to mention the class and race tensions that are already being amplified with the water crisis; the ‘leafy suburbs’ were designated white areas under Apartheid, and not much of the demographics nor of the wealth distribution has changed since then. Little has been written about that, but talk on the ground is going around.


There are issues that can be traced to politics at the local, provincial, and national levels [10]. However, in the context of the article, let me point out that the city and the province has a majority Democratic Alliance—a political party way on the right-end of the political spectrum (very capitalist etc.)—and even they don’t propose the water-spending and money-making-scheme-for-the-rich that Winter proposes. They at least get their facts, run through scenarios, and they probably can count how many votes they’d lose if the majority of the people in Cape Town would have to pay the predominantly rich water wasters to access second-grade water to survive. Having such a politically slanted opinion piece on UCT News is an embarrassment, to say the least, and is counter-productive for managing the water crisis in a manner that will make everyone get through this.



A few refreshing feminist articles—to point out and fix bugs in the game

Most articles on gender issues and feminism regurgitate the same old story and arguments, or are reports on more data and experiments with similar results popping up. Some articles or blog posts do bring something relatively new to the table, or apply a feminist analysis to something else, or explain things in a novel way that resonates better in this day and age. Upfront, to those who think gender issues and feminism is mostly rubbish, please read the parable by John Scalzi about the computer game, which is set at the lowest difficulty setting in the Game of the Real World for the Straight White Male; then read ‘those feminazi articles’ as one of pointing out bugs in the code, and of suggesting bug fixes or of a slight rewriting of the game logic to level the playing field. So, here are a few links to some such articles that otherwise may be snowed-under by the online articles on women in STEM, IT, management etc.

The feminist appraisal of Dirty Dancing over at Jezebel’s blog, or, as another one puts it “It’s the feminist sleeper agent of chick flicks” (and some class issues); after reading this, you won’t see the movie the same anymore. (Yes, I did watch the movie again, and the points made in the articles are valid, which, honestly, had escaped me when I watched it in the 80s.)

The many shortcomings of (old) white men futurology, who have a rather limited set of imaginations (fantasies?) in prognosticating. Maybe people in that (non-STEM) discipline already know about the issues and limitations, but I’m in another field of research, so it was new to me. Obviously, if futurology is a science, then it should not make a difference whether men or women do it, but that’s another discussion.

The Super-exploitation of women by Marlene Dixon on capitalism and patriarchy in cahoots to keep women as their unpaid servants and labour-producers wives. I did search for more recent analyses, but they don’t compare in content and clarity to this one.

I did not manage to find again the recent fine rant on feminist issues in Africa that are, at least in part, different from ‘the [white middle-class] feminism in the West’, but these will do on scope as well: feminism here on the continent driven by African women who really do have lots of agency (e.g., all the way up to presidents/prime ministers and Nobel Peace Prize winners) and where certain types of ‘help’ from the outside is counterproductive for it enforces dependence. An example of a currently hot topic here (and, afaik, never was in Europe) is the need for free sanitary pads for girls whose family cannot afford them, so that they can keep going to school to learn rather than miss out on it for a few days each month.

Finally, a slightly crudely formulated article that discusses a whining “pick-up artist” who is “cockblocked by redistribution” in Denmark, a socialist-like and feminist-friendly country. Squeezed between the chatter are notes on flaws on evolutionary psychology and the criticism on feminism as an individual pursuit (e.g., ‘lean in’) versus as a collective goal. Even the pick-up artist eventually notes “we can’t fulfill basic human rights for all without viewing everyone as equal”.

Reblogging 2011: Essay on the Nonviolent Personality

From the “10 years of keetblog – reblogging: 2011”: of the general interest ones, this was most definitely the one that has taken up most time—not to write the post, but what it talks about: it reports on the Italian->English translation of a booklet “The nonviolent personality”, which took over 2 years to complete. Giuliano Pontara, whom I had the pleasure to finally meet in person in Stockholm last October, wrote the original in Italian. 

Essay on the Nonviolent Personality; March 3


La personalità nonviolenta—the nonviolent personality—is the title of a booklet I stumbled upon in a bookshop in Trento in spring 2004 whilst being in the city for my internship at the Laboratory for Applied Ontology. The title immediately raises the question: what, then, actually does constitute a nonviolent personality? The author of the booklet, Giuliano Pontara, since recently an emeritus of Philosophy at the University of Stockholm, aims to contribute to answer this question that certainly does not have simple answer.

The booklet itself is out of print (having been published in 1996) and, moreover, written in Italian, which most people in the world cannot understand. However, in my opinion at least, Pontara’s proposed answer certainly deserves a wider audience, contemplation, and further investigation. So I set out to translate it into English and put it online for free. That took a while to accomplish, and the last year was certainly the most interesting one with multiple email exchanges with Giuliano Pontara about the finer details of the semantics of the words and sentences in both the Italian original and the English translation. Now here it is: The Nonviolent Personality [pdf, 1.7MB] (low bandwidth version [pdf, 287KB]).

So what is it about? Here is the new back flap summary of the booklet:

At the beginning of the new century, the culture of peace finds itself facing many and difficult challenges. This booklet surveys some of these challenges and the characteristics that a mature culture of peace should have in order to respond to them. Particularly, it investigates what type of person is more apt to be a carrier of such a mature culture of peace: the nonviolent personality. Finally, it addresses the question regarding the factors that in the educative process tend to impede and favour, respectively, the development of moral subjects equipped with a nonviolent personality.

The original Italian version was written by Giuliano Pontara, emeritus of Philosophy at the University of Stockholm, and published in 1996, but its message is certainly not outdated and perhaps even more important in the current climate. Why this is so, and why it is useful to have a more widely accessible version of the booklet available, is motivated in the introduction by Maria Keet, Senior Lecturer at the University of KwaZulu-Natal.

A slightly longer description

The first chapter of the book, having been written in the mid-nineties, discusses the then-current political situation in the world. Pontara describes the post-Cold War situation, touches upon separatism, nationalism, fundamentalism, exploitation and totalitarian capitalism (among other challenges). This includes the “cow-boy ethics” and the return of the Nazi mentality, the latter not being about Arian supremacy, but the glorification of force (and violence in general) and contempt for ‘the weak’, the might-is-right adagio, and that cow-boy ethics has been elevated to prime principle of conducting international politics. You can analyse and decide yourself if the shoe fits for a particular country’s culture and politics, be it then or now. After this rather gloomy first chapter, the first step toward a positive outlook is described in Chapter 2, which looks at several basic features of a mature culture of peace.

The core of the booklet is Chapter 3, which commences with listing ten characteristics of a nonviolent personality:

  • Rejection of violence
  • The capability to identify violence
  • The capability to have empathy
  • Refusal of authority
  • Trust in others
  • The disposition to communicate
  • Mildness
  • Courage
  • Self-sacrifice
  • Patience

These characteristics are discussed in detail in the remainder of the chapter. Read it if you want to know what is meant with these characteristics, and why.

Chapter 4 considers education at school, at home, and through other influences (such as the TV), describing both the problems in the present systems and what can to be done to change it. For example, educating students to develop a critical moral conscience, analyse, and to be able to think for oneself (as opposed to rote-learning in a degree-factory), not taking a dualistic approach but facilitating creative constructive solutions instead, and creating an atmosphere that prevents the numbing of conscience, the weakness of the senses, consumerism, and conformism of the mass-media. Also some suggestions for class activities are suggested, but note that education does not end there: it is a continuous process in life.

The English writing style may not be perfect (the spelling and grammar checkers do not complain though); either way, it tries to strike a balance between the writing style of the original and readability of the English text. And no, the translation was not done with Google translate or a similar feature, but manually and there are some notes on the translation at the end of the new booklet. Other changes or additions compared to the Italian original are the new foreword by Pontara and introduction by me, an index, bibliography in alphabetical order and several Italian translations in the original have been substituted with the original English reference, and there are biographical sketches. I did the editing and typesetting in Latex, so it looks nice and presentable.

Last, but not least:
Reblogging 2010: South African women on leadership in science, technology and innovation

From the “10 years of keetblog – reblogging: 2010”: while the post’s data are from 5 years ago, there’s still room for improvement. That said, it’s not nearly as bad as in some other countries, like the Netherlands (though the university near my home town improved from 1.6% to 5% women professors over the past 5 years). As for the places I worked post-PhD, the percent female academics with full time permanent contract: FUB-KRDB group 0% (still now), UKZN-CS-Westville: 12.5% (me; 0% now), UCT-CS: 42%.

South African Women on leadership in science, technology and innovation; August 13, 2010


Today I participated in the Annual NACI symposium on the leadership roles of women in science, technology and innovation in Pretoria, which was organized by the National Advisory Council on Innovation, which I will report on further below. As preparation for the symposium, I searched a bit to consult the latest statistics and see if there are any ‘hot topics’ or ‘new approaches’ to improve the situation.

General statistics and their (limited) analyses

The Netherlands used to be at the bottom end of the country league tables on women professors (from my time as elected representative in the university council at Wageningen University, I remember a UN table from ’94 or ‘95 where the Netherlands was third last from all countries). It has not improved much over the years. From Myklebust’s news item [1], I sourced the statistics to Monitor Women Professors 2009 [2] (carried out by SoFoKleS, the Dutch social fund for the knowledge sector): less than 12% of the full professors in the Netherlands are women, with the Universities of Leiden, Amsterdam, and Nijmegen leading the national league table and the testosterone bastion Eindhoven University of Technology closing the ranks with a mere 1.6% (2 out of 127 professors are women). With the baby boom generation lingering on clogging the pipeline since a while, the average percentage increase has been about 0.5% a year—way too low to come even near the EU Lisbon Agreement Recommendation’s target of 25% by 2010, or even the Dutch target of 15%, but this large cohort will retire soon, and, in terms of the report authors, makes for a golden opportunity to move toward gender equality more quickly. The report also has come up with a “Glass Ceiling Index” (GCI, the percentage of women in job category X-1 divided by the percentage of women in job category X) and, implicitly, an “elevator” index for men in academia. In addition to the hard data to back up the claim that the pipeline is leaking at all stages, they note it varies greatly across disciplines (see Table 6.3 of the report): in science, the most severe blockage is from PhD to assistant professor, in Agriculture, Technology, Economics, and Social Sciences it is the step from assistant to associate professor, and for Law, Language & Culture, and ‘miscellaneous’, the biggest hurdle is from associate to full professor. From all GCIs, the highest GCI (2.7) is in Technology in the promotion from assistant to associate professor, whereas there is almost parity at that stage in Language & Culture (GCI of 1.1, the lowest value anywhere in Table 6.3).

“When you’re left out of the club, you know it. When you’re in the club, you don’t see what the problem is.” Prof. Jacqui True, University of Auckland [4]

Elsewhere in ‘the West’, statistics can look better (see, e.g., The American Association of University Professors (AAUP) survey on women 2004-05), or are not great either (UK, see [3], but the numbers are a bit outdated). However, one can wonder about the meaning of such statistics. Take, for instance, the NYT article on a poll about paper rights vs. realities carried out by The Pew Research in 22 countries [4]: in France, some 100% paid their lip service to being in favour of equal rights, yet 75% also said that men had a better life. It is only in Mexico (56%), Indonesia (55%) and Russia (52%) that the people who were surveyed said that women and men have achieved a comparable quality of life. But note that the latter statement is not the same as gender equality. And equal rights and opportunities by law does not magically automatically imply the operational structures are non-discriminatory and an adequate reflection of the composition of society.

A table that has generated much attention and questions over the years—but, as far as I know, no conclusive answers—is the one published in Science Magazine [5] (see figure below). Why is it the case that there are relatively much more women physics professors in countries like Hungary, Portugal, the Philippines and Italy than in, say, Japan, USA, UK, and Germany? Recent guessing for the answer (see blog comments) are as varied as the anecdotes mentioned in the paper.

Physics professors in several countries (Source: 5).

Barinaga’s [5] collection of anecdotes of several influential factors across cultures include: a country’s level of economic development (longer established science propagates the highly patriarchal society of previous centuries), the status of science there (e.g., low and ‘therefore’ open to women), class structure (pecking order: rich men, rich women, poor men, poor women vs. gender structure rich men, poor men, rich women, poor women), educational system (science and mathematics compulsory subjects at school, all-girls schools), and the presence or absence of support systems for combining work and family life (integrated society and/or socialist vs. ‘Protestant work ethic’), but the anecdotes “cannot purport to support any particular conclusion or viewpoint”. It also notes that “Social attitudes and policies toward child care, flexible work schedules, and the role of men in families dramatically color women’s experiences in science”. More details on statistics of women in science in Latin America can be found in [6] and [7], which look a lot better than those of Europe.

Barbie the computer engineer

Bonder, in her analysis for Latin America [7], has an interesting table (cuadro 4) on the changing landscape for trying to improve the situation: data is one thing, but how to struggle, which approaches, advertisements, and policies have been, can, or should be used to increase women participation in science and technology? Her list is certainly more enlightening than the lame “We need more TV shows with women forensic and other scientists. We need female doctor and scientist dolls.” (says Lotte Bailyn, a professor at MIT) or “Across the developed world, academia and industry are trying, together or individually, to lure women into technical professions with mentoring programs, science camps and child care.” [8] that only very partially addresses the issues described in [5]. Bonder notes shifts in approaches from focusing only on women/girls to both sexes, from change in attitude to change in structure, from change of women (taking men as the norm) to change in power structures, from focusing on formal opportunities to targeting to change the real opportunities in discriminatory structures, from making visible non-traditional role models to making visible the values, interests, and perspectives of women, and from the simplistic gender dimension to the broader articulation of gender with race, class, and ethnicity.

The NACI symposium

The organizers of the Annual NACI symposium on the leadership roles of women in science, technology and innovation provided several flyers and booklets with data about women and men in academia and industry, so let us start with those. Page 24 of Facing the facts: Women’s participation in Science, Engineering and Technology [9] shows the figures for women by occupation: 19% full professor, 30% associate professor, 40% senior lecturer, 51% lecturer, and 56% junior lecturer, which are in a race distribution of 19% African, 7% Coloured, 4% Indian, and 70% White. The high percentage of women participation (compared to, say, the Netherlands, as mentioned above) is somewhat overshadowed by the statistics on research output among South African women (p29, p31): female publishing scientists are just over 30% and women contributed only 25% of all article outputs. That low percentage clearly has to do with the lopsided distribution of women on the lower end of the scale, with many junior lecturers who conduct much less research because they have a disproportionate heavy teaching load (a recurring topic during the breakout session). Concerning distribution of grant holders in 2005, in the Natural & agricultural sciences, about 24% of the total grants (211 out of 872) have been awarded to women and in engineering & technology it is 11% (24 out of 209 grants) (p38). However, in Natural & agricultural sciences, women make up 19% and in engineering and technology 3%, which, taken together with the grant percentages, show there is a disproportionate amount of women obtaining grants in recent years. This leads one to suggest that the ones that actually do make it to the advanced research stage are at least equally as good, if not better, than their male counterparts. Last year, women researchers (PIs) received more than half of the grants and more than half of the available funds (table in the ppt presentation of Maharaj, which will be made available online soon).

Mrs Naledi Pandor, the Minister for Science and Technology, held the opening speech of the event, which was a good and entertaining presentation. She talked about the lack of qualified PhD supervisors to open more PhD positions, where the latter is desired so as to move to the post-industrial, knowledge-based economy, which, in theory at least, should make it easier for women to participate than in an industrial economy. She also mentioned that one should not look at just the numbers, but instead at the institutional landscape so as to increase opportunities for women. Last, she summarized the “principles and good practice guidelines for enhancing the participation of women in the SET sector”, which are threefold: (1) sectoral policy guidelines, such as gender mainstreaming, transparent recruiting policies, and health and safety at the workplace, (2) workplace guidelines, such as flexible working arrangements, remuneration equality, mentoring, and improving communication lines, and (3) re-entry into the Science, Engineering and Technology (SET) environment, such as catch-up courses, financing fellowships, and remaining in contact during a career break.

Dr. Thema, former director of international cooperation at the Department of Science and Technology added the issues of the excessive focus on administrative practicalities, the apartheid legacy and frozen demographics, and noted that where there is no women’s empowerment, this is in violation of the constitution. My apologies if I have written her name and details wrongly: she was a last-minute replacement for Prof. Immaculada Garcia Fernández, department of computer science at the University of Malaga, Spain. Garcia Fernández did make available her slides, which focused on international perspectives on women leadership in STI. Among many points, she notes that the working conditions for researchers “should aim to provide… both women and men researchers to combine work and family, children and career” and “Particular attention should be paid, to flexible working hours, part-time working, tele-working and sabbatical leave, as well as to the necessary financial and administrative provisions governing such arrangements”. She poses the question “The choice between family and profession, is that a gender issue?”

Dr. Romilla Maharaj, executive director for human and institutional capacity development at the National Research Foundation came with much data from the same booklet I mentioned in the first paragraph, but little qualitative analysis of this data (there is some qualitative information). She wants to move from the notion of “incentives” for women to “compensation”. The aim is to increase the number of PhDs five-fold by 2018 (currently the rate is about 1200 each year), which is not going to be easy (recollect the comment by the Minister, above). Concerning policies targeted at women participation, they appear to be successful for white women only (in postdoc bursaries, white women even outnumber white men). In my opinion, this smells more of a class/race structure issue than a gender issue, as mentioned above and in [5]. Last, the focus of improvements, according to Maharaj, should be on institutional improvements. However, during the break-out session in the afternoon, which she chaired, she seemed to be selectively deaf on this issue. The problem statement for the discussion was the low research output by women scientists compared to men, and how to resolve that. Many participants reiterated the lack of research time due to the disproportionate heavy teaching load (compared to men) and what is known as ‘death by committee’, and the disproportionate amount of (junior) lecturers who are counted in the statistics as scientists but, in praxis, do not do (or very little) research, thereby pulling down the overall statistics for women’s research output. Another participant wanted to se a further breakdown of the numbers by age group, as the suspicion was that it is old white men who produce most papers (who teach less, have more funds, supervise more PhD students etc.) (UPDATE 13-10-’10: I found some data that seems to support this). In addition, someone pointed out that counting publications is one thing, but considering their impact (by citations) is another one and for which no data was available, so that a recommendation was made to investigate this further as well (and to set up a gender research institute, which apparently does not yet exist in South Africa). The pay-per-publication scheme implemented at some universities could thus backfire for women (who require the time and funds to do research in the first place so as to get at least a chance to publish good papers). Maharaj’s own summary of the break-out session was an “I see, you want more funds”, but that does not rhyme fully with he institutional change she mentioned earlier nor with the multi-faceted problems raised during the break-out session that did reveal institutional hurdles.

Prof. Catherine Odora Hoppers, DST/NRF South African Research Chair in Development Education (among many things), gave an excellent speech with provoking statements (or: calling a spade a spade). She noted that going into SET means entering an arena of bad practice and intolerance; to fix that, one first has to understand how bad culture reproduces itself. The problem is not the access, she said, but the terms and conditions. In addition, and as several other speakers already had alluded to as well, she noted that one has to deal with the ghosts of the past. She put this in a wider context of the history of science with the value system it propagates (Francis Bacon, my one-line summary of the lengthy quote: science as a means to conquer nature so that man can master and control it), and the ethics of SET: SET outcomes have, and have had, some dark results, where she used the examples of the atom bomb, gas chambers, how SET was abused by the white male belittling the native and that it has been used against the majority of people in South Africa, and climate change. She sees the need for a “broader SET”, meaning ethical, and, (in my shorthand notation) with social responsibility and sustainability as essential components. She is putting this into practice by stimulating transdisciplinary research at her research group, and, at least and as a first step: people from different disciplines must to be able to talk to each other and understand each other.

To me, as an outsider, it was very interesting to hear what the current state of affairs is regarding women in SET in South Africa. While there were complaints, there we also suggestions for solutions, and it was clear from the data available that some improvements have been made over the years, albeit only in certain pockets. More people registered for the symposium than places available, and with some 120 attendees from academia and industry at all stages of the respective career paths, it was a stimulating mix of input that I hope will further improve the situation on the ground.


Yes, the protests reduce productivity of academics as well…

…and no, we’re not worried that we won’t get our bonus this year because as academics we don’t get any bonuses anyway. Just to answer two recent ‘interesting’ questions in these times of nation-wide student protests in South Africa. With everything that’s been going on here, writing a report on attending the 34th International Conference on Conceptual Modelling (ER’15) ended up lower on the list of activities, and by now it’s almost a month ago, so I’ll let that slip by, despite that it was great and deserves attention. At the time I was in Stockholm for ER’15 and afterward a week at FUB in Bolzano (Italy), nation-wide coordinated student protests were going on, and still are albeit with fewer participants. As most people who heard of it at ER, in Bolzano, and collaborators only saw a brief international news item of the violence—police using stun grenades, rubber bullets—and assumed they were some typical run-of-the-mill student protests that happen also in other countries: I think this one is different from others, and more complex. Fundamentally, the protests are about the (mostly) young generation expressing that post-apartheid South Africa hasn’t improved nearly enough—neither the societal nor the educational nor the economic dimension—and demanding a better deal. So, here’s a coloured version of some of it, mainly intended for a non-South African readership to get a bit of an idea what’s going on and put some figures into perspective w.r.t. what I assume most of you are more familiar with. I could try to put up the pretence of objectivity, but I’m probably not. Some useful sources are news24, for quick short updates of events as they unfold, and Groundup, for some in-depth articles.


Main concrete issues

Over the past years, government funding of universities has been diminishing, with the shortfall being made up by yearly fees increases, which is an unsustainable financial model and it increasingly excludes more and more qualifying students to study at a university, especially since the student financial aid scheme hasn’t kept up and the fees increases are higher than inflation rate and wage increases that are 4-7% per year. The scheduled 10% for next year was the last straw. After the first week of protests, they managed to get a commitment from Zuma on Oct 23 for 0% fees increase for next year. While this is more than we achieved back in the ’90 in the Netherlands when we were protesting against fees increases (among other things), at that time, anyone who qualified still could get just about sufficient funds to attend university for 5 years to get a (Bachelors +) Masters degree (without it, I probably wouldn’t have gone to the university either). The latter is not the case here, not even close: the scholarship (‘studiebeurs’ in NL) then there amounts to about 100000 Rand a year here now, then with the average monthly salary of 17000K gross, that’s about half a parent’s net income/year for one year of study. But the average wage is not the kind of amount that leaves extras for saving. Apparently, for a nuclear-family household, one needs a sustained income of at least 500000/year to have enough to save over the years to pay for going to university—yes, at least twice the ‘jan modaal’/average income to be able to afford it. With South Africa having a shameful Gini coefficient of 0.71, go figure how many are in that category.

This was only the first core demand. Here, as in many universities across the world, there has been a drive for outsourcing of certain types of work—cleaning, garden maintenance and the like—in a drive for pushing down overhead costs. This might have looked good on the balance sheets at the time when the decisions were made, but the ‘collateral damage’ was that the outsourced workers did not get the benefits anymore that they had as employees of the university. Notably, the fee rebate for themselves and their family members. So, this is a double whammy for workers, making it even harder for their kids to go to university, for having to pay the full fee and for generally being on the really low pay scales that make attending university totally unaffordable and out of reach. At various points in or at the end of the second week of the protests, several universities (including UCT) committed to insourcing: when the contracts with the outsourcing companies terminate, they’ll become university employees again, with the fee rebate benefits.

That’s not all. A dastardly practice that cash-strapped universities resort to in a desperate attempt to get the unpaid or only partially paid fees from students (down to the last cent), is that when students still have outstanding fees to pay, they won’t get their final exam results and won’t be allowed to graduate. But that having-completed-the-degree-but-no-parchment-to-show-for limbo is precisely preventing students to get decent-paying jobs, or even a job at all, making it harder to pay up the remaining debt; double whammy here as well. Hence, the demand of clearing such historical debt, or at least to let them graduate, so they can get a job and start paying back soon (2? 3? 5? years) thereafter. The latter is quite common in other countries, including the country where I studied. (Had they not have that pay-back-later system, many a door would have remained closed to me as well (I had to borrow money for 4 months because of delays due to a serious sports injury near the end of my studies—after the 5 years funded, see above)). This issue is mostly still unresolved in South Africa. To relate to elsewhere: there’s many a sob story about graduates in the USA with “crippling college debts”, but what’s really crippling for one’s career is being stuck with the debt but not having the proof of the degree even though you satisfied its requirements. There’s some 25-30% unemployment rate in South Africa, and a degree paper really does make a difference.


Fair play to them, and I hope they achieve the demands. I would be very hypocritical if I were to not support them, as I have benefited from those things they want to have, and I wish that all countries would have the system we had back in the 1990s. True, I was then at one of the fronts of protests against the breakdown of it, and what we had certainly was not perfect. However, compared to what it has descended into in the Netherlands and other EU countries, and the lamentable state of the funding systems (well: the lack thereof) in most countries of the world, it almost sounds like an education paradise nowadays: finish highest level of (fee-free) secondary school, sign up for a degree at a university of your preference[1], get enough funding for 5 years that covers fees, books, living expenses, and free public transport (condition: >=25% courses passed/year). It should be at least like that, if not better, everywhere.


Other issues intersecting with it

It is not just about access to higher education, though. Once in, there’s still the so-called ‘legacy of apartheid’ to put up with, which many a student wants to see changed. This sneer-quoted term surely includes the racism, which is, perhaps, the only thing non-SA readers from my generation and older may think of. Perhaps less obvious are the issues of the “dead white men”-infested curricula, especially in the humanities, or, to phrase it positively: how to change a Euro-centric curriculum to one that is more relevant to Africa? There are notable African writers, philosophers, etc etc., but they don’t feature much now.

There’s the oppressive space and naming of buildings, with the #RhodesMustFall movement but one instance of trying to change this (tl;dr: Cecil Rhodes was an über-badass among the badass colonisers, yet having a statue in a central place on campus, which has been removed earlier this year).

Government funding post-1994 has focussed primarily on making the lives of the poorest-of-the-poor less hard, by building houses, working on providing potable water, electricity, and the like. Poor students somehow were not allowed to complain, for having the privilege of going to university. However, really scraping by is hard. That’s not of the type ‘just about enough’ I mentioned above, where we could afford cheap food, clothing, and housing—the basic necessities in Maslov’s pyramid. For instance, at the university I worked before (UKZN), a call to employees was put out in exam time at the end of the year to donate money so that the destitute students would be able to get a meal/day in exam time, as the alternative for them was no food at all. It was also not unusual that students were locked out of residence for not having paid (an unlocked lecture hall serving as make-shift sleeping place). The current protests created a space where such hardships were allowed to be voiced.

Then there’s the crazy police violence. It was not part of the original narrative for the protests, but it has become part of it. Universities here have a tendency to call in the police when there are protests. Once they’re in, they take over. Unpredictable horses and ‘refreshing’ water cannons is one thing (I know of those), and even tear gas (experienced that too), but rubber bullets (!) and the (wtf!) stun grenades, that’s of a yet different level of dastardliness. To add insult to injury, the police spokesperson even declared to be proud/satisfied that the police had acted with restraint. Compared to the massacre that Marikana was (police killed 34 strikers), I guess so, yes, but that certainly ought not to be the yardstick to measure up against. Although there are reports that some more recent protests did not remain peaceful from the protester-side, they were in the early days when the police provoked with the violence. On a related note: I heard that during the protests, academics on the frontline couldn’t stop the police from charging, but a ‘buffer’ of white students could make them hesitate at least. I’ll leave that fact for you to chew on.

This is not all, but, for now, it’ll have to do for this item, lest the blog post ends up way to long.


On the academics side

On the whole, I have the impression that the majority of academics have been supportive of the initial students demands, if not from day one then in hindsight. There have been supportive open letters signed by lots of academics, and a bunch joined in the protests. I cannot recall many supportive statements explicitly from staff/academics unions, however, but this may also be due to news reporting, or perhaps there’s room for a more progressive union. Some are pushed out of their comfort zone and feel it’s a bit scary but ok actually, other desperately want to remain in their comfy bubble and are afraid. Some academics are yelled at for being just too melanin-deficient that they could not possibly support the cause (even when they actually do), but are perceived to be part of the problem; this kind of over-generalising isn’t the way to get more academics on board to support the students’ cause. There’s the term coconut (black on the outside, white on the inside); what would the reverse be? The ‘schoolkrijt’ liquorice sweets they sell at Pick ‘n Pay (white on the outside, some brown-ish mixture on the inside)? Or, better, just human.

UCT was closed for two weeks due to the protests, which was a management decision that most academics did not like. Not for disruption of the daily routine, but for the notion of closing that space where ideas are posed, discussed, analysed, debated, contested, and possibly some solutions found.

It is not at all clear whether admin staff and academics will have to cough up the shortfall due to government’s insufficient compensation of the 0% and the insourcing, so there may be an aftermath match there. The tl;dr of many articles: education is a public good, not an individualist benefit, so society should pay, and a university is not a corporation.

At the same time, we’re devising a range of scenarios to cope with changing situations (like how to handle exam disruption), inform students, adjust things (e.g., rescheduling of revision lectures, the content of the actual exam papers, setting an extra exam) and so on. This takes time away from research and from other activities academics do. Which brings me back to the post’s title: yes, our work is affected in that we don’t get as much done as we usually do, and things slip through (deadline missed, belated response to a student query). In the grand scheme of things, they are minor compared to your (from abroad) typesetted-paper-chasing/article-review-invitation/…, and I hope you can bear with the occasional slight delay in my response (for the benefit of SA).

[1] provided you chose the right exam subjects—e.g., to study computer science, you need maths, to study physics you needed physics as subject in your high school exam—and with only medicine, physio, and dentistry were numerus fixus.

Reblogging 2009: Building bias into your database

From the “10 years of keetblog – reblogging: 2009”: The tl;dr of it: bad data management -> bad policy decisions, and how you can embed political preferences and prejudices in a conceptual data model.

While the post has a computing flavor to it especially on the database design and a touch of ontologies, it is surely also of general interest, because it gives some insight into the management of data that is used for policy-making in and for conflict zones. A nicer version of this blog post and the one after that made it into a paper-review article “Dirty wars, databases, and indices” in the Peace & Conflict Review journal (Fall 2009 issue) of the UN-mandated University for Peace in Costa Rica.

Building bias into your database; Jan 7, 2009

 p.s.: while I intended to write a post on attending the ER’15 conferences, the exciting times with the student protests in South Africa put that plan on the backburner for a few more days at least.


For developing bio-ontologies, if one follows Barry Smith and cs., then one is solely concerned with the representation of reality; moreover, it has been noted that ontologies can, or should be, seen as a representation of a scientific theory [1] or at least that they are an important part of doing science [2]. In that case, life is easy, not hard, for we have the established method of scientific inquiry to settle disputes (among others, by doing additional lab experiments to figure out more about reality). Domain- and application ontologies, as well as conceptual data models, for the enterprise universe of discourse require, at times, a consensus-based approach where some parts of the represented information are the outcome of negotiations and agreements among the stakeholders.

Going one step further on the sliding scale: for databases and application software for the humanities, and conflict databases in particular, one makes an ontology or conceptual data model conforming to one’s own (or the funding organisation’s) political convictions and with the desired conclusions in mind. Building data vaults seems to be the intended norm rather than the exception, hence, maintenance and usage and data analysis beyond the developers limited intentions, let alone integration, are a nightmare.

 In this post, I will outline some suggestions for building your own politicized representation—be it an ontology or conceptual data model—for armed conflict data, such as terrorist incidents, civil war, and inter-state war. I will discuss in the next post a few examples of conflict data analysis, both regarding extant databases and the ‘dirty war index’ application built on top of them. A later post may deal with a solution to the problems, but for now, it would already be a great help not to adhere to the tips below.

Tips for biasing the representation

In random order, you could do any of the following to pollute the model and hamper data analysis so as to ensure your data is scientifically unreliable but suitable to serve your political agenda.

1. Have a fairly flat taxonomy of types of parties; in fact, just two subtypes suffice: US and THEM, although one could subtype the latter into ‘they’, ‘with them’, and ‘for them’. The analogue, with ‘we’, ‘with us’, and ‘for us’ is too risky for potential of contagion of responsibility of atrocities and therefore not advisable to include; if you want to record any of it, then it is better to introduce types such as ‘unknown perpetrator’ or ‘not officially claimed event’ or ‘independent actor’.

2. Aggregate creatively. For instance, if some of the funding for your database comes from a building construction or civil engineering company, refine that section of target types, or include new target types only when you feel like it is targeted sufficiently often by the opponent to warrant a whole new tuple or table from then onwards. Likewise, some funding agencies would like to see a more detailed breakdown of types of victims by types of violence, some don’t. Last, be careful with the typology of arms used, in particular when your country is producing them; a category like ‘DIY explosive device’ helps masking the producer.

3. Under-/over-represent geography. Play with granularity (by city/village, region, country, continent) and categorization criteria (state borders, language, former chiefdoms, parishes, and so forth), e.g., include (or not) notions such as ‘occupied territory’ (related to the actors) and `liberated region’ or `autonomous zone’, or that an area may, or may not, be categorized or named differently at the same time. Above all, make the modelling decisions in an inconsistent way, so that no single dimension can be analysed properly.

4. Make an a-temporal model and pretend not to change it, but (a) allow non-traceable object migration so that defecting parties who used to be with US (see point 1) can be safely re-categorised as THEM, and (b) refine the hierarchy over time anyway so as to generate time-inconsistency for target types (see point 2) and geography (see point 3), in order to avoid time series analyses and prevent discovering possible patterns.

5. Have a minimal amount of classes for bibliographic information, lest someone would want to verify the primary/secondary sources that report on numbers of casualties and discovers you only included media reports from the government-censored newspapers (or the proxy-funding agency, or the rebel radio station, or the guerrilla pamphlets).

6. Keep natural language definitions for key concepts in a separate file, if recorded at all. This allows for time-inconsistency in operational definitions as well as ignorance of the data entry clerks so that each one can have his own ideas about where in the database the conflict data should go.

7. Minimize the use of database integrity constraints, hence, minimize representing constraints in the ontology to begin with, hence, use a very simple modelling language so you can blame the language for not representing the subject domain adequately.

I’m not saying all conflict databases use all of these tricks; but some use at least most of them, which ruins credibility of those database of which the analysts actually did try to avoid these pitfalls (assuming there are such databases, that is). Optimism wants me to believe developers did not think of all those issues when designing the database. However, there is a tendency that each conflict researcher compiles his own data set and that each database is built from scratch.

For the current scope, I will set aside the problems with data collection and how to arrive at guesstimated semi-reliable approximations of deaths, severe injuries, rape, torture victims and so forth (see e.g. [3] and appendix B of [4]). Inherent problems with data collection is one thing and difficult to fix, bad modelling and dubious or partial data analysis is a whole different thing and doable to fix. I elaborate on latter claim in the next post.


