Robot peppers, monkey gland sauce, and go well—Say again? reviewed

The previous post about TDDonto2 had as toy example a pool braai, which does exist in South Africa at least, but perhaps also elsewhere under a different name: the braai is the ‘South African English’ (SAE) for the barbecue. There are more such words and phrases peculiar to SAE, and after the paper deadline last week, I did finish reading the book Say again? The other side of South African English by Jean Branford and Malcolm Venter (published earlier this year) that has many more examples of SAE and a bit of sociolinguistics and some etymology of that. Anyone visiting South Africa will encounter at least several of the words and sentence constructions that are SAE, but probably would raise eyebrows elsewhere. Let me start with some examples.

Besides the braai, one certainly will encounter the robot, which is a traffic light (automating the human police officer). A minor extension to that term can be found in the supermarket (see figure on the right): robot peppers, being a bag of three peppers in the colours of red, yellow, and green—no vegetable AI, sorry. robotpeppers

How familiar the other ones discussed in the book are, depends on how much you interact with South Africans, where you stay(ed), and how much you read and knew about the country before visiting it, I suppose. For instance, when I visited Pretoria in 2008, I had not come across the bunny, but did so upon my first visit in Durban in 2010 (it’s a hollowed-out half a loaf of bread, filled with a curry) and bush college upon starting to work at a university (UKZN) here in 2011. The latter is a derogatory term that was used for universities for non-white students in the Apartheid era, with the non-white being its own loaded term from the same regime. (It’s better not to use it—all terms for classifying people one way or another are a bit of a mine field, whose nuances I’m still trying to figure out; the book didn’t help with that).

Then there’s the category of words one may know from ‘general English’, but are by the authors claimed to have a different meaning here. One is the sell-outs, which is “to apply particularly to black people who were thought to have betrayed their people” (p143), though I have the impression it can be applied generally. Another is townhouse, which supposedly has narrowed its meaning cf. British English (p155), but from having lived on the isles some years ago, it was used in the very same way as it is here; the book’s authors just stick to its older meaning and assume the British and Irish do so too (they don’t, though). One that indeed does fall in the category ‘meaning restriction’ is transformation (an explanation of the narrower sense will take up too much space). While I’ve learned a bunch of the ‘unusual’ usual words in the meantime I’ve worked here, there were others that I still did wonder about. For instance, the lay-bye, which the book explained to be the situation when the shop sets aside a product the customer wants, and the customer pays the price in instalments until it is fully paid before taking the product home. The monkey gland sauce one can buy in the supermarket is another, which is a sauce based on ketchup and onion with some chutney in it—no monkeys and no glands—but, I’ll readily admit, I still have not tried it due to its awful name.

There are many more terms described and discussed in the book, and it has a useful index at the end, especially given that it gives the impression to be a very popsci-like book. The content is very nicely typesetted, with news item snippets and aside-boxes and such. Overall, though, while it’s ok to read in the gym on the bicycle for a foreigner who sometimes wonders about certain terms and constructions, it is rather uni-dimensional from a British White South African perspective and the authors are clearly Cape Town-based, with the majority of examples from SA media from Cape Town’s news outlets. They take a heavily Afrikaans-influence-only bias, with, iirc, only four examples of the influence of, e.g., isiZulu on SAE (e.g., the ‘go well’ literal translation of isiZulu’s hamba kahle), which is a missed opportunity. A quick online search reveals quite a list of words from indigenous languages that have been adopted (and more here and here and here and here) such as muti (medicine; from the isiZulu umuthi) and maas (thick sour milk; from the isiZulu amasi) and dagga (marijuana; from the Khoe daxa-b), not to mention the many loan words, such as indaba (conference; isiZulu) and ubuntu (the concept, not the operating system—which the authors seem to be a bit short of, given the near blind spot on import of words with a local origin). If that does not make you hesitant to read it, then let me illustrate some more inaccuracies beyond the aforementioned townhouse squabble, which results in having to take the book’s contents probably with a grain of salt and heavily contextualise it, and/or at least fact-check it yourself. They fall in at least three categories: vocabulary, grammar, and etymology.

To quote: “This came about because the Dutch term tijger means either tiger or leopard” (p219): no, we do have a word for leopard: luipaard. That word is included even in a pocket-size Prisma English-Dutch dictionary or any online EN-NL dictionary, so a simple look-up to fact-check would have sufficed (and it existed already in Dutch before a bunch of them started colonising South Africa in 1652; originating from old French in ~1200). Not having done so smells of either sloppiness or arrogance. And I’m not so sure about the widespread use of pavement special (stray or mongrel dogs or cats), as my backyard neighbours use just stray for ‘my’ stray cat (whom they want to sterilise because he meows in the morning). It is a fun term, though.

Then there’s stunted etymology of words. The coconut is not a term that emerged in the “new South Africa” (pp145-146), but is transferred from the Americas where it was already in use for at least since the 1970s to denote the same concept (in short: a brown skinned person who is White on the inside) but then applied to some people from Central and South America [Latino/Hispanic; take your pick].

Extending the criticism also to the grammar explanations, the “with” aside box on pp203-204 is wrong as well, though perhaps not as blatantly obvious as the leopard and coconut ones. The authors stipulate that phrases like “Is So-and-So coming with?” (p203) is Afrikaans influence of kom saam “where saam sounds like ‘with’” (p203) (uh, no, it doesn’t), and as more guessing they drag a bit of German influence in US English into it. This use, and the related examples like the “…I have to take all my food with” (p204) is the same construction and similar word order for the Dutch adverb mee ‘with’ (and German mit), such as in the infinitives meekomen ‘to come with’ (komen = to come), meenemen ‘to take with’, meebrengen ‘to bring with’, and meegaan ‘to go with’. In a sentence, the mee may be separated from the rest of the verb and put somewhere, including at the end of the sentence, like in ik neem mijn eten mee ‘I take my food with’ (word-by-word translated) en komt d’n dieje mee? ‘comes so-and-so with?’ (word-by-word translated, with a bit of ABB in the Dutch). German has similar infinitives—mitkommen, mitnehmen, mitbringen, and mitgehen, respectively—sure, but the grammar construction the book’s authors highlight is so much more likely to come from Dutch as first step of tracing it back, given that Afrikaans is a ‘simplified’ version of Dutch, not of German. (My guess would be that the Dutch mee- can be traced back, in turn, to the German mit, as Dutch is a sort of ‘simplified’ German, but that’s a separate story.)

In closing, I could go on with examples and corrections, and maybe I should, but I think I made the point clear. The book didn’t read as badly as it may seem from this review, but writing the review required me to fact-check a few things, rather than taking most of it at face value, which made it turn out more and more mediocre than the couple of irritations I had whilst reading it.

Reblogging 2010: South African women on leadership in science, technology and innovation

From the “10 years of keetblog – reblogging: 2010”: while the post’s data are from 5 years ago, there’s still room for improvement. That said, it’s not nearly as bad as in some other countries, like the Netherlands (though the university near my home town improved from 1.6% to 5% women professors over the past 5 years). As for the places I worked post-PhD, the percent female academics with full time permanent contract: FUB-KRDB group 0% (still now), UKZN-CS-Westville: 12.5% (me; 0% now), UCT-CS: 42%.

South African Women on leadership in science, technology and innovation; August 13, 2010


Today I participated in the Annual NACI symposium on the leadership roles of women in science, technology and innovation in Pretoria, which was organized by the National Advisory Council on Innovation, which I will report on further below. As preparation for the symposium, I searched a bit to consult the latest statistics and see if there are any ‘hot topics’ or ‘new approaches’ to improve the situation.

General statistics and their (limited) analyses

The Netherlands used to be at the bottom end of the country league tables on women professors (from my time as elected representative in the university council at Wageningen University, I remember a UN table from ’94 or ‘95 where the Netherlands was third last from all countries). It has not improved much over the years. From Myklebust’s news item [1], I sourced the statistics to Monitor Women Professors 2009 [2] (carried out by SoFoKleS, the Dutch social fund for the knowledge sector): less than 12% of the full professors in the Netherlands are women, with the Universities of Leiden, Amsterdam, and Nijmegen leading the national league table and the testosterone bastion Eindhoven University of Technology closing the ranks with a mere 1.6% (2 out of 127 professors are women). With the baby boom generation lingering on clogging the pipeline since a while, the average percentage increase has been about 0.5% a year—way too low to come even near the EU Lisbon Agreement Recommendation’s target of 25% by 2010, or even the Dutch target of 15%, but this large cohort will retire soon, and, in terms of the report authors, makes for a golden opportunity to move toward gender equality more quickly. The report also has come up with a “Glass Ceiling Index” (GCI, the percentage of women in job category X-1 divided by the percentage of women in job category X) and, implicitly, an “elevator” index for men in academia. In addition to the hard data to back up the claim that the pipeline is leaking at all stages, they note it varies greatly across disciplines (see Table 6.3 of the report): in science, the most severe blockage is from PhD to assistant professor, in Agriculture, Technology, Economics, and Social Sciences it is the step from assistant to associate professor, and for Law, Language & Culture, and ‘miscellaneous’, the biggest hurdle is from associate to full professor. From all GCIs, the highest GCI (2.7) is in Technology in the promotion from assistant to associate professor, whereas there is almost parity at that stage in Language & Culture (GCI of 1.1, the lowest value anywhere in Table 6.3).

“When you’re left out of the club, you know it. When you’re in the club, you don’t see what the problem is.” Prof. Jacqui True, University of Auckland [4]

Elsewhere in ‘the West’, statistics can look better (see, e.g., The American Association of University Professors (AAUP) survey on women 2004-05), or are not great either (UK, see [3], but the numbers are a bit outdated). However, one can wonder about the meaning of such statistics. Take, for instance, the NYT article on a poll about paper rights vs. realities carried out by The Pew Research in 22 countries [4]: in France, some 100% paid their lip service to being in favour of equal rights, yet 75% also said that men had a better life. It is only in Mexico (56%), Indonesia (55%) and Russia (52%) that the people who were surveyed said that women and men have achieved a comparable quality of life. But note that the latter statement is not the same as gender equality. And equal rights and opportunities by law does not magically automatically imply the operational structures are non-discriminatory and an adequate reflection of the composition of society.

A table that has generated much attention and questions over the years—but, as far as I know, no conclusive answers—is the one published in Science Magazine [5] (see figure below). Why is it the case that there are relatively much more women physics professors in countries like Hungary, Portugal, the Philippines and Italy than in, say, Japan, USA, UK, and Germany? Recent guessing for the answer (see blog comments) are as varied as the anecdotes mentioned in the paper.

Physics professors in several countries (Source: 5).

Barinaga’s [5] collection of anecdotes of several influential factors across cultures include: a country’s level of economic development (longer established science propagates the highly patriarchal society of previous centuries), the status of science there (e.g., low and ‘therefore’ open to women), class structure (pecking order: rich men, rich women, poor men, poor women vs. gender structure rich men, poor men, rich women, poor women), educational system (science and mathematics compulsory subjects at school, all-girls schools), and the presence or absence of support systems for combining work and family life (integrated society and/or socialist vs. ‘Protestant work ethic’), but the anecdotes “cannot purport to support any particular conclusion or viewpoint”. It also notes that “Social attitudes and policies toward child care, flexible work schedules, and the role of men in families dramatically color women’s experiences in science”. More details on statistics of women in science in Latin America can be found in [6] and [7], which look a lot better than those of Europe.

Barbie the computer engineer

Bonder, in her analysis for Latin America [7], has an interesting table (cuadro 4) on the changing landscape for trying to improve the situation: data is one thing, but how to struggle, which approaches, advertisements, and policies have been, can, or should be used to increase women participation in science and technology? Her list is certainly more enlightening than the lame “We need more TV shows with women forensic and other scientists. We need female doctor and scientist dolls.” (says Lotte Bailyn, a professor at MIT) or “Across the developed world, academia and industry are trying, together or individually, to lure women into technical professions with mentoring programs, science camps and child care.” [8] that only very partially addresses the issues described in [5]. Bonder notes shifts in approaches from focusing only on women/girls to both sexes, from change in attitude to change in structure, from change of women (taking men as the norm) to change in power structures, from focusing on formal opportunities to targeting to change the real opportunities in discriminatory structures, from making visible non-traditional role models to making visible the values, interests, and perspectives of women, and from the simplistic gender dimension to the broader articulation of gender with race, class, and ethnicity.

The NACI symposium

The organizers of the Annual NACI symposium on the leadership roles of women in science, technology and innovation provided several flyers and booklets with data about women and men in academia and industry, so let us start with those. Page 24 of Facing the facts: Women’s participation in Science, Engineering and Technology [9] shows the figures for women by occupation: 19% full professor, 30% associate professor, 40% senior lecturer, 51% lecturer, and 56% junior lecturer, which are in a race distribution of 19% African, 7% Coloured, 4% Indian, and 70% White. The high percentage of women participation (compared to, say, the Netherlands, as mentioned above) is somewhat overshadowed by the statistics on research output among South African women (p29, p31): female publishing scientists are just over 30% and women contributed only 25% of all article outputs. That low percentage clearly has to do with the lopsided distribution of women on the lower end of the scale, with many junior lecturers who conduct much less research because they have a disproportionate heavy teaching load (a recurring topic during the breakout session). Concerning distribution of grant holders in 2005, in the Natural & agricultural sciences, about 24% of the total grants (211 out of 872) have been awarded to women and in engineering & technology it is 11% (24 out of 209 grants) (p38). However, in Natural & agricultural sciences, women make up 19% and in engineering and technology 3%, which, taken together with the grant percentages, show there is a disproportionate amount of women obtaining grants in recent years. This leads one to suggest that the ones that actually do make it to the advanced research stage are at least equally as good, if not better, than their male counterparts. Last year, women researchers (PIs) received more than half of the grants and more than half of the available funds (table in the ppt presentation of Maharaj, which will be made available online soon).

Mrs Naledi Pandor, the Minister for Science and Technology, held the opening speech of the event, which was a good and entertaining presentation. She talked about the lack of qualified PhD supervisors to open more PhD positions, where the latter is desired so as to move to the post-industrial, knowledge-based economy, which, in theory at least, should make it easier for women to participate than in an industrial economy. She also mentioned that one should not look at just the numbers, but instead at the institutional landscape so as to increase opportunities for women. Last, she summarized the “principles and good practice guidelines for enhancing the participation of women in the SET sector”, which are threefold: (1) sectoral policy guidelines, such as gender mainstreaming, transparent recruiting policies, and health and safety at the workplace, (2) workplace guidelines, such as flexible working arrangements, remuneration equality, mentoring, and improving communication lines, and (3) re-entry into the Science, Engineering and Technology (SET) environment, such as catch-up courses, financing fellowships, and remaining in contact during a career break.

Dr. Thema, former director of international cooperation at the Department of Science and Technology added the issues of the excessive focus on administrative practicalities, the apartheid legacy and frozen demographics, and noted that where there is no women’s empowerment, this is in violation of the constitution. My apologies if I have written her name and details wrongly: she was a last-minute replacement for Prof. Immaculada Garcia Fernández, department of computer science at the University of Malaga, Spain. Garcia Fernández did make available her slides, which focused on international perspectives on women leadership in STI. Among many points, she notes that the working conditions for researchers “should aim to provide… both women and men researchers to combine work and family, children and career” and “Particular attention should be paid, to flexible working hours, part-time working, tele-working and sabbatical leave, as well as to the necessary financial and administrative provisions governing such arrangements”. She poses the question “The choice between family and profession, is that a gender issue?”

Dr. Romilla Maharaj, executive director for human and institutional capacity development at the National Research Foundation came with much data from the same booklet I mentioned in the first paragraph, but little qualitative analysis of this data (there is some qualitative information). She wants to move from the notion of “incentives” for women to “compensation”. The aim is to increase the number of PhDs five-fold by 2018 (currently the rate is about 1200 each year), which is not going to be easy (recollect the comment by the Minister, above). Concerning policies targeted at women participation, they appear to be successful for white women only (in postdoc bursaries, white women even outnumber white men). In my opinion, this smells more of a class/race structure issue than a gender issue, as mentioned above and in [5]. Last, the focus of improvements, according to Maharaj, should be on institutional improvements. However, during the break-out session in the afternoon, which she chaired, she seemed to be selectively deaf on this issue. The problem statement for the discussion was the low research output by women scientists compared to men, and how to resolve that. Many participants reiterated the lack of research time due to the disproportionate heavy teaching load (compared to men) and what is known as ‘death by committee’, and the disproportionate amount of (junior) lecturers who are counted in the statistics as scientists but, in praxis, do not do (or very little) research, thereby pulling down the overall statistics for women’s research output. Another participant wanted to se a further breakdown of the numbers by age group, as the suspicion was that it is old white men who produce most papers (who teach less, have more funds, supervise more PhD students etc.) (UPDATE 13-10-’10: I found some data that seems to support this). In addition, someone pointed out that counting publications is one thing, but considering their impact (by citations) is another one and for which no data was available, so that a recommendation was made to investigate this further as well (and to set up a gender research institute, which apparently does not yet exist in South Africa). The pay-per-publication scheme implemented at some universities could thus backfire for women (who require the time and funds to do research in the first place so as to get at least a chance to publish good papers). Maharaj’s own summary of the break-out session was an “I see, you want more funds”, but that does not rhyme fully with he institutional change she mentioned earlier nor with the multi-faceted problems raised during the break-out session that did reveal institutional hurdles.

Prof. Catherine Odora Hoppers, DST/NRF South African Research Chair in Development Education (among many things), gave an excellent speech with provoking statements (or: calling a spade a spade). She noted that going into SET means entering an arena of bad practice and intolerance; to fix that, one first has to understand how bad culture reproduces itself. The problem is not the access, she said, but the terms and conditions. In addition, and as several other speakers already had alluded to as well, she noted that one has to deal with the ghosts of the past. She put this in a wider context of the history of science with the value system it propagates (Francis Bacon, my one-line summary of the lengthy quote: science as a means to conquer nature so that man can master and control it), and the ethics of SET: SET outcomes have, and have had, some dark results, where she used the examples of the atom bomb, gas chambers, how SET was abused by the white male belittling the native and that it has been used against the majority of people in South Africa, and climate change. She sees the need for a “broader SET”, meaning ethical, and, (in my shorthand notation) with social responsibility and sustainability as essential components. She is putting this into practice by stimulating transdisciplinary research at her research group, and, at least and as a first step: people from different disciplines must to be able to talk to each other and understand each other.

To me, as an outsider, it was very interesting to hear what the current state of affairs is regarding women in SET in South Africa. While there were complaints, there we also suggestions for solutions, and it was clear from the data available that some improvements have been made over the years, albeit only in certain pockets. More people registered for the symposium than places available, and with some 120 attendees from academia and industry at all stages of the respective career paths, it was a stimulating mix of input that I hope will further improve the situation on the ground.


[1] Jan Petter Myklebust. THE NETHERLANDS: Too few women are professors. University World News, 17 January 2010, Issue: 107.

[2] Marinel Gerritsen, Thea Verdonk, and Akke Visser. Monitor Women Professors 2009. SoFoKleS, September 2009.

[3] Helen Hague. 9.2% of professors are women. Times Higher Education, May 28, 1999.

[4] Victoria Shannon. Equal rights for women? Surveys says: yes, but…. New York Times/International Herald Tribune—The female factor, June 30, 2010.

[5] Marcia Barinaga. Overview: Surprises Across the Cultural Divide. Compiled in: Comparisons across cultures. Women in science 1994. Science, 11 March 1994 263: 1467-1496 [DOI: 10.1126/science.8128232]

[6] Beverley A. Carlson. Mujeres en la estadística: la profesión habla. Red de Reestructuración y Competitividad, CEPAL – SERIE Desarrollo productivo, nr 89. Santiago de Chile, Noviembre 2000.

[7] Gloria Bonder. Mujer y Educación en América Latina: hacia la igualdad de oportunidades. Revista Iberoamericana de Educación, Número 6: Género y Educación, Septiembre – Diciembre 1994.

[8] Katrin Benhold. Risk and Opportunity for Women in 21st Century. New York Times International Herald Tribune—The female factor, March 5, 2010.

[9] Anon. Facing the facts: Women’s participation in Science, Engineering and Technology. National Advisory Council on Innovation, August 2009.

Reblogging 2008: Failing to recognize your own incompetence

From the “10 years of keetblog – reblogging: 2008”: On those uncomfortable truths on the difference between knowing what you don’t know and not knowing what you don’t know… (and one of the earlier Ig Nobel prize winners 15 years ago)

Failing to recognize your own incompetence; Aug 25, 2008


Somehow, each time when I mention to people the intriguing 2000 Ig Nobel prize winning paper “Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments” [1], they tend to send (non)verbal signals demonstrating a certain discomfort. Then I tone it down a bit, saying that one could argue about the set up of the experiment that led Kruger & Dunning to their conclusion. Now—well, based on material from a few years ago but I found out recently—I cannot honestly say that anymore either. A paper from the same authors, “Why people fail to recognize their own incompetence” [2], reports not only more of their experiments in different settings, but also different experiments by other researchers validating the basic tenet that ignorant and incompetent people do not realize they are incompetent but rather think more favourably of themselves—“tend to hold overinflated views of their skills”—than can be justified based on their performance.

Yeah, the shoe might fit. Or not. In addition to the lower end of the scale overestimating their competencies by a large margin, the converse happens, though to a lesser extent, at the other end of the scale, where top-experts underestimate their actual capabilities. The latter brings it own set of problems and research directions, which I will set aside for the remainder of this blog post. Instead, I will dwell a bit on those people bragging to know this that and the other, but, alas, do not perform properly and, moreover, do not even realize they do not. Facing a person who knows s/he does not have the required skills is one thing and generally s/he’s willing to listen and learn or say to not care about it, but those people who do not realize the knowledge/skills gap they have are, well, a hopeless bunch futile to waste your time on (unless you teach them anyway).

 Let us have a look what those psychologists provided to come to this conclusion. Aside from the experiment about jokes in the ’99 paper, which are at least (sub)culture-dependent, the data about the introductory-level psychology class taken by 141 students is quite telling. Right after the psych exam, the students were asked about their own estimate of performance & mastery of the course material (relative to other students in their class) and to estimate their raw score of the exam. These were the results ([2] p84, Fig.1):

If you think such kind of data is only observed with undergraduates in psychology, well, then check [2]’s references: debate teams, hunters about their firearms, medical residents (over)estimating their patient-interviewing techniques, medical lab technicians overestimating their knowledge of medical terminology—you name it, the same pattern, even if the subjects were held a carrot of monetary incentive in an attempt to assess themselves honestly.

 Imagine you going to a GP or doctor of a regional hospital who has the arrogance to know it all and does not call in a specialist on time. One can debate about the harmfulness or harmlessness about such cases. A very recent incident I observed was where x1 and x2 demanded from y to do nonsensical task z. Task z—exemplifying ignorance and incompetence of x1 and x2—was not carried out by y for it could not be done, but it was nevertheless used by x1 and x2 to “demonstrate” “(inherent) incompetence” of y because y did not do task z, whereas, in fact, it the only thing it shows is that y, unlike x1 and x2, may actually have realized z could not be done, hence, understand z better than x1 and x2 do. One’s incompetence [in this case, of x1 and x2] can have far-reaching effects on others around oneself. Trying to get x1 and x2 to realize their shortcomings has not worked thus far. Dunning et al’s students, however, had exam results for unequivocal feedback and there was an additional test set up with a controlled setting where they had built-in a lecture to teach the incompetent so as to rate their competencies better (which worked to some extent), but in real life those options are not always available. What options are available, if any? A prevalent approach I observed here in Italy (well, in academia at least) is that Italians tend to ignore those xs so as to limit as much as possible the ‘air time’ and attention they have, i.e., an avoidance strategy to leave the incompetent be, whereas, e.g., in the Netherlands people will tend to keep on talking until they have blisters on their tongues (figuratively) to try to get some sense in the xs heads, and yet others attempt to sweep things under the carpet and pray there will not appear any wobbles one could fall over. Research directions, let alone some practical suggestions on “how to let people become aware of their intellectual and social deficiencies”—other than ‘teach them’—were not mentioned in the article, but made it to the list of future works.

 You might wonder: does this hold across cultures? The why of the ‘ignorant and unaware of it’ gives some clues that, in theory, culture may not have anything to do with it.

“In many intellectual and social domains, the skills needed to produce correct responses are virtually identical to those needed to evaluate the accuracy of one’s responses… Thus, if people lack the skills to produce correct answers, they are also cursed with an inability to know when their, or anyone else’s, are right or wrong. They cannot recognize their responses as mistaken, or other people’s responses as superior to their own.” ([2], p. 86—emphasis added)

The principal problem has to do with so-called meta-cognition, which “refers to the ability to evaluate responses as correct or incorrect”, and incompetence then entails that one cannot successfully complete such a task; this is a catch-22, but, as mentioned, ‘outside intervention’ through teaching appeared to work and other means are a topic of further investigation. Clearly, a culture of arrogance can make significant stats more significant, but it does not change the principle of the cause. In this respect, the start of the article aptly quotes Confucius: “Real knowledge is to know the extent of one’s ignorance”. Conversely, according to Whitehead (quoted on p. 86 of [2]), “it is not ignorance, but ignorance of ignorance, that is the death of knowledge”.


[1] Kruger, J., Dunning, D. Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of personality and Social Psychology, 1999, 77: 1121-1134.

[2] Dunning, D., Johnson, K., Ehrlinger, J., Kruger, J. Why people fail to recognize their own incompetence. Current Directions in Psychological Science, 2003, 12(3): 83-87.

 p.s.: I am aware of the fact that I do not know much about psychology, so my rendering, interpretation, and usage of the content of those papers may well be inaccurate, although I fancy the thought that I understood them.

Reblogging 2007: AI and cultural heritage workshop at AI*IA’07

From the “10 years of keetblog – reblogging: 2007”: a happy serendipity moment when I stumbled into the AI & Cultural heritage workshop, which had its presentations in Italian. Besides the nice realisation I actually could understand most of it, I learned a lot about applications of AI to something really useful for society, like the robot-guide in a botanical garden, retracing the silk route, virtual Rome in the time of the Romans, and more.

AI and cultural heritage workshop at AI*IA’07, originally posted on Sept 11, 2007. For more recent content on AI & cultural heritage, see e.g., the workshop’s programme of 2014 (also collocated with AI*IA).


I’m reporting live from the Italian conference on artificial intelligence (AI*IA’07) in Rome (well, Villa Mondrogone in Frascati, with a view on Rome). My own paper on abstractions is rather distant from near-immediate applicability in daily life, so I’ll leave that be and instead write about an entertaining co-located workshop about applying AI technologies for the benefit of cultural heritage that, e.g., improve tourists’ experience and satisfaction when visiting the many historical sites, museums, and buildings that are all over Italy (and abroad).

I can remember well the handheld guide at the Alhambra back in 2001, which had a story by Mr. Irving at each point of interest, but there was only one long story and the same one for every visitor. Current research in AI & cultural heritage looks into solving issues how this can be personalized and be more interactive. Several directions are being investigated how this can be done. This ranges from the amount of information provided at each point of interest (e.g., for the art buff, casual American visitor who ‘does’ a city in a day or two, or narratives for children), to location-aware information display (the device will detect which point of interest you are closest to), to cataloguing and structuring the vast amount of archeological information, to the software monitoring of Oetzi the Iceman. The remainder of this blog post describes some of the many behind-the-scenes AI technologies that aim to give a tourist the desired amount of relevant information at the right time and right place (see the workshop website for the list of accepted papers). I’ll add more links later; any misunderstandings are mine (the workshop was held in Italian).

First something that relates somewhat to bioinformatics/ecoinformatics: the RoBotanic [1], which is a robot guide for botanical gardens – not intended to replace a human, but as an add-on that appeals in particular to young visitors and get them interested in botany and plant taxonomy. The technology is based on the successful ciceRobot that has been tested in the Archeological Museum Agrigento, but having to operate outside in a botanical garden (in Palermo), new issues have to be resolved, such as tuff powder, irregular surface, lighting, and leaves that interfere with the GPS system (for the robot to stop at plants of most interest). Currently, the RoBotanic provides one-way information, but in the near-future interaction will be built in so that visitors can ask questions as well (ciceRobot is already interactive). Both the RoBotanic and ciceRobot are customized off-the shelf robots.

Continuing with the artificial, there were three presentations about virtual reality. VR can be a valuable add-on to visualize lost or severely damaged property, timeline visualizations of rebuilding over old ruins (building a church over a mosque or vice versa was not uncommon), to prepare future restorations, and general reconstruction of the environment, all based on the real archeological information (not Hollywood fantasy and screenwriting). The first presentation [2] explained how the virtual reality tour of the Church of Santo Stefano in Bologna was made, using Creator, Vega, and many digital photos that served for the texture-feel in the VR tour. [3] provided technical details and software customization for VR & cultural heritage. On the other hand, the third presentation [4] was from a scientific point most interesting and too full of information to cover it all here. E. Bonini et al. investigated if, and if yes how, VR can give added-value. Current VR being insufficient for the cultural heritage domain, they look at how one can do an “expansion of reality” to give the user a “sense of space”. MUDing on the via Flaminia Antica in the virtual room in the National Museum in Rome should be possible soon (CNR-ITABC project started). Another issue came up during the concluded Appia Antica project for Roman era landscape VR: behaviour of, e.g., animals are now pre-coded and become boring to the user quickly. So, what these VR developers would like to see (i.e., future work) is to have technologies for autonomous agents integrated with VR software in order to make the ancient landscape & environment more lively: artificial life in the historical era one wishes, based on – and constrained by – scientific facts so as to be both useful for science and educational & entertaining for interested laymen.

A different strand of research is that of querying & reasoning, ontologies, planning and constraints.
Arbitrarily, I’ll start with the SIRENA project in Naples (the Spanish Quarter) [5], which aims to provide automatic generation of maintenance plans for historical residential buildings in order to make the current manual plans more efficient, cost effective, and maintain them just before a collapse. Given the UNI 8290 norms for technical descriptions of parts of buildings, they made an ontology, and used FLORA-2, Prolog, and PostgreSQL to compute the plans. Each element has its own interval for maintenance, but I didn’t see much of the partonomy, and don’t know how they deal with the temporal aspects. Another project [6] also has an ontology, in OWL-DL, but is not used for DL-reasoning reasoning yet. The overall system design, including use of Sesame, Jena, SPARQL can be read here and after server migration, their portal for the archeological e-Library will be back online. Another component is the webGIS for pre- and proto-historical sites in Italy, i.e., spatio-temporal stuff, and the hope is to get interesting inferences – novel information – from that (e.g., discover new connections between epochs). A basic online accessible version of webGIS is already running for the Silk Road.
A third different approach and usage of ontologies was presented in [7]. With the aim of digital archive interoperability in mind, D’Andrea et al. took the CIDOC-CRM common reference model for cultural heritage and enriched it with DOLCE D&S foundational ontology to better describe and subsequently analyse iconographic representations, from, in this particular work, scenes and reliefs from the meroitic time in Egypt.
With In.Tou.Sys for intelligent tourist systems [8] we move to almost-industry-grade tools to enhance visitor experience. They developed software for PDAs one takes around in a city, which then through GPS can provide contextualized information to the tourist, such as the building you’re walking by, or give suggestions for the best places to visit based on your preferences (e.g., only baroque era, or churches, or etc). The latter uses a genetic algorithm to compute the preference list, the former a mix of RDBMS on the server-side, OODBMS on the client (PDA) side, and F-Logic for the knowledge representation. They’re now working on the “admire” system, which has a time component built in to keep track of what the tourist has visited before so that the PDA-guide can provide comparative information. Also for city-wide scale and guiding visitors is the STAR project [9], bit different from the previous, it combines the usual tourist information and services – represented in a taxonomy, partonomy, and a set of constraints – with problem solving and a recommender system to make an individualized agenda for each tourist; so you won’t stand in front of a closed museum, be alerted of a festival etc. A different PDA-guide system was developed in the PEACH project for group visits in a museum. It provides limited personalized information, canned Q & A, and visitors can send messages to their friend and tag points of interest that are of particular interest.

Utterly different from the previous, but probably of interest to the linguistically-oriented reader is philology & digital documents. Or: how to deal with representing multiple versions of a document. Poets and authors write and rewrite, brush up, strike through etc. and it is the philologist’s task to figure out what constitutes a draft version. Representing the temporality and change of documents (words, order of words, notes about a sentence) is another problem, which [10] attempts to solve by representing it as a PERT/CPM graph structure augmented with labeling of edges, the precise definition of a ‘variant graph’, and a method of compactly storing it (ultimately stored in XML). The test case as with a poem from Valerio Magrelli.

The proceedings will be put online soon (I presume), is also available on CD (contact the WS organizer Luciana Bordoni), and probably several of the articles are online on the author’s homepages.

[1] A. Chella, I. Macaluso, D. Peri, L. Riano. RoBotanic: a Robot Guide for Botanical Gardens. Early Steps.
[2] G. Adorni. 3D Virtual Reality and the Cultural Heritage.
[3] M.C.Baracca, E.Loreti, S. Migliori, S. Pierattini. Customizing Tools for Virtual Reality Applications in the Cultural Heritage Field.
[4] E. Bonini, P. Pierucci, E. Pietroni. Towards Digital Ecosystems for the Transmission and Communication of Cultural Heritage: an Epistemological Approach to Artificial Life.
[5] A. Calabrese, B. Como, B. Discepolo, L. Ganguzza , L. Licenziato, F. Mele, M. Nicolella, B. Stangherling, A. Sorgente, R Spizzuoco. Automatic Generation of Maintenance Plans for Historical Residential Buildings.
[6] A.Bonomi, G. Mantegari, G.Vizzari. Semantic Querying for an Archaeological E-library.
[7] A. D’Andrea, G. Ferrandino, A. Gangemi. Shared Iconographical Representations with Ontological Models.
[8] L. Bordoni, A. Gisolfi, A. Trezza. INTOUSYS: a Prototype Personalized Tourism System.
[9] D. Magro. Integrated Promotion of Cultural Heritage Resources.
[10] D. Schmidt, D. Fiormonte. Multi-Version Documents: a Digitisation Solution for Textual Cultural Heritage Artefacts

Reblogging 2006: “We are what we repeatedly do…

This is the 10th year of my blog, which started off as a little experiment and ‘seeing where it ends up’. In numbers, there are over 200 posts and I estimate that in September, the blog will clock its 100,000th visitor. I had a look at the list of posts, and I’ll reblog about 2 blog posts from each year, trying to pick one ‘general’ topic and one about my research that will also note some follow-ups that happened after the post. I’ve selected them ignoring the ratings or visits of the posts, as I still haven’t figured out why some posts get lots of hits whilst others don’t; shouldn’t you all want to know about changes in the ingredients of people’s meals or strive to be a nonviolent person, rather than solving a problem on rearranging luggage in an airport carousel or looking into money-making or self-indulgence on mapmaking showing all and sundry the countries you visited? Anyway, this is the first installment of it.

From the “10 years of keetblog – reblogging: 2006” (June 11, 2006): A summary on what to do (repeatedly) to become a good researcher.

“We are what we repeatedly do…

…excellence, then, is not an act but a habit”, Aristotle has said. Being an excellent researcher then amounts to habitually doing excellent research. A prerequisite of doing excellent research is to do research effectively. Even the famed, and idealized, eureka! moment scientists occasionally (are supposed to) have is based on sound foundations acquired through searching, reading, researching, thinking, testing, and integrating new scientific developments with extant knowledge already accumulated. But how to get there? I don’t know – I’m only studying to become an excellent scientist.
Besides the aforementioned list of activities, I occasionally browse the Internet and procrastinate by reading how to write a thesis, improve the English grammar and word use, plan activities to avoid unemployment, the PhD comic, and more of those type of suggestions that don’t help me with the topic itself (granularity) but these topics are about how to do things.

Serendipity, perhaps, it was that brought me to an essay by Michael Nielsen, entitled “Principles of effective research” [1]. I summarise it here briefly, but it would be better if you read the 12 pages in full.

The first section is about integrating research into the rest of your life. So, unlike narratives and jokes that tell you its normal to not have a life as a PhD student or researcher, this would be the wrong direction to go or stage to be at. Be fit, have fun.

The principles of personal behaviour to achieve effective research are proactivity, vision, and self-discipline. Don’t abdicate responsibility, and be accountable to other people. Vision does not apply to where you think your research field will be in 20 years, but where do you want to be then, what sort of researcher do you want to be, which areas are you interested in (etc)? Have clear for yourself what you want to achieve, why, and how.

Regarding the research itself, self-development and the creative process are important. But focusing on self-development only is not ok, because then one fails to make a contribution to the community which viability and success depends on input from scientists (among others). On the other hand, keeping on organizing workshops, conferences, doing reviewing etc leaves little time for the self-development and creative process of doing research to make scientific contributions. That is, one should strive for a balance of the two.
Self-development includes developing research strengths, your ‘niche’ with a unique combination of abilities to get a comparative advantage. Emerging disciplines, mostly interdisciplinary, are a nice mess to sort out, for instance. Then, read the 10 seminal contributions in the other field as opposed to skimming several hundred mediocre articles that are fillers of the conferences or journals. (This doesn’t sound particularly friendly, but if I take bioinformatics or the ontologies hype as examples, there are quite a lot of articles that don’t add new ideas [but more descriptions of yet more software tools] and interdisciplinary articles are known to be not easy to review, hence more articles with confused content fall through the cracks and make it into archived publications.) A high-quality research environment helps.
Concerning the creative process, this depends on if you’re a problem-solver or a problem-creator, with each requiring specific skills. The former generally receives more attention, because there are so many things unknown and then figuring out how/what/why it works gives sought-for answers, technologies, or methodologies. Problem-creators, on the other hand, generate new work so to speak; by asking interesting questions, reformulating old nigh unsolvable problems in a new way, or showing connections nobody has thought of before. Read Nielsen’s article for details on the suggested skills set for each type.

Wishing you good luck with all this, then, is inappropriate, as luck does not seem to have much to do with becoming an effective researcher. So, go forth, improve your habits, and reap what you sow.

[1] Nielsen, M.A. Principles of effective research. July 27th , 2004. UPDATE 22-7-2015: this is the new URL:

Wikipedia + open access = not quite a revolution (not yet at least)

The title of the arxiv blog post sounded so catchy and wishful thinking into a high truthlikeness: “Why Wikipedia + open access = revolution”, summarizing and expanding on with the title “Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science.” [1], with some quotes:

“The odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to closed access journals,” say Teplitskiy and co.

Open access publishing has changed the way scientists communicate with each other but Teplitskiy and buddies have now shown that its influence is much more significant. “Our research suggests that open access policies have a tremendous impact on the diffusion of science to the broader general public through an intermediary like Wikipedia,” says Teplitskiy and co.

It means that open access publications are dramatically amplifying the way science diffuses through the world and ultimately changing the way we understand the universe around us.

I sooo want to believe. And, honestly, when I search for something and Wikipedia is the first hit and I do click, it does seem to give a decent introductory overview of something I know little about so that I can make a better start for searching the real sources. I never bothered to look up my own areas of specialisation, other than when a co-author mentioned there was (she put?—I can’t recall) a reference to her tool in Wikipedia some time ago. But there’s that nagging comment to the technologyreview blog post saying the same thing, and adding that when s/he looked up his/her own field, s/he

“then realized that in my own field, my main reaction was to want to scream at the cherry picking of sources to promote some minor researcher.”

So, I looked up “ontology engineering” and “Ontologies” that redirected to “Ontology (information science)” (‘information science’, tsk!)… and I kinda screamed. The next sections are, first, about the merits of the arxiv paper (outcome: their conclusions are certainly rather quite exaggerated) and, second, I’ll use that ‘ontology (information science)’ entry to dig a bit deeper as use case, using both the English entry and in several other languages as that’s what the arxiv paper covers as well. I’ll close with some thoughts on what to do about it.


On the arxiv paper’s data and results

There are several limitations to the paper; some of them discussed by its authors, some are not. The arxiv paper does not distinguish between online freely available scientific literature where only the final typesetted version is behind a paywall and official ‘open access’. This is problematic for processing the computer science entries in Wikipedia for trying to validate their hypothesis. In addition, they considered only journals with their open access policy, and journal-level analysis (cf article-level analysis), idem for the problematic ISI impact factor, and only those 21000 listed in Scopus, amounting eventually to the (ISI index-)top 4721 journals of which 335 open access to test Wikipedia content against. The open access list was taken from being listed in the directory of OA journals, ignoring the difference between ‘green’ and ‘gold’ and paywall-access from, say Elsevier. Overall, this already does not bode well for extending the obtained conclusion to computer science entries and, hence, the diffusion of knowledge claim.

The authors admit they may undercount references for the non-English entries, but they have few references anyway (Fig 1 in the arxiv paper), so it’s basically largely an English-Wikipedia analysis after all, i.e., so the conclusion is not really straightforwardly extending to ‘diffusion of knowledge’ for the non-English speaking world.

The statistical model is described on p19 of the pdf, and I don’t quite follow the rationale, with an elusive set of ‘journal characteristics’ and some estimated variables without detail. Maybe some stats person can shed a light on it.

Then the bubble-figure in the technologyreview, which is Fig 8 in the arxiv paper and it is reproduced in the screenshot below, which “shows that across 50 [non-English] Wikipedias, there is an inverse relationship between the effects of accessibility and status on referencing”. Come again? It’s not like the regression line fits well. And why are the language entries—presumably independent of one another—in a relation after all? Notwithstanding, the odds for a Serbian entry to have a reference to an open access journal is some 275% higher than to a paywalled one, vs entries in Turkish that cite higher impact factor journals some 200% more often, according to the arxiv paper. I haven’t found details of that data, though, other than a back-of-the-envelope calculation when glancing over the figure: Serbian has a 1.5 for impact and a 3.75 or so for open access, Turkish 3 and 1.3-ish. Of how many entries and how many citations for those languages? They state that “While the English Wikipedia references ~32,000 articles from top journals, the Slovak Wikipedia references only 108 and Volapuk references 0.”. But Volapuk still ends up with an open access odd ratio of 0.588 and an ln(impact factor) of 2.330 (Appendix A3), which is counted only with the set of top-rated journals only; how is that possible when there are no references to those top journals? The number of counted journal citations is not given for each language, so a ‘statistically significant’ may well actually be over a number that’s too low to do your statistics with. Waray-Waray is a very small dot, and reading from Fig 1, it’s probably not more than those 108 references in the Slovak entries.

All in all, there is some room for improvement on this paper, and, in any case, some toning down of the conclusions, let alone technologyreview’s sensationalist blog title.


Fig 8 from Teplitskiy et al (2015)

Ontology (information science) Wikipedia entry, some issues

Let me not be petty whining that none of my papers are in the references, but take a small example of the myriad of issues.

Take the statement “There are studies on generalized techniques for merging ontologies,[12] but this area of research is still largely theoretical.” Uh? The reference is to an obscure ‘dynamic ontology repair’ project pdf from the University of Edinburgh, retrieved in 2012. We merged DMOP’s domain content with DOLCE in 2011, with tool support (Protégé, to be precise). owl:import was around and working at that time as well. Not to mention the very large body of papers on ontology alignment, reference book by Shvaiko & Euzenat, and the Ontology Alignment Evaluation Initiative.

The list of ontology languages even includes SBVR and IDEF5 (not ontology languages), and, for good measure of scruffiness, a project (TOVE).

The obscure “Gellish” appears literally everywhere: it is an ontology, it is a language, it is an English dictionary (yes, the latter apparently also falls under ‘examples’ of ontologies. not), and it is even the one and only instantiation of a “hybrid ontology” combining a domain and an upper ontology. Yeah, right. Looking it up, Gellish is van Rensen’s PhD thesis of 2005 that has an underwhelming 2 citations according to Google Scholar (10 for the related journal paper), and there’s a follow-up 2nd edition of 2014 by the same author, published with lulu, no citations. That does not belong to an introductory overview of ontologies in computing. Dublin core as an example of an ontology? No (but it is a useful artefact for metadata annotations).

Under “criticisms”: aside from a Werner Ceusters statement from a commentary on someone from his website—since when deserves that to be upgraded to Wikipedia content?!?—there’s also “It’s also not clear how ontology fits with Schema on Read (NoSQL) databases.”. Ontologies with NoSQL? sigh.

“Further readings” would, I expect, have a fine set of core readings to get a more comprehensive overview of the field. While some relevant ones are there (e.g., the “what is an ontology?” paper by Oberle, Guarino, and Staab; “Ontology (Science)” by Smith, Gruber’s paper despite the flawed definition), numerous ones are the result of some authors’ self-promotion, like the one on bootstrapping biomedical ontologies, an ontology for user profiles, IE for disease intelligence—they’re not even close to ‘staple food’ for ontologies—and the 2001 OIL paper and Ontology Development 101 technical report are woefully out-dated. The “References” section is a mishmash of webpages, slides, and a few scientific papers most of which are not from mainstream ontology research venues.

And that’s just a sampling of the issues with the “Ontology (information science)” Wikipedia entry; the ontology engineering entry is worse. No wonder my students—having grown up with treating Wikipedia as gospel—get confused.


Ontologies entries in other languages

That much about the English language version of ‘ontology (information science)’. I happen to speak a few other languages as well, so I also checked most of those for their ‘ontology (information science)’ entry. For future reference as a stock-taking of today’s contents, I’ve pdf-printed them all (zipped). For starters, they all had ontologies at least categorised properly into ‘informatica’. +1.

The entry in Dutch is very short; one can quibble and nit-pick about term usage, and it is disappointing that there’s only one reference (in Dutch, so wouldn’t count in the arxiv analysis), but at least it’s not riddled with mistakes and inappropriate content.

The German one is quite elaborate, and starts off reasonably well, but has some mistakes. Among others, the typical novice pitfall of confusing classes for instances [“Stadt als Instanz des Begriffs topologisches Element der Klasse Punkte”] and the sample ontology—which of itself is a good idea to add to an overview page—has lots of modelling issues, such as datatypes and mixing subclasses with properties (the Maler [painter] with region of origin Flämish [Flemish]). Interestingly, ontology types for the English reader are foundational, domain, and hybrid, whereas the German reader has only lightweight and heavyweight ones. As for the references, there are some oddball ones, but the fair/good ones are in the majority, if incomplete, and perhaps a bit lopsided to Barry Smith material.

The Italian entry is of similar length as the German entry, but, unfortunately, has some copy-and-paste from the English one when it comes to the list of languages and examples, so, a propagation of issues; the ‘example of applications’ does list another project, and there is no ‘criticisms’ section. The text has been written separately instead of being a translation-of-English (idem ditto for the other entries, btw), and thus also consists of some other information. For starters, removing most of the ‘Premesse’ would be helpful (or elaborating on it in a criticism section; starting the topic with information warfare and terrorism? nah). The section after that (‘uso come glossario di base’) is chaotic, reading like a competitor-author per paragraph, and riddled with problematic statements like that all computer programs are based on foundational ontologies (“Tutti i programmi per computer si basano su ontologie fondazionali,”), and that the scope of an ontology is to develop a database (“Lo scopo di un’ontologia computazionale […] [è] di creare una base di dati”). It does mention OntoClean. Italian readers will also be treated on a brief discussion of the debate on one or multiple ontologies (absent from the other entries). It has a quite different set of ‘external links’ compared to the other entries, and there are hardly any references. Al in all, one leaves with a quite distinct impression of ontologies after reading the Italian one cf the Dutch, German, and English ones.

Last, the Spanish entry is about as short as the Dutch one. There’s overlap in content with the Italian entry in the sense of near-literal translation (on the foundational ontology and that Murray-Rust guy on the ‘semantic and ontological war’ due to ‘competition between standards’), and it has a plug for MathWorld (?!).

So, if the entries on topics I’m an expert in are such of such dubious quality (the German entry is, relatively, the best), then what does that imply for the other entries that superficially may seem potentially useful introductory overviews? By the same token, they probably are not. And the ontology topics are not even in an area with as much contention as topics in political sciences, history, etc. Go figure.


Now what?

Is this a bad thing? I already can see a response in the making along the line of “well, it’s crowdsourced and everyone can contribute, we invite you to not just complain, but instead improve the entry; really, you’re welcome to do so”. Maybe I will. But first, two other questions have to be answered. The arxiv paper that got my rant started claimed that open source papers are good, and that they’re reworked in interested-layperson digestible bites in Wikipedia to spread and diffuse knowledge in the world. The idea is nice, but the reality is different. Pretty much all the main papers on ontologies are freely available online even if not published ‘open access’ (computer science praxis, thank you), yet, they are not the ones that appear in Wikipedia. Question 1: Why are those—freely available—main references of ontologies not referenced there already?

A concern of a different type is that several schools in South Africa have petitioned to get free Internet access to search Wikipedia as a source of information for their studies. Their main argument was that books don’t arrive, or arrive late, and there is no library in many schools, which is a common problem. They got the zero-rate Wikipedia from MTN; more info here. (I’ll let you mull over its effects on the quality of education they get from that.) Question 2: Can Wikipedia be made a really authoritative resource with the current set-up so as to live up to what the learners [and interested laypersons] need? If I were to rewrite an update to the Wikipedia pages today, a pesky editor or someone else simply can click to roll it back to the previous version, or slowly but steadily have funny references seeping back in and sentences cut and rephrased. Writing free textbooks, or at least extensive lecture notes, seems a better option, or a ‘synthesis lectures’ booklet endorsed by lots of people researching and using ontologies. What about a ‘this version is endorsed by …’ button for Wikipedia entries?

Any better ideas, or answers to those questions, perhaps? Free diffusion of digested high quality scientific knowledge really does sound very appealing…


[1] Teplitskiy, M., Lu, G., Duede, E. Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science.

On the need for bottom-up language-specific terminology development

Peoples of several languages intellectualise their vocabulary so as to maintain their own language as medium of instruction (or: LoLT, language of teaching and learning), to conduct scientific discussions among peers and, in some cases, still, publish research in their own language. Some languages I know of who do this are French, Spanish, German, and Italian; e.g., the English ‘set’ is conjunto (Sp.) and insieme (It.), and the Dutch for ‘garbage collection’ (in computing) is geheugensanering. I found out the hard way last month that my Italian scientific vocabulary was better than my Dutch one, never really having practiced the latter in my field of specialisation and I noticed that over the years that I have been globetrotting, quite a few Anglicisms in Dutch had been replaced with Dutch words and some were there for a while already (as excuse: I studied a different discipline in the Netherlands). How do these new words come about? There are many ways of word creation, and then it depends on the country or language region how it gets incorporated in the language. For instance, French uses a top-down approach with the Académie Française and Spain has the Real Academia Española. The Netherlands has De Nederlandse Taalunie that isn’t as autocratic, it seems; for instance, to follow suit with the French mot dièse for the twitter ‘hashtag’, there was some consultation and online voting (sound file) to come up with an agreeable Dutch term for hashtag. But how does that happen elsewhere?

We found out that there is a mode of practice for language-specific terminology development that happens in small ‘workshops’ of some 13-15 people, constituting mainly of terminologists and linguists, and 1-3 subject matter experts. There may be a consultative event with stakeholders, who are not necessarily with subject matter experts. Shocking. The sheer arrogance of the former, who ‘magically’ grasp the concepts that typically take a while to understand when it comes to science, but they supposedly nevertheless understand it well enough to come up with a meaningful local-language word. But maybe, you say, I’m too arrogant in thinking subject matter experts, such as myself, can come up with decent local-language terms. Maybe that’s partially true, but what may be more problematic, is that only a few subject matter experts are involved, so there is an over-reliance on those mere few. Maybe, you say, that’s not a problem. We put that to the test for a computing and computer literacy terminology development for isiZulu, and found out it was: it depends on who you ask what comes out of the term harvesting and term preference. And then asking just a few people is a problem for a term’s uptake. (The students involved in the experiments did not even know there was a computer literacy term list from the South African Department of Arts and Culture, published in 2005, and boo-ed away several of the terms.)

The way we tested it, was with three experiments. The first experiment was an experts-only workshop, with ‘experts’ being 4th-year computer science students who have isiZulu as home language, as there were no isiZulu-speaking MSc and PhD students, nor colleagues, in CS at the University of KwaZulu-Natal, where we did the experiment. The second experiment was an isiZulu-localised survey among undergraduate CS students to collect terms, where we hoped to see a difference between a survey where they were given the entity with an English name and the entity as a picture. The third experiment was a survey where computer literacy students (1st-year science students) could vote for terms for which there was more than one isiZulu term proposed. The details of the set-up and the results have been published recently in the Alternation open-access journal article “Limitations of Regular Terminology Development Practices: The Case of isiZulu Computing Terminology”, in the special issue on “Re-envisioning African Higher Education: Alternative Paradigms, Emerging Trends and New Directions”, edited by Rubby Dhunpath, Nyna Amin and Thabo Msibi. It describes which isiZulu terms from where are affected, ranging from a higher incidence of ‘zulufying’ English terms in aforementioned list by the South African Department of Arts and Culture cf. the proposals by the experiments’ participants, and, e.g., expert consensus for inqolobane for database, versus a preference for imininingo egciniwe by the computer literacy students (see paper for more cases). Further, when all respondents across the survey are aggregated and go for majority voting, the proposed terms by the experts are snowed under. The latter is particularly troublesome in a country where computing is a designated critical skill (or: there aren’t nearly enough of them).

A byproduct of the experiments was that we have collected the, to date, longest list of isiZulu computing terms, which have gone through a standardisation process in the meantime. The latter is mainly thanks to the tireless efforts of Khumbulani Mngadi of the ULPDO of UKZN, and the two expert CS honours students who volunteered in the process, Sibonelo Dlamini and Tanita Singano.

Our approach was already less exclusionary cf. the aforementioned traditional/standard way, but it also shows that broader participation is needed both to collect and to choose terms; or, in the words of the special issue editors [2]: a “democratization of the terminology development process” that “transcends the insularity and purism which characterises traditional laboratory approaches to development”. We are still working on-and-off to achieve this with crowdsourcing, and maybe we should start thinking of crowdfunding that crowdsourcing effort to speed up the whole thing and complete the commuterm project.

As a last note: in case you are interested in other contributions to “re-envisioning African higher education”: scan through the main page online, read the editorial [2] for main outcomes of each of the papers, and/or read the papers, on topics as diverse as postgrad supervision in isiZulu, teaching sexual and gender diversity to pre-service teachers, maths education, IKS in HE, and much more.


[1] Keet, C.M., Barbour, G. Limitations of Regular Terminology Development practices: the case of the isiZulu Computing Terminology. Alternation, 2014, 12: 13-48.

[2] Dhunpath, R., Amin, N. Msibi, T. Editorial: Re-envisioning African and Higher Education: Alternative Paradigms, Emerging Trends and New Directions. Alternation, 2014, 12: 1-12.