What about ethics and responsible data integration and data firewalls?

With another level 4 lockdown and a curfew from 9pm for most of July, I eventually gave in and decided to buy a TV, for some diversion with the national TV channels. In the process of buying, it appeared that here in South Africa, you have to have a valid paid-up TV licence to be allowed to buy a TV. I had none yet. So there I was in the online shopping check-out on a Sunday evening being held up by a message that boiled down to a ‘we don’t recognise your ID or passport number as having a TV licence’. As advances in the state’s information systems would have it, you can register for a TV licence online and pay with credit card to obtain one near-instantly. The interesting question from an IT perspective then was: how long will it take for the online retailer to know I duly registered and paid for the licence? In other words: are the two systems integrated and if so, how? It definitely is not based on a simple live SPARQL query from the retailer to a SPARQL endpoint of the TV licences database, as I still failed the retailer’s TV licence check immediately after payment of the licence and confirmation of it. Some time passed with refreshing the page and trying again and writing a message to the retailer, perhaps 30-45 minutes or so. And then it worked! A periodic data push or pull it is then, either between the licence database and the retailer or within the state’s back-end system and any front-end query interface. Not bad, not bad at all.

One may question from a privacy viewpoint whether this is the right process. Why could I not simply query by, say, just TV licence number and surname, but having had to hand over my ID or passport number for the check? Should it even be the retailer’s responsibility to check whether their customer has paid the tax?

There are other places in the state’s systems where there’s some relatively advanced integration of data between the state and companies as well. Notably, the SA Revenue Service (SARS) system pulls data from any company you work for (or they submit that via some ETL process) and from any bank you’re banking with to check whether you paid the right amount (if you owe them, they send the payment order straight to your bank, but you still have to click ‘approve’ online). No doubt it will help reduce fraud, and by making it easier to fill in tax forms, it likely will increase the amount collected and will cause less errors that otherwise may be costly to fix. Clearly, the system amounts to reduced privacy, but it remains within the legal framework—someone trying to evade paying taxes is breaking the law, rather—and I support the notion of redistributive taxation and to achieve that will as little admin as possible.

These examples do raise broader questions, though: when is data integration justified? Always? If not always, then when is it not? How to ensure that it won’t happen when it should not? Who regulates data integration, if anyone? Are there any guidelines or a checklist for doing it responsibly so that it at least won’t cause unintentional harm? Which steps in the data integration, if any, are crucial from a responsibility and ethical point of view?

No good answers

pretty picture of a selection of data integration tasks. source: https://datawarehouseinfo.com/wp-content/uploads/2018/10/data-integration-1024x1022.png
pretty picture of a selection of data integration tasks. (source: dwh site)

I did search for academic literature, but found only one paper mentioning we should think of at least some of these sort of questions [1]. There are plenty of ethics & Big Data papers (e.g., [2,3]), but those papers focus on the algorithms let loose on the data and consequences thereof once the data has been integrated, rather than yes/no integration or any of the preceding integration processes themselves. There are, among others, data cleaning, data harmonisation and algorithms for that, schema-based integration (LAV, GAV, or GLAV), conceptual model-based integration, ontology-driven integration, possibly recurring ETL processes and so on, and something may go wrong at each step or may be the fine-grained crucial component of the ethical considerations. I devised one toy example in the context of ontology-based data access and integration where things would go wrong because of a bias [4] in that COVID-19 ontology that has data integration as its explicit purpose [5]. There are also informal [page offline dd 25-7-2021] descriptions of cases where things went wrong, such as the data integration issues with the City of Johannesburg that caused multiple riots in 2011, and no doubt there will be more.

Taking the ‘non-science’ route further to see if I could find something, I did find a few websites with some ‘best practices’ and ‘guidelines’ for data integration (e.g., here and here), with the brand new and most comprehensive set of data integration guidelines at end-user level by UN’s ESCAP that focuses on data integration for statistics offices on what to do and where errors may creep in [6]. But that’s all. No substantive hits with ‘ethics in data integration’ and similar searches in the academic literature. Maybe I’m searching in the wrong places. Wading through all ‘data ethics’ papers to find the needle in the haystack may have to be done some other time. If you know of scientific literature that I missed specifically regarding data integration, I’d be most grateful if you’d let me know.

The ‘recurring reliables’ for issues: health and education

Meanwhile, to take a step toward an answer of at least a subset of the aforementioned questions, let me first mention two other recent cases, also from South Africa, although the second issue happened in the Netherlands as well.

The first one is about healthcare data. I’m trying to get a SARS-CoV-2 vaccine. Registration for the age group I’m in opened on the 14th in the evening and so I did register in the state’s electronic vaccination data system (EVDS), which is the basic requirement for getting a vaccine. The next day, it appeared that we could book a slot via the health insurance I’m a member of. Their database and the EVDS are definitely not integrated, and so my insurer spammed me for a while with online messages in red, via email, and via SMS that I should register with the EVDS, even though I had already done that well before trying out their app.

Perhaps the health data are not integrated because it’s health; perhaps it was just time pressure to not delay the SARS-CoV-2 vaccination programme rollout. For some sectors, such as the basic education sector and then the police, they got loaded into the EVDS by the respective state department in one go via some ETL process, rather than people having to bother with individual registration. ID number, names, health insurance, dependants, home address, phone number, and whatnot that the EVDS asked for. And that regardless whether you want the vaccine or not—at least most people do. I don’t recall anyone having had a problem with that back-end process that it happened, aside from reported glitches in the basic education sectors’ ETL process, with reports on missing foreign national teachers and employees of independent schools who wanted in but weren’t.

Both the IT systems for vaccination management and any app for a ‘pass’ for having been vaccinated enjoys some debates on privacy internationally. Should they be self-standing systems? If it is allowed some integration, then with what? Should a healthcare provider or insurer be informed of the vaccination status of a member (and, consequently, act accordingly, whatever that may be), only if the member voluntarily discloses it (like with the vaccination scheduling app), or never? One’s employer? The movie theatre or mall you may want to enter? Perhaps airline companies want access to the vaccine database as well, who could choose to only let vaccinated people on their planes? The latter happens with other vaccinations for sure; e.g., yellow fever vaccination proof to enter SA from some countries, which the airline staff did ask for when I checked in in Argentina when travelling back to SA in 2012. That vaccination proof had gone into the physical yellow fever vaccination booklet that I carried with me; no app was involved in that process, ever. But now more things are digital. Must any such ‘covid-19 pass’ necessarily be digital? If so, who decides who, if anyone, will get access to the vaccination data, be it the EVDS data in SA or their homologous systems in other countries? To the best of my knowledge, no regulations exist yet. Since the EVDS is an IT system of the state, I presume they will decide. If they don’t, it will be up to the whims of each company, municipality, or province, and then is bound to generate lots of confusion among people.

The other case of a different nature comes in the news regularly; e.g., here, here, and here. It’s the tension that exists between children’s right to education and the paperwork to apply for a school. This runs into complications when they have an “undocumented” status, be it because of an absent birth certificate or their and their parent’s status as legal/illegal and their related ID documents or the absence thereof. It is forbidden for a school to contact Home affairs to get the prospective pupil’s and their respective parents’/guardians’ status, and for Home Affairs to provide that data to the schools, let alone integrate those two database at the ministerial level. Essentially, it is an intentional ‘Chinese wall’ between the two databases: the right to education of a child trumps any possible violation of legality of stay in the country or missing paperwork of the child or their parents/guardians.

Notwithstanding, exclusive or exclusionary schools try to filter them out by other means, such as by demanding that sort of data when you want to apply for admission; here’s an example, compared to public schools where evidence of an application for permission to stay suffices or at least evidence of efforts to engage with Home Affairs will do already. When the law says ‘no’ to the integration, how can you guarantee it won’t happen, neither through the software nor by other means (like by de facto requiring the relevant data stored in the Home Affairs database in an admission form)? Policing it? People reporting it somewhere? Would requesting such information now be a violation of the Protection of Personal Information Act (POPIA) that came into force on the 1st of July, since it asks for more personal data than needed by law?

Regulatory aspects

These cases—TV licence, SARS (the tax, not the syndrome), vaccine database, school admissions—are just a few anecdotes. Data integration clearly is not always allowed and when it is not, it has been a deliberate decision not to do so because its outcome is easy to predict and deemed unwanted. Notably for the education case, it is the government who devised the policy for a regulatory Chinese wall between its systems. The TV licence appears to lie at the other end of the spectrum. The broadcasting act of 1999 implicitly puts the onus on the seller of TVs: the licence is not a fee to watch public TV, it is a thing to give the licence holder the right to use a TV (article 27, if you must know), so if you don’t have the right to have it, then you can’t buy it. It’s analogous to having to be over 18 to buy alcohol, where the seller is held culpable if the buyer isn’t. That said, there are differences in what the seller requests from the customer: Makro requires the licence number only and asks for ID only if you can’t remember the licence number so as to ‘help you find it’, whereas takealot demands both ID and licence in any case, and therewith perhaps is then asking for more than strictly needed. Either way, since any retailer thus should be able to access the licence information instantly to check whether you have the right to own a TV, it’s a bit like as if “come in and take my data” is written all over the TV licence database. I haven’t seen any news articles about abuse.

For the SARS-CoV-2 vaccine and the EVDS data, there is, to the best of my knowledge, no specific regulation in place from the EVDS to third parties, other than that vaccination is voluntary and there is SA’s version of the GDPR, the aforementioned POPIA, which is based on the GDPR principles. I haven’t seen much debate about organisations requiring vaccination, but they can make vaccination mandatory if they want to, from which follows that there will have to be some data exchange either between the EVDS and third parties or from EVDS to the person and from there to the company. Would it then become another “come in and take my data”? We’ll cross that bridge when it comes, I suppose; coverage is currently at about 10% of the population and not everyone who wants to could get vaccinated yet, so we’re still in a limbo.

What could possibly go wrong with widespread access, alike with the TV licence database? A lot, of course. There are the usual privacy and interoperability issues (also noted here), and there are calls even in the laissez faire USA to put a framework in place to provide companies with “standards and bounds”. They are unlikely going to be solved by the CommonPass of the Commons Project bottom-up initiative, since there are so many countries with so many rules on privacy and data sharing. Interoperability between some systems is one thing; one world-wide system is another cup of tea.

What all this boils down to is not unlike Moshe Vardi’s argument, in that there’s the need for more policy to reduce and avoid ethical issues in IT, AI, and computing, rather than that computing would be facing an ethics crisis [7]. His claim is that failures of policy cause problems and that the “remedy is public policy, in the form of laws and regulations”, not some more “ethics outrage”. Presumably, there’s no ethics crisis, of the form that there would be a lack of understanding of ethical behaviour among computer scientists and their managers. Seeing each year how students’ arguments improve between the start of the ethics course and at the end in the essay and exam, I’d argue that basic sensitization is still needed, but on the whole, more and better policy could go a long way indeed.

More research on possible missteps in the various data integration processes would also be helpful, and that from a technical angle, as would learning from case studies be, and contextual inquiries [8], as well as a rigorous assessment on possible biases, alike it was examined for software development processes [9]. Those outcomes then may end up as a set of guidelines for data integration practitioners and the companies they work for, and inform government to devise policies. For now, the ESCAP guidelines [6] probably will be of most use to a data integration practitioner. It won’t catch all biases and algorithmic issues & tools and assumes one is allowed to integrate already, but it is a step in the direction of responsible data integration. I’ll think about it a bit more, too, and for the time being I won’t bother my students with writing an essay about ethics of data integration just yet.

References

[1] Firmani, D., Tanca, L., Torlone, R. Data processing: reflection on ethics. International Workshop on Processing Information Ethically (PIE’19). CEUR-WS vol. 2417. 4 June 2019.

[2] Herschel, R., Miori, V.M. Ethics & Big Data. Technology in Society, 2017, 49:31‐36.

[3] Sax, M. Finders keepers, losers weepers. Ethics and Information Technology, 2016, 18: 25‐31.

[4] Keet, C.M. Bias in ontologies — a preliminary assessment. Technical Report, Arxiv.org, January 20, 2021. 10p

[5] He, Y., et al. 2020. CIDO: The Community-based CoronavirusInfectious Disease Ontology. In Hastings, J.; and Loebe, F., eds., Proceedings of the 11th international Conference on Biomedical Ontologies, CEUR-WS vol. 2807.

[6] Economic and Social Commission for Asia and the Pacific (ESCAP). Asia-Pacific Guidelines to Data Integration for Official Statistics. Training manual. 15 April 2021.

[7] Vardi, M.Y. Are We Having An Ethical Crisis in Computing? Communications of the ACM, 62(1):7

[8] McKeown, A., Cliffe, C., Arora, A. et al. Ethical challenges of integration across primary and secondary care: a qualitative and normative analysis. BMC Med Ethics 20, 42 (2019).

[9] R. Mohanani, I. Salman, B. Turhan, P. Rodriguez, P. Ralph, Cognitive biases in software engineering: A systematic mapping study, IEEE Transactions on Software Engineering, 46 (2020): 1318–1339.

Advertisement

A grammar of the isiZulu verb (present tense)

If you have read any of the blog posts on (automated) natural language generation for isiZulu, then you’ll probably agree with me that isiZulu verbs are non-trivial. True, verbs in other languages are most likely not as easy as in English, or Afrikaans for that matter (e.g., they made irregular verbs regular), but there are many little ‘bits and pieces’ ‘glued’ onto the verb root that make it semantically a ‘heavy’ element in a sentence. For instance:

  • Aba-shana ba-ya-zi-theng-is-el-an-a                izimpahla
  • Children   2.SC-Pres-8.OC-buyVR -C-A-R-FV 8.clothes
  • ‘The children are selling the clothes to each other’

The ba is the subject concord (~conjugation) to match with the noun class (which is 2) of the noun that plays the subject in the sentence (abashana), the ya denotes a continuous action (‘are doing something’ in the present), the zi is the object concord for the noun class (8) of the noun that plays the object in the sentence (izimpahla), theng is the verb root, then comes the CARP extension with is the causative (turning ‘buy’ into ‘sell’), and el the applicative and an the reciprocative, which take care of the ‘to each other’, and then finally the final vowel a.

More precisely, the general basic structure of the verb is as follows:

where NEG is the negative; SC the subject concord; T/A denotes tense/aspect; MOD the mood; OC the object concord; Verb Rad the verb radical; C the causative; A the applicative; R the reciprocal; and P the passive. For instance, if the children were not selling the clothes to each other, then instead of the SC, there would be the NEG SC in that position, making the verb abayazithengiselana.

To make sense of all this in a way that it would be amenable to computation, we—my co-author Langa Khumalo and I—specified the grammar of the complex verb for the present tense in a CFG using an incremental process of development. To the best of our (and the reviewer’s) knowledge, the outcome of the lengthy exercise is (1) the first comprehensive and precisely formulated documentation of the grammar rules for the isiZulu verb present tense, (2) all together in one place (cf. fragments sprinkled around in different papers, Wikipedia, and outdated literature (Doke in 1927 and 1935)), and (3) goes well beyond handling just one of the CARP, among others. The figure below summarises those rules, which are explained in detail in the forthcoming paper “Grammar rules for the isiZulu complex verb”, which will be published in the Southern African Linguistics and Applied Language Studies [1] (finally in print, yay!).

It is one thing to write these rules down on paper, and another to verify whether they’re actually doing what they’re supposed to be doing. Instead of fallible and laborious manual checking, we put them in JFLAP (for the lack of a better alternative at the time; discussed in the paper) and tested the CFG both on generation and recognition. The tests went reasonably well, and it helped fixing a rule during the testing phase.

Because the CFG doesn’t take into account phonological conditioning for the vowels, it generates strings not in the language. Such phonological conditioning is considered to be a post-processing step and was beyond the scope of elucidating and specifying the rules themselves. There are other causes of overgeneration that we did not get around to doing, for various reasons: there are rules that go across the verb root, which are simple to represent in coding-style notation (see paper) but not so much in a CFG, and rules for different types of verbs, but there’s no available resource that lists which verb roots are intransitive, which as monosyllabic and so on. We have started with scoping rules and solving issues for the latter, and do have a subset of phonological conditioning rules; so, to be continued… For now, though, we have completed at least one of the milestones.

Last, but not least, in case you wonder what’s the use of all this besides the linguistics to satisfy one’s curiosity and investigate and document an underresourced language: natural language generation for intelligent user interfaces in localised software, spellcheckers, and grammar checkers, among others.

 

References

[1] Keet, C.M., Khumalo, L. Grammar rules for the isiZulu complex verb. Southern African Linguistics and Applied Language Studies, (in print). Submitted version (the rules are the same as in the final version)

Reblogging 2010: South African women on leadership in science, technology and innovation

From the “10 years of keetblog – reblogging: 2010”: while the post’s data are from 5 years ago, there’s still room for improvement. That said, it’s not nearly as bad as in some other countries, like the Netherlands (though the university near my home town improved from 1.6% to 5% women professors over the past 5 years). As for the places I worked post-PhD, the percent female academics with full time permanent contract: FUB-KRDB group 0% (still now), UKZN-CS-Westville: 12.5% (me; 0% now), UCT-CS: 42%.

South African Women on leadership in science, technology and innovation; August 13, 2010

 

Today I participated in the Annual NACI symposium on the leadership roles of women in science, technology and innovation in Pretoria, which was organized by the National Advisory Council on Innovation, which I will report on further below. As preparation for the symposium, I searched a bit to consult the latest statistics and see if there are any ‘hot topics’ or ‘new approaches’ to improve the situation.

General statistics and their (limited) analyses

The Netherlands used to be at the bottom end of the country league tables on women professors (from my time as elected representative in the university council at Wageningen University, I remember a UN table from ’94 or ‘95 where the Netherlands was third last from all countries). It has not improved much over the years. From Myklebust’s news item [1], I sourced the statistics to Monitor Women Professors 2009 [2] (carried out by SoFoKleS, the Dutch social fund for the knowledge sector): less than 12% of the full professors in the Netherlands are women, with the Universities of Leiden, Amsterdam, and Nijmegen leading the national league table and the testosterone bastion Eindhoven University of Technology closing the ranks with a mere 1.6% (2 out of 127 professors are women). With the baby boom generation lingering on clogging the pipeline since a while, the average percentage increase has been about 0.5% a year—way too low to come even near the EU Lisbon Agreement Recommendation’s target of 25% by 2010, or even the Dutch target of 15%, but this large cohort will retire soon, and, in terms of the report authors, makes for a golden opportunity to move toward gender equality more quickly. The report also has come up with a “Glass Ceiling Index” (GCI, the percentage of women in job category X-1 divided by the percentage of women in job category X) and, implicitly, an “elevator” index for men in academia. In addition to the hard data to back up the claim that the pipeline is leaking at all stages, they note it varies greatly across disciplines (see Table 6.3 of the report): in science, the most severe blockage is from PhD to assistant professor, in Agriculture, Technology, Economics, and Social Sciences it is the step from assistant to associate professor, and for Law, Language & Culture, and ‘miscellaneous’, the biggest hurdle is from associate to full professor. From all GCIs, the highest GCI (2.7) is in Technology in the promotion from assistant to associate professor, whereas there is almost parity at that stage in Language & Culture (GCI of 1.1, the lowest value anywhere in Table 6.3).

“When you’re left out of the club, you know it. When you’re in the club, you don’t see what the problem is.” Prof. Jacqui True, University of Auckland [4]

Elsewhere in ‘the West’, statistics can look better (see, e.g., The American Association of University Professors (AAUP) survey on women 2004-05), or are not great either (UK, see [3], but the numbers are a bit outdated). However, one can wonder about the meaning of such statistics. Take, for instance, the NYT article on a poll about paper rights vs. realities carried out by The Pew Research in 22 countries [4]: in France, some 100% paid their lip service to being in favour of equal rights, yet 75% also said that men had a better life. It is only in Mexico (56%), Indonesia (55%) and Russia (52%) that the people who were surveyed said that women and men have achieved a comparable quality of life. But note that the latter statement is not the same as gender equality. And equal rights and opportunities by law does not magically automatically imply the operational structures are non-discriminatory and an adequate reflection of the composition of society.

A table that has generated much attention and questions over the years—but, as far as I know, no conclusive answers—is the one published in Science Magazine [5] (see figure below). Why is it the case that there are relatively much more women physics professors in countries like Hungary, Portugal, the Philippines and Italy than in, say, Japan, USA, UK, and Germany? Recent guessing for the answer (see blog comments) are as varied as the anecdotes mentioned in the paper.

Physics professors in several countries (Source: 5).

Barinaga’s [5] collection of anecdotes of several influential factors across cultures include: a country’s level of economic development (longer established science propagates the highly patriarchal society of previous centuries), the status of science there (e.g., low and ‘therefore’ open to women), class structure (pecking order: rich men, rich women, poor men, poor women vs. gender structure rich men, poor men, rich women, poor women), educational system (science and mathematics compulsory subjects at school, all-girls schools), and the presence or absence of support systems for combining work and family life (integrated society and/or socialist vs. ‘Protestant work ethic’), but the anecdotes “cannot purport to support any particular conclusion or viewpoint”. It also notes that “Social attitudes and policies toward child care, flexible work schedules, and the role of men in families dramatically color women’s experiences in science”. More details on statistics of women in science in Latin America can be found in [6] and [7], which look a lot better than those of Europe.

Barbie the computer engineer

Bonder, in her analysis for Latin America [7], has an interesting table (cuadro 4) on the changing landscape for trying to improve the situation: data is one thing, but how to struggle, which approaches, advertisements, and policies have been, can, or should be used to increase women participation in science and technology? Her list is certainly more enlightening than the lame “We need more TV shows with women forensic and other scientists. We need female doctor and scientist dolls.” (says Lotte Bailyn, a professor at MIT) or “Across the developed world, academia and industry are trying, together or individually, to lure women into technical professions with mentoring programs, science camps and child care.” [8] that only very partially addresses the issues described in [5]. Bonder notes shifts in approaches from focusing only on women/girls to both sexes, from change in attitude to change in structure, from change of women (taking men as the norm) to change in power structures, from focusing on formal opportunities to targeting to change the real opportunities in discriminatory structures, from making visible non-traditional role models to making visible the values, interests, and perspectives of women, and from the simplistic gender dimension to the broader articulation of gender with race, class, and ethnicity.

The NACI symposium

The organizers of the Annual NACI symposium on the leadership roles of women in science, technology and innovation provided several flyers and booklets with data about women and men in academia and industry, so let us start with those. Page 24 of Facing the facts: Women’s participation in Science, Engineering and Technology [9] shows the figures for women by occupation: 19% full professor, 30% associate professor, 40% senior lecturer, 51% lecturer, and 56% junior lecturer, which are in a race distribution of 19% African, 7% Coloured, 4% Indian, and 70% White. The high percentage of women participation (compared to, say, the Netherlands, as mentioned above) is somewhat overshadowed by the statistics on research output among South African women (p29, p31): female publishing scientists are just over 30% and women contributed only 25% of all article outputs. That low percentage clearly has to do with the lopsided distribution of women on the lower end of the scale, with many junior lecturers who conduct much less research because they have a disproportionate heavy teaching load (a recurring topic during the breakout session). Concerning distribution of grant holders in 2005, in the Natural & agricultural sciences, about 24% of the total grants (211 out of 872) have been awarded to women and in engineering & technology it is 11% (24 out of 209 grants) (p38). However, in Natural & agricultural sciences, women make up 19% and in engineering and technology 3%, which, taken together with the grant percentages, show there is a disproportionate amount of women obtaining grants in recent years. This leads one to suggest that the ones that actually do make it to the advanced research stage are at least equally as good, if not better, than their male counterparts. Last year, women researchers (PIs) received more than half of the grants and more than half of the available funds (table in the ppt presentation of Maharaj, which will be made available online soon).

Mrs Naledi Pandor, the Minister for Science and Technology, held the opening speech of the event, which was a good and entertaining presentation. She talked about the lack of qualified PhD supervisors to open more PhD positions, where the latter is desired so as to move to the post-industrial, knowledge-based economy, which, in theory at least, should make it easier for women to participate than in an industrial economy. She also mentioned that one should not look at just the numbers, but instead at the institutional landscape so as to increase opportunities for women. Last, she summarized the “principles and good practice guidelines for enhancing the participation of women in the SET sector”, which are threefold: (1) sectoral policy guidelines, such as gender mainstreaming, transparent recruiting policies, and health and safety at the workplace, (2) workplace guidelines, such as flexible working arrangements, remuneration equality, mentoring, and improving communication lines, and (3) re-entry into the Science, Engineering and Technology (SET) environment, such as catch-up courses, financing fellowships, and remaining in contact during a career break.

Dr. Thema, former director of international cooperation at the Department of Science and Technology added the issues of the excessive focus on administrative practicalities, the apartheid legacy and frozen demographics, and noted that where there is no women’s empowerment, this is in violation of the constitution. My apologies if I have written her name and details wrongly: she was a last-minute replacement for Prof. Immaculada Garcia Fernández, department of computer science at the University of Malaga, Spain. Garcia Fernández did make available her slides, which focused on international perspectives on women leadership in STI. Among many points, she notes that the working conditions for researchers “should aim to provide… both women and men researchers to combine work and family, children and career” and “Particular attention should be paid, to flexible working hours, part-time working, tele-working and sabbatical leave, as well as to the necessary financial and administrative provisions governing such arrangements”. She poses the question “The choice between family and profession, is that a gender issue?”

Dr. Romilla Maharaj, executive director for human and institutional capacity development at the National Research Foundation came with much data from the same booklet I mentioned in the first paragraph, but little qualitative analysis of this data (there is some qualitative information). She wants to move from the notion of “incentives” for women to “compensation”. The aim is to increase the number of PhDs five-fold by 2018 (currently the rate is about 1200 each year), which is not going to be easy (recollect the comment by the Minister, above). Concerning policies targeted at women participation, they appear to be successful for white women only (in postdoc bursaries, white women even outnumber white men). In my opinion, this smells more of a class/race structure issue than a gender issue, as mentioned above and in [5]. Last, the focus of improvements, according to Maharaj, should be on institutional improvements. However, during the break-out session in the afternoon, which she chaired, she seemed to be selectively deaf on this issue. The problem statement for the discussion was the low research output by women scientists compared to men, and how to resolve that. Many participants reiterated the lack of research time due to the disproportionate heavy teaching load (compared to men) and what is known as ‘death by committee’, and the disproportionate amount of (junior) lecturers who are counted in the statistics as scientists but, in praxis, do not do (or very little) research, thereby pulling down the overall statistics for women’s research output. Another participant wanted to se a further breakdown of the numbers by age group, as the suspicion was that it is old white men who produce most papers (who teach less, have more funds, supervise more PhD students etc.) (UPDATE 13-10-’10: I found some data that seems to support this). In addition, someone pointed out that counting publications is one thing, but considering their impact (by citations) is another one and for which no data was available, so that a recommendation was made to investigate this further as well (and to set up a gender research institute, which apparently does not yet exist in South Africa). The pay-per-publication scheme implemented at some universities could thus backfire for women (who require the time and funds to do research in the first place so as to get at least a chance to publish good papers). Maharaj’s own summary of the break-out session was an “I see, you want more funds”, but that does not rhyme fully with he institutional change she mentioned earlier nor with the multi-faceted problems raised during the break-out session that did reveal institutional hurdles.

Prof. Catherine Odora Hoppers, DST/NRF South African Research Chair in Development Education (among many things), gave an excellent speech with provoking statements (or: calling a spade a spade). She noted that going into SET means entering an arena of bad practice and intolerance; to fix that, one first has to understand how bad culture reproduces itself. The problem is not the access, she said, but the terms and conditions. In addition, and as several other speakers already had alluded to as well, she noted that one has to deal with the ghosts of the past. She put this in a wider context of the history of science with the value system it propagates (Francis Bacon, my one-line summary of the lengthy quote: science as a means to conquer nature so that man can master and control it), and the ethics of SET: SET outcomes have, and have had, some dark results, where she used the examples of the atom bomb, gas chambers, how SET was abused by the white male belittling the native and that it has been used against the majority of people in South Africa, and climate change. She sees the need for a “broader SET”, meaning ethical, and, (in my shorthand notation) with social responsibility and sustainability as essential components. She is putting this into practice by stimulating transdisciplinary research at her research group, and, at least and as a first step: people from different disciplines must to be able to talk to each other and understand each other.

To me, as an outsider, it was very interesting to hear what the current state of affairs is regarding women in SET in South Africa. While there were complaints, there we also suggestions for solutions, and it was clear from the data available that some improvements have been made over the years, albeit only in certain pockets. More people registered for the symposium than places available, and with some 120 attendees from academia and industry at all stages of the respective career paths, it was a stimulating mix of input that I hope will further improve the situation on the ground.

References

[1] Jan Petter Myklebust. THE NETHERLANDS: Too few women are professors. University World News, 17 January 2010, Issue: 107.

[2] Marinel Gerritsen, Thea Verdonk, and Akke Visser. Monitor Women Professors 2009. SoFoKleS, September 2009.

[3] Helen Hague. 9.2% of professors are women. Times Higher Education, May 28, 1999.

[4] Victoria Shannon. Equal rights for women? Surveys says: yes, but…. New York Times/International Herald Tribune—The female factor, June 30, 2010.

[5] Marcia Barinaga. Overview: Surprises Across the Cultural Divide. Compiled in: Comparisons across cultures. Women in science 1994. Science, 11 March 1994 263: 1467-1496 [DOI: 10.1126/science.8128232]

[6] Beverley A. Carlson. Mujeres en la estadística: la profesión habla. Red de Reestructuración y Competitividad, CEPAL – SERIE Desarrollo productivo, nr 89. Santiago de Chile, Noviembre 2000.

[7] Gloria Bonder. Mujer y Educación en América Latina: hacia la igualdad de oportunidades. Revista Iberoamericana de Educación, Número 6: Género y Educación, Septiembre – Diciembre 1994.

[8] Katrin Benhold. Risk and Opportunity for Women in 21st Century. New York Times International Herald Tribune—The female factor, March 5, 2010.

[9] Anon. Facing the facts: Women’s participation in Science, Engineering and Technology. National Advisory Council on Innovation, August 2009.

Quasi wordles of isiZulu online newspaper articles from this weekend

Every now and then, I get side-tracked from what I was (supposed to be) doing. This time, it was a result of the combination of preparing ICPC training problems, preparing for a statistics tutorial for the postgraduate research methods, and a conversation from last week on an isiZulu corpus with Langa Khumalo from UKZN’s ULPDO (and my co-author on several papers on isiZulu CNLs). To make a long story short, I ended up sourcing some online news articles in isiZulu and writing a little python script to count the words and top-k words of the news articles to get a feel of what the most prevalent topics of the articles were.

 

Materials and data

10 Isolezwe, listed on the front page on August 8, 2015 (articles were from Aug 6 and 7—no updates in the long weekend)

10 News24 in isiZulu articles, listed on the front page on August 8, 2015 (articles were from Aug 8)

10 News24 in isiZulu articles, listed on the front page on August 9, 2015 (articles were from Aug 9, a Sunday, and Women’s Day in South Africa)

Simple basicCorpusStats.py that one can make already just by going through the first part of ThinkPython (in case you’re unfamiliar with python).

Note: ilanga doesn’t have articles online, and therefore was not included.

Note 2: for copyright issues, I probably cannot share the txt files online, but in case you’re interested, just ask me and I’ll email them.

 

Some general stats

Isolezwe had, on average, 265 words/article, whereas news24 had about half of that (110 and 134 on Saturday and Sunday, respectively). The top-20 of each is listed at the end of this post (the raw results of News24 had “–” removed [bug], as well as udaba and olunye [standard-text noise from the articles]).

Comparing them on the August 8 offering, Isolezwe had people saying this that and the other (ukuthi ‘saying/to say’ had the highest frequency of 60) and then the police (amaphoyisa, n=27), whereas News24 had amaphoyisa 27 times as most frequent word, then abasolwa (‘suspects’) 11 times that doesn’t even appear in Isolezwe’s top-20 most frequent words (though the stem –solwa appears 9 times). The police is problematic in South Africa—they commit crimes and other dubious behaviour under investigation (e.g., Marikana)—and more get killed than in may other countries (another one last week), and crime happens. But not on a public holiday, apparently: News24 had only one –phoyisa on Aug 9.

While I hoped to find a high incidence of women, for it being Women’s Day on August 9, none of –fazi appeared in the News24 mini-corpus of 1353 words of the 10 front page articles; instead, there was a lot of saying this that and the other (ukuthi had the highest frequency of 37), and little on suspects or blaming (-solwa n=3).

 

On that quasi wordle

While ukuthi is the infinitive, there are a gazillion conjugations and things agglutinated to it that is barely clear to the linguists on how it all works, so I did not analyse that further. Amaphoyisa, on the other hand, as a noun (plural of ‘police’), has fewer variations. In the Isolezwe mini-corpus, –phoyis– (the root of ‘police’) appeared 47 times, including variants like lwamaphoyisa, ngamaphoyisa, yiphoyisa, i.e., substantially more than the 27 amaphoyisa. If I were to create a wordle, they’d be missed unless one uses some stemmer, which doesn’t happen to be available[1] and I didn’t write one (just regex in the txt). By the same token, News24’s mention of the police on August 8 goes up to 28 with –phoyisa, and as close second the blaming and suspects (-solwa, n=27).

The lack of a stemmer also means missing out on all sorts of variations on imali (‘money’, n=11) in the isolezwe articles, whereas its stem –mali pops up 29 times, due to, among others, kwemali (n=5), mali (n=3), yimali (y- functioning as copulative in that sentence, n=1), ngezimali (n=1) and others. Likewise on person/people (-ntu) for which n=17 that are distributed among abantu (plural) umuntu (singular), nabantu (‘and people’), among others.

Last, the second most frequently used word in News24 on August 9 was njengoba (‘as’, ‘whereas’, ‘since’), primarily due to the first article on the sports results of the matches played.

So, with all that background knowledge, Isolezwe’s wordle would be, in descending order (and in English for the readers of this blog): say, police, money, people. News24 on August 8: police, suspect/blame, say (two variations, n=9 each). News24 on August 9: say, as/since (and then some other adverbs).

 

In closing

This dabbling resulted in more problems and questions being raised than answered. But, for now, it’s at least still a bit of a peek into the kitchen of news in a language that I don’t master as well as I want to and should. It wasn’t useful either for the ICPC problem setting or the stats tutorial, nor is a 5123-word corpus of any use, but it was fun with python at least and satisfying at last a little of my curiosity, and perhaps it spurs someone to do all this properly/more systematically and on a grander scale. For the isiZulu speakers: it’s surely still up to you to read whichever news outlet you prefer reading.

 

 

References

[1] Pretorius, L., Bosch, S.E. (2010). Finite-state morphology of the Nguni language cluster: modelling and implementation Issues. In A. Yli-Jyrä, Kornai, A., Sakarovitch, J. & Watson, B. (Eds.), Finite-State Methods and Natural Language Processing 8th International Workshop, FSMNLP 2009. Lecture Notes in Computer Science, Vol. 6062, pp. 123–130

[2] Spiegler, S., van der Spuy, A., and Flach, P. A. (2010). Ukwabelana – an opensource morphological zulu corpus. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10), pages 1020-1028. Association for Computational Linguistics. Beijing

 

 

Top-20 words Isolezwe on Aug8 Top-20 words News24 on Aug8 Top-20 words News24 on Aug9
ukuthi 60 amaphoyisa 19 ukuthi 37
amaphoyisa 27 abasolwa 11 njengoba 13
uthe 17 uthe 9 ngemuva 12
ngoba 17 ukuthi 9 ngesikhathi 8
lokhu 16 ubudala 9 lo 8
kuthiwa 16 njengoba 9 uthe 7
kusho 16 kusho 9 uma 7
kodwa 12 lo 8 futhi 7
imali 11 oneminyaka 7 uzakwe 6
uma 10 ngokuthakatha 7 usnethemba 6
ngesikhathi 10 ngokusho 7 kodwa 6
yakhe 9 omphakathi 6 johannesburg 6
ukuba 9 okhulumela 6 yakhe 5
njengoba 9 kanti 6 united 5
nje 9 endaweni 6 ukuba 5
lo 9 yohlobo 5 ukomphela 5
khona 9 ngesikhathi 5 ufaku 5
abantu 9 ngemuva 5 ubudala 5
umphakathi 8 le 5 rhythms 5
umnuz 8 imoto 5 ngokubika 5

 

 

 

[1] There is some material on that (among others, [1,2]), though, but it’s mostly theoretical or very proof of concept, rather than the easy reuse of tools like for English, and the example rule in [1] isn’t right (it’s umfana, not umufana; the longer prefix with the extra –u– is used when the stem is one syllable, like –ntu -> umuntu).

An orchestration of ontologies for linguistic knowledge

Starting from multilingual knowledge representation in ontologies and an eye on linguistic linked data and controlled natural languages, we had developed a basic ontology for the Bantu noun class system [1] to link with the lemon model [2]. The noun class system is alike gender in, e.g., German and Italian, but then a bit different. It is based on semantics of the nouns and each Bantu language has some 12-23 noun classes. For instance, noun classes 1 and 2 are for singular and plural humans, 9 and 10 for animals (singular and plural, respectively), 11 for inanimates and long thin objects (e.g., a telephone cable), and class 14 has abstract nouns (e.g., beauty). Each class has its own augment or augment+prefix to be added to the stem. None of the other linguistic resources, such as ISOcat or the GOLD ontology, dealt with them, so, lemon did not either, but we needed it. The first version of the ontology we introduced in [1] had its limitations, but it mostly did its job. Mostly, but not fully.

Lemon needs that morphology module and then some for the rules. The ontology did not fully satisfy Bantu languages other than Chichewa and isiZulu. With the knowledge of the latter only, it was more alike a merged conceptual data model, for it was tailored to the two specific languages. Also, it wasn’t aligned to other models or ontologies, thus hampering interoperability and reuse. We didn’t have any competency questions or cool inferences either, because our scope then was just to annotate the names of the classes in an ontology. Hence, it was time for an improvement.

Among others, we don’t want just to annotate, but, given that Bantu languages are underresourced, see what we can add to derive implicit information, which could help with tagging terms. For instance

  • if you know abantu is a plural and in noun class 2 and umuntu is the singular of it, then umuntu is in noun class 1, or
  • when it is declared that inja is in noun class 9, then so is its stem -ja (or vv), or
  • language specific, which singular (plural) noun class goes with which plural (singular) noun class: while the majority neatly has a pair of successive odd and even numbers (1-2, 3-4, 5-6 etc), this is not always the case; e.g., in isiZulu, noun class 11 does not have noun class 12 as plural, but noun class 10 (which has its own augment and prefix).

Then, besides the interoperability and reuse requirements, we’d needed to distinguish between language-specific axioms and those that hold across the language family. To solve all that, we developed a framework, reusing the pyramid structure idea from BioTop [3] and the so-called “double articulation principle” of DOGMA [4], where the language-specific axioms are at the level of DOGMA’s conceptual model, for they add specific constraints.

To make a long story short, the framework/orchestration applied to the linguistic knowledge of Bantu noun classes in general, and specific to some language, looks as follows:

framework applied to some linguistics ontologies (source: [5])

framework applied to some linguistics ontologies (source: [5])

More details are described in the recently accepted paper “An orchestration framework for linguistic task ontologies” [5], to be presented as the 9th Metadata and Semantics Research Conference (MTSR’15), to be held from 9 to 11 September in Manchester, UK. My co-author Catherine Chavula will be attending MTSR’15 and present our paper, hoping/assuming that all those last-minute things—like visa and money actually being transferred to buy that plane ticket—will be sorted this month. (Odd ‘checks and balances’ that make life harder and more expensive for people outside of a visa-free zone and tied to a funding benefactor is a topic for some other time.).

The set of ontologies (in OWL) is available in NCS1.zip from my ontologies directory. It contains the goldModule—a module extracted from the GOLD ontology for general linguistics knowledge and that is aligned to the foundational ontology SUMO—the NCS ontology, and three languages-specific axiomatizations for the noun classes, being Chichewa, isiXhosa, and isiZulu (more TBA). The same approach can be used for other linguistic features in other language groups or families; e.g., instead of the NCS, one could have knowledge represented about conjugation in the Romance languages (Italian, Spanish etc.), and then the more precise axiomatization (conceptual data model, if you will) for constraints unique to each language.

 

p.s.: Bantu languages is the term used in linguistics, so that’s why it’s used here. Elsewhere, they are also called African languages. They’re not synonymous, however, as the latter includes also other, non-Bantu, languages, as it can designate any language spoken in Africa that may have a wholly different grammar, hence, the difference linguists make to avoid misinterpretation.

 

References

[1] Chavula, C., Keet, C.M. Is Lemon Sufficient for Building Multilingual Ontologies for Bantu Languages? 11th OWL: Experiences and Directions Workshop (OWLED’14). Keet, C.M., Tamma, V. (Eds.). Riva del Garda, Italy, Oct 17-18, 2014. CEUR-WS vol. 1265, 61-72.

[2] McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., Gracia, J., Hollink, L., Montiel-Ponsoda, E., Spohr, D., Wunner, T.: Interchanging lexical resources on the Semantic Web. Language Resources & Evaluation, 2012, 46(4), 701-719

[3] Beißwanger, E., Schulz, S., Stenzhorn, H., Hahn, U.: Biotop: An upper domain ontology for the life sciences: A description of its current structure, contents and interfaces to obo ontologies. Applied Ontology, 2008, 3(4), 205-212

[4] Jarrar, M., Meersman, R.: Ontology Engineering The DOGMA Approach. In: Advances in Web Semantics I, LNCS, vol. 4891, pp. 7-34. Springer (2009)

[5] Chavula, C., Keet, C.M. An Orchestration Framework for Linguistic Task Ontologies. 9th Metadata and Semantics Research Conference (MTSR’15), Springer CCIS. 9-11 September, 2015, Manchester, UK. (in print)

Wikipedia + open access = not quite a revolution (not yet at least)

The title of the arxiv blog post sounded so catchy and wishful thinking into a high truthlikeness: “Why Wikipedia + open access = revolution”, summarizing and expanding on arxiv.org/abs/1506.07608 with the title “Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science.” [1], with some quotes:

“The odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to closed access journals,” say Teplitskiy and co.

Open access publishing has changed the way scientists communicate with each other but Teplitskiy and buddies have now shown that its influence is much more significant. “Our research suggests that open access policies have a tremendous impact on the diffusion of science to the broader general public through an intermediary like Wikipedia,” says Teplitskiy and co.

It means that open access publications are dramatically amplifying the way science diffuses through the world and ultimately changing the way we understand the universe around us.

I sooo want to believe. And, honestly, when I search for something and Wikipedia is the first hit and I do click, it does seem to give a decent introductory overview of something I know little about so that I can make a better start for searching the real sources. I never bothered to look up my own areas of specialisation, other than when a co-author mentioned there was (she put?—I can’t recall) a reference to her tool in Wikipedia some time ago. But there’s that nagging comment to the technologyreview blog post saying the same thing, and adding that when s/he looked up his/her own field, s/he

“then realized that in my own field, my main reaction was to want to scream at the cherry picking of sources to promote some minor researcher.”

So, I looked up “ontology engineering” and “Ontologies” that redirected to “Ontology (information science)” (‘information science’, tsk!)… and I kinda screamed. The next sections are, first, about the merits of the arxiv paper (outcome: their conclusions are certainly rather quite exaggerated) and, second, I’ll use that ‘ontology (information science)’ entry to dig a bit deeper as use case, using both the English entry and in several other languages as that’s what the arxiv paper covers as well. I’ll close with some thoughts on what to do about it.

 

On the arxiv paper’s data and results

There are several limitations to the paper; some of them discussed by its authors, some are not. The arxiv paper does not distinguish between online freely available scientific literature where only the final typesetted version is behind a paywall and official ‘open access’. This is problematic for processing the computer science entries in Wikipedia for trying to validate their hypothesis. In addition, they considered only journals with their open access policy, and journal-level analysis (cf article-level analysis), idem for the problematic ISI impact factor, and only those 21000 listed in Scopus, amounting eventually to the (ISI index-)top 4721 journals of which 335 open access to test Wikipedia content against. The open access list was taken from being listed in the directory of OA journals, ignoring the difference between ‘green’ and ‘gold’ and paywall-access from, say Elsevier. Overall, this already does not bode well for extending the obtained conclusion to computer science entries and, hence, the diffusion of knowledge claim.

The authors admit they may undercount references for the non-English entries, but they have few references anyway (Fig 1 in the arxiv paper), so it’s basically largely an English-Wikipedia analysis after all, i.e., so the conclusion is not really straightforwardly extending to ‘diffusion of knowledge’ for the non-English speaking world.

The statistical model is described on p19 of the pdf, and I don’t quite follow the rationale, with an elusive set of ‘journal characteristics’ and some estimated variables without detail. Maybe some stats person can shed a light on it.

Then the bubble-figure in the technologyreview, which is Fig 8 in the arxiv paper and it is reproduced in the screenshot below, which “shows that across 50 [non-English] Wikipedias, there is an inverse relationship between the effects of accessibility and status on referencing”. Come again? It’s not like the regression line fits well. And why are the language entries—presumably independent of one another—in a relation after all? Notwithstanding, the odds for a Serbian entry to have a reference to an open access journal is some 275% higher than to a paywalled one, vs entries in Turkish that cite higher impact factor journals some 200% more often, according to the arxiv paper. I haven’t found details of that data, though, other than a back-of-the-envelope calculation when glancing over the figure: Serbian has a 1.5 for impact and a 3.75 or so for open access, Turkish 3 and 1.3-ish. Of how many entries and how many citations for those languages? They state that “While the English Wikipedia references ~32,000 articles from top journals, the Slovak Wikipedia references only 108 and Volapuk references 0.”. But Volapuk still ends up with an open access odd ratio of 0.588 and an ln(impact factor) of 2.330 (Appendix A3), which is counted only with the set of top-rated journals only; how is that possible when there are no references to those top journals? The number of counted journal citations is not given for each language, so a ‘statistically significant’ may well actually be over a number that’s too low to do your statistics with. Waray-Waray is a very small dot, and reading from Fig 1, it’s probably not more than those 108 references in the Slovak entries.

All in all, there is some room for improvement on this paper, and, in any case, some toning down of the conclusions, let alone technologyreview’s sensationalist blog title.

fig8of1506.07608

Fig 8 from Teplitskiy et al (2015)

Ontology (information science) Wikipedia entry, some issues

Let me not be petty whining that none of my papers are in the references, but take a small example of the myriad of issues.

Take the statement “There are studies on generalized techniques for merging ontologies,[12] but this area of research is still largely theoretical.” Uh? The reference is to an obscure ‘dynamic ontology repair’ project pdf from the University of Edinburgh, retrieved in 2012. We merged DMOP’s domain content with DOLCE in 2011, with tool support (Protégé, to be precise). owl:import was around and working at that time as well. Not to mention the very large body of papers on ontology alignment, reference book by Shvaiko & Euzenat, and the Ontology Alignment Evaluation Initiative.

The list of ontology languages even includes SBVR and IDEF5 (not ontology languages), and, for good measure of scruffiness, a project (TOVE).

The obscure “Gellish” appears literally everywhere: it is an ontology, it is a language, it is an English dictionary (yes, the latter apparently also falls under ‘examples’ of ontologies. not), and it is even the one and only instantiation of a “hybrid ontology” combining a domain and an upper ontology. Yeah, right. Looking it up, Gellish is van Rensen’s PhD thesis of 2005 that has an underwhelming 2 citations according to Google Scholar (10 for the related journal paper), and there’s a follow-up 2nd edition of 2014 by the same author, published with lulu, no citations. That does not belong to an introductory overview of ontologies in computing. Dublin core as an example of an ontology? No (but it is a useful artefact for metadata annotations).

Under “criticisms”: aside from a Werner Ceusters statement from a commentary on someone from his website—since when deserves that to be upgraded to Wikipedia content?!?—there’s also “It’s also not clear how ontology fits with Schema on Read (NoSQL) databases.”. Ontologies with NoSQL? sigh.

“Further readings” would, I expect, have a fine set of core readings to get a more comprehensive overview of the field. While some relevant ones are there (e.g., the “what is an ontology?” paper by Oberle, Guarino, and Staab; “Ontology (Science)” by Smith, Gruber’s paper despite the flawed definition), numerous ones are the result of some authors’ self-promotion, like the one on bootstrapping biomedical ontologies, an ontology for user profiles, IE for disease intelligence—they’re not even close to ‘staple food’ for ontologies—and the 2001 OIL paper and Ontology Development 101 technical report are woefully out-dated. The “References” section is a mishmash of webpages, slides, and a few scientific papers most of which are not from mainstream ontology research venues.

And that’s just a sampling of the issues with the “Ontology (information science)” Wikipedia entry; the ontology engineering entry is worse. No wonder my students—having grown up with treating Wikipedia as gospel—get confused.

 

Ontologies entries in other languages

That much about the English language version of ‘ontology (information science)’. I happen to speak a few other languages as well, so I also checked most of those for their ‘ontology (information science)’ entry. For future reference as a stock-taking of today’s contents, I’ve pdf-printed them all (zipped). For starters, they all had ontologies at least categorised properly into ‘informatica’. +1.

The entry in Dutch is very short; one can quibble and nit-pick about term usage, and it is disappointing that there’s only one reference (in Dutch, so wouldn’t count in the arxiv analysis), but at least it’s not riddled with mistakes and inappropriate content.

The German one is quite elaborate, and starts off reasonably well, but has some mistakes. Among others, the typical novice pitfall of confusing classes for instances [“Stadt als Instanz des Begriffs topologisches Element der Klasse Punkte”] and the sample ontology—which of itself is a good idea to add to an overview page—has lots of modelling issues, such as datatypes and mixing subclasses with properties (the Maler [painter] with region of origin Flämish [Flemish]). Interestingly, ontology types for the English reader are foundational, domain, and hybrid, whereas the German reader has only lightweight and heavyweight ones. As for the references, there are some oddball ones, but the fair/good ones are in the majority, if incomplete, and perhaps a bit lopsided to Barry Smith material.

The Italian entry is of similar length as the German entry, but, unfortunately, has some copy-and-paste from the English one when it comes to the list of languages and examples, so, a propagation of issues; the ‘example of applications’ does list another project, and there is no ‘criticisms’ section. The text has been written separately instead of being a translation-of-English (idem ditto for the other entries, btw), and thus also consists of some other information. For starters, removing most of the ‘Premesse’ would be helpful (or elaborating on it in a criticism section; starting the topic with information warfare and terrorism? nah). The section after that (‘uso come glossario di base’) is chaotic, reading like a competitor-author per paragraph, and riddled with problematic statements like that all computer programs are based on foundational ontologies (“Tutti i programmi per computer si basano su ontologie fondazionali,”), and that the scope of an ontology is to develop a database (“Lo scopo di un’ontologia computazionale […] [è] di creare una base di dati”). It does mention OntoClean. Italian readers will also be treated on a brief discussion of the debate on one or multiple ontologies (absent from the other entries). It has a quite different set of ‘external links’ compared to the other entries, and there are hardly any references. Al in all, one leaves with a quite distinct impression of ontologies after reading the Italian one cf the Dutch, German, and English ones.

Last, the Spanish entry is about as short as the Dutch one. There’s overlap in content with the Italian entry in the sense of near-literal translation (on the foundational ontology and that Murray-Rust guy on the ‘semantic and ontological war’ due to ‘competition between standards’), and it has a plug for MathWorld (?!).

So, if the entries on topics I’m an expert in are such of such dubious quality (the German entry is, relatively, the best), then what does that imply for the other entries that superficially may seem potentially useful introductory overviews? By the same token, they probably are not. And the ontology topics are not even in an area with as much contention as topics in political sciences, history, etc. Go figure.

 

Now what?

Is this a bad thing? I already can see a response in the making along the line of “well, it’s crowdsourced and everyone can contribute, we invite you to not just complain, but instead improve the entry; really, you’re welcome to do so”. Maybe I will. But first, two other questions have to be answered. The arxiv paper that got my rant started claimed that open source papers are good, and that they’re reworked in interested-layperson digestible bites in Wikipedia to spread and diffuse knowledge in the world. The idea is nice, but the reality is different. Pretty much all the main papers on ontologies are freely available online even if not published ‘open access’ (computer science praxis, thank you), yet, they are not the ones that appear in Wikipedia. Question 1: Why are those—freely available—main references of ontologies not referenced there already?

A concern of a different type is that several schools in South Africa have petitioned to get free Internet access to search Wikipedia as a source of information for their studies. Their main argument was that books don’t arrive, or arrive late, and there is no library in many schools, which is a common problem. They got the zero-rate Wikipedia from MTN; more info here. (I’ll let you mull over its effects on the quality of education they get from that.) Question 2: Can Wikipedia be made a really authoritative resource with the current set-up so as to live up to what the learners [and interested laypersons] need? If I were to rewrite an update to the Wikipedia pages today, a pesky editor or someone else simply can click to roll it back to the previous version, or slowly but steadily have funny references seeping back in and sentences cut and rephrased. Writing free textbooks, or at least extensive lecture notes, seems a better option, or a ‘synthesis lectures’ booklet endorsed by lots of people researching and using ontologies. What about a ‘this version is endorsed by …’ button for Wikipedia entries?

Any better ideas, or answers to those questions, perhaps? Free diffusion of digested high quality scientific knowledge really does sound very appealing…

References

[1] Teplitskiy, M., Lu, G., Duede, E. Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science. arxiv.org/abs/1506.07608

On the need for bottom-up language-specific terminology development

Peoples of several languages intellectualise their vocabulary so as to maintain their own language as medium of instruction (or: LoLT, language of teaching and learning), to conduct scientific discussions among peers and, in some cases, still, publish research in their own language. Some languages I know of who do this are French, Spanish, German, and Italian; e.g., the English ‘set’ is conjunto (Sp.) and insieme (It.), and the Dutch for ‘garbage collection’ (in computing) is geheugensanering. I found out the hard way last month that my Italian scientific vocabulary was better than my Dutch one, never really having practiced the latter in my field of specialisation and I noticed that over the years that I have been globetrotting, quite a few Anglicisms in Dutch had been replaced with Dutch words and some were there for a while already (as excuse: I studied a different discipline in the Netherlands). How do these new words come about? There are many ways of word creation, and then it depends on the country or language region how it gets incorporated in the language. For instance, French uses a top-down approach with the Académie Française and Spain has the Real Academia Española. The Netherlands has De Nederlandse Taalunie that isn’t as autocratic, it seems; for instance, to follow suit with the French mot dièse for the twitter ‘hashtag’, there was some consultation and online voting (sound file) to come up with an agreeable Dutch term for hashtag. But how does that happen elsewhere?

We found out that there is a mode of practice for language-specific terminology development that happens in small ‘workshops’ of some 13-15 people, constituting mainly of terminologists and linguists, and 1-3 subject matter experts. There may be a consultative event with stakeholders, who are not necessarily with subject matter experts. Shocking. The sheer arrogance of the former, who ‘magically’ grasp the concepts that typically take a while to understand when it comes to science, but they supposedly nevertheless understand it well enough to come up with a meaningful local-language word. But maybe, you say, I’m too arrogant in thinking subject matter experts, such as myself, can come up with decent local-language terms. Maybe that’s partially true, but what may be more problematic, is that only a few subject matter experts are involved, so there is an over-reliance on those mere few. Maybe, you say, that’s not a problem. We put that to the test for a computing and computer literacy terminology development for isiZulu, and found out it was: it depends on who you ask what comes out of the term harvesting and term preference. And then asking just a few people is a problem for a term’s uptake. (The students involved in the experiments did not even know there was a computer literacy term list from the South African Department of Arts and Culture, published in 2005, and boo-ed away several of the terms.)

The way we tested it, was with three experiments. The first experiment was an experts-only workshop, with ‘experts’ being 4th-year computer science students who have isiZulu as home language, as there were no isiZulu-speaking MSc and PhD students, nor colleagues, in CS at the University of KwaZulu-Natal, where we did the experiment. The second experiment was an isiZulu-localised survey among undergraduate CS students to collect terms, where we hoped to see a difference between a survey where they were given the entity with an English name and the entity as a picture. The third experiment was a survey where computer literacy students (1st-year science students) could vote for terms for which there was more than one isiZulu term proposed. The details of the set-up and the results have been published recently in the Alternation open-access journal article “Limitations of Regular Terminology Development Practices: The Case of isiZulu Computing Terminology”, in the special issue on “Re-envisioning African Higher Education: Alternative Paradigms, Emerging Trends and New Directions”, edited by Rubby Dhunpath, Nyna Amin and Thabo Msibi. It describes which isiZulu terms from where are affected, ranging from a higher incidence of ‘zulufying’ English terms in aforementioned list by the South African Department of Arts and Culture cf. the proposals by the experiments’ participants, and, e.g., expert consensus for inqolobane for database, versus a preference for imininingo egciniwe by the computer literacy students (see paper for more cases). Further, when all respondents across the survey are aggregated and go for majority voting, the proposed terms by the experts are snowed under. The latter is particularly troublesome in a country where computing is a designated critical skill (or: there aren’t nearly enough of them).

A byproduct of the experiments was that we have collected the, to date, longest list of isiZulu computing terms, which have gone through a standardisation process in the meantime. The latter is mainly thanks to the tireless efforts of Khumbulani Mngadi of the ULPDO of UKZN, and the two expert CS honours students who volunteered in the process, Sibonelo Dlamini and Tanita Singano.

Our approach was already less exclusionary cf. the aforementioned traditional/standard way, but it also shows that broader participation is needed both to collect and to choose terms; or, in the words of the special issue editors [2]: a “democratization of the terminology development process” that “transcends the insularity and purism which characterises traditional laboratory approaches to development”. We are still working on-and-off to achieve this with crowdsourcing, and maybe we should start thinking of crowdfunding that crowdsourcing effort to speed up the whole thing and complete the commuterm project.

As a last note: in case you are interested in other contributions to “re-envisioning African higher education”: scan through the main page online, read the editorial [2] for main outcomes of each of the papers, and/or read the papers, on topics as diverse as postgrad supervision in isiZulu, teaching sexual and gender diversity to pre-service teachers, maths education, IKS in HE, and much more.

References

[1] Keet, C.M., Barbour, G. Limitations of Regular Terminology Development practices: the case of the isiZulu Computing Terminology. Alternation, 2014, 12: 13-48.

[2] Dhunpath, R., Amin, N. Msibi, T. Editorial: Re-envisioning African and Higher Education: Alternative Paradigms, Emerging Trends and New Directions. Alternation, 2014, 12: 1-12.

Pleasant SAARMSTE’15 in Maputo

The 23rd annual conference of the Southern African Association for Research in Mathematics, Science, and Technology Education, held in Maputo, Mozambique concluded last Friday, after some 200 presentations in 8 parallel session by academics from about 18 countries (mostly SACD region, some USA, UK, Norway, Japan, Turkey, and new Zealand). It was a stimulating event by a welcoming community.

Most maths & science teaching research presentations were concerned with “what goes wrong, and why?” and “which interventions (hypothesised improvements), and do they work?”. I’ll describe a brief sampling of the presentations spread over the 3.5 days to illustrate it. For instance, Frikkie George from UWC looked into why teachers in secondary schools do, or do not, use computer-assisted learning in their teaching [1]. To look at the negative side (for one may want to use technology in the classroom and wonder why it is not always happening that much): this was due to, mainly, the lack of experience with the technology, of on-site support, of availability of the technologies, and of lack of time to integrate it in the curriculum.

A recurring and emerging research theme on the problem-side of things was the “LoLT”–language of teaching and learning (formerly known as ‘medium of instruction’)–, as many learners in the classroom in SADC countries are being taught in a language that is not their mother tongue (called ‘home language’ in South Africa). There were several presentations on this issue, and a whole symposium was dedicated to it. Kathija Adam from NMMU presented a useful literature review [2], which was part of an inter-institutional funded project that started last year, so the main solutions are yet to come. (and I’ll leave it with this ‘cliffhanger’, as much more can be said about it, deserving its own blog post).

There was also the issue of “Indigenous Knowledge Systems (IKS) in the class room and in science”, and I went to a few of those presentations. It is a touchy subject in this region of the world, and to complicate matters, different presenters and attendees had quite different ideas and assumptions about it. From the ‘light’ version: e.g., IKS & weather by Alvin Riffel (also from UWC) in the way like, say, “an evening false moonbow brings rain tomorrow”1, which can then be used as an introduction to the scientific explanation of the phenomenon, relating everyday life observations to science in the classroom [3]. To the ‘heavy’ and un(counter?)productive: a big, fat, loud-mouthed militant claiming that ‘everything is science, including the spirits’ and lambasting ‘and if you go for western science [cf. African], then you are one of those bad oppressive colonialists, racist!’, nipping in the bud any conversation about IKS and science (I’m not exaggerating). Another recurring theme was pedagogical content knowledge (PCK).

My own presentation was about an experiment in peer instruction that, in short, didn’t have the desired effect (increasing class attendance), but was useful in other ways nevertheless (read the 13-page paper for the details [4]). This work will be extended this year, partially thanks to a UCT Teaching With Technology grant to develop a better functioning software-based audience response system, and more concept tests.

Other than that, it was hot in Maputo, full of friendly people, and good food and coffee. The SAARMSTE choir gave its best during the social dinner, which was also spiced up with some dancing. Friday afternoon after the conference’s closing ceremony, I planned to finally go to the internet cafe to check emails, but the bus was for the excursion through Maputo only, so that plan was changed (the alternative was a 20-minute walk in the blistering sun at 2pm and get burned, again). There may not be a whole lot of touristy places in the city, but it mattered not, as we had a good time together anyway. Also contributing to a great stay in Maputo was my choice on being frugal with the accommodation, opting for Fatima’s Place backpackers rather than a fancy hotel (choices: expensive and even more expensive): unlike the conference participant who was lamenting a ‘dull 15-hour stay at the hotel util the conference’s next day’, I had great company in the backpackers’ lively common area in the (late) evening.

The next SAARMSTE in early 2016 will be in Pretoria—a location not even close as appealing as Maputo, but a warm welcome will be guaranteed by its participants (as it was also welcoming in Cape Town in 2013 when I attended the conference).

References

[1] George, F., Ogunnniyi, M. Teacher’s perceptions on the use of ICT in a CAL environment to enhance the conception of scientific concepts. 23rd Annual Meeting of the Southern African Association for Research in Mathematics, Science, and Technology Education (SAARMSTE’15), 13-16 January 2015, Maputo, Mozambique.

[2] Adam, K., Africa, A., Woods, T., Johnson, S. Exploring issues related to language in multilingual South African Science classrooms: a literature review. 23rd Annual Meeting of the Southern African Association for Research in Mathematics, Science, and Technology Education (SAARMSTE’15), 13-16 January 2015, Maputo, Mozambique.

[3] Riffel, A.D. Examining the impact of dialogical argumentation on grade 9 learners’ beliefs about weather and indigenous knowledge. 23rd Annual Meeting of the Southern African Association for Research in Mathematics, Science, and Technology Education (SAARMSTE’15), Huillet, E. (Ed.), pp366-379. 13-16 January 2015, Maputo, Mozambique.

[4] Keet, C.M. An Experiment with Peer Instruction in Computer Science to Enhance Class Attendance. 23rd Annual Meeting of the Southern African Association for Research in Mathematics, Science, and Technology Education (SAARMSTE’15), Huillet, E. (Ed.), pp319-331. 13-16 January 2015, Maputo, Mozambique.

1The “false moonbow”—called corona, a circular ‘rainbow’ around the moon—phrase I just made up, and is similar to a reading-of-the-sky we have in the Netherlands, and on January 4 we saw an amazing one here, admiring it during a neighbourhood braai, wondering what it might mean. The next day, I made it to work through the heavy rain (in summer!) and looking it up to see what it meant and why… reality very much confirmed the theory, the whole day long.

Even more short reviews of books I’ve read in 2014

I’m not sure whether I’ll make it a permanent fixture for years to come, but, for now, here’s another set of book suggestions, following those on books on (South) Africa from 2011, some more and also general read in 2012, and even more fiction & non-fiction book suggestions from 2013. If nothing else, it’s actually a nice way to myself to recall the books’ contents and decide which ones are worthwhile mentioning here, for better or worse. To summarise the books I’ve read in 2014 in a little animated gif:

(saved last year from daskapital.nl)

(saved last year from daskapital.nl)

Let me start with fiction books this time, which includes two books/authors suggested by blog readers. (note: most book and author hyperlinks are to online bookstores and wikipedia or similar, unless I could find their home page)

Fiction

Stoner by John Williams (1965). This was a recommendation by a old friend (more precisely on the ‘old’: she’s about as young as I am, but we go way back to kindergarten), and the book was great. If you haven’t heard about it yet: it tells the life of a professor coming from a humble background and dying in relative anonymity, in a way of the ups and downs of the life of an average ‘Joe Soap’, without any heroic achievements (assuming that you don’t count becoming a professor one). That may sound dull, perhaps, but it isn’t, not least in the way it is narrated, which gives a certain beauty to the mundane. I’ll admit I have read it in its Dutch translation, even in dwarsligger format (which appeared to be a useful invention), as I couldn’t find the book in the shops here, but better in translated form than not having read it at all. There’s more information over at wikipedia, the NYT’s review, the Guardian’s review, and many other places.

Not a fairy tale by Shaida Kazie Ali (2010). The book is fairly short, but many things happen nevertheless in this fast-paced story of two sisters who grow up in Cape Town in a Muslim-Indian family. The sisters have very different characters—one demure, the other willful and more adventurous—and both life stories are told in short chapters that cover the main events in their lives, including several same events from each one’s vantage point. As the title says, it’s not a fairy tale, and certainly the events are not all happy ones. Notwithstanding its occasional grim undertones, to me, it is told in a way to give a fascinating ‘peek into the kitchen’ of how people live in this society across the decennia. Sure, it is a work of fiction, but there are enough recognizable aspects that give the impression that it could have been pieced together from actual events from different lives. The story is interspersed with recipes—burfi, dhania chutney, coke float, falooda milkshake, masala tea, and more—which gives the book a reminiscence of como agua para chocolate. I haven’t tried them all, but if nothing else, now at least I know what a packet labelled ‘falooda’ is when I’m in the supermarket.

No time like the present, by Nadine Gordimer (2012). Not necessarily this particular book, but ‘well, anything by Gordimer’ was recommended. There were so few of Gordimer’s books in the shops here, that I had to go abroad to encounter a selection, including this recent one. I should have read some online reviews of it first, rather than spoiling myself with such an impulse buy, though. This book is so bad that I didn’t even finish it, nor do I want to finish reading it. While the storyline did sound interesting enough—about a ‘mixed race couple’ from the struggle times transitioning into the present-day South Africa, and how they come to terms with trying to live normal lives—the English was so bad it’s unbelievable this has made it through any editorial checks by the publisher. It’s replete with grammatically incoherent and incomplete sentences that makes it just unreadable. (There are other reviews online that are less negative)

The time machine, by HG Wells (1895). It is the first work of fiction that considers time travel, the possible time anomalies when time travelling, and to ponder what a future society may be like from the viewpoint of the traveller. It’s one of those sweet little books that are short but has a lot of story in it. Anyone who likes this genre ought to read this book.

One thousand and one nights, by Hanan Al-Shaykh (2011). Yes, what you may expect from the title. The beginning and end are about how Scheherazade (Shahrazad) ended up telling stories to King Shahrayar all night, and the largest part of the book is devoted to story within a story within another story etc., weaving a complex web of tales from across the Arab empire so that the king would spare her for another day, wishing to know how the story ends. The stories are lovely and captivating, and also I kept on reading, indeed wanting to know how the stories end.

Karma Suture, by Rosamund Kendall (2008). Because I liked the Angina Monologues by the same author (earlier review), I’ve even read that book for a second time already, and Karma Suture is also about medics in South Africa’s hospitals, I thought this one would be likable, too. The protagonist is a young medical doctor in a Cape Town hospital who lost the will to do that work and needs to find her vibe. The story was a bit depressing, but maybe that’s what 20-something South African women go through.

God’s spy by Juan Gómez-Jurado (2007) (espía de dios; spanish original). A ‘holiday book’ that’s fun, if that can be an appropriate adjective for a story about a serial killer murdering cardinals before the conclave after Pope John Paul’s death. It has recognizable Italian scenes, the human interaction component is worked out reasonably well, it has good twists and turns and suspense-building required for a crime novel, and an plot you won’t expect. (also on goodreads—it was a bestseller in Spain)

Non-fiction

This year’s non-fiction selection is as short as the other years, but I have less to say about them cf. last year.

David and Goliath—Underdogs, misfits, and the art of battling giants, by Malcolm Gladwell (2013). What to say: yay! another book by Gladwell, and, like the others I read by Gladwell (Outliers, The tipping point), also this one is good. Gladwell takes a closer look at how seemingly underdogs are victorious against formidable opponents. Also in this case, there’s more to it than meets the eye (or some stupid USA Hollywood movie storyline of ‘winning against the odds’), such as playing by different rules/strategy than the seemingly formidable opponent does. The book is divided into three parts, on the advantages of disadvantages, the theory of desirable difficulty, and the limits of power, and, as with the other books, explores various narratives and facts. One of those remarkable observations is that, for universities in the USA at least, a good student is better off at a good university than at a top university. This for pure psychological reasons—it feels better to be the top of an average/good class than the average mutt in a top class—and that the top of a class gets more attention for nice side activities, so that the good student at a good (vs top) university gets more useful learning opportunities than s/he would have gotten at a top university. Taking another example from education: a ‘big’ class at school (well, just some 30) is better than a small (15) one, for it give more “allies in the adventures of learning”.

The dictator’s learning curve by William J. Dobson (2013), or: some suggestions for today’s anti-government activists. It’s mediocre, one of those books where the cover makes it sound more interesting than it is. The claimed thesis is that dictators have become more sophisticated in oppression by giving it a democratic veneer. This may be true at least in part, and in the sense there is a continuum from autocracy (tyranny, as Dobson labels it in the subtitle) to democracy. To highlight that notion has some value. However, it’s written from a very USA-centric viewpoint, so essentially it’s just highbrow propaganda for dubious USA foreign policy with its covert interventions not to be nice to countries such as Russia, China, and Venezuela—and to ‘justifiably’ undercut whatever plans they have through supporting opposition activists. Interwoven in the dictator’s learning curve storyline is his personal account of experiencing that there is more information sharing—and how—about strategy and tactics among activists across countries on how to foment dissent for another colour/flower-revolution. I was expecting some depth about autocracy-democracy spiced up with pop-politics and events, but it did not live up to that expectation. A more academic, and less ideologically tainted, treatise on the continuum autocracy-democracy would have been a more useful way of spending my time. You may find the longer PS Mag review useful before/instead of buying the book.

Umkhonto weSizewe (pocket history) by Janet Cherry (2011). There are more voluminous books about the armed organisation of the struggle against Apartheid, but this booklet was a useful introduction to it. It describes the various ‘stages’ of MK, from deciding to take up arms to the end to lay them down, and the successes and challenges that were faced and sacrifices made as an organisation and by its members.

I’m still not finished reading Orientalism by Edward Said—some day, I will, and will write about it. If you want to know about it now already, then go to your favourite search engine and have a look at the many reviews and (academic and non-academic) analyses. Reading A dream deferred (another suggestion) is still in the planning.

VocabLift to learn some isiZulu, Shona, French, and English words

While I’ll be at EKAW’14 to network, present the stuff ontology, and support SUGOI, some of my students will hold the fort locally at the African Language Technologies Workshop (AFLaT’14) on 27-28 November in Cape Town. One of the two posters & demos I contributed to is about a cute tool that two 3rd-year students—Ntokozo Zwane and Sungunani Silubonde—designed and implemented as their capstone project for software engineering, which they called VocabLift (zip). The capstone groups’ task was to develop a tool that can help someone to learn vocabulary in a playful way, which had some leeway to be creative in how to realize that.

The context is that everyone has to learn vocabulary over the years, from basic words in primary school to scientific terminology at university, and any time when one is learning a new language. Besides memorizing ‘boring’ lists of words from a sheet of paper, there are more playful ways to do this, like the multi-player dictionary game and hangman, or single-player memory cards game from the EuroTalk DVDs. There are indeed many word games online, e.g., for English, and learning a foreign language on duolingo, but there is less for multilingualism and the languages in Southern Africa. EuroTalk DVDs for Zulu, Shona, Swahili, Yoruba and a few other African languages do exist, true, but at a cost and they are inflexible in a teaching setting. Enter VocabLift, which is both technologically interesting and for the target languages chosen: isiZulu and Shona, and English and French. Conceptually, it is based on natural language-independent root questions that are mapped to the language of choice, so another language easily can be added, and, unlike the usual ‘closed’ world of the computer-based language games, a teacher can add words to the dictionary, making it in principle adaptable to the desired level of language learning.

Currently, VocabLift has three games: the Picture Matcher, Vocab Trainer, and Word Tetris. In Picture Matcher, the name of the object in the picture has to be provided by the user, with as objective to improve memory and spelling in the chosen language; a screenshots for avocado in isiZulu and pineapple Shona are shown below.

avocado in isiZUlu, right before selecting 'confirm word'

Avocado in isiZulu, right before selecting ‘confirm word’

Pineapple in Shona, after I clicked 'I don't know'

Pineapple in Shona, after I clicked ‘I don’t know’

Vocab Trainer tests the user’s ability to recall the word given in English in the target language; screenshots for green in isiZulu and gray in French are shown below.

Choosing the right word for 'green' in isiZulu (the answer also can be found further below in another screenshot)

Choosing the right word for ‘green’ in isiZulu (the answer also can be found further below in another screenshot)

Same story, and just to show it works for French, too.

Same story, and just to show it works for French, too.

The third game, Word Tetris is included so that the user can learn to match the word to the picture. The user has to type the word associated with the picture before it falls below the bar; a screenshot is shown below (I lost points due to trying make nice screenshots, really).

Halfway playing 'word tetris'

Halfway playing ‘word tetris’

One needs to be logged in as administrator to add words (admin 1234 will do the trick) and use the tool in ‘dictionary mode’, as illustrated in the next two screenshots.

Adding terms, having selected to add it to the isiZulu dictionary

Adding terms, having selected to add it to the isiZulu dictionary

Cellphone was added (and you also can find the answer to 'green', above)

Cellphone was added (and you also can find the answer to ‘green’, above)

VocabLift has been implemented using JavaFX and XML, making the tool platform-independent (click the VocabLift.jar file once the downloaded zip file is unzipped). I’ll readily admit it is very well possible to add more features, adapt it to have it running also on a mobile, or refine the HCI, and some educational technologies researcher may want to investigate whether this or something like it improves the learning outcome significantly, but it was a fun software engineering project over a timespan of a mere two months (with other parts of the course being taught) and words surely can be learnt (at least I have, and so did the students).

The AFLaT’14 poster and demo session runs from 15:30-16:30 on the 27th and will remain available during the breaks on the 28th as well, and Ntokozo and Sungunani will be happy to demo it for you and describe more details about it.