Some reflections on designing Abstract Wikipedia so far

Abstract Wikipedia aims to at least augment the current, if not be the next-generation, Wikipedia. Besides the human-authored articles that take their time to write and maintain, you could scale up article generation through automation and to do so for many more languages. And keep all that content up-to-date. And all that reliably without hallucinations where algorithms make stuff up. How? Represent the data and information in a structured format, such as in an RDF triple store, JSON, or even a relational database or OWL, and generate text from suitably selected structured content. Put differently: multilingual natural language generation, at scale, and community-controlled. For the Abstract Wikipedia setting, the content would come from Wikidata and the code to compute it from Wikifunctions. Progress in creating the system isn’t going as fast as hoped for and a few Google.org fellows wrote an initial critique of the plans and progress made, to which the Abstract Wikipedia team at WMF wrote a comprehensive reply. It was also commented on in a Signpost technology report, and a condensed non-technical summary has appeared in an Abstract Wikipedia updates letter. The question remains: is it feasible? If so, what is the best way to go about doing it; if not, why not and then what?

A ‘pretty picture’ of a prospective Abstract Wikipedia architecture, at a very high level. Challenges lie in what’s going to be in that shiny yellow box in the centre and how that process should unfold, in the lexicographic data in Wikidata, and where the actual text generation will happen and how.

My name appears in some of those documents, as I’ve been volunteering in the NLG stream of the Abstract Wikipedia Project in an overlapping timeframe and I contributed to the template language, to the progress on the constructors (here and here), and to adding isiZulu lexemes to Wikidata, among others. The mentions are mostly in the context of challenges with Niger Congo B (AKA ‘Bantu’) languages that are spoken across most of Sub-Saharan Africa. Are these languages really so special that they deserve a specific mention over all others? Yes and No. A “No” may apply since there are many languages spoken in the world by many people that have ‘unexpected’ or ‘peculiar’ or ‘unique’ or ‘difficult to computationally process’ features or are in the same boat when it comes to their low-resource status and the challenges that entails. NCB languages, such as isiZulu that I focus on mainly, are among just one family of languages. If I were to have moved to St. Lawrence Island in the Bering Street, say, I could have given similar pushback, with the difference that there are many, many more millions of people speaking NCB languages than Yupik. Neither language is in the Indo-European language family. Language families exist for a reason; they have features really unlike others. That’s where the “Yes” answer comes in. The ‘yes’, together with the low-resourcedness, challenges consist of four dimensions: theoretical, technical, people, and praxis. Let me briefly illustrate each in turn./

Theory – linguistic and computational

The theoretical challenges are mainly about the language and linguistics, on the characteristic features they have and how much we know of it, affecting technical aspects down the road. For instance, we know that the noun class system is emblematic of NCB languages. To a novice or an outsider, it smells of the M/F/N gender of nouns like in French, Spanish, or German, but then a few more of them. It isn’t quite like that in the details for the 11-23 noun classes in an NCB language and squeezing that into Wikidata is non-trivial, since here and there an n-ary relation is more appropriate for some aspects than approximating that by reifying binaries partially. The noun class of the noun governs a concordial agreement system goes across a sentence rather than only its adjacent word; e.g., not only an adjective agreeing with the gender of a noun like in Romance languages (e.g., an abuala vieja and abuelo viejo in Spanish) for each noun class, but also conjugation of the verb by noun class and other aspects such as quantification over a noun (e.g., bonke abantu ‘all humans’ and zonke izinja ‘all dogs’). We know some of the rules, but not all of them and only for some of the NCB languages. When I commenced with natural language generation for isiZulu in earnest in 2014, it wasn’t even clear how to pluralise nouns roughly, let alone exactly. We now know how roughly to pluralise nouns automatically. Figuring out the isiZulu verb present tense got us a paper as recent as 2017; the Context-Free Grammar we defined for it is not perfect yet, but it’s an improvement on the state of the art and we can use it in certain controlled settings of natural language generation.

My collaborator and I like such a phrase structure grammar. There are several types of grammars, however, and it’s anyone’s guess whether any of them is expressive and convenient enough to capture grammars of the NCB languages. The alternative family is dependency grammars with its subtypes and variants. To the best of my knowledge, nothing has been done with such grammars and any of the NCB languages. What I can assure you from ample experience, is that it is infeasible for people working on low- or medium-resourced languages to start writing up grammars for every pet preference of grammar flavour of the day that rotating volunteers have.

IsiZulu and Kiswahili are probably the least low-resourced languages of the NCB language family, and yet there’s no abundance of grammar specifications. It’s not that it can’t be done at least in part; it’s just that most material, if available at all, is outdated and never tested on more than a handful of words or sentences, and thus is not off-the-shelf computationally reliable at present. And there are limited resources available to verify. This is also the case for many other low-resourced languages. For Abstract Wikipedia to achieve its inclusivity aim, the system must have a way to deal with incremental development of grammar specifications without large upfront investments. One shouldn’t want to kill a mosquito with a sledgehammer by first having to scramble together the material and build a sledgehammer, because there are no instant resources to create that sledgehammer. Let’s start with something feasible in the near term, to build just enough equipment to get done what’s needed. Rolling up a newspaper page will do just fine to kill that mosquito. For instance, don’t demand that the grammar spec must be able to cover, say, all numbers in all possible constructions, but only for one specific construction in a specific context. Say, for stating the age of a person provided they’re less than 100 years old, or the numbers related to years, not centuries or millennia that will be tackled later. Templates are good for specifying such constrained contexts of use and they assist with incremental grammar development and can offer near-instant concrete user feedback of positive contributions showing results.

Supporting a template-based approach doesn’t mean that I don’t understand that the sledgehammer may be better in theory – an all-purpose CFG or DG would be wonderful. It’s that I know enough of the facts on the ground that I’m aware rolling up a newspaper page suffices for a case and is feasible, unlike the sledgehammer. Let low-resource languages join the party. Devise a novel framework, method, and system that permits incremental development and graceful degradation in the realiser. A nice-to-have on top of that would be automated ‘transformers’ across types of grammars so we won’t have to start all over again when the next grammar formalism flavour is trumped, if it must change at all.

Technical challenges

The theory relates to the second one group of challenges, which are of a technical nature. There are lovely systems and frameworks, who overconfidently claim to be ‘universal’. Grammars coded up for 40, nay a 100, languages, so it must be good and universal, or so the assumption may go. We do want to reuse as much as possible—being resource-constrained and all—but then it never turns out to work off-the-shelf. From word-based spellcheckers like in OpenOffice that are useless for agglutinating languages to the Universal Dependencies (UD) framework and accompanying tools that miss useful dependencies and are too coarse-grained at the word-level and, up till very recently, was artificially constrained to trees rather than DAGs, up to word-based natural language generation realisers: we have (had) to start from scratch mostly and devise new approaches.

So now we have a template language for Abstract Wikipedia (yay!) that can handle the sub-words (yay!), but then we get whacked like a mole on needing a fully functional Dependency Grammar (and initially UD and trees only) for parsing the template, which we don’t have. The UD framework has to be modified to work for NCB languages – none of those 100 is an NCB language – to allow arcs to be drawn on sub-word fragments, or if on the much less useful words only, then allowing for more than one incoming arc. It also means we first have to adapt UD annotation tools to get clarity on the matter. And off we must go to do all that before we can sit at that table again? We’ll do a bit, enough for our own systems of what we need for the use cases.

Sadly, Grammatical Framework is worse, despite there already being a draft partial resource grammar for isiZulu and even though it’s a framework of the CFG flavour of grammars. Unlike for UD, where reading an overview article suffices to get started, that won’t do for GF; a two-week summer school you must attend and the book to read to get anything possibly started. The start-up costs are too high for the vast majority of languages. And remember that the prospective system should be community-driven rather than be an experts-only affair that GF is at present. Even if that route is taken, then the grammar is locked into the GF system, inaccessible for any reuse elsewhere, which is not a good incentive when potential for reuse is important.

The Google.org fellows’ review proposed to adopt an extant NLG system and build on it, including possibly GF: if we could have done it, we would have done so and I wouldn’t have received any funding for investigating an NLG system for Nguni languages. A long answer on why we couldn’t can be found in Zola Mahlaza’s PhD thesis on foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu and shorter answers regarding parts of the problems are described in papers emanating from my GeNi and MoreNL research projects. More can be done still to create a better realiser.

The other dimension of technical aspects is the WMF software ecosystem as it stands at present. For a proof-of-concept to demonstrate the Abstract Wikipedia project’s potential, I don’t care whether that’s with Wikifunctions, with Scribunto, or a third-party system that can be (near-instantly) copied over onto Wikifunctions once it works as envisioned. Wikidata will need to beefed up, on speed in SPARQL query answering, on reducing noise in its content, and on the lexemes to cater for highly inflectional and agglutinating languages. It’s not realistic to make the community add all forms of the words, since there are too many and the interface requires too much clicking around and re-typing when entering lexicographic data manually. Either allow for external precomputation, a human-in-the-loop, and then a batch upload, or assume base forms and link it to a set of rules stored somewhere in order to compute the required form at runtime.

People and society

The third aspect, people, consists of two components: NCB language speakers with their schedules and incentives and, for the lack of a better term, colonial peculiarities or sexism, or both. Gender bias issues in content and culture on Wikipedia are amply investigated and documented. Within the context of Abstract Wikipedia, providing examples that are too detailed is awkward to do publicly and anyhow the plural of anecdote is not data. What I experienced were mostly instances of recurring situations. Therefore, let me generalise some of it and formulate it partially as a reply and way forward, in arbitrary order.

First, the “I don’t know any isiZulu, but…” phrases: factless opinions about the language shouldn’t be deemed more valuable and worthy and perceived valid just because one works with a well-resourced language in another language family and is more pushy or abrasive. The research we carried out over the past many years really happened and was published in reputable venues. It may be tempting to (over)generalise for other languages once one speaks several languages, but it’s better to be safe than sorry.

Second, let me remind you that Wikis are intended to be edited by the community – and that includes me. I might just continue expressing my indignation at the repeated condescending comments that I couldn’t be allowed to do so because some European white guy’s edits are unquestionably naturally superior. As it turned out, it were questionable attitudes from certain people within the broader Abstract Wikipedia team, not the nice white guy who had been merely exploring adding a certain piece of non-trivial information. I went ahead and edited it eventually anyway, but it does make me wonder how often people from outside the typical Wiki contributor demographic are actively discouraged from adding content for made-up reasons.

Third, languages evolve and research does happen. The English from a 100 years ago is not the same as it is spoken and written today and that’s the same for most other languages, including low-resourced languages. They’re not frozen in time just because there are fewer computational resources or they’re too far away to see their changes. Societies change and the languages change with them. No doubt the missionary did his best documenting a language 50-150 years ago, but just because it’s written in a book and he wrote it doesn’t mean that my or my colleagues’ recent published research that included an evaluation with a set of words or sentences would be less valid just because it’s our work and we’re not missionaries (or whatever other reason one invents why long gone missionaries’ work takes precedence over anyone else’s contributions).

Fourth, if an existing framework for Indo-European languages doesn’t work for NCB languages, it doesn’t imply we’re all too stupid to grasp that framework. We may not know, but it’s likely that we do and the framework is too limited for the language (see also above) or it’s too impractical for the lived reality of working with a low-resourced language. Regarding the latter, a [stop whining and] “become more active and just get yourself more resources” isn’t a helpful response, nor is not announcing open calls for WMF-funded projects.

As to human contributions to any component of Abstract Wikipedia and any wiki project more generally, it’s complex and deserves more unpacking. On incentives to contribute, perceptions of Wikipedia, sociolinguistics, and the good plans we have that are derailed by things that people in affluent countries wouldn’t think of that could interfere, and there’s Moses and the mountain.

Practical hurdles

Last, there are practical hurdles that an internationally dominant or darling language does not have to put up with. An example is the unbelievable process of getting a language accepted by the WMF ecosystem as deserving to be one. I’m not muttering about being shoved aside for trying to promote an endangered language that doesn’t have an ISO-639 3-letter code and has only a mere handful of speakers left, but even an ISO-639 2-letter code language with millions of speakers faces hurdles. Evidence has to be provided. Yes, there are millions of speakers, here’s the census data; Yes, there are daily news items on national TV, and look here the discussions on Facebook; Yes, there are online newspapers with daily updates. It takes weeks if not months, if ever. These are exclusionary practices. We should not have to waste limited time on countering the anti-nondominant-language ‘trolling’ – having to put in extra effort to pass an invisible and arbitrary bar – but, as a minimum, have each already ISO-recognised language be granted status as being one. True enough that this suggestion is also not a perfect solution, but at least it’s much more inclusive. Needless to say, also this challenge is not unique to NCB languages. And yes, various phabricator tickets with language requests have been open since at least 1.5 years.

In closing

The practicalities is just one more thing on top of all the rest that make a fine idea, Abstract Wikipedia for all, smell of entrenching much deeper the well-documented biassed tendencies of Wikis. I tried, and try, to push back. The issues are complex both theoretically & technically and people & praxis. They hold for NCB languages as well as many others.

Abstract Wikipedia aims to build a multilingual Wikipedia, and the back-end technology that it requires may have been a rather big bite for the Wikimedia Foundation to chew on. The ‘many flowers‘ on top of the constructors to generate the text it will have to be if it is serious about the inclusivity, as well as gradual expansion of the natural language generation features during runtime, an expansion that will be paced differently according to the language resources, not unlike that each Wikipedia has its own pace of growth. From the one step at a time perspective, even basic sentences in a short paragraph for a Wikipedia article is an infinite improvement over no article at all. It invites contributions compared to creating a new article from scratch. The bar for making Abstract Wikipedia successful does not necessarily need to be, say, ‘to surpass English articles’.

The mountain we’ll keep climbing, be it with or without the Abstract Wikipedia project. If Abstract Wikipedia is to become a reality and flourish for many languages soon, it needs to allow for molehills, anthills, dykes, dunes, and hills as well, and with whatever flowers available to set it up and make it grow.

Advertisement

ChatGPT, deep learning and the like do not make ontologies (and the rest of AI) obsolete

Countless articles have announced the death of symbolic AI, which includes, among others, ontology engineering, in favour of data-driven AI with deep learning, even more loudly so since large language model-based apps like ChatGPT have captured the public’s attention and imagination. There are those who don’t even realise there is more to AI than deep learning with neural networks. But there is; have a look at the ACM Computing Classification or scroll down to the screenshots at the end of this post if you’re unaware of that. With all the hype and narrow focus, doom and gloom is being predicted with a new AI winter on the cards. But is it? It’s not like we all ditched mathematics at school when portable calculators became cheap gadgets, so why would AI now with machine and deep learning and Large Language Models (LLMs) and an app that attracts attention? Let me touch upon a few examples to illustrate that ontologies have not become obsolete, nor will they.

How exactly do you think data integration is done? Maybe ChatGPT can tell you what’s involved, superficially, but it won’t actually do it for you. Consider, for instance, a paper published earlier this month, on finding clusters of long Covid patient symptoms [Reese23], described in a press release:  they obtained data of 20,532 relevant patients from 38 (!!) data partners, where the authors mapped the clinical findings taken from the electronic health records “to computable terms contained in the Human Phenotype Ontology (HPO), a standard framework for describing human traits … This allowed the researchers to analyze the data across the entire cohort.” (italics are mine). Here’s an illustration of the idea:

Diagram demonstrating how the Human Phenotype Ontology is used for semantic comparisons of electronic health record data to find long covid clusters. (Source: [Reese23] at https://www.thelancet.com/cms/attachment/d7cf87e1-556f-47c0-ae4b-9f5cd8c39b50/gr2.jpg)

Could reliable data integration possibly be done by LLMs? No, not even in the future. NLP with electronic health records is an option, true, but it won’t harmonise terminology for you, nor will it integrate different electronic health record systems.

LLMs aren’t good at playing with data in the myriad of ways where ontologies are used to power ‘intelligent’ applications. Data that’s generated in automation of scientific experiments, for instance, like that cell types in the brain need to be annotated and processed to try to find new cell types and then add annotations with those new types, which is used downstream in queries and further analysis [Tan23]. There is no new stuff in off-the-shelf LLMs, so they can’t help; ontologies can – and do. Ontologies are used and extended as needed to document the new ground truth, which won’t ever be replaced by LLMs, nor by the approximations that machine learning’s outputs are.

What about intelligent analysis of real-time data? Those LLMs won’t be of assistance there either. Take, e.g., energy-optimised building systems control: the system takes real-time data that is linked to an ontology and then it can automatically derive energy conservation measures for the building and its use [Pruvost22].

Much has been written on ChatGPT and education. It’s an application domain that permits for no mistakes on the teaching side of it and, in fact, demands for vetted quality. There are many tasks, from content presentation to assessment. ChatGPT can generate quiz questions, indeed, but only on general knowledge. It can generate a response as well, but whether that will be correct answer is another matter altogether. We also need other types of educational questions besides MCQs, in many disciplines, on specific texts and textbooks with its particular vocabulary, and have the answer computed for automated marking. Computing correct questions and answers can be done with ontologies and some basic automated reasoning services [Raboanary22]. One obtains precision with ontologies that cannot be had with probabilistic guessing. Or take the Foundational Model of Anatomy ontology as a concrete example, which is used to manage the topics in anatomy classes augmented with VR [Soergel22]. Ontologies can also be used as a method of teaching, in art history no less, to push students to dig into the details and be precise [Bertens22] – the opposite of bland, handwaivy, roughly, sort of, non-committal, and fickle responses ChatGPT provides, at times, to open questions. 

They’re just a few application examples that I lazily came across in the timespan of a mere 15 minutes (including selecting them) – one via the LinkedIn timeline, a GS search on “ontologies” with a “since 2022” (17300 results this morning) and clicking a few links that sounded appealing, and one I’m involved in.

This post is not a cry of desperation before sinking, but, rather, mainly one of annoyance. Technology blinkers of any kind are no good and one better has more than just a hammer in one’s toolbox. Not everything can be solved by LLMs and deep learning, and Knowledge Representation (& Reasoning) is not dead. It may have been elbowed to the side by the new kids on the block. I suspect that those in the ‘symbolic AI is obsolete’ camp simply aren’t aware – or would like to pretend not to be aware – of the many different AI-driven computing tasks that need to be solved and implemented. Tasks for which there are no humongous amounts of text or non-text data to grab and learn from. Tasks that are not tolerant to outputs that are noisy or plain wrong. Tasks that require current data, not stale stuff from over a year old and longer ago. Tasks where past data are not a good predictor for the future. Tasks in specialised domains. Tasks that are quirky to a locale. And so on. The NLP community already has recognised LLM’s outputs need fixing, which I was pleasantly surprised with when I attended EMNLP’22 in December (see my EMNLP22 trip report for a few pointers).

Also, and casting the net a little wider, our academic year is about to start, where students need to choose projects and courses, including, among others, another installment of ontology engineering, of logic for AI, Computer Vision, and so on. Perhaps this might assist in choosing and in reflecting that computing as a whole is not going to be obsolete either. ChatGPT and CodePilot can probably pass our 1st-year practical assignments, but there’s so much more computing beyond that, that relies on students understanding the foundations and problem-solving methods. Why should the whole rest of AI, and even computing as a discipline, become obsolete the instant a tool can, at best, regurgitate the known coding solutions to common basic tasks. There are still mathematicians notwithstanding all the devices more powerful than a pocket calculator and there are linguists regardless the free availability of Google Translate’s services; so why would software engineers not remain when there’s a code-completion tool for basic tasks.

Perhaps you still do not care about ontologies and knowledge representation & reasoning. That’s fine; everyone has their interests – just don’t confound new interests for obsolescence of established topics. In case you do want to know more about ontologies and ontology engineering: you may like to have a look at my award-winning open textbook, with exercises, tools, and slides.

p.s.: here are those screenshots on the ACM classification and AI, annotated:

References

[Bertens22] Bertens, L. M. F. Modeling the art historical canon. Arts and Humanities in Higher Education, 2022, 21(3), 240-262.

[Pruvost22] Pruvost, Hervé and Olaf Enge-Rosenblatt. Using Ontologies for Knowledge-Based Monitoring of Building Energy Systems. Computing in Civil Engineering 2021. American Society of Civil Engineers, 2022, pp762-770.

[Roboanary22] Raboanary, T., Wang, S., Keet, C.M. Generating Answerable Questions from Ontologies for Educational Exercises. 15th Metadata and Semantics Research Conference (MTSR’21). Garoufallou, E., Ovalle-Perandones, M-A., Vlachidis, A (Eds.). 2022, Springer CCIS vol. 1537, 28-40.

[Reese23] Reese, J. et al. Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. eBioMedicine, Volume 87, 104413, January 2023.

[Soergel22] Soergel, Dagobert, Olivia Helfer, Steven Lewis, Matthew Wysocki, David Mawer. Using Virtual Reality & Ontologies to Teach System Structure & Function: The Case of Introduction to Anatomy. 12th International conference on the Future of Education 2022. 2022/07/01

[Tan23] Tan, S.Z.K., Kir, H., Aevermann, B.D. et al. Brain Data Standards – A method for building data-driven cell-type ontologies. Scientific Data, 2023, 10, 50.

EMNLP’22 trip report: neuro-symbolic approaches in NLP are on the rise

The trip to the Empirical Methods in Natural Language Processing 2022 conference is certainly one I’ll remember. The conference had well over 1000 in-person people attending what they could of the 6 tutorials and 24 workshops on Wednesday and Thursday, and then the 175 oral presentations, 654 posters, 3 keynotes and a panel session, and 10 Birds of Feather sessions on Friday-Sunday, which was topped off with a welcome reception and a social dinner. The open air dinner was on the one day in the year that it rains in the desert! More precisely on the venue: that was the ADNEC conference centre in Abu Dhabi, from 7 to 11 December.

With so many parallel sessions, it was not always easy to choose. Although I expected many presentations about just large language models (LLMs) that I’m not particularly interested in from a research perspective, it turned out it was very well possible to find a straight road through the parallel NLP sessions with research that had at least added an information-based or a knowledge-based approach to do NLP better. Ha! NLP needs structured data, information, and knowledge to mitigate the problems with hallucinations in natural language generation – elsewhere called “fluent bullshit” – that those LLMs suffer from, among other tasks. Adding a symbolic approach into the mix turned out to be a recurring theme in the conference. Some authors tried to hide a rule-based approach or were apologetic about it, so ‘hot’ the topic is not just yet, but we’ll get there. In any case, it worked so much better for my one-liner intro to state that I’m into ontologies having been branching out to NLG than to say I’m into NLG for African languages. Most people I met had heard of ontologies or knowledge graphs, whereas African languages mostly drew a blank expression.

It was hard to choose what to attend especially on the first day, but eventually I participated in part of the second workshop on Natural Language Generation, Evaluation, and Metrics (GEM’22), NLP for positive impact (NLP4PI’22), and Data Science with Human-in-the-Loop (DaSH’22), and walked into a few more poster sessions of other workshops. The conference sessions had 8 sessions in parallel in each timeslot; I chose the semantics one, ethics, NLG, commonsense reasoning, speech and robotics grounding, and the birds of a feather sessions on ethics and on code-switching. I’ve structured this post by topic rather than by type of session or actual session, however, in the following order: NLP with structured stuff, ethics, a basket with other presentations that were interesting, NLP for African languages, the two BoF sessions, and a few closing remarks. I did at least skim over the papers associated with the presentations and referenced here, and so any errors in discussing the works are still mine. Logistically, the links to the papers in this post are a bit iffy: about 900 EMNLP + workshops papers were already on arxiv according to the organisers, and 828 papers of the main conference are being ingested into the ACL anthology and so its permanent URL is not functional yet, and so my linking practice was inconsistent and may suffer link rot. Be that as it may, let’s get to the science.

The entrance of the conference venue, ADNEC in Abu Dhabi, at the end of the first workshop and tutorials day.

NLP with at least some structured data, information, or knowledge and/or reasoning

I’ve tried to structure this section, roughly going from little addition of structured stuff to more, and then from less to more inferencing.

The first poster session on the first day that I attended was the one of the NLP4PI workshop; it was supposed to be for 1 hour, but after 2.5h it was still being well-attended. I also passed by the adjacent Machine translation session (WMT’22) that also paid off. There were several posters there that were of interest to my inclination toward knowledge engineering. Abhinav Lalwani presented a Findings paper on Logical Fallacy Detection in the NLP4PI’22 poster session, which was interesting both for the computer ethics that I have to teach and their method: create a dataset of 2449 fallacies of 13 types that were taken for online educational resources, machine-learn templates from those sentences – that they call generating a  “structure-aware model” – and then use those templates to find new ones in the wild, which was on climate change claims in this case [1]. Their dataset and code are available on GitHub. The one presented by Lifeng Han from the University of Manchester was part of WMT’22: their aim was to see whether a generic LLM would do better or worse than smaller in-domain language models enhanced with clinical terms extracted from biomedical literature and electronic health records and from class names of (unspecified in the paper) ontologies. The smaller models win, and terms or concepts may win depending on the metric used [2].

For the main conference, and unsurprising for a session called “semantics”, it wasn’t just about LLMs. The first paper was about Structured Knowledge Grounding, of which the tl;dr is that SQL tables and queries improve on the ‘state of the art’ of just GPT-3 [3]. The Reasoning Like Program Executors aims to fix nonsensical numerical output of LLMs by injecting small programs/code for sound numerical reasoning, among the reasoning types that LLMs are incapable of, and are successful at doing so [4]. And there’s a paper on using WordNet for sense retrieval in the context of word in/vs context use, and on discovering that the human evaluators were less biassed than the language model [5].

The commonsense reasoning session also – inevitably, I might add – had papers that combined techniques. The first paper of the session looked into the effects of injecting external knowledge (Comet) to enhance question answering, which is generally positive, and more positive for smaller models [6]. I also have in my notes that they developed an ontology of knowledge types, and so does the paper text claim so, but it is missing from the paper, unless they are referring to the 5 terms in its table 6.

I also remember seeing a poster on using Abstract Meaning Representation. Yes, indeed, and there turned out to be a place for it: for text style transfer to convert a piece of text from one style into another. The text-to-AMR + AMR-to-text model T-STAR beat the state of the art with a 15% increase in content preservation without substantive loss of accuracy (3%) [7].

Moving on to rules and more or less reasoning, first, at the NLP4PI’22 poster session, there was a poster on “Towards Countering Essentialism through Social Bias Reasoning”, which was presented by Maarten Sap. They took a very interdisciplinary approach, mixing logic, psychology and cognitive science to get the job done, and the whole system was entirely rules-based. The motivation was to find a way to assist content moderators by generating possible replies to counter prejudiced statements in online comments. They generated five types of replies and asked users which one they preferred. Types of sample generated replies include, among others, to compute exceptions to the prejudice (e.g., an individual in the group who does not have that trait), attributing the trait also to other groups, and a generic statement on tolerance. Bland seemed to work best. I tried to find the paper for details, but was unsuccessful.

The DaSH’22 presentation about WaNLI concerned the creation of a dataset and pipeline to have crowdsourcing workers and AI “collaborate” in dataset creation, which had a few rules sprinkled into the mix [8]. It turns out that humans are better at revising and evaluating than at creating sentences from scratch, so the pipeline takes that into account. First, from a base set, it uses NLG to generate complement sentences, which are filtered and then reviewed and possibly revised by humans. Complement sentence generation (the AI part) involves taking sentences like “5% chance that the object will be defect free” + “95% that the object will have defects” to then generate (with GPT-3, in this case) the candidate sentence pairs “1% of the seats were vacant” + “99% of the seats were occupied”, using encoded versions of the principles of entailment and set complement, among the reasoning cases used.

Turning up the reasoning a notch, Sean Welleck of the University of Washington gave the keynote at GEM’22. His talks consisted of two parts, on unlearning bad behaviour of LLMs and then an early attempt with a neuro-symbolic approach. The latter concerned connecting a LLM’s output to some logic reasoning. He chose Isabelle, of all reasoners, as a way to get it to check and verify the hallucinations (the nonsense) the LLMs spit out. I asked him why he chose a reasoner for an undecidable language, but the response was not a direct answer. It seemed that he liked the proof trace but was unaware of the undecidability issues. Maybe there’s a future for description logics reasoners here. Elsewhere, and hidden behind a paper title that mentions language models, lies a reality of the ConCoRD relation detection for “boosting consistency of pre-trained language models” with a MAX-SAT solver in the toolbox [9].

Impression of the NLP4PI’22 poster session 2.5h into the 1h session timeslot.

There are (many?) more relevant presentations that I did not get around to attending, such as on dynamic hierarchical reasoning that uses both a LM and a knowledge graph for their scope of question answering [10], a unified representation for graph query language, GraphQ IR [11], and on that RoBERTa, T5, and GPT3 have problems especially with deductive reasoning involving negation [12] and PLOG table-to-logic to enhance table-to-text. Open the conference program handbook and search on things like “commonsense reasoning” or NLI where the I isn’t an abbreviation of Interface but of Inference rather, and there’s even neural-symbolic inference for graph parsing. The compound term “Knowledge graph” has 84 mentions and “reasoning” has 244 mentions. There are also four papers with “grammar induction”, two papers with CFGs, and one with a construction grammar.

It was a pleasant surprise to not be entirely swamped by the “stats/NN + automated metric” formula. I fancy thinking it’s an indication that the frontiers of NLP research already grew out of that and is adding knowledge into the mix.

Ethics and computational social science

Of course, the previously-mentioned topic of trying to fix hallucinations and issues with reasoning and logical coherence of what the language models spit out implies researchers know there’s a problem that needs to be addressed. That is a general issue. Specific ones are unique in their own way; I’ll mention three. Inna Lin presented work on gendered mental health stigma and potential downstream issues with health chatbots that would rely on such language models [13]. For instance, that women were more likely to be recommended to seek professional help and men to toughen up and get on with it. The GeoMLAMA dataset showed that not everything is as bad as one might suspect. The dataset was created to explore multilingual Pre-Trained Language Models on cultural commonsense knowledge, like which colour the dress of the bride is typically. The authors selected English, Chinese, Hindi, Persian, and Swahili. Evaluation showed that multilingual PLMs are not biased toward the USA, that the native language of a country may not be the best language to probe its knowledge (as the commonsense isn’t explicitly stated) and a language may better probe knowledge about a nonnative country than its native country. [14]. The third paper is more about working on a mechanism to help NLP ethics: modelling information change in science communication. The scientist or the press release says one thing, which gets altered slightly in a popular science article, and then morphs into tweets and toots with yet another, different, message. More distortions occurs in the step from popsci article to tweet than from scientist to popsci article. The sort of distortion or ‘not as faithful as one would like’? Notably, “Journalists tend to downplay the certainty and strength of findings from abstracts” and “limitations are more likely to be exaggerated and overstated”. [15]

In contrast, Fatemehsadat Mireshghallah showed some general ethical issues with the very LLMs in her lively presentation. They are so large and have so many parameters that what they end up doing is more alike text memorization and output that memorised text, rather than outputting de novo generated text [16]. She focussed on potential privacy issues, where such models may output sensitive personal data. It also applies to copyright infringement issues: if they return chunk of already existing text, say, a paragraph from this blog, it would be copyright infringement, since I hold the copyright on it by default and I made it CC-BY-NC-SA, which those large LLMs do not adhere to and they don’t credit me. Copilot is already facing a class action lawsuit for unfairly reusing open source code without having obtained permission. In both cases, there’s the question, or task, of removing pieces of text and retraining the model, or not, as well as how to know whether your text was used to create the model. I recall seeing something about that in the presentations and we had some lively discussions about it as well, leaning toward a remove & re-train and suspecting that’s not what’s happening now (except at IBM apparently).

Last, but not least, on this theme: the keynote by Gary Marcus turned out to be a pre-recorded one. It was mostly a popsci talk (see also his recent writings here, among others) on the dangers of those large language models, with plenty of examples of problems with them that have been posted widely recently.

Noteworthy “other” topics

The ‘other’ category in ontologies may be dubious, but here it is not meant as such – I just didn’t have enough material or time to write more about them in this post, but they deserved a mention nonetheless.

The opening keynote of the EMNLP’22 conference by Neil Cohn was great. His main research is in visual languages, and those in comic books in particular. He raised some difficult-to-answer questions and topics. For instance, is language multimodal – vocal, body, graphic – and are gestures separate from, alongside, or part of language? Or take the idea of abstract cognitive principles as basis for both visual and textual language, the hypothesis of “true universals” that should span across modalities, and the idea of “conceptual permeability” on whether the framing in one modality of communication affects the others. He also talked about the topic of cross-cultural diversity in those structures of visual languages, of comic books at least. It almost deserves to be in the “NLP + symbolic” section above, for the grammar he showed and to try to add theory into the mix, rather than just more LLMs and automated evaluation scores.

The other DaSH paper that I enjoyed after aforementioned Wanli was the Cheater’s Bowl, where the authors tried to figure out how humans cheat in online quizzes [17]. Compared to automated open-domain question-answering, humans use fewer keywords more effectively, use more world knowledge to narrow searches, use dynamic refinement and abandonment of search chains, have multiple search chains, and do answer validation. Also in the workshops days setting, I somehow also walked into a poster session of the BlackboxNLP’22 workshop on analysing and interpreting neural networks for NLP. Priyanka Sukumaran enthusiastically talked about her research how LSTMs handle (grammatical) gender [18]. They wanted to know where about in the LSTM a certain grammatical feature is dealt with; and they did, at least for gender agreement in French. The ‘knowledge’ is encoded in just a few nodes and does better on longer than on shorter sentences, since then it can use more other cues in the sentence, including gendered articles, to figure out M/F needed for constructions like noun-adjective agreement. That definitely is alike the same way humans do, but then, algorithms do not need to copy human cognitive processes.

NLP4PI’s keynote was given Preslav Nakov, who recently moved to the Mohamed Bin Zayed University of AI. He gave an interesting talk about fake news, mis- and dis-information detection, and also differentiated it with propaganda detection that, in turn, consists of emotion and logical fallacy detection. If I remember correctly, not with knowledge-based approaches either, but interesting nonetheless.

I had more papers marked for follow up, including on text generation evaluation [19], but this post is starting to become very long as it is already.

Papers with African languages, and Niger-Congo B (‘Bantu’) languages in particular

Last, but not least, something on African languages. There were a few. Some papers had it clearly in the title, others not at all but they used at least one of them in their dataset. The list here is thus incomplete and merely reflects on what I came across.

On the first day, as part of NLP4PI, there was also a poster on participatory translations of Oshiwambo, a language spoken in Namibia, which was presented by Jenalea Rajab from Wits and Millicent Ochieng from Microsoft Kenya, both with the masakhane initiative; the associated paper seems to have been presented at the ICLR 2022 Workshop on AfricaNLP. Also within the masakhane project is the progress on named entity recognition [20]. My UCT colleague Jan Buys also had papers with poster presentation, together with two of his students, Khalid Elmadani and Francois Meyer. One was part of the WMT’22 on multilingual machine translation for African languages [21] and another on sub-word segmentation for Nguni languages (EMNLP Findings) [22]. The authors of AfroLID show results that they have some 96% accuracy on identification of a whopping 517 African languages, which sounds very impressive [23].

Birds of a Feather sessions

The BoF sessions seemed to be loosely organised discussions and exchange-of-ideas about a specific topic. I tried out the Ethics and NLP one, organised by Fatemehsadat Mireshghallah, Luciana Benotti, and Patrick Blackburn, and the code-switching & multilinguality one, organised by Genta Winata, Marina Zhukova, and Sudipta Kar. Both sessions were very lively and constructive and I can recommend to go to at least one of them the next time you’ll attend EMNLP or organise something like that at a conference. The former had specific questions for discussion, such as on the reviewing process and on that required ethics paragraph; the latter had themes, including datasets and models for code-switching and metrics for evaluation. For ethics, there seems to be a direction to head toward, whereas the NLP for code-switching seems to be still very much in its infancy.

Final remarks

As if all that wasn’t keeping me busy already, there were lots of interesting conversations, meeting people I haven’t seen in many years, including Barbara Plank who finished her undergraduate studies at FUB when I was a PhD student there (and focussing on ontologies rather, which I still do) and likewise for Luciana Benotti (who had started her European Masters at that time, also at FUB); people with whom I had emailed before but not met due to the pandemic; and new introductions. There was a reception and an open air social dinner; an evening off meeting an old flatmate from my first degree and a soccer watch party seeing Argentina win; and half a day off after the conference to bridge the wait for the bus to leave which time I used to visit the mosque (it doubles as worthwhile tourist attraction), chat with other attendees hanging around for their evening travels, and start writing this post.

Will I go to another EMNLP? Perhaps. Attendance was most definitely very useful, some relevant research outputs I do have, and there’s cookie dough and buns in the oven, but I’d first need a few new bucketloads of funding to be able to pay for the very high registration cost that comes on top of the ever increasing travel expenses. EMNLP’23 will be in Singapore.

References

[1] Zhijing Jin, Abhinav Lalwani, Tejas Vaidhya, Xiaoyu Shen, Yiwen Ding, Zhiheng Lyu, Mrinmaya Sachan, Rada Mihalcea, Bernhard Schölkopf. Logical Fallacy Detection. EMNLP’22 Findings.

[2] L Han, G Erofeev, I Sorokina, S Gladkoff, G Nenadic Examining Large Pre-Trained Language Models for Machine Translation: What You Don’t Know About It. 7th Conference on Machine translation at EMNLP’22.

[3] Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer and Tao Yu. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. EMNLP’22.

[4] Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Qiang Fu, Yan Gao, Jian-Guang LOU and Weizhu Chen. Reasoning Like Program Executors. EMNLP’22

[5] Qianchu Liu, Diana McCarthy and Anna Korhonen. Measuring Context-Word Biases in Lexical Semantic Datasets. EMNLP’22

[6] Yash Kumar Lal, Niket Tandon, Tanvi Aggarwal, Horace Liu, Nathanael Chambers, Raymond Mooney and Niranjan Balasubramanian. Using Commonsense Knowledge to Answer Why-Questions. EMNLP’22

[7] Anubhav Jangra, Preksha Nema and Aravindan Raghuveer. T-STAR: Truthful Style Transfer using AMR Graph as Intermediate Representation. EMNLP’22

[8] A Liu, S Swayamdipta, NA Smith, Y Choi. Wanli: Worker and ai collaboration for natural language inference dataset creation. DaSH’22 at EMNLP2022.

[9] Eric Mitchell, Joseph Noh, Siyan Li, Will Armstrong, Ananth Agarwal, Patrick Liu, Chelsea Finn and Christopher Manning. Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference. EMNLP’22

[10] Miao Zhang, Rufeng Dai, Ming Dong and Tingting He. DRLK: Dynamic Hierarchical Reasoning with Language Model and Knowledge Graph for Question Answering. EMNLP’22

[11] Lunyiu Nie, Shulin Cao, Jiaxin Shi, Jiuding Sun, Qi Tian, Lei Hou, Juanzi Li, Jidong Zhai GraphQ IR: Unifying the semantic parsing of graph query languages with one intermediate representation. EMNLP’22

[12] Soumya Sanyal, Zeyi Liao and Xiang Ren. RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners. EMNLP’22

[13] Inna Lin, Lucille Njoo, Anjalie Field, Ashish Sharma, Katharina Reinecke, Tim Althoff and Yulia Tsvetkov. Gendered Mental Health Stigma in Masked Language Models. EMNLP’22

[14] Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li, Kai-Wei Chang. Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models. EMNLP’22

[15] Dustin Wright, Jiaxin Pei, David Jurgens, Isabelle Augenstein. Modeling Information Change in Science Communication with Semantically Matched Paraphrases. EMNLP’22

[16] Fatemehsadat Mireshghallah, Archit Uniyal, Tianhao Wang, David Evans and Taylor Berg-Kirkpatrick. An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models. EMNLP’22

[17] Cheater’s Bowl: Human vs. Computer Search Strategies for Open-Domain QA. DaSH’22 at EMNLP2022.

[18] Priyanka Sukumaran, Conor Houghton,Nina Kazanina. Do LSTMs See Gender? Probing the Ability of LSTMs to Learn Abstract Syntactic Rules. BlackboxNLP’22 at EMNLP2022. 7-11 Dec 2022, Abu Dhabi, UAE. arXiv:2211.00153

[19] Ming Zhong, Yang Liu, Da Yin, Yuning Mao, Yizhu Jiao, Pengfei Liu, Chenguang Zhu, Heng Ji and Jiawei Han. Towards a Unified Multi-Dimensional Evaluator for Text Generation. EMNLP’22

[20] David Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba Alabi, Shamsuddeen Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson KALIPE, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau, Edwin Munkoh-Buabeng, victoire Memdjokam Koagne, Allahsera Auguste Tapo, Tebogo Macucwa, Vukosi Marivate, MBONING TCHIAZE Elvis, Tajuddeen Gwadabe, Tosin Adewumi, Orevaoghene Ahia and Joyce Nakatumba-Nabende. MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. EMNLP’22

[21] Khalid Elmadani, Francois Meyer and Jan Buys. University of Cape Town’s WMT22 System: Multilingual Machine Translation for Southern African Languages. WMT’22 at EMNLP’22.

[22] Francois Meyer and Jan Buys. Subword Segmental Language Modelling for Nguni Languages. Findings of EMNLP, 7-11 December 2022, Abu Dhabi, United Arab Emirates.

[23] Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed and Alcides Inciarte. AfroLID: A Neural Language Identification Tool for African Languages. EMNLP’22

“Grammar infused” templates for NLG

It’s hardly ever entirely one extreme or the other in natural language generation and controlled natural languages. Rarely can one get away with simplistic ‘just fill in the blanks’ templates that do not do any grammar or phonological processing to make the output better; our technical report about work done some 17 years ago was a case in point on the limitations thereof if one still needs to be convinced [1]. But where does NLG start? I agree with Ehud Reiter that it isn’t about template versus NLG, but a case of levels of sophistication: the fill-in-the-blank templates definitely don’t count as NLG and full-fledged grammar-only systems definitely do, with anything in-between a grey area. Adding word-level grammatical functions to templates makes it lean to NLG, or even indeed being so if there are relatively many such rules, and dynamically creating nicely readable sentences with aggregation and connectives counts as NLG for sure, too.

With that in mind, we struggled with how to name the beasts we had created for generating sentences in isiZulu [2], a Niger-Congo B language: nearly every resultant word in the generated sentences required a number of grammar rules to make it render sufficiently well (i.e., at least grammatically acceptable and understandable). Since we didn’t have a proper grammar engine yet, but we knew they could never be fill-in-the-blank templates either, we dubbed them verbalisation patterns. Most systems (by number of systems) use either only templates or templates+grammar, so our implemented system [3] was in good company. It may sound like oldskool technology, but you go ask Meta with their Galactica if a ML/DL-based approach is great for generating sensible text that doesn’t hallucinate… and does it well for languages other than English.

That said, honestly, those first attempts we did for isiZulu were not ideal for reusability and maintainability – that was not the focus – and it opened up another can of worms: how do you link templates to (partial) grammar rules? With the ‘partial’ motivated by taking it one step at a time in grammar engine development, as a sort of agile engine development process that is relevant especially for languages that are not well-resourced.

We looked into this recently. There turn out to be three key mechanisms for linking templates to computational grammar rules: embedding (E), where grammar rules are mixed with the templates specifications and therewith co-dependent, and compulsory (C) and partial (P) attachment where there is, or can be, an independent existence of the grammar rules.

Attachment of grammar rules (that can be separated) vs embedding of grammar rules in a system (intertwined with templates) (Source: [6])

The difference between the latter two is subtle but important for use and reuse of grammar rules in the software system and the NLG-ness of it: if each template must use at least one rule from the set of grammar rules and each rule is used somewhere, then the set of rules is compulsorily attached. Conversely, it is partially attached if there are templates in that system that don’t have any grammar rules attached. Whether it is partial because it’s not needed (e.g., the natural language’s grammar is pretty basic) or because the system is on the fill-in-the-blank not-NLG end of the spectrum, is a separate question, but for sure the compulsory one is more on the NLG side of things. Also, a system may use more than one of them in different places; e.g., EC, both embedding and compulsory attachment. This was introduced in [4] in 2019 and expanded upon in a journal article entitled Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation [5] that was published in IJMSO, and even more detail can be found in Zola Mahlaza’s recently completed PhD thesis [6]. These papers have various examples, illustrations how to categorise a system, and why one system was categorised in one way and not another. Here’s a table with several systems that combine templates and computational grammar rules and how they are categorised:

Source: [5]

We needed a short-hand name to refer to the cumbersome and wordy description of ‘combining templates with grammar rules in a [theoretical or implemented] system in some way’, which ended up to be grammar-infused templates.

Why write about this now? Besides certain pandemic-induced priorities in 2021, the recently proposed template language for Abstract Wikipedia that I blogged about before may mix Compulsory or Partial attachment, but ought not to permit the messy embedding of grammar in a template. This may not have been clear in v1 of the proposal, but hopefully it is a little bit more so in this new version that was put online over the past few days. To make that long story short: besides a few notes at the start of its Section 3, there’s a generic description of an idea for a realization algorithm. Its details don’t matter if you don’t intend to design a new realiser from scratch and maybe not either if you want to link it to your existing system. The key take-away from that section is that there’s where the real grammar and phonological conditioning stuff happens if it’s needed. For example, for the ‘age in years’ sub-template for isiZulu, recall that’s:

Year_zu(years):"{root:Lexeme(L686326)} {concord:RelativeConcord()}{Copula()}{concord_1<nummod:NounPrefix()}-{nummod:Cardinal(years)}"

The template language sets some boundaries for declaring such a template, but it is a realiser that has to interpret ‘keywords’, such as root, concord, and RelativeConcord, and do something with it so that the output ends up correctly; in this case, from ‘year’ + ‘25’ as input data to iminyaka engama-25 as outputted text. That process might be done in line with Ariel Gutman’s realiser pipeline for Abstract Wikipedia and his proof-of-concept implementation with Scribunto or any other realizer architecture or system, such as Grammatical Framework, SimpleNLG, NinaiUdiron, or Zola’s Nguni Grammar Engine, among several options for multilingual text generation. It might sound silly to put templates on top of the heavy machinery of a grammar engine, but it will make it more accessible to the general public so that they can specify how sentences should be generated. And, hopefully, permit a rules-as-you-go approach as well.

It is then the realiser (including grammar) engine and the partially or compulsorily attached computational grammar rules and other algorithms that work with the template. For the example, when it sees root and that the lemma fetched is a noun (L686326 is unyaka ‘year’), it also fetches the value of the noun class (a grammatical feature stored with the noun), which we always need somewhere for isiZulu NLG. It then needs to figure out to make a plural out of ‘year’, which it will know that it must do thanks to the years fetched for the instance (i.e., 25, which is plural) and the nummod that links to the root by virtue of the design and the assumption there’s a (dependency) grammar. Then, with concord:RelativeConcord, it will fetch the relative concord for that noun class, since concord also links to root. We already can do the concordial agreements and pluralising of nouns (and much more!) for isiZulu since several years. The only hurdle is that that code would need to become interoperable with the template language specification, in that our realisers will have to be able to recognise and process properly those ‘keywords’. Those words are part of an extensible set of words inspired by dependency grammars.

How this is supposed to interact smoothly is to be figured out still. Part of that is touched upon in the section about instrumentalising the template language: you could, for instance, specify it as functions in Wikifunctions that is instantly editable, facilitating an add-rules-as-you-go approach. Or it can be done less flexibly, by mapping or transforming it to another template language or to the specification of an external realiser (since it’s the principle of attachment, not embedding, of computational grammar rules).

In closing, whether the term “grammar-infused templates” will stick remains to be seen, but combining templates with grammars in some way for NLG will have a solid future at least for as long as those ML/DL-based large language model systems keep hallucinating and don’t consider languages other than English, including the intended multilingual setting for Abstract Wikipedia.

References

[1] M. Jarrar, C.M. Keet, and P. Dongilli. Multilingual verbalization of ORM conceptual models and axiomatized ontologies. STARLab Technical Report, Vrije Universiteit Brussels, Belgium. February 2006.

[2] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157. (accepted version free access)

[3] Keet, C.M. Xakaza, M., Khumalo, L. Verbalising OWL ontologies in isiZulu with Python. The Semantic Web: ESWC 2017 Satellite Events, Blomqvist, E et al. (eds.). Springer LNCS vol 10577, 59-64. Portoroz, Slovenia, May 28 – June 2, 2017.

[4] Mahlaza, Z., Keet, C.M. A classification of grammar-infused templates for ontology and model verbalisation. 13th Metadata and Semantics Research Conference (MTSR’19). E. Garoufallou et al. (Eds.). Springer vol. CCIS 1057, 64-76. 28-31 Oct 2019, Rome, Italy.

[5] Mahlaza, Z., Keet, C.M. Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation. International Journal of Metadata, Semantics and Ontologies, 2020, 14(3): 249-262.

[6] Mahlaza, Z. Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu. PhD Thesis, Department of Computer Science, University of Cape Town, South Africa. 2022.

A review of NLG realizers and a new architecture

That last step in the process of generating text from some structured representation of data, information or knowledge is done by things called surface realizers. They take care of the ‘finishing touches’ – syntax, morphology, and orthography – to make good natural language sentences out of an ontology, conceptual data model, or Wikidata data, among many possible sources that can be used for declaring abstract representations. Besides theories, there are also many tools that try to get that working at least to some extent. Which ways, or system architectures, are available for generating the text? Which components do they all, or at least most of them, have? Where are the differences and how do they matter? Will they work for African languages? And if not, then what?

My soon-to-graduate PhD student Zola Mahlaza and I set out to answer these questions, and more, and the outcome is described in the article Surface realization architecture for low-resourced African languages that was recently accepted and is now in print with the ACM Transactions on Asian and Low-Resource Language Information Processing (ACM TALLIP) journal [1].

Zola examined 77 systems, which exhibited some 13 different principal architectures that could be classified into 6 distinct architecture categories. Purely by number of systems, manually coded and rule-based would be the most popular, but there are a few hybrid and data-driven systems as well. A consensus architecture for realisers there is not. And none exhibit most of the software maintainability characteristics, like modularity, reusability, and analysability that we need for African languages (even more so than for better resourced languages). African is narrowed down in the paper further to those in the Niger-Congo B (‘Bantu’) family of languages. One of the tricky things is that there’s a lot going on at the sub-word level with these languages, whereas practically all extant realizers operate at the word-level.

Hence, the next step was to create a new surface realizer architecture that is suitable for low-resourced African languages and that is maintainable. Perhaps unsurprisingly, since the paper is in print, this new architecture compares favourably against the required features. The new architecture also has ‘bonus’ features, like being guided by an ontology with a template ontology [2] for verification and interoperability. All its components and the rationale for putting it together this way are described in Section 5 of the article and the maintainability claims are discussed in its Section 6.

Source: [1]

There’s also a brief illustration how one can redesign a realiser into the proposed architecture. We redesigned the architecture of OWLSIZ for question generation in isiZulu [3] as use case. The code of that redesign of OWLSIZ is available, i.e., it’s not merely a case of just having drawn a different diagram, but it was actually proof-of-concept tested that it can be done.

While I obviously know what’s going on in the article, if you’d like to know much more details than what’s described there, I suggest you consult Zola as the main author of the article or his (soon to be available online) PhD thesis [4] that devotes roughly a chapter to this topic.

References

[1] Mahlaza, Z., Keet, C.M. Surface realisation architecture for low-resourced African languages. ACM Transactions on Asian and Low-Resource Language Information Processing, (in print). DOI: 10.1145/3567594.

[2] Mahlaza, Z., Keet, C.M. ToCT: A task ontology to manage complex templates. FOIS’21 Ontology Showcase. Sanfilippo, E.M. et al. (Eds.). CEUR-WS vol. 2969. 9p.

[3] Mahlaza, Z., Keet, C.M.: OWLSIZ: An isiZulu CNL for structured knowledge validation. In: Proc. of WebNLG+ 2020. pp. 15–25. ACL, Dublin, Ireland (Virtual).

[4] Mahlaza, Z. Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu. PhD Thesis, Department of Computer Science, University of Cape Town, South Africa. 2022.

A proposal for a template language for Abstract Wikipedia

Natural language generation applications have been ‘mainstreaming’ behind the scenes for the last couple of years, from automatically generating text for images, to weather forecasts, summarising news articles, digital assistants that mechanically blurt out text based the structured information they have, and many more. Google, Reuters, BBC, Facebook – they all do it. Wikipedia is working on it as well, principally within the scope of Abstract Wikipedia to try to build a better multilingual Wikipedia [1] to reach more readers better. They all have some source of structured content – like data fetched from a database or spreadsheet, information from, say, a UML class diagram, or knowledge from some knowledge graph or ontology – and a specification as to what the structure of the sentence should be, typically with some grammar rules to at least prettify it, if not also being essential to generate a grammatically correct sentence [2]. That specification is written in templates that are then filled with content.

For instance, a simple rendering of a template may be “Each [C1] [R1] at least one [C2]” or “[I1] is an instance of [C1]”, where the things within the square brackets are variables standing in for content that will be fetched from the source, like a class, relationship, or individual. Linking these to a knowledge graph about universities, it may generate, e.g., “Each academic teaches at least one course” and “Joanne Soap is an instance of Academic”. To get the computer to do this, just “Each [C1] [R1] at least one [C2]” for template won’t do: we need to tell it what the components are so that the program can process it to generate that (pseudo-)natural language sentence.

Many years ago, we did this for multiple languages and used XML to specify the templates for the key aspects of the content. The structured input were conceptual data models in ORM in the DOGMA tool that had that verbalisation component [3]. As example, the template for verbalising a mandatory constraint was as follows:

<Constraint xsi:type="Mandatory">
 <Text> - [Mandatory] Each</Text>
 <Object index="0"/>
 <Text>must</Text>
 <Role index="0"/>
 <Text>at least one</Text>
 <Object index="1"/>
</Constraint>

Besides demarcating the sentence and indicating the constraint, there’s fixed text within the <text> … </text> tags and there’s the variable part with the <Object… that declares that the name of the object type has to be fetched and the <Role… that declares that the name of the relationship has to be fetched from the model (well, more precisely in this care: the reading label), which were elements declared in an XML Schema. With the same example as before, where Academic is in the object index “0” position and Course in the “1” position (see [3] for details), the software would then generate “ – [Mandatory] Each Academic must teaches at least one Course.”

This can be turned up several notches by adding grammatical features to it in order to handle, among others, gender for nouns in German, because they affect the rendering of the ‘each’ and ‘one’ in the sample sentence, not to mention the noun classes of isiZulu and many other languages [4], where even the verb conjugation depends on the noun class of the noun that plays the role of subject in the sentence. Or you could add sentence aggregation to combine two templates into one larger one to generate more flowy text, like a “Joanne Soap is an academic who teaches at least one course”. Or change the application scenario or the machinery for how to deal with the templates. For instance, instead of those variables in the template + code elsewhere that does the content fetching and any linguistic processing, we could put part of that in the template specification. Then there are no variables as such in the template, but functions. The template specification for that same constraint in an ORM diagram might then look like this:

ConstraintIsMandatory {
 “[Mandatory] Each ”
 FetchObjectType(0)
 “ must ”
 MakeInfinitive(FetchRole(0))
 “ at least one ”
 FetchObjectType(1)}

If you want to go with newer technology than markup languages, you may prefer to specify it in JSON. If you’re excited about functional programming languages and see everything through the lens of functions, you even can turn the whole template specification into a bunch of only functions. Either way: there must be a specification of how those templates are permitted to look like, or: what elements can be used to make a valid specification of a template. This so that the software will work properly so that it neither will spit out garbage nor will halt halfway before returning anything. What is permitted in a template language can be specified by means of a model, such as an XML Schema or a DTD, a JSON artefact, or even an ontology [5], a formal definition in some notation of choice, or by defining a grammar (be it a CFG or in BNF notation), and anyhow with enough documentation to figure out what’s going on.

How might this look like in the context of Abstract Wikipedia? For the natural language generation aspects and its first proposal for the realiser architecture, the structured content to be rendered in a natural language sentence is fetched from Wikidata, as is the lexicographic data, and the functions to do the various computations are to come from/go in Wikifunctions. They’re then combined with the templates in various stages in the realiser pipeline to generate those sentences. But there was still a gap as to what those templates in this context may look like. Ariel Gutman, a google.org fellow working on Abstract Wikipedia, and I gave it a try and that proposal for a template language for Abstract Wikipedia is now online accessible for comment, feedback, and, if you happen to speak a grammatically rich language, an option to provide difficult examples so that we can check whether the language is expressive enough.

The proposal is – as any other proposal for a software system – some combination of theoretical foundations, software infrastructure peculiarities, reasoned and arbitrary design decisions, compromises, and time constraints. Here’s a diagram of the key aspects of the syntax, i.e., with the elements, how they relate, and the constraints holding between them, in ORM notation:

An illustrative diagram with the key features of the template language in ORM notation.

There’s also a version in CFG notation, and there are a few examples, each of which shows how the template looks like for verbalising one piece of information (Malala Yousafzai’s age) in Swedish, French, Hebrew, and isiZulu. Swedish is the simplest one, as would English or Dutch be, so let’s begin with that:

Persoon_leeftijd_nl(Entity,Age_in_years): “{Person(Entity) is 
  {Age_in_years} jaar.}”

Where the Person(Entity) fetches the name of the person (that’s identified by an identifier) and the Age_in_years fetches the age. One may like to complicate matters and add a conditional statement, like that any age less than 30 will render that last part not just as jaar ‘year’, but as jaar oud ‘years old’ but jaar jong ‘years young’, but where that dividing line is, is a sensitive topic for some and I will let that rest. In any case, in Dutch, there’s no processing of the number itself to be able to render it in the sentence – 25 renders as 25 – but in other languages there is. For instance, in isiZulu. In that case, instead of a simple fetching of the number, we can put a function in the slot:

Person_AgeYr_zu(Entity,Age_in_years): “{subj:Person(Entity)} 
  {root:subjConcord()}na{Year(Age_in_years).}”

That Year(Age_in_years) is a function that is based on either another function or a sub-template. For instance, it can be defined as follows:

Year_zu(years):"{root:Lexeme(L686326)} 
  {concord:RelativeConcord()}{Copula()}{concord_1<nummod:NounPrefix()}-
  {nummod:Cardinal(years)}"

Where Lexeme(L686326) is the word for ‘year’ in isiZulu, unyaka, and for the rest, it first links the age rendering to the ‘year’ with the RelativeConcord() of that word, which practically fetches e- for the ‘years’ (iminyaka, noun class 4),  then gets the copulative (ng in this case), and then the concord for the noun class of the noun of the number. Malala is in her 20s, which is amashumi amabili ..  (noun class 6, which is computed via Cardinal(years)), and thus the function nounPrefix will fetch ama-. So, for Malala’s age data, Year_zu(years) will return iminyaka engama-25. That then gets processed with the rest of the Person_AgeYr_zu template, such as adding an U to the name by subj:Person(Entity), and later steps in the pipeline that take care of things like phonological conditioning (-na- + i- = –ne-), to eventually output UMalala Yousafzai uneminyaka engama-25. In other words: such a template indeed can be specified with the proposed template syntax.

There’s also a section in the proposal about how that template language then connects to the composition syntax so that it can be processed by the Wikifunctions Orchestrator component of the overall architecture. That helps hiding a few complexities from the template declarations, but, yes, someone’s got to write those functions (or take them from existing grammar engines) that will take care of those more or less complicated processing steps. That’s a different problem to solve. You also could link it up with another realiser by means of a transformation the the input type it expects. For now, it’s the syntax of the declarative part for the templates.

If you have any questions or comments or suggestions on that proposal or interesting use cases to test with, please don’t hesitate to add something to the talk page of the proposal, leave a comment here, or contact either Ariel or me directly.

 

References

[1] Vrandečić, D. Building a multilingual Wikipedia. Communications of the ACM, 2021, 64(4), 38-41.

[2] Mahlaza, Z., Keet, C.M. Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation. International Journal of Metadata, Semantics and Ontologies, 2020, 14(3): 249-262.

[3] M. Jarrar, C.M. Keet, and P. Dongilli. Multilingual verbalization of ORM conceptual models and axiomatized ontologies. STARLab Technical Report, Vrije Universiteit Brussels, Belgium. February 2006.

[4] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157.

[5] Mahlaza, Z., Keet, C. M. ToCT: A Task Ontology to Manage Complex Templates. Proceedings of the Joint Ontology Workshops 2021, FOIS’21 Ontology Showcase. Sanfilippo, E.M. et al. (Eds.). CEUR-WS vol. 2969. 9p.

Progress on generating educational questions from ontologies

With increasing student numbers, but not as much more funding for schools and universities, and the desire to automate certain tasks anyhow, there have been multiple efforts to generate and mark educational exercises automatically. There are a number of efforts for the relatively easy tasks, such as for learning a language, which range from the entry level with simple vocabulary exercises to advanced ones of automatically marking essays. I’ve dabbled in that area as well, mainly with 3rd-year capstone projects and 4th-year honours project student projects [1]. Then there’s one notch up with fact recall and concept meaning recall questions, and further steps up, such as generating multiple-choice questions (MCQs) with not just obviously wrong distractors but good distractors to make the question harder. There’s quite a bit of work done on generating those MCQs in theory and in tooling, notably [2,3,4,5]. As a recent review [6] also notes, however, there are still quite a few gaps. Among others, about generalisability of theory and systems – can you plug in any structured data or knowledge source to question templates – and the type of questions. Most of the research on ‘not-so-hard to generate and mark’ questions has been done for MCQs, but there are multiple of other types of questions that also should be doable to generate automatically, such as true/false, yes/no, and enumerations. For instance, with an axiom such as impala \sqsubseteq \exists livesOn.land in a ontology or knowledge graph, a suitable question generation system may then generate “Does an impala live on land?” or “True or false: An impala lives on land.”, among other options.

We set out to make a start with tackling those sort of questions, for the type-level information from an ontology (cf. facts in the ABox or knowledge graph). The only work done there, when we started with it, was for the slick and fancy Inquire Biology [5], but which did not have their tech available for inspection and use, so we had to start from scratch. In particular, we wanted to find a way to be able to plug in any ontology into a system and generate those non-MCQ other types of educations questions (10 in total), where the questions generated are at least grammatically good and for which the answers also can be generated automatically, so that we get to automated marking as well.

Initial explorations started in 2019 with an honours project to develop some basics and a baseline, which was then expanded upon. Meanwhile, we have some more designed, developed, and evaluated, which was written up in the paper “Generating Answerable Questions from Ontologies for Educational Exercises” [7] that has been accepted for publication and presentation at the 15th international conference on metadata and semantics research (MTSR’21) that will be held online next week.

In short:

  • Different types of questions and the answer they have to provide put different prerequisites on the content of the ontology with certain types of axioms. We specified those for 10 types of educational questions.
  • Three strategies of question generation were devised, being ‘simple’ from the vocabulary and axioms and plug it into a template, guided by some more semantics in the ontology (a foundational ontology), and one that didn’t really care about either but rather took a natural language approach. Variants were added to cater for differences in naming and other variations, amounting to 75 question templates in total.
  • The human evaluation with questions generated from three ontologies showed that while the semantics-based one was slightly better than the baseline, the NLP-based one gave the best results on syntactic and semantic correctness of the sentences (according to the human evaluators).
  • It was tested with several ontologies in different domains, and the generalisability looks promising.
Graphical Abstract (made by Toky Raboanary)

To be honest to those getting their hopes up: there are some issues that cause it never to make it to the ‘100% fabulous!’ if one still wants to designs a system that should be able to take any ontology as input. A main culprit is naming of elements in the ontology, which varies widely across ontologies. There are several guidelines for how to name entities, such as using camel case or underscores, and those things easily can be coded into an algorithm, indeed, but developers don’t stick to them consistently or there’s an ontology import that uses another naming convention so that there likely will be a glitch in the generated sentences here or there. Or they name things within the context of the hierarchy where they put the class, but in the question it is out of that context and then looks weird or is even meaningless. I moaned about this before; e.g., ‘American’ as the name of the class that should have been named ‘American Pizza’ in the Pizza ontology. Or the word used for the name of the class can have different POS tags such that it makes the generated sentence hard to read; e.g., ‘stuff’ as a noun or a verb.

Be this as it may, overall, promising results were obtained and are being extended (more to follow). Some details can be found in the (CRC of the) paper and the algorithms and data are available from the GitHub repo. The first author of the paper, Toky Raboanary, recently made a short presentation video about the paper for the yearly Open Evening/Showcase, which was held virtually and that page is still online available.

References

[1] Gilbert, N., Keet, C.M. Automating question generation and marking of language learning exercises for isiZulu. 6th International Workshop on Controlled Natural language (CNL’18). Davis, B., Keet, C.M., Wyner, A. (Eds.). IOS Press, FAIA vol. 304, 31-40. Co. Kildare, Ireland, 27-28 August 2018.

[2] Alsubait, T., Parsia, B., Sattler, U. Ontology-based multiple choice question generation. KI – Kuenstliche Intelligenz, 2016, 30(2), 183-188.

[3] Rodriguez Rocha, O., Faron Zucker, C. Automatic generation of quizzes from dbpedia according to educational standards. In: The Third Educational Knowledge Management Workshop. pp. 1035-1041 (2018), Lyon, France. April 23 – 27, 2018.

[4] Vega-Gorgojo, G. Clover Quiz: A trivia game powered by DBpedia. Semantic Web Journal, 2019, 10(4), 779-793.

[5] Chaudhri, V., Cheng, B., Overholtzer, A., Roschelle, J., Spaulding, A., Clark, P., Greaves, M., Gunning, D. Inquire biology: A textbook that answers questions. AI Magazine, 2013, 34(3), 55-72.

[6] Kurdi, G., Leo, J., Parsia, B., Sattler, U., Al-Emari, S. A systematic review of automatic question generation for educational purposes. Int. J. Artif. Intell. Edu, 2020, 30(1), 121-204.

[7] Raboanary, T., Wang, S., Keet, C.M. Generating Answerable Questions from Ontologies for Educational Exercises. 15th Metadata and Semantics Research Conference (MTSR’21). 29 Nov – 3 Dec, Madrid, Spain / online. Springer CCIS (in print).

NLG requirements for social robots in Sub-Saharan Africa

When the robots come rolling, or just trickling or seeping or slowly creeping, into daily life, I want them to be culturally aware, give contextually relevant responses, and to do that in a language that the user can understand and speak well. Currently, they don’t. Since I work and in live in South Africa, then what does all that mean for the Southern Africa context? Would social robot use case scenarios be different here than in the Global North where most of the robot research and development is happening, and if so, how? What is meant with  contextually relevant responses? Which language(s) should the robot communicate in?

The question of which languages is the easiest to answer: those spoken in this region, which are mainly those in the Niger-Congo B [NCB] (aka ‘Bantu’) family of languages, and then also Portuguese, French, Afrikaans, and English. I’ve been working on theory and tools for NCB languages, and isiZulu in particular (and some isiXhosa and Runyankore), whose research was mainly as part of the two NRF-funded projects GeNI and MoReNL. However, if we don’t know how that human-robot interaction occurs in which setting, we won’t know whether the algorithms designed so far can also be used for that, which may well be beyond the ontology verbalisation, a patient’s medicine prescription generation, weather forecasts, or language learning exercises that we roughly got covered for the controlled language and natural language generation aspects of it.

So then what about those use case scenarios and contextually relevant responses? Let me first give an example of the latter. A few years ago in one of the social issues and professional practice lectures I was teaching, I brought in the Amazon Echo to illustrate precisely that as well as privacy issues with Alexa and digital assistants (‘robot secretaries’) in general. Upon asking “What is the EFF?”, the whole class—some 300 students present at the time—was expecting that Alexa would respond with something like “The EFF is the economic freedom fighters, a political party in South Africa”. Instead, Alexa fetched the international/US-based answer and responded with “The EFF is the electronic frontier foundation” that the class had never heard of and that EFF doesn’t really do anything in South Africa (it does pass the revue later on in the module nonetheless, btw). There’s plenty of online content about the EFF as political party, yet Alexa chose to ignore that and prioritise information from elsewhere. Go figure with lots of other information that has limited online presence and doesn’t score high in the search engine results because there are fewer queries about it. How to get the right answer in those cases is not my problem (area of expertise), but I take that a solved black box and zoom in on the natural language aspects to automatically generate a sentence that has the answer taken from some structured data or knowledge.

The other aspect of this instance, is that the interactions both during and after the lecture was not a 1:1 interaction of students with their own version of Siri or Cortana and the like, but eager and curious students came in teams, so a 1:m interaction. While that particular class is relatively large and was already split into two sessions, larger classes are also not uncommon in several Sub-Saharan countries: for secondary school class sizes, the SADC average is 23.55 learners per class (the world average is 17), with the lowest is Botswana (13.8 learners) and the highest in Malawi with a whopping 72.3 learners in a class, on average. An educational robot could well be a useful way to get out of that catch-22, and, given resource constraints, end up as a deployment scenario with a robot per study group, and that in a multilingual setting that permits code switching (going back and forth between different languages). While human-robot interaction experts still will need to do some contextual inquiries and such to get to the bottom of the exact requirements and sentences, this variation in use is on top of the hitherto know possible ways for educational robots.

Going beyond this sort of informal chatter, I tried to structure that a bit and narrowed it down to a requirements analysis for the natural language generation aspects of it. After some contextualisation, I principally used two main use cases to elucidate natural language generation requirements and assessed that against key advances in research and technologies for NCB languages. Very, very, briefly, any system will need to i) combine data-to-text and knowledge-to-text, ii) generate many more different types of sentences, including sentences for  both written and spoken languages in the NCB languages that are grammatically rich and often agglutinating, and iii) process non-trivial numbers that is non-trivial to do for NCB languages because the surface realization of the numbers depend on the noun class of the noun that is being counted. At present, no system out there can do all of that. A condensed version of the analysis was recently accepted as a paper entitled Natural Language Generation Requirements for Social Robots in Sub-Saharan Africa [1], for the IST-Africa’21 conference, and it will be presented there next week at the virtual event, in the ‘next generation computing’ session no less, on Wednesday the 12th of May.

Screen grab of the recording of the conference presentation (link to recording available upon request)

Probably none of you has ever heard of this conference. IST-Africa is yearly IT conference in Africa that aims to foster North-South and South-South  networking, promote the academia->industry and academia->policy bridge-creation and knowledge transfer pipelines, and capacity building for paper writing and presentation. The topics covered are distinctly of regional relevance and, according to its call for papers, the “Technical, Policy, Social Implications Papers must present analysis of early/final Research or Implementation Project Results, or business, government, or societal sector Case Study”.

Why should I even bother with an event like that? It’s good to sometimes reflect on the context and ponder about relevance of one’s research—after all, part of the university’s income (and thus my salary) and a large part of the research project funding I have received so far comes ultimately from the taxpayers. South African tax payers, to be more precise; not the taxpayers of the Global North. I can ‘advertise’, ahem, my research area and its progress to a regional audience. Also, I don’t expect that the average scientist in the Global North would care about HRI in Africa and even less so for NCB languages, but the analysis needed to be done and papers equate brownie points. Also, if everyone thinks to better not participate in something locally or regionally, it won’t ever become a vibrant network of research, applied research, and technology. I’ve attended the event once, in 2018 when we had a paper on error correction for isiZulu spellcheckers, and from my researcher viewpoint, it was useful for networking and ‘shopping’ for interesting problems that I may be able to solve, based on other participants’ case studies and inquiries.

Time will tell whether attending that event then and now this paper and online attendance will be time wasted or well spent. Unlike the papers on the isiZulu spellcheckers that reported research and concrete results that a tech company easily could take up (feel free to do so), this is a ‘fluffy’ paper, but exploring the use of robots in Africa was an interesting activity to do, I learned a few things along the way, it will save other interested people time in the analysis phase, and hopefully it also will generate some interest and discussion about what sort of robots we’d want and what they could or should be doing to assist, rather than replace, humans.

p.s.: if you still were to think that there are no robots in Africa and deem all this to be irrelevant: besides robots in the automotive and mining industries by, e.g., Robotic Innovations and Robotic Handling Systems, there are robots in education (also in Cape Town, by RD-9), robot butlers in hotels that serve quarantined people with mild COVID-19 in Johannesburg, they’re used for COVID-19 screening in Rwanda, and the Naledi personal banking app by Botlhale, to name but a few examples. Other tools are moving in that direction, such as, among others, Awezamed’s use of speech synthesis with (canned) text in isiZulu, isiXhosa and Afrikaans and there’s of course my research group where we look into knowledge-to-text text generation in African languages.

References

[1] Keet, C.M. Natural Language Generation Requirements for Social Robots in Sub-Saharan Africa. IST-Africa 2021, 10-14 May 2021, online. in print.

A set of competency questions and SPARQL-OWL queries, with analysis

As a good beginning of the new year, our Data in Brief article Dataset of Ontology Competency Questions to SPARQL-OWL Queries Translations [1] was accepted and came online this week, which accompanies our Journal of Web Semantics article Analysis of Ontology Competency Questions and their Formalisations in SPARQL-OWL [2] that was published in December 2019—with ‘our’ referring to my collaborators in Poznan, Dawid Wisniewski, Jedrzej Potoniec, and Agnieszka Lawrynowicz, and myself. The former article provides extensive detail of a dataset we created that was subsequently used for analysis that provided new insights that is described in the latter article.

The dataset

In short, we tried to find existing good TBox-level competency questions (CQs) for available ontologies and manually formulate (i.e., formalise the CQ in) SPARQL-OWL queries for each of the CQs over said ontologies. We ended up with 234 CQs for 5 ontologies, with 131 accompanying SPARQL-OWL queries. This constitutes the first gold standard pipeline for verifying an ontology’s requirements and it presents the systematic analyses of what is translatable from the CQs and what not, and when not, why not. This may assist in further research and tool development on CQs, automating CQ verification, assessing the main query language constructs and therewith language optimisation, among others. The dataset itself is indeed independently reusable for other experiments, and has been reused already [3].

The key insights

The first analysis we conducted on it, reported in [2], revealed several insights. First, a larger set of CQs (cf. earlier work) indeed did increase the number of CQ patterns. There are recurring patterns in the shape of the CQs, when analysed linguistically; a popular one is What EC1 PC1 EC2? obtained from CQs like “What data are collected for the trail making test?” (a Dem@care CQ). Observe that, yes, indeed, we did decouple the language layer from the formalisation layer rather than mixing the two; hence, the ECs (resp. PCs) are not necessarily classes (resp. object properties) in an ontology. The SPARQL-OWL queries were also analysed at to what is really used of that query language, and used most often (see table 7 of the paper).

Second, these characteristics are not the same across CQ sets by different authors of different ontologies in different subject domains, although some patterns do recur and are thus somehow ‘popular’ regardless. Third, the relation CQ (pattern or not) : SPARQL-OWL query (or its signature) is m:n, not 1:1. That is, a CQ may have multiple SPARQL-OWL queries or signatures, and a SPARQL-OWL query or signature may be put into a natural language question (CQ) in different ways. The latter sucks for any aim of automated verification, but unfortunately, there doesn’t seem to be an easy way around that: 1) there are different ways to say the same thing, and 2) the same knowledge can be represented in different ways and therewith leading to a different shape of the query. Some possible ways to mitigate either is being looked into, like specifying a CQ controlled natural language [3] and modelling styles [4] so that one might be able to generate an algorithm to find and link or swap or choose one of them [5,6], but all that is still in the preliminary stages.

Meanwhile, there is that freely available dataset and the in-depth rigorous analysis, so that, hopefully, a solution may be found sooner rather than later.

 

References

[1] Potoniec, J., Wisniewski, D., Lawrynowicz, A., Keet, C.M. Dataset of Ontology Competency Questions to SPARQL-OWL Queries Translations. Data in Brief, 2020, in press.

[2] Wisniewski, D., Potoniec, J., Lawrynowicz, A., Keet, C.M. Analysis of Ontology Competency Questions and their Formalisations in SPARQL-OWL. Journal of Web Semantics, 2019, 59:100534.

[3] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS vol 1057, 3-15.

[4] Fillottrani, P.R., Keet, C.M. Dimensions Affecting Representation Styles in Ontologies. 1st Iberoamerican conference on Knowledge Graphs and Semantic Web (KGSWC’19). Springer CCIS vol 1029, 186-200. 24-28 June 2019, Villa Clara, Cuba. Paper at Springer

[5] Fillottrani, P.R., Keet, C.M. Patterns for Heterogeneous TBox Mappings to Bridge Different Modelling Decisions. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS vol 10249, 371-386. Portoroz, Slovenia, May 28 – June 2, 2017.

[6] Khan, Z.C., Keet, C.M. Automatically changing modules in modular ontology development and management. Annual Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT’17). ACM Proceedings, 19:1-19:10. Thaba Nchu, South Africa. September 26-28, 2017.

Localising Protégé with Manchester syntax into your language of choice

Some people like a quasi natural language interface in ontology development tools, which is why Manchester Syntax was proposed [1]. A downside is that it locks the ontology developer into English, so that weird chimaeras are generated in the interface if the author prefers another language for the ontology, such as, e.g., the “jirafa come only (oja or ramita)” mentioned in an earlier post and that was deemed unpleasant in an experiment a while ago [2]. Those who prefer the quasi natural language components will have to resort to localising Manchester syntax and the tool’s interface.

This is precisely what two of my former students—Adam Kaliski and Casey O’Donnell—did during their mini-project in the ontology engineering course of 2017. A localisation in Afrikaans, as the case turned out to be. To make this publicly available, Michael Harrison brushed up the code a bit and tested it worked also in the new version of Protégé. It turned out it wasn’t that easy to localise it to another language the way it was done, so one of my PhD students, Toky Raboanary, redesigned the whole thing. This was then tested with Spanish, and found to be working. The remainder of the post describes informally some main aspects of it. If you don’t want to read all that but want to play with it right away: here are the jar files, open source code, and localisation instructions for if you want to create, say, a French or Dutch variant.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

The localisation functions as a plugin for Protégé as a ‘view’ component. It can be selected under “Windows – Views – Class views” and then Beskrywing for the Afrikaans and Descripción for Spanish, and dragged into the desired position; this is likewise for object properties.

Instead of burying the translations in the code, they are specified in a separate XML file, whose content is fetched during the rendering. Adding a new ‘simple’ (more about that later) language merely amounts to adding a new XML file with the translations of the Protégé labels and of the relevant Manchester syntax. Here are the ‘simple’ translations—i.e., where both are fixed strings—for Afrikaans for the relevant tool interface components:

 

Class Description

(Label)

Klasbeskrywing

(Label in Afrikaans)

Equivalent To Dieselfde as
SubClass Of Subklas van
General Class axioms Algemene Klasaksiomas
SubClass Of (Anonymous Ancestor) Subklas van (Naamlose Voorvader)
Disjoint With Disjunkte van
Disjoint Union Of Disjunkte Unie van

 

The second set of translations is for the Manchester syntax, so as to render that also in the target language. The relevant mappings for Afrikaans class description keywords are listed in the table below, which contain the final choices made by the students who developed the original plugin. For instance, min and max could have been rendered as minimum and maksimum, but the ten minste and by die meeste were deemed more readable despite being multi-word strings. Another interesting bit in the translation is negation, where there has to be a second ‘no’ since Afrikaans has double negation in this construction, so that it renders it as nie <expression> nie. That final rendering is not grammatically perfect, but (hopefully) sufficiently clear:

An attempt at double negation with a fixed string

An attempt at double negation with a fixed string

Manchester OWL Keyword Afrikaans Manchester OWL

Keyword or phrase

some sommige
only slegs
min ten minste
max by die meeste
exactly precies
and en
or of
not nie <expression> nie
SubClassOf SubklasVan
EquivalentTo DieselfdeAs
DisjointWith DisjunkteVan

 

The people involved in the translations for the object properties view for Afrikaans are Toky, my colleague Tommie Meyer (also at UCT), and myself; snyding for ‘intersection’ sounds somewhat odd to me, but the real tough one to translate was ‘SuperProperty’. Of the four options that were considered—SuperEienskap, SuperVerwantskap, SuperRelasie, and SuperVerband SuperVerwantskap was chosen with Tommie having had the final vote, which is also a semantic translation, not a literal translation.

Screenshot of the object properties description, with comparison to the English

The Spanish version also has multi-word strings, but at least does not do double negation. On the other hand, it has accents. To generate the Spanish version, myself, my collaborator Pablo Fillottrani from the Universidad Nacional del Sur, Argentina, and Toky had a go at it in translating the terms. This was then implemented with the XML file. In case you do not want to dig into the XML file and not install the plugin either, but have a quick look at the translations, they are as follows for the class description view:

 

Class Description

Label

Descripción

(in Spanish)

Equivalent To Equivalente a
SubClass Of Subclase de
General Class axioms Axiomas generales de clase
SubClass Of (Anonymous Ancestor) Subclase de (Ancestro Anónimo)
Disjoint With Disjunto con
Disjoint Union Of Unión Disjunta de
Instances Instancias

 

Manchester OWL Keyword Spanish Manchester OWL Keyword
some al menos uno
only sólo
min al mínimo
max al máximo
and y
or o
not no
exactly exactamente
SubClassOf SubclaseDe
EquivalentTo EquivalenteA
DisjointWith DisjuntoCon

And here’s a rendering of a real ontology, for geo linked data in Spanish, rather than African wildlife yet again:

screenshot of the plugin behaviour with someone else’s ontology in Spanish

One final comment remains, which has to do with the ‘simple’ mentioned above. The approach of localisation presented here works only with fixed strings, i.e., the strings do not have to change depending on the context where it is uses. It won’t work with, say, isiZulu—a highly agglutinating and inflectional language—because isiZulu doesn’t have fixed strings for the Manchester syntax keywords nor for some other labels. For instance, ‘at least one’ has seven variants for nouns in the singular, depending on the noun class of the noun of the OWL class it quantifies over; e.g., elilodwa for ‘at least one’ apple, and esisodwa for ‘at least one’ twig. Also, the conjugation of the verb for the object property depends on the noun class of the noun of the OWL class, but in this case for the one that plays the subject; e.g., it’s “eats” in English for both humans and elephants eating, say, fruit, so one string for the name of the object property, but that’s udla and idla, respectively, in isiZulu. This requires annotations of the classes with ontolex-lemon or a similar approach and a set of rules (which we have, btw) to determine what to do in which case, which requires on-the-fly modifications to Manchester syntax keywords and elements’ names or labels. And then there’s still phonological conditioning to account for. It surely can be done, but it is not as doable as with the ‘simple’ languages that have at least a disjunctive orthography and much less genders or noun classes for the nouns.

In closing, while there’s indeed more to translate in the Protégé interface in order to fully localise it, hopefully this already helps as-is either for reading at least a whole axiom in one’s language or as stepping stone to extend it further for the other terms in the Manchester syntax and the interface. Feel free to extend our open source code.

 

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. 8th Extended Semantic Web Conference (ESWC’11), G. Antoniou et al (Eds.), Heraklion, Crete, Greece, 29 May-2 June, 2011. Springer LNCS 6643, 321-335.

[3] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157. accepted version