Systematic design of conceptual modelling languages

What would your ideal modelling language look like if you were to design one yourself? How would you go about defining your own language? The act of creating your own pet language can be viewed as a design process. Processes can be structured. It wasn’t the first thing we wanted to address when my collaborator Pablo Fillottrani and I were trying to design evidence-based conceptual data modelling languages. Yet. All those other conceptual modelling languages out there did not sprout from a tree; people designed them, albeit most often not always in a systematic way. We wanted to design ours in a systematic, repeatable, and justified way.

More broadly, modelling is growing up as a field of specialisation, and is even claimed by some to be deserving to be its own discipline [CabotVallecillo22]. Surely someone must have thought of this notion of language design processes before? To a limited extent, yes. There are a few logicians who have thought about procedures and have used a procedure or part thereof. Two notable examples are OWL and DOL, which both went through a requirements specification phase, goals were formulated, and the language was designed. OWL was also assessed on usage and a ‘lessons learned’ was extracted from it to add one round of improvements, which resulted in OWL 2.

But what would a systematic procedure look like? Ulrich Frank devised a waterfall methodology for domain-specific languages [Frank13], which are a bit different from conceptual data modelling languages. Pablo and I modified that to make it work for designing ontology languages. Its details, and focussing on the additional ‘ontological analysis’ step, is described in our FOIS2020 paper [FillottraniKeet20] and I wrote a blogpost about that before. It also includes the option to iterate over the steps, there are optional steps, and there is that ontological analysis where deciding on certain elements entail philosophical choices for one theory or another. We tweaked it further so that it also would work for conceptual data modelling language design, which was published in a journal article on the design of a set of evidence-based conceptual data modelling languages [FillottraniKeet21] in late 2021, but that I hadn’t gotten around to writing a blog post about yet. Let me summarise the steps visually in the figure below.

Overview of a procedure for conceptual modelling and ontology language design (coloured in from [FillottraniKeet21])

For marketing purposes, I probably should come up with a easily pronounceable name for the proposed procedure, like MeCModeL (Methodology for the Creation of Modelling Languages) or something, We’re open to suggestions. Be that as it may, let’s briefly summarise each step in the remainder of this post.

Step 1. Clarification of scope and purpose

We first need to clarify the scope, purpose, expected benefits, and possible long-term perspective, and consider the feasibility given the resources available. For instance, if you were to want to design a new conceptual data modelling language tailored to temporal model-based data access, and surpass UML class diagrams, it’s unlikely going to work. For one, the Object Management Group has more resources both in the short and in the long term to promote and sustain UML. Second, reasoning over temporal constraints is computationally expensive so it won’t scale to access large amounts of data. We’re halted in our tracks already. Let’s try this again. What about a new temporal UML that has a logic-based reconstruction for precision? Its purpose is to model more of the subject domain more precisely. The expected benefits would be better quality models, because more precise, and thus better quality applications. A long-term perspective does not apply, as it’s just a use case scenario here. Regarding feasibility, let’s assume we do have the competencies, people, and funding to develop the language and tool, and to carry out the evaluation.

Step 2. Analysis of general requirements

The “analysis and general requirements” step can be divided into three parallel or sequential tasks: determining the requirements for modelling (and possibly the associated automated reasoning over the models), devising use case scenarios, and assigning priorities to each. An example of a requirement is the ability to represent change in the data and to keep track of it, such as the successive stages in signing computational legal contracts. Devising a list of requirements out of the blue is nontrivial, but there are a few libraries of possible requirements out there that can help with picking and choosing. For conceptual modelling languages, there is no such library yet, however, but we created a preliminary library of features for ontology languages that may be of use.

Use cases can vary widely, depending on the scope, purpose, and requirements of the language aimed for. For requirements, use cases can be described as the kind of things you want to be able to represent in the prospective language. For instance, that employee Jane as Product Manager
may change her job in the company to Area Manager or that she’s repeatedly assigned on a project for a specified duration. The former is an example of object migration and the latter of a ternary relationship or a binary with an attribute. An end user stakeholder bringing up these examples may not know that, but as language designer, one would need to recognise the language feature(s) needed for it. Another type of use case may be about how a modeller would interact with the language and the prospective modelling tool.

Step 3. Analysis of specific requirements and ontological analysis

Here’s were the ontological commitments are made, even if you don’t want to or think you don’t do so. Even before looking at the temporal aspects, the fact that we committed to UML class diagrams already entails we committed to, among others, the so-called positionalist commitment of relations and a class-based approach (cf. first order predicate logic, where there are just ordered relations of arity >=1), and we adhere to the most common take on representing temporality, where there are 3-dimensional objects and a separate temporal dimension is added whenever the entity needs it (the other option being 4-dimensionalism). Different views affect how time is included in the language. With the ‘add time to a-temporal’ choice, there are still more decisions to take, like whether time is linear and whether it consists of adjacent successive timepoints (chronons) or that another point can always be squeezed in-between (dense time). Ontological differences they really are, even if you chose ‘intuitively’ hitherto. There are more such ontological decisions, besides these obvious ones on time and relations, which are described in our FOIS2020 paper. In all but one paper about languages, such choices were left implicit and time will tell whether it’ll be picked up for the design of new languages.

The other sub-step of step 3 has been very much to the fore if logic plays a role in the language design. Which elements are going to be in the language, how are they going to look like, how scalable does it have to be, and should it extend existing infrastructure or be something entirely separate from it? For our temporal UML, the answers may be that the atemporal elements are those from UML class diagrams, all the temporal stuff with their icons shall be carried over from the TREND conceptual data modelling language [KeetBerman17], and the underlying logic, DLRus, is not even remotely close to being scalable so there is no existing tool infrastructure. Of course, someone else may make other decisions here.

Step 4. Language specification

Now we’re finally getting down to what from the outside may seem to be the only task: defining the language. There are two key ways of doing it, being either to define the syntax and the semantics or to make a metamodel for your language. The syntax can be informal-ish, like listing the permissible graphical elements and then a BNF grammar for how they can be used. This we can do also for logics more precisely, like that UML’s arrow for class subsumption is a ⇒ in our logic-based reconstruction rather than a →, as you wish. Once the syntax is settled, we need to give it meaning, or: define the semantics of the language. For instance, that a rectangle means that it’s a class that can have instances and a line between classes denotes a relationship. Or that that fancy arrow means that if C ⇒ D, then all instances of C are also instances of D in all possible worlds (that in the interpretation of C ⇒ D we have that CI ⊂ DI). Since logic is not everyone’s preference, metamodelling to define the language may be a way out; sometimes a language can be defined in its own language, sometimes not (e.g., ORM can be [Halpin04]). For our temporal UML example, we can use the conversions from EER to UML class diagrams (see, e.g., our FaCIL paper with the framework, implementation and the theory it uses), and then also reuse the extant logic-based reconstruction in the DLRus Description Logic.

Once all that has been sorted, there’s still the glossary and documentation to write so that potential users and tool developers can figure out what you did. There’s neither a minimum nor a maximum page limit for it. The UML standard is over 700 pages long, DOL is 209 pages, and the CL Standard is 70 pages. Others hide their length by rendering it as a web page and toggle figures and examples; the OWL 2 functional style syntax in A4-sized MS Word amounts to 118 pages in 12-point Times New Roman font, whereas the original syntax and semantics of the underlying logic SROIQ [HorrocksEtAl06], including the key algorithms, is just 11 pages or about 20 reformatted in 12-point single-column A4. And it may need to be revised due to potential infelicities in steps 5-7. For our temporal UML, there will be quite a number of pages.

Step 5. Design of notation for modeller

It may be argued that designing the notation is part of the language specification, but, practically, different stakeholders want different things out of it, especially if your language is more like a programming language or a logic rather than diagrammatic. Depending on your intended audience, graphical or textual notations may be preferred. You’ll need to tweak that additional notation and evaluate it with a representative selection of prospective users on whether the models are easy to understand and to create. To the best of my knowledge, that never happened at the bedrock of any of the popular logics, be it first order predicate logic, Description Logics, or OWL, which may well be a reason why there are so many research papers on providing nicer renderings of them, sugar-coating it either diagrammatically, with a controlled natural language, or a different syntax. OWL 2 has 5 different official syntaxes, even. For our hypothetical temporal UML: since we’re transferring TREND, we may as well do so for the graphical notation and the controlled natural language for it.

Step 6. Development of modelling tool

Create a computer-processable format of it, i.e., a serialisation, which assumes 1) you want to have it implemented and a modelling tool for it and 2) it wasn’t already serialised in step 4. If you don’t want an implementation, this step can be skipped. Creating such a serialisation format, however, will help getting it adopted more widely than yourself (although it’s by no means a guarantee that it will). There are also other reasons why you may want to create a computer processable version for the new language, such as sending it to an automated reasoner or automatically checking that a model adheres to the language specifications and to highlight syntax errors, or any other application scenario. Our fictitious temporal UML doesn’t have a computer-processable format and neither does TREND to copy it from, but we ought to because we do want a tool for both.

Step 7. Evaluation and refinement

Evaluation involves defining and executing test cases to validate and verify them on the language. Remember those use cases from step 2 and the ontological requirements of step 3? They count as test cases: can that be modelled in the new language and does it have the selected features? If so, good; if not, you better have a good reason for why not. If you don’t, then you’ll need to return to step 4 to improve the language. For our temporal UML, we’re all sorted, as both the object and relation migration constraints can be represented, as well as ternaries.

Let’s optimistically assume it all went well with your design, and your language passes all those tests. The last task, at least for the first round, is to analyse the effect of usage in practice. Do users use it in the way intended? Are they under-using some language features and discovering they want another, now that they’re deploying it? Are there unexpected user groups with additional requirements that may be strategically savvy to satisfy? If the answers are a resounding ‘no’ to the second and third question in particular, you may rest on your laurels. If the answer is ‘yes’, you may need to cycle through the procedure again to incorporate updates and meet moving goalposts. There’s no shame in that. UML’s version 1.0 was released in 1997 and then came 1.1, 1.3, 1.4, 1.5, 2.0, 2.1, 2.1.1, 2.1.2, 2.2, 2.3, 2.4.1, 2.5, and 2.5.1. The UML 2.6 Revision Task Force faces an issue tracker of around 800 issues, five years after the 2.5.1 official release. They are not all issues with the UML class diagram language, but it does indicate things change. OWL had a first version in 2004 and then a revised one in 2008. ER evolved into EER; ORM into ORM2.

Regardless of whether your pet language is used by anyone other than yourself, it’s fun designing one, even if only because then you don’t have to abide by other people’s decisions on what features modelling language should have and if it turns out the same as an existing one, you’ll have a better understanding of why that is the way it is. What the procedure does not include, but may help marketing your pet language, is how to name it. UML, ER, and ORM are not the liveliest acronyms and not easy to pronounce. Compare that to Mind Maps, which is a fine alliteration at least. OWL, for the web ontology language, is easy to pronounce and it is nifty in that owl-as-animal is associated with knowledge, and OWL is a knowledge representation language, albeit that this explanation is a tad bit long for explaining a name. Some of the temporal ER languages have good names too, like TimER and TREND. With this last naming consideration, we have pushed it as far as possible in the current language development process.

In closing

The overall process is, perhaps, not an exciting one, but it will get the job done and you’ll be able to justify what you did and why. Such an explanation beats an ‘I just liked it this way’. It also may keep language scope creep in check, or at least help to become cognizant about it, and you may have the answer ready to a user asking for a feature.

Our evidence-based conceptual data modelling languages introduced in [FillottraniKeet21] have clear design rationales and evidence to back it up. We initially didn’t like them much ourselves, for they are lean languages rather than the very expressive ones that we’d hoped for when we started out with the investigation, but they do have their advantages, such as run-time usage in applications including ontology-based data access, automated verification, query compilation, and, last but not least, seamless interoperability among EER, UML class diagrams and ORM2 [BraunEtAl23].

References

[BraunEtAl23] Braun, G., Fillottrani, P.R., Keet, C.M. A Framework for Interoperability Between Models with Hybrid Tools, Journal of Intelligent Information Systems, (in print since July 2022).

[CabotVallecillo22] Cabot, Jordi and Vallecillo, Antonio. Modeling should be an independent scientific discipline. Software and Systems Modeling, 2022, 22:2101–2107.

[Frank13] Frank, Ulrich. Domain-specific modeling languages – requirements analysis anddesign guidelines. In Reinhartz-Berger, I.; Sturm, A.; Clark, T.; Bettin, J., and Cohen, S., editors, Domain Engineering: Product Lines, Conceptual Models, and Languages, pages 133–157. Springer, 2013

[Halpin04] Halpin, T. A. Advanced Topics in Database Research, volume 3, chapter Comparing Metamodels for ER, ORM and UML Data Models, pages 23–44. Idea Publishing Group, Hershey PA, USA, 2004.

[HorrocksEtAl06] Horrocks, Ian, Kutz, Oliver, and Sattler, Ulrike. The even more irresistible SROIQ. Proceedings of KR-2006, AAAI, pages 457–467, 2006.

[FillottraniKeet20] Fillottrani, P.R., Keet, C.M.. An analysis of commitments in ontology language design. 11th International Conference on Formal Ontology in Information Systems 2020 (FOIS’20). Brodaric, B and Neuhaus, F. (Eds.). IOS Press, FAIA vol. 330, 46-60.

[FillottraniKeet21] Fillottrani, P.R., Keet, C.M. Evidence-based lean conceptual data modelling languages. Journal of Computer Science and Technology, 2021, 21(2): 93-111.

[KeetBerman17] Keet, C.M., Berman, S. Determining the preferred representation of temporal constraints in conceptual models. 36th International Conference on Conceptual Modeling (ER’17). Mayr, H.C., Guizzardi, G., Ma, H. Pastor. O. (Eds.). Springer LNCS vol. 10650, 437-450. 6-9 Nov 2017, Valencia, Spain.

Advertisement

Some reflections on designing Abstract Wikipedia so far

Abstract Wikipedia aims to at least augment the current, if not be the next-generation, Wikipedia. Besides the human-authored articles that take their time to write and maintain, you could scale up article generation through automation and to do so for many more languages. And keep all that content up-to-date. And all that reliably without hallucinations where algorithms make stuff up. How? Represent the data and information in a structured format, such as in an RDF triple store, JSON, or even a relational database or OWL, and generate text from suitably selected structured content. Put differently: multilingual natural language generation, at scale, and community-controlled. For the Abstract Wikipedia setting, the content would come from Wikidata and the code to compute it from Wikifunctions. Progress in creating the system isn’t going as fast as hoped for and a few Google.org fellows wrote an initial critique of the plans and progress made, to which the Abstract Wikipedia team at WMF wrote a comprehensive reply. It was also commented on in a Signpost technology report, and a condensed non-technical summary has appeared in an Abstract Wikipedia updates letter. The question remains: is it feasible? If so, what is the best way to go about doing it; if not, why not and then what?

A ‘pretty picture’ of a prospective Abstract Wikipedia architecture, at a very high level. Challenges lie in what’s going to be in that shiny yellow box in the centre and how that process should unfold, in the lexicographic data in Wikidata, and where the actual text generation will happen and how.

My name appears in some of those documents, as I’ve been volunteering in the NLG stream of the Abstract Wikipedia Project in an overlapping timeframe and I contributed to the template language, to the progress on the constructors (here and here), and to adding isiZulu lexemes to Wikidata, among others. The mentions are mostly in the context of challenges with Niger Congo B (AKA ‘Bantu’) languages that are spoken across most of Sub-Saharan Africa. Are these languages really so special that they deserve a specific mention over all others? Yes and No. A “No” may apply since there are many languages spoken in the world by many people that have ‘unexpected’ or ‘peculiar’ or ‘unique’ or ‘difficult to computationally process’ features or are in the same boat when it comes to their low-resource status and the challenges that entails. NCB languages, such as isiZulu that I focus on mainly, are among just one family of languages. If I were to have moved to St. Lawrence Island in the Bering Street, say, I could have given similar pushback, with the difference that there are many, many more millions of people speaking NCB languages than Yupik. Neither language is in the Indo-European language family. Language families exist for a reason; they have features really unlike others. That’s where the “Yes” answer comes in. The ‘yes’, together with the low-resourcedness, challenges consist of four dimensions: theoretical, technical, people, and praxis. Let me briefly illustrate each in turn./

Theory – linguistic and computational

The theoretical challenges are mainly about the language and linguistics, on the characteristic features they have and how much we know of it, affecting technical aspects down the road. For instance, we know that the noun class system is emblematic of NCB languages. To a novice or an outsider, it smells of the M/F/N gender of nouns like in French, Spanish, or German, but then a few more of them. It isn’t quite like that in the details for the 11-23 noun classes in an NCB language and squeezing that into Wikidata is non-trivial, since here and there an n-ary relation is more appropriate for some aspects than approximating that by reifying binaries partially. The noun class of the noun governs a concordial agreement system goes across a sentence rather than only its adjacent word; e.g., not only an adjective agreeing with the gender of a noun like in Romance languages (e.g., an abuala vieja and abuelo viejo in Spanish) for each noun class, but also conjugation of the verb by noun class and other aspects such as quantification over a noun (e.g., bonke abantu ‘all humans’ and zonke izinja ‘all dogs’). We know some of the rules, but not all of them and only for some of the NCB languages. When I commenced with natural language generation for isiZulu in earnest in 2014, it wasn’t even clear how to pluralise nouns roughly, let alone exactly. We now know how roughly to pluralise nouns automatically. Figuring out the isiZulu verb present tense got us a paper as recent as 2017; the Context-Free Grammar we defined for it is not perfect yet, but it’s an improvement on the state of the art and we can use it in certain controlled settings of natural language generation.

My collaborator and I like such a phrase structure grammar. There are several types of grammars, however, and it’s anyone’s guess whether any of them is expressive and convenient enough to capture grammars of the NCB languages. The alternative family is dependency grammars with its subtypes and variants. To the best of my knowledge, nothing has been done with such grammars and any of the NCB languages. What I can assure you from ample experience, is that it is infeasible for people working on low- or medium-resourced languages to start writing up grammars for every pet preference of grammar flavour of the day that rotating volunteers have.

IsiZulu and Kiswahili are probably the least low-resourced languages of the NCB language family, and yet there’s no abundance of grammar specifications. It’s not that it can’t be done at least in part; it’s just that most material, if available at all, is outdated and never tested on more than a handful of words or sentences, and thus is not off-the-shelf computationally reliable at present. And there are limited resources available to verify. This is also the case for many other low-resourced languages. For Abstract Wikipedia to achieve its inclusivity aim, the system must have a way to deal with incremental development of grammar specifications without large upfront investments. One shouldn’t want to kill a mosquito with a sledgehammer by first having to scramble together the material and build a sledgehammer, because there are no instant resources to create that sledgehammer. Let’s start with something feasible in the near term, to build just enough equipment to get done what’s needed. Rolling up a newspaper page will do just fine to kill that mosquito. For instance, don’t demand that the grammar spec must be able to cover, say, all numbers in all possible constructions, but only for one specific construction in a specific context. Say, for stating the age of a person provided they’re less than 100 years old, or the numbers related to years, not centuries or millennia that will be tackled later. Templates are good for specifying such constrained contexts of use and they assist with incremental grammar development and can offer near-instant concrete user feedback of positive contributions showing results.

Supporting a template-based approach doesn’t mean that I don’t understand that the sledgehammer may be better in theory – an all-purpose CFG or DG would be wonderful. It’s that I know enough of the facts on the ground that I’m aware rolling up a newspaper page suffices for a case and is feasible, unlike the sledgehammer. Let low-resource languages join the party. Devise a novel framework, method, and system that permits incremental development and graceful degradation in the realiser. A nice-to-have on top of that would be automated ‘transformers’ across types of grammars so we won’t have to start all over again when the next grammar formalism flavour is trumped, if it must change at all.

Technical challenges

The theory relates to the second one group of challenges, which are of a technical nature. There are lovely systems and frameworks, who overconfidently claim to be ‘universal’. Grammars coded up for 40, nay a 100, languages, so it must be good and universal, or so the assumption may go. We do want to reuse as much as possible—being resource-constrained and all—but then it never turns out to work off-the-shelf. From word-based spellcheckers like in OpenOffice that are useless for agglutinating languages to the Universal Dependencies (UD) framework and accompanying tools that miss useful dependencies and are too coarse-grained at the word-level and, up till very recently, was artificially constrained to trees rather than DAGs, up to word-based natural language generation realisers: we have (had) to start from scratch mostly and devise new approaches.

So now we have a template language for Abstract Wikipedia (yay!) that can handle the sub-words (yay!), but then we get whacked like a mole on needing a fully functional Dependency Grammar (and initially UD and trees only) for parsing the template, which we don’t have. The UD framework has to be modified to work for NCB languages – none of those 100 is an NCB language – to allow arcs to be drawn on sub-word fragments, or if on the much less useful words only, then allowing for more than one incoming arc. It also means we first have to adapt UD annotation tools to get clarity on the matter. And off we must go to do all that before we can sit at that table again? We’ll do a bit, enough for our own systems of what we need for the use cases.

Sadly, Grammatical Framework is worse, despite there already being a draft partial resource grammar for isiZulu and even though it’s a framework of the CFG flavour of grammars. Unlike for UD, where reading an overview article suffices to get started, that won’t do for GF; a two-week summer school you must attend and the book to read to get anything possibly started. The start-up costs are too high for the vast majority of languages. And remember that the prospective system should be community-driven rather than be an experts-only affair that GF is at present. Even if that route is taken, then the grammar is locked into the GF system, inaccessible for any reuse elsewhere, which is not a good incentive when potential for reuse is important. (UPDATE 20-4-’23: It turned out that GF does have a pg = print_grammar command that exports to a selected other format, as described halfway into in the GF shell documentation here.)

The Google.org fellows’ review proposed to adopt an extant NLG system and build on it, including possibly GF: if we could have done it, we would have done so and I wouldn’t have received any funding for investigating an NLG system for Nguni languages. A long answer on why we couldn’t can be found in Zola Mahlaza’s PhD thesis on foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu and shorter answers regarding parts of the problems are described in papers emanating from my GeNi and MoreNL research projects. More can be done still to create a better realiser.

The other dimension of technical aspects is the WMF software ecosystem as it stands at present. For a proof-of-concept to demonstrate the Abstract Wikipedia project’s potential, I don’t care whether that’s with Wikifunctions, with Scribunto, or a third-party system that can be (near-instantly) copied over onto Wikifunctions once it works as envisioned. Wikidata will need to beefed up, on speed in SPARQL query answering, on reducing noise in its content, and on the lexemes to cater for highly inflectional and agglutinating languages. It’s not realistic to make the community add all forms of the words, since there are too many and the interface requires too much clicking around and re-typing when entering lexicographic data manually. Either allow for external precomputation, a human-in-the-loop, and then a batch upload, or assume base forms and link it to a set of rules stored somewhere in order to compute the required form at runtime.

People and society

The third aspect, people, consists of two components: NCB language speakers with their schedules and incentives and, for the lack of a better term, colonial peculiarities or sexism, or both. Gender bias issues in content and culture on Wikipedia are amply investigated and documented. Within the context of Abstract Wikipedia, providing examples that are too detailed is awkward to do publicly and anyhow the plural of anecdote is not data. What I experienced were mostly instances of recurring situations. Therefore, let me generalise some of it and formulate it partially as a reply and way forward, in arbitrary order.

First, the “I don’t know any isiZulu, but…” phrases: factless opinions about the language shouldn’t be deemed more valuable and worthy and perceived valid just because one works with a well-resourced language in another language family and is more pushy or abrasive. The research we carried out over the past many years really happened and was published in reputable venues. It may be tempting to (over)generalise for other languages once one speaks several languages, but it’s better to be safe than sorry.

Second, let me remind you that Wikis are intended to be edited by the community – and that includes me. I might just continue expressing my indignation at the repeated condescending comments that I couldn’t be allowed to do so because some European white guy’s edits are unquestionably naturally superior. As it turned out, it were questionable attitudes from certain people within the broader Abstract Wikipedia team, not the nice white guy who had been merely exploring adding a certain piece of non-trivial information. I went ahead and edited it eventually anyway, but it does make me wonder how often people from outside the typical Wiki contributor demographic are actively discouraged from adding content for made-up reasons.

Third, languages evolve and research does happen. The English from a 100 years ago is not the same as it is spoken and written today and that’s the same for most other languages, including low-resourced languages. They’re not frozen in time just because there are fewer computational resources or they’re too far away to see their changes. Societies change and the languages change with them. No doubt the missionary did his best documenting a language 50-150 years ago, but just because it’s written in a book and he wrote it doesn’t mean that my or my colleagues’ recent published research that included an evaluation with a set of words or sentences would be less valid just because it’s our work and we’re not missionaries (or whatever other reason one invents why long gone missionaries’ work takes precedence over anyone else’s contributions).

Fourth, if an existing framework for Indo-European languages doesn’t work for NCB languages, it doesn’t imply we’re all too stupid to grasp that framework. We may not know, but it’s likely that we do and the framework is too limited for the language (see also above) or it’s too impractical for the lived reality of working with a low-resourced language. Regarding the latter, a [stop whining and] “become more active and just get yourself more resources” isn’t a helpful response, nor is not announcing open calls for WMF-funded projects.

As to human contributions to any component of Abstract Wikipedia and any wiki project more generally, it’s complex and deserves more unpacking. On incentives to contribute, perceptions of Wikipedia, sociolinguistics, and the good plans we have that are derailed by things that people in affluent countries wouldn’t think of that could interfere, and there’s Moses and the mountain.

Practical hurdles

Last, there are practical hurdles that an internationally dominant or darling language does not have to put up with. An example is the unbelievable process of getting a language accepted by the WMF ecosystem as deserving to be one. I’m not muttering about being shoved aside for trying to promote an endangered language that doesn’t have an ISO-639 3-letter code and has only a mere handful of speakers left, but even an ISO-639 2-letter code language with millions of speakers faces hurdles. Evidence has to be provided. Yes, there are millions of speakers, here’s the census data; Yes, there are daily news items on national TV, and look here the discussions on Facebook; Yes, there are online newspapers with daily updates. It takes weeks if not months, if ever. These are exclusionary practices. We should not have to waste limited time on countering the anti-nondominant-language ‘trolling’ – having to put in extra effort to pass an invisible and arbitrary bar – but, as a minimum, have each already ISO-recognised language be granted status as being one. True enough that this suggestion is also not a perfect solution, but at least it’s much more inclusive. Needless to say, also this challenge is not unique to NCB languages. And yes, various phabricator tickets with language requests have been open since at least 1.5 years.

In closing

The practicalities is just one more thing on top of all the rest that make a fine idea, Abstract Wikipedia for all, smell of entrenching much deeper the well-documented biassed tendencies of Wikis. I tried, and try, to push back. The issues are complex both theoretically & technically and people & praxis. They hold for NCB languages as well as many others.

Abstract Wikipedia aims to build a multilingual Wikipedia, and the back-end technology that it requires may have been a rather big bite for the Wikimedia Foundation to chew on. The ‘many flowers‘ on top of the constructors to generate the text it will have to be if it is serious about the inclusivity, as well as gradual expansion of the natural language generation features during runtime, an expansion that will be paced differently according to the language resources, not unlike that each Wikipedia has its own pace of growth. From the one step at a time perspective, even basic sentences in a short paragraph for a Wikipedia article is an infinite improvement over no article at all. It invites contributions compared to creating a new article from scratch. The bar for making Abstract Wikipedia successful does not necessarily need to be, say, ‘to surpass English articles’.

The mountain we’ll keep climbing, be it with or without the Abstract Wikipedia project. If Abstract Wikipedia is to become a reality and flourish for many languages soon, it needs to allow for molehills, anthills, dykes, dunes, and hills as well, and with whatever flowers available to set it up and make it grow.

ChatGPT, deep learning and the like do not make ontologies (and the rest of AI) obsolete

Countless articles have announced the death of symbolic AI, which includes, among others, ontology engineering, in favour of data-driven AI with deep learning, even more loudly so since large language model-based apps like ChatGPT have captured the public’s attention and imagination. There are those who don’t even realise there is more to AI than deep learning with neural networks. But there is; have a look at the ACM Computing Classification or scroll down to the screenshots at the end of this post if you’re unaware of that. With all the hype and narrow focus, doom and gloom is being predicted with a new AI winter on the cards. But is it? It’s not like we all ditched mathematics at school when portable calculators became cheap gadgets, so why would AI now with machine and deep learning and Large Language Models (LLMs) and an app that attracts attention? Let me touch upon a few examples to illustrate that ontologies have not become obsolete, nor will they.

How exactly do you think data integration is done? Maybe ChatGPT can tell you what’s involved, superficially, but it won’t actually do it for you. Consider, for instance, a paper published earlier this month, on finding clusters of long Covid patient symptoms [Reese23], described in a press release:  they obtained data of 20,532 relevant patients from 38 (!!) data partners, where the authors mapped the clinical findings taken from the electronic health records “to computable terms contained in the Human Phenotype Ontology (HPO), a standard framework for describing human traits … This allowed the researchers to analyze the data across the entire cohort.” (italics are mine). Here’s an illustration of the idea:

Diagram demonstrating how the Human Phenotype Ontology is used for semantic comparisons of electronic health record data to find long covid clusters. (Source: [Reese23] at https://www.thelancet.com/cms/attachment/d7cf87e1-556f-47c0-ae4b-9f5cd8c39b50/gr2.jpg)

Could reliable data integration possibly be done by LLMs? No, not even in the future. NLP with electronic health records is an option, true, but it won’t harmonise terminology for you, nor will it integrate different electronic health record systems.

LLMs aren’t good at playing with data in the myriad of ways where ontologies are used to power ‘intelligent’ applications. Data that’s generated in automation of scientific experiments, for instance, like that cell types in the brain need to be annotated and processed to try to find new cell types and then add annotations with those new types, which is used downstream in queries and further analysis [Tan23]. There is no new stuff in off-the-shelf LLMs, so they can’t help; ontologies can – and do. Ontologies are used and extended as needed to document the new ground truth, which won’t ever be replaced by LLMs, nor by the approximations that machine learning’s outputs are.

What about intelligent analysis of real-time data? Those LLMs won’t be of assistance there either. Take, e.g., energy-optimised building systems control: the system takes real-time data that is linked to an ontology and then it can automatically derive energy conservation measures for the building and its use [Pruvost22].

Much has been written on ChatGPT and education. It’s an application domain that permits for no mistakes on the teaching side of it and, in fact, demands for vetted quality. There are many tasks, from content presentation to assessment. ChatGPT can generate quiz questions, indeed, but only on general knowledge. It can generate a response as well, but whether that will be correct answer is another matter altogether. We also need other types of educational questions besides MCQs, in many disciplines, on specific texts and textbooks with its particular vocabulary, and have the answer computed for automated marking. Computing correct questions and answers can be done with ontologies and some basic automated reasoning services [Raboanary22]. One obtains precision with ontologies that cannot be had with probabilistic guessing. Or take the Foundational Model of Anatomy ontology as a concrete example, which is used to manage the topics in anatomy classes augmented with VR [Soergel22]. Ontologies can also be used as a method of teaching, in art history no less, to push students to dig into the details and be precise [Bertens22] – the opposite of bland, handwaivy, roughly, sort of, non-committal, and fickle responses ChatGPT provides, at times, to open questions. 

They’re just a few application examples that I lazily came across in the timespan of a mere 15 minutes (including selecting them) – one via the LinkedIn timeline, a GS search on “ontologies” with a “since 2022” (17300 results this morning) and clicking a few links that sounded appealing, and one I’m involved in.

This post is not a cry of desperation before sinking, but, rather, mainly one of annoyance. Technology blinkers of any kind are no good and one better has more than just a hammer in one’s toolbox. Not everything can be solved by LLMs and deep learning, and Knowledge Representation (& Reasoning) is not dead. It may have been elbowed to the side by the new kids on the block. I suspect that those in the ‘symbolic AI is obsolete’ camp simply aren’t aware – or would like to pretend not to be aware – of the many different AI-driven computing tasks that need to be solved and implemented. Tasks for which there are no humongous amounts of text or non-text data to grab and learn from. Tasks that are not tolerant to outputs that are noisy or plain wrong. Tasks that require current data, not stale stuff from over a year old and longer ago. Tasks where past data are not a good predictor for the future. Tasks in specialised domains. Tasks that are quirky to a locale. And so on. The NLP community already has recognised LLM’s outputs need fixing, which I was pleasantly surprised with when I attended EMNLP’22 in December (see my EMNLP22 trip report for a few pointers).

Also, and casting the net a little wider, our academic year is about to start, where students need to choose projects and courses, including, among others, another installment of ontology engineering, of logic for AI, Computer Vision, and so on. Perhaps this might assist in choosing and in reflecting that computing as a whole is not going to be obsolete either. ChatGPT and CodePilot can probably pass our 1st-year practical assignments, but there’s so much more computing beyond that, that relies on students understanding the foundations and problem-solving methods. Why should the whole rest of AI, and even computing as a discipline, become obsolete the instant a tool can, at best, regurgitate the known coding solutions to common basic tasks. There are still mathematicians notwithstanding all the devices more powerful than a pocket calculator and there are linguists regardless the free availability of Google Translate’s services; so why would software engineers not remain when there’s a code-completion tool for basic tasks.

Perhaps you still do not care about ontologies and knowledge representation & reasoning. That’s fine; everyone has their interests – just don’t confound new interests for obsolescence of established topics. In case you do want to know more about ontologies and ontology engineering: you may like to have a look at my award-winning open textbook, with exercises, tools, and slides.

p.s.: here are those screenshots on the ACM classification and AI, annotated:

References

[Bertens22] Bertens, L. M. F. Modeling the art historical canon. Arts and Humanities in Higher Education, 2022, 21(3), 240-262.

[Pruvost22] Pruvost, Hervé and Olaf Enge-Rosenblatt. Using Ontologies for Knowledge-Based Monitoring of Building Energy Systems. Computing in Civil Engineering 2021. American Society of Civil Engineers, 2022, pp762-770.

[Roboanary22] Raboanary, T., Wang, S., Keet, C.M. Generating Answerable Questions from Ontologies for Educational Exercises. 15th Metadata and Semantics Research Conference (MTSR’21). Garoufallou, E., Ovalle-Perandones, M-A., Vlachidis, A (Eds.). 2022, Springer CCIS vol. 1537, 28-40.

[Reese23] Reese, J. et al. Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. eBioMedicine, Volume 87, 104413, January 2023.

[Soergel22] Soergel, Dagobert, Olivia Helfer, Steven Lewis, Matthew Wysocki, David Mawer. Using Virtual Reality & Ontologies to Teach System Structure & Function: The Case of Introduction to Anatomy. 12th International conference on the Future of Education 2022. 2022/07/01

[Tan23] Tan, S.Z.K., Kir, H., Aevermann, B.D. et al. Brain Data Standards – A method for building data-driven cell-type ontologies. Scientific Data, 2023, 10, 50.

EMNLP’22 trip report: neuro-symbolic approaches in NLP are on the rise

The trip to the Empirical Methods in Natural Language Processing 2022 conference is certainly one I’ll remember. The conference had well over 1000 in-person people attending what they could of the 6 tutorials and 24 workshops on Wednesday and Thursday, and then the 175 oral presentations, 654 posters, 3 keynotes and a panel session, and 10 Birds of Feather sessions on Friday-Sunday, which was topped off with a welcome reception and a social dinner. The open air dinner was on the one day in the year that it rains in the desert! More precisely on the venue: that was the ADNEC conference centre in Abu Dhabi, from 7 to 11 December.

With so many parallel sessions, it was not always easy to choose. Although I expected many presentations about just large language models (LLMs) that I’m not particularly interested in from a research perspective, it turned out it was very well possible to find a straight road through the parallel NLP sessions with research that had at least added an information-based or a knowledge-based approach to do NLP better. Ha! NLP needs structured data, information, and knowledge to mitigate the problems with hallucinations in natural language generation – elsewhere called “fluent bullshit” – that those LLMs suffer from, among other tasks. Adding a symbolic approach into the mix turned out to be a recurring theme in the conference. Some authors tried to hide a rule-based approach or were apologetic about it, so ‘hot’ the topic is not just yet, but we’ll get there. In any case, it worked so much better for my one-liner intro to state that I’m into ontologies having been branching out to NLG than to say I’m into NLG for African languages. Most people I met had heard of ontologies or knowledge graphs, whereas African languages mostly drew a blank expression.

It was hard to choose what to attend especially on the first day, but eventually I participated in part of the second workshop on Natural Language Generation, Evaluation, and Metrics (GEM’22), NLP for positive impact (NLP4PI’22), and Data Science with Human-in-the-Loop (DaSH’22), and walked into a few more poster sessions of other workshops. The conference sessions had 8 sessions in parallel in each timeslot; I chose the semantics one, ethics, NLG, commonsense reasoning, speech and robotics grounding, and the birds of a feather sessions on ethics and on code-switching. I’ve structured this post by topic rather than by type of session or actual session, however, in the following order: NLP with structured stuff, ethics, a basket with other presentations that were interesting, NLP for African languages, the two BoF sessions, and a few closing remarks. I did at least skim over the papers associated with the presentations and referenced here, and so any errors in discussing the works are still mine. Logistically, the links to the papers in this post are a bit iffy: about 900 EMNLP + workshops papers were already on arxiv according to the organisers, and 828 papers of the main conference are being ingested into the ACL anthology and so its permanent URL is not functional yet, and so my linking practice was inconsistent and may suffer link rot. Be that as it may, let’s get to the science.

The entrance of the conference venue, ADNEC in Abu Dhabi, at the end of the first workshop and tutorials day.

NLP with at least some structured data, information, or knowledge and/or reasoning

I’ve tried to structure this section, roughly going from little addition of structured stuff to more, and then from less to more inferencing.

The first poster session on the first day that I attended was the one of the NLP4PI workshop; it was supposed to be for 1 hour, but after 2.5h it was still being well-attended. I also passed by the adjacent Machine translation session (WMT’22) that also paid off. There were several posters there that were of interest to my inclination toward knowledge engineering. Abhinav Lalwani presented a Findings paper on Logical Fallacy Detection in the NLP4PI’22 poster session, which was interesting both for the computer ethics that I have to teach and their method: create a dataset of 2449 fallacies of 13 types that were taken for online educational resources, machine-learn templates from those sentences – that they call generating a  “structure-aware model” – and then use those templates to find new ones in the wild, which was on climate change claims in this case [1]. Their dataset and code are available on GitHub. The one presented by Lifeng Han from the University of Manchester was part of WMT’22: their aim was to see whether a generic LLM would do better or worse than smaller in-domain language models enhanced with clinical terms extracted from biomedical literature and electronic health records and from class names of (unspecified in the paper) ontologies. The smaller models win, and terms or concepts may win depending on the metric used [2].

For the main conference, and unsurprising for a session called “semantics”, it wasn’t just about LLMs. The first paper was about Structured Knowledge Grounding, of which the tl;dr is that SQL tables and queries improve on the ‘state of the art’ of just GPT-3 [3]. The Reasoning Like Program Executors aims to fix nonsensical numerical output of LLMs by injecting small programs/code for sound numerical reasoning, among the reasoning types that LLMs are incapable of, and are successful at doing so [4]. And there’s a paper on using WordNet for sense retrieval in the context of word in/vs context use, and on discovering that the human evaluators were less biassed than the language model [5].

The commonsense reasoning session also – inevitably, I might add – had papers that combined techniques. The first paper of the session looked into the effects of injecting external knowledge (Comet) to enhance question answering, which is generally positive, and more positive for smaller models [6]. I also have in my notes that they developed an ontology of knowledge types, and so does the paper text claim so, but it is missing from the paper, unless they are referring to the 5 terms in its table 6.

I also remember seeing a poster on using Abstract Meaning Representation. Yes, indeed, and there turned out to be a place for it: for text style transfer to convert a piece of text from one style into another. The text-to-AMR + AMR-to-text model T-STAR beat the state of the art with a 15% increase in content preservation without substantive loss of accuracy (3%) [7].

Moving on to rules and more or less reasoning, first, at the NLP4PI’22 poster session, there was a poster on “Towards Countering Essentialism through Social Bias Reasoning”, which was presented by Maarten Sap. They took a very interdisciplinary approach, mixing logic, psychology and cognitive science to get the job done, and the whole system was entirely rules-based. The motivation was to find a way to assist content moderators by generating possible replies to counter prejudiced statements in online comments. They generated five types of replies and asked users which one they preferred. Types of sample generated replies include, among others, to compute exceptions to the prejudice (e.g., an individual in the group who does not have that trait), attributing the trait also to other groups, and a generic statement on tolerance. Bland seemed to work best. I tried to find the paper for details, but was unsuccessful.

The DaSH’22 presentation about WaNLI concerned the creation of a dataset and pipeline to have crowdsourcing workers and AI “collaborate” in dataset creation, which had a few rules sprinkled into the mix [8]. It turns out that humans are better at revising and evaluating than at creating sentences from scratch, so the pipeline takes that into account. First, from a base set, it uses NLG to generate complement sentences, which are filtered and then reviewed and possibly revised by humans. Complement sentence generation (the AI part) involves taking sentences like “5% chance that the object will be defect free” + “95% that the object will have defects” to then generate (with GPT-3, in this case) the candidate sentence pairs “1% of the seats were vacant” + “99% of the seats were occupied”, using encoded versions of the principles of entailment and set complement, among the reasoning cases used.

Turning up the reasoning a notch, Sean Welleck of the University of Washington gave the keynote at GEM’22. His talks consisted of two parts, on unlearning bad behaviour of LLMs and then an early attempt with a neuro-symbolic approach. The latter concerned connecting a LLM’s output to some logic reasoning. He chose Isabelle, of all reasoners, as a way to get it to check and verify the hallucinations (the nonsense) the LLMs spit out. I asked him why he chose a reasoner for an undecidable language, but the response was not a direct answer. It seemed that he liked the proof trace but was unaware of the undecidability issues. Maybe there’s a future for description logics reasoners here. Elsewhere, and hidden behind a paper title that mentions language models, lies a reality of the ConCoRD relation detection for “boosting consistency of pre-trained language models” with a MAX-SAT solver in the toolbox [9].

Impression of the NLP4PI’22 poster session 2.5h into the 1h session timeslot.

There are (many?) more relevant presentations that I did not get around to attending, such as on dynamic hierarchical reasoning that uses both a LM and a knowledge graph for their scope of question answering [10], a unified representation for graph query language, GraphQ IR [11], and on that RoBERTa, T5, and GPT3 have problems especially with deductive reasoning involving negation [12] and PLOG table-to-logic to enhance table-to-text. Open the conference program handbook and search on things like “commonsense reasoning” or NLI where the I isn’t an abbreviation of Interface but of Inference rather, and there’s even neural-symbolic inference for graph parsing. The compound term “Knowledge graph” has 84 mentions and “reasoning” has 244 mentions. There are also four papers with “grammar induction”, two papers with CFGs, and one with a construction grammar.

It was a pleasant surprise to not be entirely swamped by the “stats/NN + automated metric” formula. I fancy thinking it’s an indication that the frontiers of NLP research already grew out of that and is adding knowledge into the mix.

Ethics and computational social science

Of course, the previously-mentioned topic of trying to fix hallucinations and issues with reasoning and logical coherence of what the language models spit out implies researchers know there’s a problem that needs to be addressed. That is a general issue. Specific ones are unique in their own way; I’ll mention three. Inna Lin presented work on gendered mental health stigma and potential downstream issues with health chatbots that would rely on such language models [13]. For instance, that women were more likely to be recommended to seek professional help and men to toughen up and get on with it. The GeoMLAMA dataset showed that not everything is as bad as one might suspect. The dataset was created to explore multilingual Pre-Trained Language Models on cultural commonsense knowledge, like which colour the dress of the bride is typically. The authors selected English, Chinese, Hindi, Persian, and Swahili. Evaluation showed that multilingual PLMs are not biased toward the USA, that the native language of a country may not be the best language to probe its knowledge (as the commonsense isn’t explicitly stated) and a language may better probe knowledge about a nonnative country than its native country. [14]. The third paper is more about working on a mechanism to help NLP ethics: modelling information change in science communication. The scientist or the press release says one thing, which gets altered slightly in a popular science article, and then morphs into tweets and toots with yet another, different, message. More distortions occurs in the step from popsci article to tweet than from scientist to popsci article. The sort of distortion or ‘not as faithful as one would like’? Notably, “Journalists tend to downplay the certainty and strength of findings from abstracts” and “limitations are more likely to be exaggerated and overstated”. [15]

In contrast, Fatemehsadat Mireshghallah showed some general ethical issues with the very LLMs in her lively presentation. They are so large and have so many parameters that what they end up doing is more alike text memorization and output that memorised text, rather than outputting de novo generated text [16]. She focussed on potential privacy issues, where such models may output sensitive personal data. It also applies to copyright infringement issues: if they return chunk of already existing text, say, a paragraph from this blog, it would be copyright infringement, since I hold the copyright on it by default and I made it CC-BY-NC-SA, which those large LLMs do not adhere to and they don’t credit me. Copilot is already facing a class action lawsuit for unfairly reusing open source code without having obtained permission. In both cases, there’s the question, or task, of removing pieces of text and retraining the model, or not, as well as how to know whether your text was used to create the model. I recall seeing something about that in the presentations and we had some lively discussions about it as well, leaning toward a remove & re-train and suspecting that’s not what’s happening now (except at IBM apparently).

Last, but not least, on this theme: the keynote by Gary Marcus turned out to be a pre-recorded one. It was mostly a popsci talk (see also his recent writings here, among others) on the dangers of those large language models, with plenty of examples of problems with them that have been posted widely recently.

Noteworthy “other” topics

The ‘other’ category in ontologies may be dubious, but here it is not meant as such – I just didn’t have enough material or time to write more about them in this post, but they deserved a mention nonetheless.

The opening keynote of the EMNLP’22 conference by Neil Cohn was great. His main research is in visual languages, and those in comic books in particular. He raised some difficult-to-answer questions and topics. For instance, is language multimodal – vocal, body, graphic – and are gestures separate from, alongside, or part of language? Or take the idea of abstract cognitive principles as basis for both visual and textual language, the hypothesis of “true universals” that should span across modalities, and the idea of “conceptual permeability” on whether the framing in one modality of communication affects the others. He also talked about the topic of cross-cultural diversity in those structures of visual languages, of comic books at least. It almost deserves to be in the “NLP + symbolic” section above, for the grammar he showed and to try to add theory into the mix, rather than just more LLMs and automated evaluation scores.

The other DaSH paper that I enjoyed after aforementioned Wanli was the Cheater’s Bowl, where the authors tried to figure out how humans cheat in online quizzes [17]. Compared to automated open-domain question-answering, humans use fewer keywords more effectively, use more world knowledge to narrow searches, use dynamic refinement and abandonment of search chains, have multiple search chains, and do answer validation. Also in the workshops days setting, I somehow also walked into a poster session of the BlackboxNLP’22 workshop on analysing and interpreting neural networks for NLP. Priyanka Sukumaran enthusiastically talked about her research how LSTMs handle (grammatical) gender [18]. They wanted to know where about in the LSTM a certain grammatical feature is dealt with; and they did, at least for gender agreement in French. The ‘knowledge’ is encoded in just a few nodes and does better on longer than on shorter sentences, since then it can use more other cues in the sentence, including gendered articles, to figure out M/F needed for constructions like noun-adjective agreement. That definitely is alike the same way humans do, but then, algorithms do not need to copy human cognitive processes.

NLP4PI’s keynote was given Preslav Nakov, who recently moved to the Mohamed Bin Zayed University of AI. He gave an interesting talk about fake news, mis- and dis-information detection, and also differentiated it with propaganda detection that, in turn, consists of emotion and logical fallacy detection. If I remember correctly, not with knowledge-based approaches either, but interesting nonetheless.

I had more papers marked for follow up, including on text generation evaluation [19], but this post is starting to become very long as it is already.

Papers with African languages, and Niger-Congo B (‘Bantu’) languages in particular

Last, but not least, something on African languages. There were a few. Some papers had it clearly in the title, others not at all but they used at least one of them in their dataset. The list here is thus incomplete and merely reflects on what I came across.

On the first day, as part of NLP4PI, there was also a poster on participatory translations of Oshiwambo, a language spoken in Namibia, which was presented by Jenalea Rajab from Wits and Millicent Ochieng from Microsoft Kenya, both with the masakhane initiative; the associated paper seems to have been presented at the ICLR 2022 Workshop on AfricaNLP. Also within the masakhane project is the progress on named entity recognition [20]. My UCT colleague Jan Buys also had papers with poster presentation, together with two of his students, Khalid Elmadani and Francois Meyer. One was part of the WMT’22 on multilingual machine translation for African languages [21] and another on sub-word segmentation for Nguni languages (EMNLP Findings) [22]. The authors of AfroLID show results that they have some 96% accuracy on identification of a whopping 517 African languages, which sounds very impressive [23].

Birds of a Feather sessions

The BoF sessions seemed to be loosely organised discussions and exchange-of-ideas about a specific topic. I tried out the Ethics and NLP one, organised by Fatemehsadat Mireshghallah, Luciana Benotti, and Patrick Blackburn, and the code-switching & multilinguality one, organised by Genta Winata, Marina Zhukova, and Sudipta Kar. Both sessions were very lively and constructive and I can recommend to go to at least one of them the next time you’ll attend EMNLP or organise something like that at a conference. The former had specific questions for discussion, such as on the reviewing process and on that required ethics paragraph; the latter had themes, including datasets and models for code-switching and metrics for evaluation. For ethics, there seems to be a direction to head toward, whereas the NLP for code-switching seems to be still very much in its infancy.

Final remarks

As if all that wasn’t keeping me busy already, there were lots of interesting conversations, meeting people I haven’t seen in many years, including Barbara Plank who finished her undergraduate studies at FUB when I was a PhD student there (and focussing on ontologies rather, which I still do) and likewise for Luciana Benotti (who had started her European Masters at that time, also at FUB); people with whom I had emailed before but not met due to the pandemic; and new introductions. There was a reception and an open air social dinner; an evening off meeting an old flatmate from my first degree and a soccer watch party seeing Argentina win; and half a day off after the conference to bridge the wait for the bus to leave which time I used to visit the mosque (it doubles as worthwhile tourist attraction), chat with other attendees hanging around for their evening travels, and start writing this post.

Will I go to another EMNLP? Perhaps. Attendance was most definitely very useful, some relevant research outputs I do have, and there’s cookie dough and buns in the oven, but I’d first need a few new bucketloads of funding to be able to pay for the very high registration cost that comes on top of the ever increasing travel expenses. EMNLP’23 will be in Singapore.

References

[1] Zhijing Jin, Abhinav Lalwani, Tejas Vaidhya, Xiaoyu Shen, Yiwen Ding, Zhiheng Lyu, Mrinmaya Sachan, Rada Mihalcea, Bernhard Schölkopf. Logical Fallacy Detection. EMNLP’22 Findings.

[2] L Han, G Erofeev, I Sorokina, S Gladkoff, G Nenadic Examining Large Pre-Trained Language Models for Machine Translation: What You Don’t Know About It. 7th Conference on Machine translation at EMNLP’22.

[3] Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer and Tao Yu. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. EMNLP’22.

[4] Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Qiang Fu, Yan Gao, Jian-Guang LOU and Weizhu Chen. Reasoning Like Program Executors. EMNLP’22

[5] Qianchu Liu, Diana McCarthy and Anna Korhonen. Measuring Context-Word Biases in Lexical Semantic Datasets. EMNLP’22

[6] Yash Kumar Lal, Niket Tandon, Tanvi Aggarwal, Horace Liu, Nathanael Chambers, Raymond Mooney and Niranjan Balasubramanian. Using Commonsense Knowledge to Answer Why-Questions. EMNLP’22

[7] Anubhav Jangra, Preksha Nema and Aravindan Raghuveer. T-STAR: Truthful Style Transfer using AMR Graph as Intermediate Representation. EMNLP’22

[8] A Liu, S Swayamdipta, NA Smith, Y Choi. Wanli: Worker and ai collaboration for natural language inference dataset creation. DaSH’22 at EMNLP2022.

[9] Eric Mitchell, Joseph Noh, Siyan Li, Will Armstrong, Ananth Agarwal, Patrick Liu, Chelsea Finn and Christopher Manning. Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference. EMNLP’22

[10] Miao Zhang, Rufeng Dai, Ming Dong and Tingting He. DRLK: Dynamic Hierarchical Reasoning with Language Model and Knowledge Graph for Question Answering. EMNLP’22

[11] Lunyiu Nie, Shulin Cao, Jiaxin Shi, Jiuding Sun, Qi Tian, Lei Hou, Juanzi Li, Jidong Zhai GraphQ IR: Unifying the semantic parsing of graph query languages with one intermediate representation. EMNLP’22

[12] Soumya Sanyal, Zeyi Liao and Xiang Ren. RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners. EMNLP’22

[13] Inna Lin, Lucille Njoo, Anjalie Field, Ashish Sharma, Katharina Reinecke, Tim Althoff and Yulia Tsvetkov. Gendered Mental Health Stigma in Masked Language Models. EMNLP’22

[14] Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li, Kai-Wei Chang. Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models. EMNLP’22

[15] Dustin Wright, Jiaxin Pei, David Jurgens, Isabelle Augenstein. Modeling Information Change in Science Communication with Semantically Matched Paraphrases. EMNLP’22

[16] Fatemehsadat Mireshghallah, Archit Uniyal, Tianhao Wang, David Evans and Taylor Berg-Kirkpatrick. An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models. EMNLP’22

[17] Cheater’s Bowl: Human vs. Computer Search Strategies for Open-Domain QA. DaSH’22 at EMNLP2022.

[18] Priyanka Sukumaran, Conor Houghton,Nina Kazanina. Do LSTMs See Gender? Probing the Ability of LSTMs to Learn Abstract Syntactic Rules. BlackboxNLP’22 at EMNLP2022. 7-11 Dec 2022, Abu Dhabi, UAE. arXiv:2211.00153

[19] Ming Zhong, Yang Liu, Da Yin, Yuning Mao, Yizhu Jiao, Pengfei Liu, Chenguang Zhu, Heng Ji and Jiawei Han. Towards a Unified Multi-Dimensional Evaluator for Text Generation. EMNLP’22

[20] David Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba Alabi, Shamsuddeen Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson KALIPE, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau, Edwin Munkoh-Buabeng, victoire Memdjokam Koagne, Allahsera Auguste Tapo, Tebogo Macucwa, Vukosi Marivate, MBONING TCHIAZE Elvis, Tajuddeen Gwadabe, Tosin Adewumi, Orevaoghene Ahia and Joyce Nakatumba-Nabende. MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. EMNLP’22

[21] Khalid Elmadani, Francois Meyer and Jan Buys. University of Cape Town’s WMT22 System: Multilingual Machine Translation for Southern African Languages. WMT’22 at EMNLP’22.

[22] Francois Meyer and Jan Buys. Subword Segmental Language Modelling for Nguni Languages. Findings of EMNLP, 7-11 December 2022, Abu Dhabi, United Arab Emirates.

[23] Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed and Alcides Inciarte. AfroLID: A Neural Language Identification Tool for African Languages. EMNLP’22

“Grammar infused” templates for NLG

It’s hardly ever entirely one extreme or the other in natural language generation and controlled natural languages. Rarely can one get away with simplistic ‘just fill in the blanks’ templates that do not do any grammar or phonological processing to make the output better; our technical report about work done some 17 years ago was a case in point on the limitations thereof if one still needs to be convinced [1]. But where does NLG start? I agree with Ehud Reiter that it isn’t about template versus NLG, but a case of levels of sophistication: the fill-in-the-blank templates definitely don’t count as NLG and full-fledged grammar-only systems definitely do, with anything in-between a grey area. Adding word-level grammatical functions to templates makes it lean to NLG, or even indeed being so if there are relatively many such rules, and dynamically creating nicely readable sentences with aggregation and connectives counts as NLG for sure, too.

With that in mind, we struggled with how to name the beasts we had created for generating sentences in isiZulu [2], a Niger-Congo B language: nearly every resultant word in the generated sentences required a number of grammar rules to make it render sufficiently well (i.e., at least grammatically acceptable and understandable). Since we didn’t have a proper grammar engine yet, but we knew they could never be fill-in-the-blank templates either, we dubbed them verbalisation patterns. Most systems (by number of systems) use either only templates or templates+grammar, so our implemented system [3] was in good company. It may sound like oldskool technology, but you go ask Meta with their Galactica if a ML/DL-based approach is great for generating sensible text that doesn’t hallucinate… and does it well for languages other than English.

That said, honestly, those first attempts we did for isiZulu were not ideal for reusability and maintainability – that was not the focus – and it opened up another can of worms: how do you link templates to (partial) grammar rules? With the ‘partial’ motivated by taking it one step at a time in grammar engine development, as a sort of agile engine development process that is relevant especially for languages that are not well-resourced.

We looked into this recently. There turn out to be three key mechanisms for linking templates to computational grammar rules: embedding (E), where grammar rules are mixed with the templates specifications and therewith co-dependent, and compulsory (C) and partial (P) attachment where there is, or can be, an independent existence of the grammar rules.

Attachment of grammar rules (that can be separated) vs embedding of grammar rules in a system (intertwined with templates) (Source: [6])

The difference between the latter two is subtle but important for use and reuse of grammar rules in the software system and the NLG-ness of it: if each template must use at least one rule from the set of grammar rules and each rule is used somewhere, then the set of rules is compulsorily attached. Conversely, it is partially attached if there are templates in that system that don’t have any grammar rules attached. Whether it is partial because it’s not needed (e.g., the natural language’s grammar is pretty basic) or because the system is on the fill-in-the-blank not-NLG end of the spectrum, is a separate question, but for sure the compulsory one is more on the NLG side of things. Also, a system may use more than one of them in different places; e.g., EC, both embedding and compulsory attachment. This was introduced in [4] in 2019 and expanded upon in a journal article entitled Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation [5] that was published in IJMSO, and even more detail can be found in Zola Mahlaza’s recently completed PhD thesis [6]. These papers have various examples, illustrations how to categorise a system, and why one system was categorised in one way and not another. Here’s a table with several systems that combine templates and computational grammar rules and how they are categorised:

Source: [5]

We needed a short-hand name to refer to the cumbersome and wordy description of ‘combining templates with grammar rules in a [theoretical or implemented] system in some way’, which ended up to be grammar-infused templates.

Why write about this now? Besides certain pandemic-induced priorities in 2021, the recently proposed template language for Abstract Wikipedia that I blogged about before may mix Compulsory or Partial attachment, but ought not to permit the messy embedding of grammar in a template. This may not have been clear in v1 of the proposal, but hopefully it is a little bit more so in this new version that was put online over the past few days. To make that long story short: besides a few notes at the start of its Section 3, there’s a generic description of an idea for a realization algorithm. Its details don’t matter if you don’t intend to design a new realiser from scratch and maybe not either if you want to link it to your existing system. The key take-away from that section is that there’s where the real grammar and phonological conditioning stuff happens if it’s needed. For example, for the ‘age in years’ sub-template for isiZulu, recall that’s:

Year_zu(years):"{root:Lexeme(L686326)} {concord:RelativeConcord()}{Copula()}{concord_1<nummod:NounPrefix()}-{nummod:Cardinal(years)}"

The template language sets some boundaries for declaring such a template, but it is a realiser that has to interpret ‘keywords’, such as root, concord, and RelativeConcord, and do something with it so that the output ends up correctly; in this case, from ‘year’ + ‘25’ as input data to iminyaka engama-25 as outputted text. That process might be done in line with Ariel Gutman’s realiser pipeline for Abstract Wikipedia and his proof-of-concept implementation with Scribunto or any other realizer architecture or system, such as Grammatical Framework, SimpleNLG, NinaiUdiron, or Zola’s Nguni Grammar Engine, among several options for multilingual text generation. It might sound silly to put templates on top of the heavy machinery of a grammar engine, but it will make it more accessible to the general public so that they can specify how sentences should be generated. And, hopefully, permit a rules-as-you-go approach as well.

It is then the realiser (including grammar) engine and the partially or compulsorily attached computational grammar rules and other algorithms that work with the template. For the example, when it sees root and that the lemma fetched is a noun (L686326 is unyaka ‘year’), it also fetches the value of the noun class (a grammatical feature stored with the noun), which we always need somewhere for isiZulu NLG. It then needs to figure out to make a plural out of ‘year’, which it will know that it must do thanks to the years fetched for the instance (i.e., 25, which is plural) and the nummod that links to the root by virtue of the design and the assumption there’s a (dependency) grammar. Then, with concord:RelativeConcord, it will fetch the relative concord for that noun class, since concord also links to root. We already can do the concordial agreements and pluralising of nouns (and much more!) for isiZulu since several years. The only hurdle is that that code would need to become interoperable with the template language specification, in that our realisers will have to be able to recognise and process properly those ‘keywords’. Those words are part of an extensible set of words inspired by dependency grammars.

How this is supposed to interact smoothly is to be figured out still. Part of that is touched upon in the section about instrumentalising the template language: you could, for instance, specify it as functions in Wikifunctions that is instantly editable, facilitating an add-rules-as-you-go approach. Or it can be done less flexibly, by mapping or transforming it to another template language or to the specification of an external realiser (since it’s the principle of attachment, not embedding, of computational grammar rules).

In closing, whether the term “grammar-infused templates” will stick remains to be seen, but combining templates with grammars in some way for NLG will have a solid future at least for as long as those ML/DL-based large language model systems keep hallucinating and don’t consider languages other than English, including the intended multilingual setting for Abstract Wikipedia.

References

[1] M. Jarrar, C.M. Keet, and P. Dongilli. Multilingual verbalization of ORM conceptual models and axiomatized ontologies. STARLab Technical Report, Vrije Universiteit Brussels, Belgium. February 2006.

[2] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157. (accepted version free access)

[3] Keet, C.M. Xakaza, M., Khumalo, L. Verbalising OWL ontologies in isiZulu with Python. The Semantic Web: ESWC 2017 Satellite Events, Blomqvist, E et al. (eds.). Springer LNCS vol 10577, 59-64. Portoroz, Slovenia, May 28 – June 2, 2017.

[4] Mahlaza, Z., Keet, C.M. A classification of grammar-infused templates for ontology and model verbalisation. 13th Metadata and Semantics Research Conference (MTSR’19). E. Garoufallou et al. (Eds.). Springer vol. CCIS 1057, 64-76. 28-31 Oct 2019, Rome, Italy.

[5] Mahlaza, Z., Keet, C.M. Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation. International Journal of Metadata, Semantics and Ontologies, 2020, 14(3): 249-262.

[6] Mahlaza, Z. Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu. PhD Thesis, Department of Computer Science, University of Cape Town, South Africa. 2022.

Good girls, bold girls – but not böse

That first sentence of a book, including non-fiction books, may set the tone for what’s to come. For my memoir, it’s a translation of Brave meisjes komen in de hemel, brutale overal: good girls go to heaven, bold ones go everywhere.

I had read a book with that title some 25 years ago. It was originally written by Ute Ehrhardt in 1994 and translated from German to Dutch and published a year later. For the memoir, I had translated the Dutch title of the book into English myself: the brutale translates to ‘bold’ according to me, my dictionary (a Prisma Woordenboek hard copy), and an online dictionary. Bold means “(of a person, action, or idea) showing a willingness to take risks; confident and courageous.” according to the Oxford dictionary (and similarly here) and it’s in the same league as audacious, daring, brazen, and perky. It has a positive connotation.

What I, perhaps, ought to have done last year, is to find out whether the book also had been translated into English and trust that translator. As it turned out, I’m glad I did not do so, which brings me to the more substantive part of the post. I wanted to see whether I could find the book in order to link it in this post. I did. Interestingly, the word used in the English title was “bad” rather than ‘bold’, yet brutaal is not at all necessarily bad, nor is the book about women being bad. Surely something must have gotten warped in translation there?!

I took the hard copy from the bookshelf and checked the fine-print: it listed the original German title as Gute Mädchen kommen in den Himmel, böse überall hin. Hm, bӧse is not good. It has 17 German-to-English translations and none is quite as flattering as bold, not at all. This leaves either bad translations to blame or there was a semantic shift in the German-to-Dutch translation. Considering the former first, it appeared that the German-Dutch online dictionary did not offer nice Dutch words for bӧse either. Getting up from my chair again to consult my hard copy Prisma German-Dutch dictionary did not pay off either, except for one, maybe (ondeugend). It does not even list brutaal as possible translation. Was the author, Dr Ehrhardt of the Baby Boomer generation, still so indoctrinated in the patriarchy and Christianity – Gute vs Das Bӧse – as to think that not being a smiling nice girl must mean being bӧse? The term did not hold back the Germans, by the way: it was the best-sold non-fiction book in Germany in 1995, my Dutch copy stated. Moreover, it turned out to be at second place overall since German book sales counting started 60 years ago, including having been a whopping 107 weeks at first place in the Spiegel bestseller list. What’s going on here? Would the Germans be that interested in ‘bad’ girls? Not quite. The second option applies, i.e., the the semantic shift for the Dutch translation.

The book’s contents is not about bad, mean, or angry women at all and the subtitle provides a further hint to that: waarom lief zijn vrouwen geen stap verder brengt ‘why being nice won’t get women even one step ahead’. Instead of being pliant, submissive, and self-sabotaging in several ways, and therewith have our voices ignored, contributions downplayed, and being passed over for jobs and promotions, it seeks to give women a kick in the backside in order to learn to stand one’s ground and it provides suggestions to be heard and taken into account by avoiding the many pitfalls. Our generation of children of the Baby Boomers would improve the world better than those second wave feminists tried to do, and this book fitted right within the Zeitgeist. It was the girl power decade in the 1990s, where women took agency to become master of their own destiny, or at least tried to. The New Woman – yes, capitalised in the book. Agent Dana Scully of the X Files as the well-dressed scientist and sceptic investigator. Buffy the vampire slayer. Xena, Warrior Princess. The Spice Girls. Naomi Wolf’s Fire with Fire (that, by the way, wasn’t translated into Dutch). Reading through the book again now, it comes across as a somewhat dated use-case-packed manifesto about the pitfalls to avoid and how to be the architect of your own life. That’s not being bad, is it.

I suppose I have to thank the German-to-Dutch book translator Marten Hofstede for putting a fitting Dutch title to the content of the book. It piqued my interest in the bookstore at the train station, and I bought and read it in hat must have been 1997. It resonated. To be honest, if the Dutch title would have used any of the listed translations in the online dictionary – such as kwaad, verstoord, and nijdig – then I likely would not have bought the book. Having had to be evil or perpetually angry to go everywhere, anywhere and upward would have been too steep price to pay. Luckily, bold was indeed the right attribute. Perhaps for the generation after me, i.e., who are now in their twenties, it’s not about being bold but about being, as a normal way of outlook and interaction in society. Of course a woman is entitled to live her own life, as any human being is.

A review of NLG realizers and a new architecture

That last step in the process of generating text from some structured representation of data, information or knowledge is done by things called surface realizers. They take care of the ‘finishing touches’ – syntax, morphology, and orthography – to make good natural language sentences out of an ontology, conceptual data model, or Wikidata data, among many possible sources that can be used for declaring abstract representations. Besides theories, there are also many tools that try to get that working at least to some extent. Which ways, or system architectures, are available for generating the text? Which components do they all, or at least most of them, have? Where are the differences and how do they matter? Will they work for African languages? And if not, then what?

My soon-to-graduate PhD student Zola Mahlaza and I set out to answer these questions, and more, and the outcome is described in the article Surface realization architecture for low-resourced African languages that was recently accepted and is now in print with the ACM Transactions on Asian and Low-Resource Language Information Processing (ACM TALLIP) journal [1].

Zola examined 77 systems, which exhibited some 13 different principal architectures that could be classified into 6 distinct architecture categories. Purely by number of systems, manually coded and rule-based would be the most popular, but there are a few hybrid and data-driven systems as well. A consensus architecture for realisers there is not. And none exhibit most of the software maintainability characteristics, like modularity, reusability, and analysability that we need for African languages (even more so than for better resourced languages). African is narrowed down in the paper further to those in the Niger-Congo B (‘Bantu’) family of languages. One of the tricky things is that there’s a lot going on at the sub-word level with these languages, whereas practically all extant realizers operate at the word-level.

Hence, the next step was to create a new surface realizer architecture that is suitable for low-resourced African languages and that is maintainable. Perhaps unsurprisingly, since the paper is in print, this new architecture compares favourably against the required features. The new architecture also has ‘bonus’ features, like being guided by an ontology with a template ontology [2] for verification and interoperability. All its components and the rationale for putting it together this way are described in Section 5 of the article and the maintainability claims are discussed in its Section 6.

Source: [1]

There’s also a brief illustration how one can redesign a realiser into the proposed architecture. We redesigned the architecture of OWLSIZ for question generation in isiZulu [3] as use case. The code of that redesign of OWLSIZ is available, i.e., it’s not merely a case of just having drawn a different diagram, but it was actually proof-of-concept tested that it can be done.

While I obviously know what’s going on in the article, if you’d like to know much more details than what’s described there, I suggest you consult Zola as the main author of the article or his (soon to be available online) PhD thesis [4] that devotes roughly a chapter to this topic.

References

[1] Mahlaza, Z., Keet, C.M. Surface realisation architecture for low-resourced African languages. ACM Transactions on Asian and Low-Resource Language Information Processing, (in print). DOI: 10.1145/3567594.

[2] Mahlaza, Z., Keet, C.M. ToCT: A task ontology to manage complex templates. FOIS’21 Ontology Showcase. Sanfilippo, E.M. et al. (Eds.). CEUR-WS vol. 2969. 9p.

[3] Mahlaza, Z., Keet, C.M.: OWLSIZ: An isiZulu CNL for structured knowledge validation. In: Proc. of WebNLG+ 2020. pp. 15–25. ACL, Dublin, Ireland (Virtual).

[4] Mahlaza, Z. Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu. PhD Thesis, Department of Computer Science, University of Cape Town, South Africa. 2022.

Semantic interoperability of conceptual data modelling languages: FaCIL

Software systems aren’t getting any less complex to design, implement, and maintain, which applies to both the numerous diverse components and the myriad of people involved in the development processes. Even a straightforward configuration of a data­base back-end and an object-oriented front-end tool requires coordination among database analysts, programmers, HCI people, and increasing involvement of domain experts and stakeholders. They each may prefer, and have different competencies in, certain specific design mechanisms; e.g., one may want EER for the database design, UML diagrams for the front-end app, and perhaps structured natural language sentences with SBVR or ORM for expressing the business rules. This requires multi-modal modelling in a plurality of paradigms. This would then need to be supported by hybrid tools that offer interoperability among those modelling languages, since such heterogeneity won’t go away any time soon, or ever.

Example of possible interactions between the various developers of a software system and the models they may be using.

It is far from trivial to have these people work together whilst maintaining their preferred view of a unified system’s design, let alone doing all this design in one system. In fact, there’s no such tool that can seamlessly render such varied models across multiple modelling languages whilst preserving the semantics. At best, there’s either only theory that aims to do that, or only a subset of the respective languages’ features, or a subset of the required combinations. Well, more precisely, until our efforts. We set out to fill this gap in functionality, both in a theoretically sound way and implemented as proof-of-concept to demonstrate its feasibility. The latest progress was recently published in the paper entitled A framework for interoperability with hybrid tools in the Journal of Intelligent Information Systems [1], in collaboration with Germán Braun and Pablo Fillottrani.

First, we propose the Framework for semantiC Interoperability of conceptual data modelling Languages, FaCIL, which serves as the core orchestration mechanism for hybrid modelling tools with relations between components and a workflow that uses them. At its centre, it has a metamodel that is used for the interchange between the various conceptual models represented in different languages and it has sets of rules to and from the metamodel (and at the metamodel level) to ensure the semantics is preserved when transforming a model in one language into a model in a different language and such that edits to one model automatically propagate correctly to the model in another language. In addition, thanks to the metamodel-based approach, logic-based reconstructions of the modelling languages also have become easier to manage, and so a path to automated reasoning is integrated in FaCIL as well.

This generic multi-modal modelling interoperability framework FaCIL was instantiated with a metamodel for UML Class Diagrams, EER, and ORM2 interoperability specifically [2] (introduced in 2015), called the KF metamodel [3] with its relevant rules (initial and implemented ones), an English controlled natural language, and a logic-based reconstruction into a fragment of OWL (orchestration graphically from the paper). This enables a range of different user interactions in the modelling process, of which an example of a possible workflow is shown in the following figure.

A sample workflow in the hybrid setting, showing interactions between visual conceptual data models (i.e., in their diagram version) and in their (pseudo-)natural language versions, with updates propagating to the others automatically. At the start (top), there’s a visual model in one’s preferred language from which a KF runtime model is generated. From there, it can go in various directions: verbalise, convert, or modify it. If the latter, then the KF runtime model is also updated and the changes are propagated to the other versions of the model, as often as needed. The elements in yellow/green/blue are thanks to FaCIL and the white ones are the usual tasks in the traditional one-off one-language modelling setting.

These theoretical foundations were implemented in the web-based crowd 2.0 tool (with source code). crowd 2.0 is the first hybrid tool of its kind, tying together all the pieces such that now, instead of partial or full manual model management of transformations and updates in multiple disparate tools, these tasks can be carried out automatically in one application and therewith also allow diverse developers and stakeholders to work from a shared single system.

We also describe a use case scenario for it – on Covid-19, as pretty much all of the work for this paper was done during the worse-than-today’s stage of the pandemic – that has lots of screenshots from the tool in action, both in the paper (starting here, with details halfway in this section) and more online.

Besides evaluating the framework with an instantiation, a proof-of-concept implementation of that instantiation, and a use case, it was also assessed against the reference framework for conceptual data modelling of Delcambre and co-authors [4] and shown to meet those requirements. Finally, crowd 2.0’s features were assessed against five relevant tools, considering the key requirements for hybrid tools, and shown to compare favourable against them (see Table 2 in the paper).

Distinct advantages can be summed up as follows, from those 26 pages of the paper, where the, in my opinion, most useful ones are underlined here, and the most promising ones to solve another set of related problems with conceptual data modelling (in one fell swoop!) in italics:

  • One system for related tasks, including visual and text-based modelling in multiple modelling languages, automated transformations and update propagation between the models, as well as verification of the model on coherence and consistency.
  • Any visual and text-based conceptual model interaction with the logic has to be maintained only in one place rather than for each conceptual modelling and controlled natural language separately;
  • A controlled natural language can be specified on the KF metamodel elements so that it then can be applied throughout the models regardless the visual language and therewith eliminating duplicate work of re-specifications for each modelling language and fragment thereof;
  • Any further model management, especially in the case of large models, such as abstraction and modularisation, can be specified either on the logic or on the KF metamodel in one place and propagate to other models accordingly, rather than re-inventing or reworking the algorithms for each language over and over again;
  • The modular design of the framework allows for extensions of each component, including more variants of visual languages, more controlled languages in your natural language of choice, or different logic-based reconstructions.

Of course, more can be done to make it even better, but it is a milestone of sorts: research into the  theoretical foundations of this particular line or research had commenced 10 years ago with the DST/MINCyT-funded bi-lateral project on ontology-driven unification of conceptual data modelling languages. Back then, we fantasised that, with more theory, we might get something like this sometime in the future. And we did.

References

[1] Germán Braun, Pablo Fillottrani, and C Maria Keet. A framework for interoperability with hybrid tools. Journal of Intelligent Information Systems, in print since 29 July 2022.

[2] Keet, C. M., & Fillottrani, P. R. (2015). An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 98, 30–53.

[3] Fillottrani, P.R., Keet, C.M. KF metamodel formalization. Technical Report, Arxiv.org http://arxiv.org/abs/1412.6545. Dec 19, 2014. 26p.

[4] Delcambre, L. M. L., Liddle, S. W., Pastor, O., & Storey, V. C. (2018). A reference framework for conceptual modeling. In: 37th International Conference on Conceptual Modeling (ER’18). LNCS. Springer, vol. 11157, 27–42.

English, Englishes – which one to use for writing?

Sometimes, the answer to the question in the post’s title is easy, if you’re writing in English: do whatever the style guide says. Don’t argue with the journal editor or typesetter about that sort of trivia (unless they’re very wrong). If it states American English spelling, do so; if British English, go for that. If you can’t distinguish your color from colour, modeling from modelling, and a faucet from a tap, use a spellchecker with one of the Englishes on offer—even OpenOffice Writer shows red wavy lines under ‘color’, ‘modeling’, and ‘faucet’ when it’s set to my default “English (South Africa)”. There are very many other places where you can write in English as much as you like or have time for, however, and then the blog post’s question becomes more relevant. How many Englishes or somehow accepted recognised variants of English exist, and where does it make a difference in writing such that you’ll have to, or are supposed to, choose?

It begs the question of how many variants of English count as one of the Englishes, which is tricky to answer, because it depends on what counts. Does a dialect count? Does it count when it’s sanctioned by a country when it has an official language status and a language body? Does it count when there are enough users? Or when there’s enough text to detect the substantive differences? What are the minimum number or type of differences, if any, and from which standard, before one may start to talk of different Englishes and a new spin-off X-English? People have been deliberating about such matters and trying to document differences and even have come up with classification schemes. Englishes around the world, to be more precise, refer to localised or indigenised versions of English that are either those people’s first or institutionalised language, not just any variant or dialect. There’s an International Association for World Englishes (IAWE) and there are handbooks, textbooks, and scientific journals about it, and the 25th conference of the IAWE will take place next year.

In recent years there have been suggestions that English could break up into mutually unintelligible languages, much as Latin once did. Could such a break-up occur, or are we in need of a new appreciation of the nature of World English?

Tom McArtrur, 1987, writing from “the mother country”, but not “the centre of gravity”, of English (pdf).

My expertise doesn’t go that far – I’m operating from the consumer-side of these matters, standards-following, and trying to not make too many mistakes. It took me a while to figure out there was British English (BE) and American English (AE) and then it was a matter of looking up rules on spelling differences, like -ise vs. -ize and single vs. double l (e.g., traveling vs. travelling), checking comparative word lists, and other varied differences, like whether it’s ‘towards’ or ‘toward’ or 15:30, 15.30, 3.30pm or 3:30pm (or one of my colleagues p’s, like a 3.30p). Not to mention a plethora of online writing guides and the comprehensive sense of style book by Steven Pinker. Let’s explore the Englishes and Global English a little.

McArthur’s Englishes (source)

South African English (SAE) exists as one of the recognised Englishes, all the way into internationally reputable dictionaries. It is a bit of a mix of BE and AE, with some spices sprinkled into it. It tries to follow BE but there are AE influences due to the media and, perhaps, anti-colonial sentiment. It’s soccer, not football, for instance, and the 3.30pm variant rather than a 24h clock. Well, I’m not sure it is officially, but practically it is so. It also has ‘weird’ words that everyone is convinced is native English of the BE variety, but isn’t, like timeously rather than timeous or timely – the most I could find was a Wiktionary entry on it claiming it to be Scottish and SAE, but not even the Dictionary of SAE (DSAE) has an entry for it. I’ve seen it so often in work emails over the years that I caved in and use it as well. There are at least a handful of SAE words that people in South Africa think is BE but isn’t, as any SA expat will be able to recall when they get quizzical looks overseas. Then there are hundreds of words that people know is SAE at least unofficially, which are mainly the loan words and adopted words from the 10 other languages spoken in SA – regional overlap causes mutual language influences in all directions. Bakkie, indaba, veld, lekker, dagga, and many more – I’ve blogged about that before. My OpenOffice SAE spellchecker doesn’t flag any of these words as typos.

Arguably, also grammatical differences for SAE exist. In practice they sure do, but I’m not aware of anything officially endorsed. There is no ‘benevolent language dictator’ with card-carrying members of the lexicography and grammar police to endorse or reprimand. Indeed there is the Pan-South African Language Board (PANSALB), but its teeth and thunder don’t come close to the likes of the Académie Française or Real Academia Española. Regarding grammar, that previous post already mentioned the case of the preposition at the end of a sentence when it’s a separable part of the verb in Afrikaans, Dutch, and German (e.g., meenemen or mitnehmen ‘take with’). A concoction that still makes me wince each time I hear or read it, is the ‘can be able to’. It’s either can + verb what you can, or copula + able to + verb what you can do. It is, e.g., ‘I can carry out the experiment’ or ‘I’m able to carry out the experiment’, but not ‘I can be able to carry out the experiment’. I suspect it carries over from a verb form in Niger-Congo B languages since I’ve heard it used also by at least Tanzanians, Kenyans, and Malawians, and meanwhile I’ve occasionally seen it also in texts written by English South African students.

If the notion of “Englishes” feels uncomfortable, then what about Global/World/International English? Is there one? For many a paper I review double-blind, i.e., where the author names and affiliations are hidden, I can’t tell unless the English is really bad. I’ve read enough to be able to spot Spanglish or Chinglish, but mostly I can’t tell, in that there’s a sort of bland scientific English – be it a pidgin English, or maybe multiple authors cancel out ways of making mistakes, or no-one really bothers tear the vocabulary apart into their boxes because it’s secondary to the scientific content being communicated. No doubt that investigative deliberations are ongoing about that too; if there aren’t, they ought to.

Another scenario for ‘global English’, concerns how to write a newsletter to a global audience. For instance, if you were to visit a website with an intended audience in the USA, then it should tolerable to read “this fall”, even though elsewhere it’s either autumn, spring, a rainy or a dry season. If it’s an article by the UN, say, then one may expect a different wording that is either not US-centric or, if the season matters, to qualify it like in a “Covid-19 cases are expected to rise during fall and winter in North America”. With the former wording, you can’t please everyone, due to different calendars with different month names and year ends and different seasons. The question also came up recently for a Wikimedia blog post that I was involved sideways in a draft version, on Abstract Wikipedia progress for its natural language generation component. My tendency was toward(s) a Global English, whereas one of my collaborators’ stance was that they assumed a rule that it should be the English of wherever the organisation’s headquarters is located. These choices were also confusing when I was writing the first draft of my memoir: it was published by a South African publisher, hence, SAE style guidelines, but the book is also distributed – and read! – internationally.

Without clear rules, there will always be people who complain about your English, be it either that you’re wrong or just not in the inner circle for sensing ‘the feeling of the language that only a native speaker can have’, that supposedly inherently unattainable fingerspitzengefühl for it. No clear rules isn’t good for developing spelling and grammar checkers either. In that regard, and that one only, perhaps I just might prefer a benevolent dictator. I don’t even care which of the Englishes (except for not the stupid stuff like spelling ‘light’ as ‘lite’, ffs). I also fancy the idea of banding together with other ‘nonfirst-language’ speakers of English to start devising and dictating rules, since the English speakers can’t seem to sort out their own language – at least not enough like the grammatically richer languages – and we’re in the overwhelming majority in numbers (about 1:3 apparently). One can dream.

As to the question in the title of the blog post: what I’ve written so far is not a clear answer for all cases, indeed, in particular when there is no editorial house style dictating it, but this lifting of the veil hopefully has made clear that attempting to answer the question means opening up that can of worms further. You could create your own style guide for your not-editor-policed writings. The more I read about it, though, the more complicated things turn out to be, so you’re warned in case you’d like to delve into this topic. Meanwhile, I’ll keep winging it on my blog with some version of a ‘global English’ and inadvertent typos and grammar missteps…

How does one do an ontological investigation?

It’s a question I’ve been asked several times. Students see ontology papers in venues such as FOIS, EKAW, KR, AAAI, Applied Ontology, or the FOUST workshops and it seems as if all that stuff just fell from the sky neatly into the paper, or that the authors perhaps played with mud and somehow got the paper’s contents to emerge neatly from it. Not quite. It’s just that none of the authors bothered to write a “methods and methodologies” or “procedure” section. That it’s not written doesn’t mean it didn’t happen.

To figure out how to go about doing such an ontological investigation, there are a few options available to you:

  • Read many such papers and try to distill commonalities with which one could  reverse engineer a possible process that could have led to those documented outcomes.
  • Guess the processes and do something, submit the manuscript, swallow the critical reviews and act upon those suggestions; repeat this process until it makes it through the review system. Then try again with another topic to see if you can do it now by yourself in fewer iterations.
  • Try to get a supervisor or a mentor who has published such papers and be their apprentice or protégé formally or informally.
  • Enrol in an applied ontology course, where they should be introducing you to the mores of the field, including the process of doing ontological investigations. Or take up a major/minor in philosophy.

Pursuing all options likely will get you the best results. In a time of publish-or-perish, shortcuts may be welcome since the ever greater pressures are less forgiving to learning things the hard way.

Every discipline has its own ways for how to investigate something. At a very high level, it still will look the same: you arrive at a question, a hypothesis, or a problem that no one has answered/falsified/solved before, you do your thing and obtain results, discuss them, and conclude. For ontology, what hopefully rolls out of such an investigation is what the nature of the entity under investigation is. For instance, what dispositions are, a new insight on the transitivity of parthood, the nature of the relation between portions of stuff, or what a particular domain entity (e.g., money, peace, pandemic) means.

I haven’t seen cookbook instructions for how to go about doing this for applied ontology. I did do most of the options listed above: I read (and still read) a lot of articles, conducted a number of such investigations myself and managed to get them published, and even did a (small) dissertation in applied philosophy (mentorships are hard to come by for women in academia, let alone the next stage of being someone’s protégé). I think it is possible to distill some procedure from all of that, for applied ontology at least. While it’s still only a rough outline, it may be of interest to put it out there to get feedback on it to see whether this can be collectively refined or extended.

With X the subject of investigation, which could be anything—a feature such as the colour of objects, the nature of a relation, the roles people fulfill, causality, stuff, collectives, events, money, secrets—the following steps will get you at least closer to an answer, if not finding the answer outright:

  1. (optional) Consult dictionaries and the like for what they say about X;
  2. Do a scientific literature review on X and, if needed when there’s little on X, also look up attendant topics for possible ideas;
  3. Criticise the related work for where they fall short and how, and narrow down the problem/question regarding X;
  4. Put forth your view on the matter, by building up the argument step by step; e.g., as follows:
    1. From informal explanation to a possible intermediate stage with sketching a solution (in ad hoc notation for illustration or by abusing ORM or UML class diagram notation) to a formal characterisation of X, or the aspect of X if the scope was narrowed down.
    2. From each piece of informal explanation, create the theory one axiom or definition at a time.
    Either of the two may involve proofs for logical consequences and will have some iterations of looking up more scientific literature to finalise an axiom or definition.
  1. (optional) Evaluate and implement.
  2. Discuss where it gave new insight, note any shortcomings, and mention new questions it may generate or problem it doesn’t solve yet, and conclude.

For step 3, and as compared to scientific literature I’ve read in other disciplines, the ontologists are a rather blunt critical lot. The formalisation stage in step 4 is more flexible than indicated. For instance, you can choose your logic or make one up [1], but you do need at least something of that (more about that below). Few use tools, such as Isabelle, Prover9, and HeTS, to assist with the logic aspects, but I would recommend you do. Also within that grand step 4, is that philosophers typically would not use UML or ORM or the like, but use total freedom in drawing something, if there’s a drawing at all (and a good number would recoil at the very word ‘conceptual data modeling language’, but that’s for another time), and likewise for many a logician. Here are two sample sequences for that step 4:

A visualization of the ‘one definition or axiom at a time’ option (4b)

A visualization of the ‘iterating over a diagram first’ option (4a)

As an aside, the philosophical investigations are lonesome endeavours resulting in disproportionately more single-author articles and books. This is in stark contrast with ontologies, those artefacts in computing and IT: many of them are developed in teams or even in large consortia, ranging from a few modellers to hundreds of contributors. Possibly because there are more tasks and the scope often may be larger.

Is that all there is to it? Sort of, yes, but for different reasons, there may be different emphases on different components (and so it still may not get you through the publication process to tell the world about your awesome results). Different venues have different scopes, even if they use the same terminology in their respective CFPs. Venues such as KR and AAAI are very much logic oriented, so there must be a formalization and proving interesting properties will substantially increase the (very small) chance of getting the paper accepted. Toning down the philosophical musings and deliberations is unlikely to be detrimental. For instance, our paper on essential vs immutable part-whole relations [2]. I wouldn’t expect the earlier papers, such as on social roles by Masolo et al [3] or temporal mereology by Donnelly and Bittner [4], to be able to make it through in the KR/AAAI/IJCAI venues nowadays (none of the IJCAI’22 papers sound even remotely like an ontology paper). But feel free to try. IJCAI 2023 will be in Cape Town, in case that information would help to motivate trying.

Venues such as EKAW and KCAP like some theory, but there’s got to be some implementation, (plausible) use, and/or evaluation to it for it to have a chance to make it through the review process. For instance, my theory on relations was evaluated on a few ontologies [5] and the stuff paper had the ontology also in OWL, modelling guidance for use, and notes on interoperability [6]. All those topics, which reside in the “step 5” above, come at the ‘cost’ of less logic and less detailed philosophical deliberations—research time and a paper’s page limits do have hard boundaries.

Ontology papers in FOIS and the like prefer to see more emphasis on the theory and what can be dragged in and used or adapted from advances in analytic philosophy, cognitive science, and attendant disciplines. Evaluation is not asked for as a separate item but assumed to be evident from the argumentation. I admit that sometimes I skip that as well when I write for such venues, e.g., in [7], but typically do put some evaluation in there nonetheless (recall [1]). And there still tends to be the assumption that one can write axioms flawlessly and oversee consequences without the assistance of automated model checkers and provers. For instance, have a look at the FOIS 2020 best paper award paper on a theory of secrets [8], which went through the steps mentioned above with the 4b route, and the one about the ontology of competition [9], which took the 4a route with OntoUML diagrams (with the logic implied by its use), and one more on mereology that first had other diagrams as part of the domain analysis to then go to the formalization with definitions and theorems and a version in CLIF [10]. That’s not to say you shouldn’t do an evaluation of sorts (of the variety use cases, checking against requirements, proving consistency, etc.), but just that you may be able to get away with not doing so (provided your argumentation is good enough and there’s enough novelty to it).

Finally, note that this is a blog post and it was not easy to keep it short. Alleys and more explanations and illustrations and details are quite possible. If you have comments on the high-level procedure, please don’t hesitate to leave a comment on the blog or contact me directly!

References

[1] Fillottrani, P.R., Keet, C.M.. An analysis of commitments in ontology language design. Proceedings of the 11th International Conference on Formal Ontology in Information Systems 2020 (FOIS’20). Brodaric, B and Neuhaus, F. (Eds.). IOS Press, FAIA vol. 330, 46-60.

[2] Artale, A., Guarino, N., and Keet, C.M. Formalising temporal constraints on part-whole relations. Proceedings of the 11th International Conference on Principles of Knowledge Representation and Reasoning (KR’08). Gerhard Brewka, Jerome Lang (Eds.) AAAI Press, pp 673-683.

[3] Masolo, C., Vieu, L., Bottazzi, E., Catenacci, C., Ferrario, R., Gangemi, A., & Guarino, N. Social Roles and their Descriptions. Proceedings of the 9th International Conference on Principles of Knowledge Representation and Reasoning (KR’04). AAAI press. pp 267-277.

[4] Bittner, T., & Donnelly, M. A temporal mereology for distinguishing between integral objects and portions of stuff. Proceedings of Association for the Advancement of Artificial Intelligence conference 2007 (AAAI’07). AAAI press. pp 287-292.

 [5] Keet, C.M. Detecting and Revising Flaws in OWL Object Property Expressions. 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12), A. ten Teije et al. (Eds.). Springer, LNAI 7603, 252-266.

[6] Keet, C.M. A core ontology of macroscopic stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). Springer LNAI vol. 8876, 209-224.

[7] Keet, C.M. The computer program as a functional whole. Proceedings of the 11th International Conference on Formal Ontology in Information Systems 2020 (FOIS’20). Brodaric, B and Neuhaus, F. (Eds.). IOS Press, FAIA vol. 330, 216-230.

[8] Haythem O. Ismail, Merna Shafie. A commonsense theory of secrets. Proceedings of the 11th International Conference on Formal Ontology in Information Systems 2020 (FOIS’20). Brodaric, B and Neuhaus, F. (Eds.). IOS Press, FAIA vol. 330, 77-91.

[9] Tiago Prince Sales, Daniele Porello, Nicola Guarino, Giancarlo Guizzardi, John Mylopoulos. Ontological foundations of competition. Proceedings of the 10th International Conference on Formal Ontology in Information Systems 2020 (FOIS’18). Stefano Borgo, Pascal Hitzler, Oliver Kutz (eds.). IOS Press, FAIA vol. 306, 96-109.

[10] Michael Grüninger, Carmen Chui, Yi Ru, Jona Thai. A mereology for connected structures. Proceedings of the 11th International Conference on Formal Ontology in Information Systems 2020 (FOIS’20). Brodaric, B and Neuhaus, F. (Eds.). IOS Press, FAIA vol. 330, 171-185.