# On generating isiZulu sentences with part-whole relations

It all sounded so easy… We have a pretty good and stable idea about part-whole relations and their properties (see, e.g., [1]), we know how to ‘verbalise’/generate a natural language sentence from basic description logic axioms with object properties that use simple verbs [2], like $Professor \sqsubseteq \exists teaches.Course$ ‘each professor teaches at least one course’, and SNOMED CT is full of logically ‘simple’ axioms (it’s in OWL 2 EL, after all) and has lots of part-whole relations. So why not combine that? We did, but it took some more time than initially anticipated. The outcomes are described in the paper “On the verbalization patterns of part-whole relations in isiZulu”, which was recently accepted at the 9th International Natural Language Generation Conference (INLG’16) that will be held 6-8 September in Edinburgh, Scotland.

What it ended up to be, is that notions of ‘part’ in isiZulu are at times less precise and other times more precise compared to the taxonomy of part-whole relations. This interfered with devising the sentence generation patterns, it pushed the number of ‘elements’ to deal with in the language up to 13 constituents, and there was no way to avoid proper phonological conditioning. We already could handle quantitative, relative, and subject concords, the copulative, and conjunction, but what had to be added were, in particular, the possessive concord, locative affixes, a preposition (just the nga in this context), epenthetic, and the passive tense (with modified final vowel). As practically every element has to be ‘completed’ based on the context (notably the noun class), one can’t really speak of a template-based approach anymore, but a bunch of patterns and partial grammar engine instead. For instance, plain parthood, structural parthood, involvement, membership all have:

• (‘each whole has some part’) $QCall_{nc_{x,pl}}$ $W_{nc_{x,pl}}$ $SC_{nc_{x,pl}}-CONJ-P_{nc_y}$ $RC_{nc_y}-QC_{nc_y}-$dwa
• (‘each part is part of some whole’) $QCall_{nc_{x,pl}}$ $P_{nc_{x,pl}}$ $SC_{nc_{x,pl}}-COP-$ingxenye $PC_{\mbox{\em ingxenye}}-W_{nc_y}$ $RC_{nc_y}-QC _{nc_y}-$dwa

There are a couple of noteworthy things here. First, the whole-part relation does not have one single string, like a ‘has part’ in English, but it is composed of the subject concord (SC) for the noun class (nc) of the noun that play the role of the whole ( W ) together with the phonologically conditioned conjunction na- ‘and’ (the “SC-CONJ”, above) and glued onto the noun of the entity that play the role of the part (P). Thus, the surface realisation of what is conceptually ‘has part’ is dependent on both the noun class of the whole (as the SC is) and on the first letter of the name of the part (e.g., na-+i-=ne-). The ‘is part of’ reading direction is made up of ingxenye ‘part’, which is a noun that is preceded with the copula (COP) y– and together then amounts to ‘is part’. The ‘of’ of the ‘is part of’ is handled by the possessive concord (PC) of ingxenye, and with ingxenye being in noun class 9, the PC is ya-. This ya- is then made into one word together with the noun for the object that plays the role of the whole, taking into account vowel coalescence (e.g., ya-+u-=yo-). Let’s illustrate this with heart (inhliziyo, nc9) standing in a part-whole relation to human (umuntu, NC1), with the ‘has part’ and ‘is part of’ underlined:

• bonke abantu banenhliziyo eyodwa ‘All humans have as part at least one heart’
• The algorithm, in short, to get this sentence from, say $Human \sqsubseteq \exists hasPart.Heart$: 1) it looks up the noun class of umuntu (nc1); 2) it pluralises umuntu into abantu (nc2); 3) it looks up the quantitative concord for universal quantification (QCall) for nc2 (bonke); 4) it looks up the SC for nc2 (ba); 5) then it uses the phonological conditioning rules to add na- to the part inhliziyo, resulting in nenhliziyo and strings it together with the subject concord to banenhliziyo; 6) and finally it looks up the noun class of inhliziyo, which is nc9, and from that it looks up the relative concord (RC) for nc9 (e-) and the quantitative concord for existential quantification (QC) for nc9 (being yo-), and strings it together with –dwa to eyodwa.
• zonke izinhliziyo ziyingxenye yomuntu oyedwa ‘All hearts are part of at least one human’
• The algorithm, in short, to get this sentence from $Heart \sqsubseteq \exists isPartOf.Human$: 1) it looks up the noun class of inhliziyo (nc9); 2) it pluralises inhliziyo to izinhliziyo (nc10); 3) it looks up the QCall for nc10 (zonke); 4) it looks up the SC for nc10 (zi-), takes y- (the COP) and adds them to ingxenye to form ziyingxenye; 5) then it uses the phonological conditioning rules to add ya- to the whole umuntu, resulting in yomuntu; 6) and finally it looks up the noun class of umuntu, which is nc1, and from that the RC for nc10 (o-) and the QC for nc10 (being ye-), and strings it together with –dwa to oyedwa.

For subquantities, we end up with three variants: one for stuff-parts (as in ‘urine has part water’, still with ingxenye for ‘part’), one for portions of solid objects (as in ‘tissue sample is a subquantity of tissue’ or a slice of the cake) that uses umunxa instead of ingxenye, and one ‘spatial’ notion of portion, like that an operating theatre is a portion of a hospital, or the area of the kitchen where the kitchen utensils are is a portion of the kitchen, which uses isiqephu instead of ingxenye. Umunxa is in nc3, so the PC is wa- so that with, e.g., isbhedlela ‘hospital’ it becomes wesibhedlela ‘of the hospital’, and the COP is ng- instead of y-, because umunxa starts with an u. And yet again part-whole relations use locatives (like the containment type of part-whole relation). The paper has all those sentence generation patterns, examples for each, and explanations for them.

The meronymic part-whole relations participation and constitution have added aspects for the verb, such as generating the passive for ‘constituted of’: –akha is ‘to build’ for objects that are made/constituted of some matter in some structural sense, else –enza is used. They are both ‘irregular’ in the sense that it is uncommon that a verb stem starts with a vowel, so this means additional vowel processing (called hiatus resolution in this case) to put the SC together with the verb stem. Then, for instance za+akhiwe=zakhiwe but u+akhiwe=yakhiwe (see rules in paper).

Finally, this was not just a theoretical exercise, but it also has been implemented. I’ll readily admit that the Python code isn’t beautiful and can do with some refactoring, but it does the job. We gave it 42 test cases, of which 38 were answered correctly; the remaining errors were due to an ‘incomplete’ (and unresolvable case for any?) pluraliser and that we don’t know how to systematically encode when to pick akha and when enza, for that requires some more semantics of the nouns. Here is a screenshot with some examples:

The ‘wp’ ones are that a whole has some part, and the ‘pw’ ones that the part is part of the whole and, in terms of the type of axiom that each function verbalises, they are of the so-called ‘all some’ pattern.

The source code, additional files, and the (slightly annotated) test sentences are available from the GENI project’s website. If you want to test it with other nouns, please check whether the noun is already in nncPairs.txt; if not, you can add it, and then invoke the function again. (This remains this ‘clumsily’ until we make a softcopy of all isiZulu nouns with their noun classes. Without the noun class explicitly given, the automatic detection of the noun class is not, and cannot be, more than about 50%, but with noun class information, we can get it up to 90-100% correct in the pluralisation step of the sentence generation [4].)

References

[1] Keet, C.M., Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, 2008, 3(1-2):91-110.

[2] Keet, C.M., Khumalo, L. Basics for a grammar engine to verbalize logical theories in isiZulu. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS vol. 8620, 216-225. August 18-20, 2014, Prague, Czech Republic.

[3] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. 9th International Natural Language Generation conference (INLG’16), September 5-8, 2016, Edinburgh, UK. (in print)

[4] Byamugisha, J., Keet, C.M., Khumalo, L. Pluralising Nouns in isiZulu and Related Languages. 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’16), Springer LNCS. April 3-9, 2016, Konya, Turkey. (in print)

# A search engine, browser, and language bias mini-experiment

I’m in the midst of preparing for the “Social Issues and Professional Practice” block for a course and was pondering whether I should touch upon known search engine issues, like the filter bubble and search engine manipulation to nudge democratic elections, which could be interesting given that South Africa just had the local elections last week, with interesting results.

I don’t have the option to show the differences between ‘Google search when logged in’ versus ‘Google search when logged out’, nor for the Bing-Hotmail combination, so I played with other combinations: Google in isiZulu on Firefox (GiF), Google in English on Safari (GES), and Bing in English on Firefox (BEF). I did seven searches at the same time (Friday 12 August 2016, 17:18-17:32) on the same machine (a MacBookPro), using the eduroam on campus. Although this certainly will not pass a test of scientific rigour, it unequivocally shows that it deserves a solid experiment. The only thing I aimed to do was to see whether those things happen in South Africa too, not just in the faraway USA or India. They do.

Before giving the results, some basic preliminaries may be of use if you are not familiar with the topic. On HTTP, that the browser uses: in trying to GET information, your browser sends the server what operating system you are using (Mac, Linux, Windows, etc.), your browser information (e.g., Firefox, Safari, Chrome, etc.), and language settings (e.g., UK English, isiZulu, Italian). Safari is linked to the Mac, and thus Apple, and it is assumed that Apple users have more disposable income (are richer). Free and open source software users (e.g., Linux + Firefox) are assumed to be not rich or a leftie or liberal, or all of them. I don’t know if they categorise Apple + Firefox as an armchair socialist or a posh right-wing liberal😉.

Here goes the data, being the screenshots and the reading and interpretation of the links of the search results, with a bit of context in case you’re not in South Africa. The screens in the screenshots are in the order (from let to right) as GiF, GES, BEF.

EFF search

• Search term: EFF: GiF and BEF show EFF as political party (leftpopulist opposition party in South Africa) information and a link to EFF as the electronic frontier foundation, whereas the GES just shows EFF as political party in the context of news about the DA political party (capitalist, for the rich, mainly White voters). The GES difference may be explained by the Mac+Safari combination, and it makes one wonder whether and how this has had an effect on perceptions and voting behaviour. Bing had 10mln results, Google 46mln.

Jacob Zuma search

• Search term: Jacob Zuma (current president of South Africa): GiF and BEF show general results, GES with articles also about JZ to stay (by a DA supporter) and on that he won’t resign. Bing has 1.1mln results, Google 9.6mln.

• Search term: Nkandla (Zuma’s controversial lavish homestead

Nkandla Search

that was upgraded with taxpayers money): GiF has pictures and a fact about Nkandla, GES has a picture, fact, and a bit negative news, BEF: more on news and issues (that is: that JZ has to pay back the money). Bing has 700K results, Google 1.8mln.

• Search term: FeesMustFall (hashtag of 2015 on no university fee increases and free higher education): Google results has

FeesMustFall search

‘plain’-looking information, whereas Bing shows results with more information from the FMF perspective, it seems. Bing has 165K results, Google 451K.

• Search term: Fleming Rose (person with controversial ideas,

Fleming Rose search

recently disinvited by UCT to not give the academic freedom lecture): Google shows a little general information and several UCT opinion issues, BEF has information about Fleming Rose. Bing has 1.25mln results, Google about 500K—the only time that Bing’s number of results far outnumbers Google’s.

• Search term: Socialism: GiF has links to definitions, GES and BEF

socialism search

show a definition in their respective info boxes, which takes up most of the screen. Bing has 7.3mln results, GiF with 23.4mln, GES: 31mln—this is the first time there is a stark difference between the number of results in Google hits, with more for English and Safari.

• Law on cookies in south africa Search

Search term: Law on cookies in south africa: the results are similar throughout the three search results. Bing has 108mln results, GiF 3mln, and GES 2.2mln—a 1/3 difference in Google’s number of results in the other direction.

In interpreting the results, it has to be noted that Google, though typing in google.com, forced it to google.co.za, where as Bing stayed on bing.com. This might explain some ‘tailoring’ of GiF and GES to news that is topical in South Africa, which does not happen to the same extent on Bing. I suppose that for some search terms, one would like that, and for others, one would not; i.e., to have the option to choose to search for facts vs opinion pieces vs news, nationally or internationally, or whether you’d want to get an answer or get links to multiple answers. Neither Bing nor Google gives you a free choice on the matter: based on the data you provide involuntarily, they make assumptions as to whom you are and what they think that that kind of person would probably like to see in the search results. That three out of the seven searches on GES lean clearly to the political right is a cause of concern, as is the fewer amounts of facts in Google search results vs Bing’s. I also find it a bit odd that the selection of results is from such wide-ranging numbers of results.

Based on this small sampling, I obviously cannot draw hard conclusions, but it would be nice if we can get some money to get a student to investigate this systematically with more browsers and more languages. We now know that it happens, but how does it happen in South Africa, and might there be some effect because of it? Those questions remain unanswered. In the meantime, I’ll have to do with some anecdotes for the students in an upcoming lecture.

# Experimentally-motivated non-trivial intermodel links between conceptual models

I am well aware that some people prefer Agile and mash-ups and such to quickly, scuffily, put an app together, but putting a robust, efficient, lasting, application together does require a bit of planning—analysis and design in the software development process. For instance, it helps to formalise one’s business rules or requirements, or at least structure them with, say, SBVR or ORM, so as to check that the rules obtained from the various stakeholders do not contradict themselves cf. running into problems when testing down the line after having implemented it during a testing phase. Or analyse a bit upfront which classes are needed in the front-end application layer cf. perpetual re-coding to fix one’s mistakes (under the banner ‘refactoring’, as if naming the process gives it an air of respectability), and create, say, a UML diagram or two. Or generating a well-designed database based on an EER model.

Each of these three components can be done in isolation, but how to do this for complex system development where the object-oriented application layer hast to interact with the database back-end, all the while ensuring that the business rules are still adhered to? Or you had those components already, but they need to be integrated? One could link the code to tables in the implementation layer, on an ad hoc basis, and figure it out again and again for any combination of languages and systems. Another one is to do that at the conceptual modelling layer irrespective of the implementation language. The latter approach is reusable (cf. reinventing the mapping wheel time and again), and at a level of abstraction that is easier to cope with for more people, and even more so if the system is really large. So, we went after that option for the past few years and have just added another step to realising all this: how to link which elements in the different models for the system.

It is not difficult to imagine a tool where one can have several windows open, each with a model in some conceptual modelling language—many CASE tools already support modelling in different notations anyway. It is conceptually also fairly straightforward when in, say, the UML model there is a class ‘Employee’ and in the ER diagram there’s an ‘Employee’ entity type: it probably will work out to align these classes. Implementing just this is a bit of an arduous engineering task, but doable. In fact, there is such a tool for models represented in the same language, where the links can be subsumption, equivalence, or disjointness between classes or between relationships: ICOM [2]. But we need something like that to work across modelling languages as well, and for attributes, too. In the hand-waiving abstract sense, this may be intuitively trivial, but the gory details of the conceptual and syntax aspects are far from it. For instance, what should a modeller do if one model has ‘Address’ as an attribute and the other model has it represented as a class? Link the two despite being different types of constructs in the respective languages? Or that ever-recurring example of modelling marriage: a class ‘Marriage’ with (at least) two participants, or ‘Marriage’ as a recursive relationship (UML association) of a ‘Person’ class? What to do if a modeller in one model had chosen the former option and another modeller the latter? Can they be linked up somehow nonetheless, or would one have to waste a lot of time redesigning the other model?

Instead of analysing this for each case, we sought to find a generic solution to it; with we being: Zubeida Khan, Pablo Fillottrani, Karina Cenci, and I. The solution we propose will appear soon in the proceedings of the 20th Conference on Advances in DataBases and Information Systems (ADBIS’16) that will be held at the end of this month in Prague.

So, what did we do? First, we tried to narrow down the possible links between elements in the models: in theory, one might want to try to link anything to anything, but we already knew some model elements are incompatible, and we were hoping that others wouldn’t be needed yet other suspected to be needed, so that a particular useful subset could be the focus. To determine that, we analysed a set of ICOM projects created by students at the Universidad Nacionál del Sur (in Bahía Blanca), and we created model integration scenarios based on publicly available conceptual models of several subject domains, such as hospitals, airlines, and so on, including EER diagrams, UML class diagrams, and ORM models. An example of an integration scenario is shown in the figure below: two conceptual models about airline companies, with on the left the ER diagram and on the right the UML diagram.

One of the integration scenarios [1]

The solid purple links are straightforward 1:1 mappings; e.g., er:Airlines = uml:Airline. Long-dashed dashed lines represent ‘half links’ that are semantically very similar, such as er:Flight.Arr_time ≈ uml:Flight.arrival_time, where the idea of attribute is the same, but ER attributes don’t have a datatype specified whereas UML attributes do. The red short-dashed dashed lines require some transformation: e.g., er:Airplane.Type is an attribute yet uml:Aircraft is a class, and er:Airport.code is an identifier (with its mandatory 1:1 constraint, still no datatype) but uml:Airport.ID is just a simple attribute. Overall, we had 40 models with 33 schema matchings, with 25 links in the ICOM projects and 258 links in the integration scenarios. The detailed aggregates are described in the paper and the dataset is available for reuse (7MB). Unsurprisingly, there were more attribute links than class links (if a class can be linked, then typically also some of its attributes). There were 64 ‘half’ links and 48 transformation links, notably on the slightly compatible attributes, attributes vs. identifiers, attribute<->value type, and attribute<->class.

Armed with these insights from the experiment, a general intermodel link validation approach [3] that uses the unified metamodel [4], and which type of elements occur mostly in conceptual models with their logic-based profiles [5,6], we set out to define those half links and transformation links. While this could have been done with a logic of choice, we opted for a clear step toward implementability by exploiting the ATLAS Transformation Language (ATL) [7] to specify the transformations. As there’s always a source model and a target model in ATL, we constructed the mappings such that both models in question are the ‘source’ and both are mapped into a new, single, ‘target’ model that still adheres to the constraints imposed by the unifying metamodel. A graphical depiction of the idea is shown in the figure below; see paper for details of the mapping rules (they don’t look nice in a blog post).

Informal, graphical rendering of the rule Attribute<->Object Type output [1]

Someone into this matter might think, based on this high-level description, there’s nothing new here. However, there is, and the paper has an extensive related works section. For instance, there’s related work on Distributed Description Logics with bridge rules [8], but they don’t do attributes and the logics used for that doesn’t fit well with the features needed for conceptual modelling, so it cannot be adopted without substantial modifications. Triple Graph Grammars look quite interesting [9] for this sort of task, as does DOL [10], but either would require another year or two to figure it out (feel free to go ahead already). On the practical side, e.g., the Eclipse metamodel of the popular Eclipse Modeling Framework didn’t have enough in the metamodel for what needs to be included, both regarding types of entities and the constraints that would need to be enforced. And so on, such that by a process of elimination, we ended up with ATL.

It would be good to come up with those logic-based linking options and proofs of correctness of the transformation rules presented in the paper, but in the meantime, an architecture design of the new tool was laid out in [11], which is in the stage of implementation as I write this. For now, at least a step has been taken from the three years of mostly theory and some experimentation toward implementation of all that. To be continued J.

References

[1] Khan, Z.C., Keet, C.M., Fillottrani, P.R., Cenci, K.M. Experimentally motivated transformations for intermodel links between conceptual models. 20th Conference on Advances in Databases and Information Systems (ADBIS’16). Springer LNCS. August 28-31, Prague, Czech Republic. (in print)

[2] Fillottrani, P.R., Franconi, E., Tessaris, S. The ICOM 3.0 intelligent conceptual modelling tool and methodology. Semantic Web Journal, 2012, 3(3): 293-306.

[3] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer Lecture Notes in Computer Science LNCS vol. 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

[4] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 2015, 98:30-53.

[5] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Johannesson, P., Lee, M.L. Liddle, S.W., Opdahl, A.L., Pastor López, O. (Eds.). Springer LNCS vol 9381, 585-593. 19-22 Oct, Stockholm, Sweden.

[6] Fillottrani, P.R., Keet, C.M. Evidence-based Languages for Conceptual Data Modelling Profiles. 19th Conference on Advances in Databases and Information Systems (ADBIS’15). Morzy et al. (Eds.). Springer LNCS vol. 9282, 215-229. Poitiers, France, Sept 8-11, 2015.

[7] Jouault, F. Allilaire, F. Bzivin, J. Kurtev, I. ATL: a model transformation tool. Science of Computer Programming, 2008, 72(12):31-39.

[8] Ghidini, C., Serafini, L., Tessaris, S., Complexity of reasoning with expressive ontology mappings. Formal ontology in Information Systems (FOIS’08). IOS Press, FAIA vol. 183, 151-163.

[9] Golas, U., Ehrig, H., Hermann, F. Formal specification of model transformations by triple graph grammars with application conditions. Electronic Communications of the ESSAT, 2011, 39: 26.

[10] Mossakowsi, T., Codescu, M., Lange, C. The distributed ontology, modeling and specification language. Proceedings of the Workshop on Modular Ontologies 2013 (WoMo’13). CEUR-WS vol 1081. Corunna, Spain, September 15, 2013.

[11] Fillottrani, P.R., Keet, C.M. A Design for Coordinated and Logics-mediated Conceptual Modelling. 29th International Workshop on Description Logics (DL’16). Peñaloza, R. and Lenzerini, M. (Eds.). CEUR-WS Vol. 1577. April 22-25, Cape Town, South Africa. (abstract)

# Bootstrapping a Runyankore CNL from an isiZulu one mostly works well

Earlier this week the 5th Workshop on Controlled Natural Language (CNL’16) was held in Aberdeen, Scotland, where I presented progress made on a Runyankore CNL [1], rather than my student, Joan Byamugisha, who did most of the work on it (she could not attend due to nasty immigration rules by the UK, not a funding issue).

“Runyankore?”, you might ask. It is one of the languages spoken in Uganda. As Runyankore is very under-resourced, any bootstrapping to take a ‘shortcut’ to develop language resources would be welcome. We have a CNL for isiZulu [2], but that is spoken in South Africa, which is a few thousand kilometres further south of Uganda, and it is in a different Guthrie zone of the—in linguistics still called—Bantu languages, so it was a bit of a gamble to see whether those results could be repurposed for Runynakore. They could, needing only minor changes.

What stayed the same were the variables, or: components to make up a grammatically correct sentence when generating a sentence within the context of OWL axioms (ALC, to be more precise). They are: the noun class of the name of the concept (each noun is assigned a noun class—there are 20 in Runyankore), the category of the concept (e.g., noun, adjective), whether the concept is atomic (named OWL class) or an OWL class expression, the quantifier used in the axiom, and the position of the concept in the axiom. The only two real differences were that for universal quantification the word for the quantifier is the same when in the singular (cf. isiZulu, where it changes for both singular or plural), and for disjointness there is only one word, ti ‘is not’ (cf. isiZulu’s negative subject concord + pronomial). Two very minor differences are that for existential quantification ‘at least one’, the ‘at least’ is in a different place in the sentence but the ‘one’ behaves exactly the same, and ‘all’ for universal quantification comes after the head noun rather than before (but is also still dependent on the noun class).

It goes without saying that the vocabulary is different, but that is a minor aspect compared to figuring out the surface realisation for an axiom. Where the bootstrapping thus came in handy was that that arduous step of investigating from scratch the natural language grammar involved in verbalising OWL axioms could be skipped and instead the ones for isiZulu could be reused. Yay. This makes it look very promising to port to other languages in the Bantu language family. (yes, I know, “one swallow does not a summer make” [some Dutch proverb], but one surely is justified to turn up one’s hope a notch regarding generalizability and transferability of results.)

Joan also conducted a user survey to ascertain which surface realisation was preferred among Runyankore speakers, implemented the algorithms, and devised a new one for the ‘hasX’ naming scheme of OWL object properties (like hasSymptom and hasChild). All these details, as well as the details of the Runyankore CNL and the bootstrapping, are described in the paper [1].

I cannot resist a final comment on all this. There are people who like to pull it down and trivialise natural language interfaces for African languages, on the grounds of “who cares about text in those kind of countries; we have to accommodate the illiteracy with pictures and icons and speech and such”. People are not as illiterate as is claimed here and there (including by still mentally colonised people from African countries)—if they were, then the likes of Google and Facebook and Microsoft would not invest in localising their interfaces in African languages. The term “illiterate” is used by those people to include also those who do not read/write in English (typically an/the official language of government), even though they can read and write in their local language. People who can read and write—whichever natural language it may be—are not illiterate, neither here in Africa nor anywhere else. English is not the yardstick of (il)literacy, and anyone who thinks it is should think again and reflect a bit on cultural imperialism for starters.

References

[1] Byamugisha, J., Keet, C.M., DeRenzi, B. Bootstrapping a Runyankore CNL from an isiZulu CNL. 5th Workshop on Controlled Natural Language (CNL’16), Springer LNAI vol. 9767, 25-36. 25-27 July 2016, Aberdeen, UK. Springer’s version

[2] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2016. DOI: 10.1007/s10579-016-9340-0 (in print) accepted version

# Automatically finding the feasible object property

Late last month I wrote about the updated taxonomy of part-whole relations and claimed it wasn’t such a big deal during the modeling process to have that many relations to choose from. Here I’ll back up that claim. Primarily, it is thanks to the ‘Foundational Ontology and Reasoner enhanced axiomatiZAtion’ (FORZA) approach which includes the Guided ENtity reuse and class Expression geneRATOR (GENERATOR) method that was implemented in the OntoPartS-2 tool [1]. The general idea of the GENERATOR method is depicted in the figure below, which outlines two scenarios: one in which the experts perform the authoring of their domain ontology with the help of a foundational ontology, and the other one without a foundational ontology.

I think the pictures are clearer than the following text, but some prefer text, so here goes the explanation attempt. Let’s start with scenario A on the left-hand side of the figure: a modeller has a domain ontology and a foundational ontology and she wants to relate class two domain classes (indicated with C and D) and thus needs to select some object property. The first step is, indeed, selecting C and D (e.g., Human and Heart in an anatomy ontology); this is step (1) in the Figure.

Then (step 2) there are those long red arrows, which indicate that somehow there has to be a way to deal with the alignment of Human and of Heart to the relevant categories in the foundational ontology. This ‘somehow’ can be either of the following three options: (i) the domain ontology was already aligned to the foundational ontology, so that step (2) is executed automatically in the background and the modeler need not to worry, (ii) she manually carries out the alignment (assuming she knows the foundational ontology well enough), or, more likely, (iii) she chooses to be guided by a decision diagram that is specific to the selected foundational ontology. In case of option (ii) or (iii), she can choose to save it permanently or just use it for the duration of the application of the method. Step (3) is an automated process that moves up in the taxonomy to find the possible object properties. Here is where an automated reasoner comes into the equation, which can step-wise retrieve the parent class, en passant relying on taxonomic classification that offers the most up-to-date class hierarchy (i.e., including implicit subsumptions) and therewith avoiding spurious candidates. From a modeller’s viewpoint, one thus only has to select which classes to relate, and, optionally, align the ontology, so that the software will do the rest, as each time it finds a domain and range axiom of a relationship in which the parents of C and D participate, it is marked as a candidate property to be used in the class expression. Finally, the candidate object properties are returned to the user (step 4).

While the figure shows only one foundational ontology, one equally well can use a separate relation ontology, like PW or PWMT, which is just an implementation variant of scenario A: the relation ontology is also traversed upwards and on each iteration, the base ontology class is matched against relational ontology to find relations where the (parent of the) class is defined in a domain and range axiom, also until the top is reached before returning candidate relations.

The second scenario with a domain ontology only is a simplified version of option A, where the alignment step is omitted. In Figure-B above, GENERATOR would return object properties W and R as options to choose from, which, when used, would not generate an inconsistency (in this part of the ontology, at least). Without this guidance, a modeler could, erroneously, select, say, object property S, which, if the branches are disjoint, would result in an inconsistency, and if not declared disjoint, move class C from the left-hand branch to the one in the middle, which may be an undesirable deduction.

For the Heart and Human example, these entities are, in DOLCE terminology, physical objects, so that it will return structural parthood or plain parthood, if the PW ontology is used as well. If, on the other hand, say, Vase and Clay would have been the classes selected from the domain ontology, then a constitution relation would be proposed (be this with DOLCE, PW, or, say, GFO), for Vase is a physical object and Clay an amount of matter. Or with Limpopo and South Africa, a tangential proper parthood would be proposed, because they are both geographic entities.

The approach without the reasoner and without the foundational ontology decision diagram was tested with users, and showed that such a tool (OntoPartS) made the ontology authoring more efficient and accurate [2], and that aligning to DOLCE was the main hurdle for not seeing even more impressive differences. This is addressed with OntoPartS-2, so it ought to work better. What still remains to be done, admittedly, is that larger usability study with the updated version OntoPartS-2. In the meantime: if you use it, please let us know your opinion.

References

[1] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM proceedings, pp569-578. Oct. 27 – Nov. 1, 2013, San Francisco, USA.

[2] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

# A few refreshing feminist articles—to point out and fix bugs in the game

Most articles on gender issues and feminism regurgitate the same old story and arguments, or are reports on more data and experiments with similar results popping up. Some articles or blog posts do bring something relatively new to the table, or apply a feminist analysis to something else, or explain things in a novel way that resonates better in this day and age. Upfront, to those who think gender issues and feminism is mostly rubbish, please read the parable by John Scalzi about the computer game, which is set at the lowest difficulty setting in the Game of the Real World for the Straight White Male; then read ‘those feminazi articles’ as one of pointing out bugs in the code, and of suggesting bug fixes or of a slight rewriting of the game logic to level the playing field. So, here are a few links to some such articles that otherwise may be snowed-under by the online articles on women in STEM, IT, management etc.

The feminist appraisal of Dirty Dancing over at Jezebel’s blog, or, as another one puts it “It’s the feminist sleeper agent of chick flicks” (and some class issues); after reading this, you won’t see the movie the same anymore. (Yes, I did watch the movie again, and the points made in the articles are valid, which, honestly, had escaped me when I watched it in the 80s.)

The many shortcomings of (old) white men futurology, who have a rather limited set of imaginations (fantasies?) in prognosticating. Maybe people in that (non-STEM) discipline already know about the issues and limitations, but I’m in another field of research, so it was new to me. Obviously, if futurology is a science, then it should not make a difference whether men or women do it, but that’s another discussion.

The Super-exploitation of women by Marlene Dixon on capitalism and patriarchy in cahoots to keep women as their unpaid servants and labour-producers wives. I did search for more recent analyses, but they don’t compare in content and clarity to this one.

I did not manage to find again the recent fine rant on feminist issues in Africa that are, at least in part, different from ‘the [white middle-class] feminism in the West’, but these will do on scope as well: feminism here on the continent driven by African women who really do have lots of agency (e.g., all the way up to presidents/prime ministers and Nobel Peace Prize winners) and where certain types of ‘help’ from the outside is counterproductive for it enforces dependence. An example of a currently hot topic here (and, afaik, never was in Europe) is the need for free sanitary pads for girls whose family cannot afford them, so that they can keep going to school to learn rather than miss out on it for a few days each month.

Finally, a slightly crudely formulated article that discusses a whining “pick-up artist” who is “cockblocked by redistribution” in Denmark, a socialist-like and feminist-friendly country. Squeezed between the chatter are notes on flaws on evolutionary psychology and the criticism on feminism as an individual pursuit (e.g., ‘lean in’) versus as a collective goal. Even the pick-up artist eventually notes “we can’t fulfill basic human rights for all without viewing everyone as equal”.

# New OWL files for the (extended) taxonomy of part-whole relations

Once upon a time (surely >6 years ago) I made an OWL file of the taxonomy of part-whole relations [1], which contains several parthood relations and a few meronyic-only ones that in natural language are considered ‘part’ but are not so according to mereology (like participation, membership). Some of these relations were defined with a specific domain and range that was a DOLCE category (it could just as well have been, say, GFO). Looking at it recently, I noticed it was actually a bit scruffy (but I’ll leave it here nonetheless), and more has happened in this area over the years. So, it was time for an update on contents and on design.

For the record on how it’s done and to serve, perhaps, as a comparison exercise on modeling, here’s what I did. First of all, I started over, so as to properly type the relations to DOLCE categories, with the DOLCE IRIs rather than duplicated as DOLCE-category-with-my-IRI. As DOLCE is way too big and slows down reasoning, I made a module of DOLCE, called DOLCEmini, mainly by removing the irrelevant object properties, though re-adding the SOB, APO and NAPO that’s in D18 but not in DOLCE-lite from DLP3791. This reduced the file from DOLCE-lite’s 534 axioms, 37 classes, 70 OPs, in SHI to DOLCEmini’s 388 axioms, 40 classes, 43 OPs, also in SHI, and I changed the ontology IRI to where DOLCEmini will be put online.

Then I created a new ontology, PW.owl, imported DOLCEmini, added the taxonomy of part-whole relations from [1] right under owl:topObjectProperty, with domain and range axioms using the DOLCE categories as in the definitions, under part-whole. This was then extended with the respective inverses under whole-part, all the relevant proper part versions of them (with inverses), transitivity added for all (as the reasoner isn’t doing it [2]) annotations added, and then aligned to some DOLCE properties with equivalences. This makes it to 524 axioms and 79 object properties.

I deprecated subquantityOf (annotated with ‘deprecated’ and subsumed by a new property ‘deprecated’). Several new stuff relations and their inverses were added (such as portions), and annotated them. This made it to the PW ontology of 574 axioms (356 logical axioms) and 92 object properties (effectively, for part-whole relations: 92 – 40 from dolce – 3 for deprecated = 49).

As we made an extension with mereotopology [3] (and also that file wasn’t great, though did the job nevertheless [4]), but one that not everybody may want to put up with, yet a new file was created, PWMT. PWMT imports PW (and thus also DOLCEmini) and was extended with the main mereotopological relations from [3], and relevant annotations were added. I skipped property disjointness axioms, because they don’t go well with transitivity, which I assumed to be more important. This makes PWMT into one of 605 (380 logical) axioms and 103 object properties, with, effectively, for parts: 103 – 40 from dolce – 3 for deprecated – 1 connection = 59 object properties.

That’s a lot of part-whole relations, but fear not. The ‘Foundational Ontology and Reasoner enhanced axiomatiZAtion’ (FORZA) and its tool that incorporates with the Guided ENtity reuse and class Expression geneRATOR (GENERATOR) method [4] describes a usable approach how that can work out well and has a tool for the earlier version of the owl file. FORZA uses an optional decision diagram for the DOLCE categories as well as the automated reasoner so that it can select and propose to you those relations that, if used in an axiom, is guaranteed not to lead to an inconsistency that would be due to the object property hierarchy or its domain and range axioms. (I’ll write more about it in the next post.)

Ah well, even if the OWL files are not used, it was still a useful exercise in design, and at least I’ll have a sample case for next year’s ontology engineering course on ‘before’ and ‘after’ about questionable implementation and (relatively) good implementation without the need to resorting to criticizing other owl files… (hey, even the good and widely used ontologies have a bunch of pitfalls, whose amount is not statistically significantly different from ontologies made by novices [5]).

References

[1] Keet, C.M., Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, 2008, 3(1-2):91-110.

[2] Keet, C.M. Detecting and Revising Flaws in OWL Object Property Expressions. 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12), Oct 8-12, Galway, Ireland. Springer, LNAI 7603, 252-266.

[3] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

[4] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM proceedings, pp569-578. Oct. 27 – Nov. 1, 2013, San Francisco, USA.

[5] Keet, C.M., Suarez-Figueroa, M.C., Poveda-Villalon, M. Pitfalls in Ontologies and TIPS to Prevent Them. In: Knowledge Discovery, Knowledge Engineering and Knowledge Management: IC3K 2013 Selected papers. Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (Eds.). Springer, CCIS 454, pp. 115-131. 2015.