BFO decision diagram and alignment tool

How to align your domain ontology to a foundational ontology? It’s a well-known question, and one that I’ve looked into before as well. In some of that earlier work, we used DOLCE to align one’s ontology to. We devised the DOLCE decision diagram as part of the FORZA method to assist with the alignment process and implemented that in the MoKI ontology development tool [1]. MoKI is no more, but the theory and the algorithm’s design approach still stand. Instead of re-implementing it as a Protégé plugin and have it go defunct in a few years again (due to incompatible version upgrades, say), it sounded like more fun to design one for BFO and make a stand-alone tool out of it. And that design and the evaluation thereof is precisely what two of my ontology engineering course students—Chiadika Emeruem and Steve Wang—did for their mini-project of the course. That was then finalised and implemented in a tool for general use as part of the DOT4D project extension for my (award-winning) OE textbook afterward.

More precisely, as first part, there’s a diagram specifically for BFO – well, for one of its 2.0-ish versions in existence at least. Deciding on which version to use and what would be good questions was not as trivial as it may sound. While the questions seem to work (as evaluated with several ontologies), it might still be of use to set up an experiment to assess usability from a modeller’s viewpoint.

BFO ‘decision diagram’ to assist trying to align one’s class of a domain or core ontology to BFO (click to enlarge, or navigate to the user guide at https://bfo-classifier.github.io/)

Be this as it may, this decision diagram was incorporated into the tool that wraps around it with a nice interface with user guidance and feedback, and it has the option to load an ontology and save the alignment into the ontology (along with BFO). The decision tree itself is stored as a separate XML file so that it easily can be replaced with any update thereto, be it to reflect changes in question formulation or to adjust it to some later version of BFO. The stand-alone tool is a jar file that can be downloaded from the GitHub repo, and the repo also has the source code that may be used/adapted (i.e., has an open source licence). There’s also a user guide with explanations and screenshots. Here’s another screenshot of the tool in action:

Example of the BFO classifier in use, trying to align CODO’s ‘Disease’ to BFO, the trail of questions answered to get to ‘Disposition’, and the subsumption axiom that can be added to the ontology.

If you have any questions, please feel free to contact either of us.

References

[1] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM proceedings, pp569-578. Oct. 27 – Nov. 1, 2013, San Francisco, USA.

Advertisement

Automatically simplifying an ontology with NOMSA

Ever wanted only to get the gist of the ontology rather than wading manually through thousands of axioms, or to extract only a section of an ontology for reuse? Then the NOMSA tool may provide the solution to your problem.

screenshot of NOMSA in action (deleting classes further than two levels down in the hierarchy in BFO)

There are quite a number of ways to create modules for a range of purposes [1]. We zoomed in on the notion of abstraction: how to remove all sorts of details and create a new ontology module of that. It’s a long-standing topic in computer science that returns every couple of years with another few tries. My first attempts date back to 2005 [2], which references modules & abstractions for conceptual models and logical theories to works published in the mid-1990s and, stretching the scope to granularity, to 1985, even. Those efforts, however, tend to halt at the theory stage or worked for one very specific scenario (e.g., clustering in ER diagrams). In this case, however, my former PhD student and now Senior Research at the CSIR, Zubeida Khan, went further and also devised the algorithms for five types of abstraction, implemented them for OWL ontologies, and evaluated them on various metrics.

The tool itself, NOMSA, was presented very briefly at the EKAW 2018 Posters & Demos session [3] and has supplementary material, such as the definitions and algorithms, a very short screencast and the source code. Five different ways of abstraction to generate ontology modules were implemented: i) removing participation constraints between classes (e.g., the ‘each X R at least one Y’ type of axioms), ii) removing vocabulary (e.g., remove all object properties to yield a bare taxonomy of classes), iii) keeping only a small number of levels in the hierarchy, iv) weightings based on how much some element is used (removing less-connected elements), and v) removing specific language profile features (e.g., qualified cardinality, object property characteristics).

In the meantime, we have added a categorisation of different ways of abstracting conceptual models and ontologies, a larger use case illustrating those five types of abstractions that were chosen for specification and implementation, and an evaluation to see how well the abstraction algorithms work on a set of published ontologies. It was all written up and polished in 2018. Then it took a while in the publication pipeline mixed with pandemic delays, but eventually it has emerged as a book chapter entitled Structuring abstraction to achieve ontology modularisation [4] in the book “Advanced Concepts, methods, and Applications in Semantic Computing” that was edited by Olawande Daramola and Thomas Moser, in January 2021.

Since I bought new video editing software for the ‘physically distanced learning’ that we’re in now at UCT, I decided to play a bit with the software’s features and record a more comprehensive screencast demo video. In the nearly 13 minutes, I illustrate NOMSA with four real ontologies, being the AWO tutorial ontology, BioTop top-domain ontology, BFO top-level ontology, and the Stuff core ontology. Here’s a screengrab from somewhere in the middle of the presentation, where I just automatically removed all 76 object properties from BioTop, with just one click of a button:

screengrab of the demo video

The embedded video (below) might keep it perhaps still readable with really good eyesight; else you can view it here in a separate tab.

The source code is available from Zubeida’s website (and I have a local copy as well). If you have any questions or suggestions, please feel free to contact either of us. Under the fair use clause, we also can share the book chapter that contains the details.

References

[1] Khan, Z.C., Keet, C.M. An empirically-based framework for ontology modularization. Applied Ontology, 2015, 10(3-4):171-195.

[2] Keet, C.M. Using abstractions to facilitate management of large ORM models and ontologies. International Workshop on Object-Role Modeling (ORM’05). Cyprus, 3-4 November 2005. In: OTM Workshops 2005. Halpin, T., Meersman, R. (eds.), LNCS 3762. Berlin: Springer-Verlag, 2005. pp603-612.

[3] Khan, Z.C., Keet, C.M. NOMSA: Automated modularisation for abstraction modules. Proceedings of the EKAW 2018 Posters and Demonstrations Session (EKAW’18). CEUR-WS vol. 2262, pp13-16. 12-16 Nov. 2018, Nancy, France.

[4] Khan, Z.C., Keet, C.M. Structuring abstraction to achieve ontology modularisation. Advanced Concepts, methods, and Applications in Semantic Computing. Daramola O, Moser T (Eds.). IGI Global. 2021, 296p. DOI: 10.4018/978-1-7998-6697-8.ch004

Localising Protégé with Manchester syntax into your language of choice

Some people like a quasi natural language interface in ontology development tools, which is why Manchester Syntax was proposed [1]. A downside is that it locks the ontology developer into English, so that weird chimaeras are generated in the interface if the author prefers another language for the ontology, such as, e.g., the “jirafa come only (oja or ramita)” mentioned in an earlier post and that was deemed unpleasant in an experiment a while ago [2]. Those who prefer the quasi natural language components will have to resort to localising Manchester syntax and the tool’s interface.

This is precisely what two of my former students—Adam Kaliski and Casey O’Donnell—did during their mini-project in the ontology engineering course of 2017. A localisation in Afrikaans, as the case turned out to be. To make this publicly available, Michael Harrison brushed up the code a bit and tested it worked also in the new version of Protégé. It turned out it wasn’t that easy to localise it to another language the way it was done, so one of my PhD students, Toky Raboanary, redesigned the whole thing. This was then tested with Spanish, and found to be working. The remainder of the post describes informally some main aspects of it. If you don’t want to read all that but want to play with it right away: here are the jar files, open source code, and localisation instructions for if you want to create, say, a French or Dutch variant.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

The localisation functions as a plugin for Protégé as a ‘view’ component. It can be selected under “Windows – Views – Class views” and then Beskrywing for the Afrikaans and Descripción for Spanish, and dragged into the desired position; this is likewise for object properties.

Instead of burying the translations in the code, they are specified in a separate XML file, whose content is fetched during the rendering. Adding a new ‘simple’ (more about that later) language merely amounts to adding a new XML file with the translations of the Protégé labels and of the relevant Manchester syntax. Here are the ‘simple’ translations—i.e., where both are fixed strings—for Afrikaans for the relevant tool interface components:

 

Class Description

(Label)

Klasbeskrywing

(Label in Afrikaans)

Equivalent To Dieselfde as
SubClass Of Subklas van
General Class axioms Algemene Klasaksiomas
SubClass Of (Anonymous Ancestor) Subklas van (Naamlose Voorvader)
Disjoint With Disjunkte van
Disjoint Union Of Disjunkte Unie van

 

The second set of translations is for the Manchester syntax, so as to render that also in the target language. The relevant mappings for Afrikaans class description keywords are listed in the table below, which contain the final choices made by the students who developed the original plugin. For instance, min and max could have been rendered as minimum and maksimum, but the ten minste and by die meeste were deemed more readable despite being multi-word strings. Another interesting bit in the translation is negation, where there has to be a second ‘no’ since Afrikaans has double negation in this construction, so that it renders it as nie <expression> nie. That final rendering is not grammatically perfect, but (hopefully) sufficiently clear:

An attempt at double negation with a fixed string

An attempt at double negation with a fixed string

Manchester OWL Keyword Afrikaans Manchester OWL

Keyword or phrase

some sommige
only slegs
min ten minste
max by die meeste
exactly precies
and en
or of
not nie <expression> nie
SubClassOf SubklasVan
EquivalentTo DieselfdeAs
DisjointWith DisjunkteVan

 

The people involved in the translations for the object properties view for Afrikaans are Toky, my colleague Tommie Meyer (also at UCT), and myself; snyding for ‘intersection’ sounds somewhat odd to me, but the real tough one to translate was ‘SuperProperty’. Of the four options that were considered—SuperEienskap, SuperVerwantskap, SuperRelasie, and SuperVerband SuperVerwantskap was chosen with Tommie having had the final vote, which is also a semantic translation, not a literal translation.

Screenshot of the object properties description, with comparison to the English

The Spanish version also has multi-word strings, but at least does not do double negation. On the other hand, it has accents. To generate the Spanish version, myself, my collaborator Pablo Fillottrani from the Universidad Nacional del Sur, Argentina, and Toky had a go at it in translating the terms. This was then implemented with the XML file. In case you do not want to dig into the XML file and not install the plugin either, but have a quick look at the translations, they are as follows for the class description view:

 

Class Description

Label

Descripción

(in Spanish)

Equivalent To Equivalente a
SubClass Of Subclase de
General Class axioms Axiomas generales de clase
SubClass Of (Anonymous Ancestor) Subclase de (Ancestro Anónimo)
Disjoint With Disjunto con
Disjoint Union Of Unión Disjunta de
Instances Instancias

 

Manchester OWL Keyword Spanish Manchester OWL Keyword
some al menos uno
only sólo
min al mínimo
max al máximo
and y
or o
not no
exactly exactamente
SubClassOf SubclaseDe
EquivalentTo EquivalenteA
DisjointWith DisjuntoCon

And here’s a rendering of a real ontology, for geo linked data in Spanish, rather than African wildlife yet again:

screenshot of the plugin behaviour with someone else’s ontology in Spanish

One final comment remains, which has to do with the ‘simple’ mentioned above. The approach of localisation presented here works only with fixed strings, i.e., the strings do not have to change depending on the context where it is uses. It won’t work with, say, isiZulu—a highly agglutinating and inflectional language—because isiZulu doesn’t have fixed strings for the Manchester syntax keywords nor for some other labels. For instance, ‘at least one’ has seven variants for nouns in the singular, depending on the noun class of the noun of the OWL class it quantifies over; e.g., elilodwa for ‘at least one’ apple, and esisodwa for ‘at least one’ twig. Also, the conjugation of the verb for the object property depends on the noun class of the noun of the OWL class, but in this case for the one that plays the subject; e.g., it’s “eats” in English for both humans and elephants eating, say, fruit, so one string for the name of the object property, but that’s udla and idla, respectively, in isiZulu. This requires annotations of the classes with ontolex-lemon or a similar approach and a set of rules (which we have, btw) to determine what to do in which case, which requires on-the-fly modifications to Manchester syntax keywords and elements’ names or labels. And then there’s still phonological conditioning to account for. It surely can be done, but it is not as doable as with the ‘simple’ languages that have at least a disjunctive orthography and much less genders or noun classes for the nouns.

In closing, while there’s indeed more to translate in the Protégé interface in order to fully localise it, hopefully this already helps as-is either for reading at least a whole axiom in one’s language or as stepping stone to extend it further for the other terms in the Manchester syntax and the interface. Feel free to extend our open source code.

 

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. 8th Extended Semantic Web Conference (ESWC’11), G. Antoniou et al (Eds.), Heraklion, Crete, Greece, 29 May-2 June, 2011. Springer LNCS 6643, 321-335.

[3] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157. accepted version

A controlled language for competency questions

The formulation of so-called competency questions (CQs) at the start of the development of an ontology or a similar artefact is a recurring exercise in various ontology development methodologies. For instance, “Which animals are the predators of impalas?” that an African Wildlife ontology should be able to answer and “What are the main parsers for compilers?” that a software ontology may be able to answer. Yet, going by the small number of publicly available CQs, it seems like that many developers skip that step in the process. And it turned out that for those who at least try, a considerable number of purported CQs, actually aren’t at all, are mis-formulated for even having a chance for it to work smoothly (for, say, automated formalisation), or are grammatically incorrect (a depressing 1/3 of the sentences in our test set, to be more precise). Also, there’s no software support in guiding a modeller to formulate CQs, nor to actually do something with it, such as converting it automatically into SPARQL; hence, it is disjointed from the actual artefact under development, which doesn’t help uptake.

In an attempt to narrow this gap, we have developed a controlled natural language (CNL) called CLaRO: a Competency question Language for specifying Requirements for an Ontology, model, or specification [1] for CQs for ‘TBoxes’ (type-level information and knowledge, not instances). Advantages of a CNL for CQs include that it should be easier—or at least less hard—to formalise a CQ into a query over the model and to formulate a CQ in the first place. CLaRO more specifically operates at the language layer, so it deals with noun and verb phrases, rather that the primitives of a representation language and the predetermined modeling style that comes with it. It is also the first one that has been evaluated on coverage, which turned out to be good and better than earlier works on templates for CQs. To add more to it, we also made a basic tool that offers assistive authoring to write CQs (screencast).

We got there by availing of a recently published dataset of 234 CQs that had been analysed linguistically into patterns. We analysed those patterns, and that outcome informed the design of CLaRO. Given the size, this first version pf CLaRO is template-based, with core CQs and several variants, totalling to 134 templates. CLaRO was evaluated with a random sample from the original 234 CQs, a newly created set of CQs scrambled together for related work, and half of the Pizza CQs, as well as evaluated against templates presented elsewhere [2,3]. The results are summarised in the paper and discussed in more detail in a related longer technical report [4]. Here’s the nice table with the aggregate data:

Aggregated results for coverage of the three test sets. The best values are highlighted in italics. (CLaRO results are for the complete set of 134 templates) (source: based on [1])

Given the encouraging results, we also created a proof of concept CQ authoring tool, which both can assist in the authoring of CQs and may contribute to get a better idea of requirements for such a tool. One can use autocomplete so that it proposes a template, and then fill in a selected template, or just ignore it and write a free-form CQ, hit enter, and save it to a text file; the file can also be opened and CQs deleted. Here are a few screenshots on selecting and adding a CQ in the tool:

We will be presenting CLaRO at the 13th International Conference on Metadata and Semantics Research (MTSR’19) in Rome at the end of October. If you have any questions, comments, or suggestions, please don’t hesitate to contact us. The CNL specification in csv and XML formats, the evaluation data, and the tool with the source code are available from the CLaRO Github repo.

 

References:

[1] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS. (in print)

[2] Ren, Y., Parvizi, A., Mellish, C., Pan, J.Z., van Deemter, K., Stevens, R.: Towards competency question-driven ontology authoring. In: Extended Semantic Web Conference (ESWC’14). LNCS, Springer (2014)

[3] Bezerra, C., Freitas, F., Santana, F.: Evaluating ontologies with competency questions. In: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) – Volume 03. pp. 284-285. IEEE Computer Society, Washington, DC, USA (2013)

[4] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: A data-driven CNL for specifying competency questions. University of Cape Town. Technical Report. 17 July 2019. https://arxiv.org/abs/1907.07378

DL notation plugin for Protégé 5.x

Once upon a time… the Protégé ontology development environment used Description Logic (DL) symbols and all was well—for some users at least. Then Manchester Syntax came along as the new kid on the block, using hearsay and opinion and some other authors’ preferences for an alternative rendering to the DL notation [1]. Subsequently, everyone who used Protégé was forced to deal with those new and untested keywords in the interface, like ‘some’ and ‘only’ and such, rather than the DL symbols. That had another unfortunate side-effect, being that it hampers internationalisation, for it jumbles up things rather awkwardly when your ontology vocabulary is not in English, like, say, “jirafa come only (oja or ramita)”. Even in the same English-as-first-language country, it turned out that under a controlled set-up, the DL axiom rendering in Protégé fared well in a fairly large sized experiment when compared to the Protégé interface with the sort of Manchester syntax with GUI [2], and also the OWL 2 RL rules rendering appear more positive in another (smaller) experiment [3]. Various HCI factors remain to be examined in more detail, though.

In the meantime, we didn’t fully reinstate the DL notation in Protégé in the way it was in Protégé v3.x from some 15 years ago, but with our new plugin, it will at least render the class expression in DL notation in the tool. This has the benefits that

  1. the modeller will receive immediate feedback during the authoring stage regarding a notation that may be more familiar to at least a knowledge engineer or expert modeller;
  2. it offers a natural language-independent rendering of the axioms with respect to the constructors, so that people may develop their ontology in their own language if they wish to do so, without being hampered by continuous code switching or the need for localisation; and
  3. it also may ease the transition from theory (logics) to implementation for ontology engineering novices.

Whether it needs to be integrated further among more components of the tabs and views in Protégé or other ODEs, is also a question for HCI experts to answer. The code for the DL plugin is open source, so you could extend it if you wish to do so.

The plugin itself is a jar file that can simply be dragged into the plugin folder of a Protégé installation (5.x); see the github repo for details. To illustrate it briefly, after dragging the jar file into the plugin folder, open Protégé, and add it as a view:

Then when you add some new axioms or load an ontology, select a class, and it will render all the axioms in DL notation, as shown in the following two screenshots form different ontologies:

For the sake of illustration, here’s the giraffe that eats only leaves or twigs, in the Spanish version of the African Wildlife Ontology:

The first version of the tool was developed by Michael Harrison and Larry Liu as part of their mini-project for the ontology engineering course in 2017, and it was brushed up for presentation beyond that just now by Michael Harrison (meanwhile an MSc student a CS@UCT), which was supported by a DOT4D grant to improve my textbook on ontology engineering and accompanying educational resources. We haven’t examined all possible ‘shapes’ that a class expression can take, but it definitely processes the commonly used features well. At the time of writing, we haven’t detected any errors.

p.s.: if you want your whole ontology exported at once in DL notation and to latex, for purposes of documentation generation, that is a different usage scenario and is already possible [4].

p.p.s.: if you want more DL notation, please let me know, and I’ll try to find more resources to make a v2 with more features.

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] E. Alharbi, J. Howse, G. Stapleton, A. Hamie and A. Touloumis. The efficacy of OWL and DL on user understanding of axioms and their entailments. The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudre-Mauroux, J. Sequeda, C. Lange and J. He (eds.). Springer 2017, pp20-36.

[3] M. K. Sarker, A. Krisnadhi, D. Carral and P. Hitzler, Rule-based OWL modeling with ROWLtab Protégé plugin. Proceedings of ESWC’17, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig (eds.). Springer. 2017, pp 419-433.

[4] Cogan Shimizu, Pascal Hitzler, Matthew Horridge: Rendering OWL in Description Logic Syntax. ESWC (Satellite Events) 2017. Springer LNCS. pp109-113

Updated isiZulu spellchecker and new isiXhosa spellchecker

Noting that February is the month of language activism in South Africa and that 21 February is the International Mother Language Day (a United Nations event since 2000), let me add my proverbial two cents to that. Since the launch of the isiZulu spellchecker in November 2016, research and development has progressed quite a bit, so that we have released a new ‘version 2’ of the spellchecker. For those not in-the-know: isiZulu and isiXhosa are both among the 11 official languages of South Africa, with isiZulu the largest language in the country by first language speakers and isiXhosa is slated to make an international breakthrough, as it’s used in the Black Panther movie that was released this weekend. Anyhow, the main novelties of the updated spellchecker are:

  • first error correction algorithms for isiZulu;
  • improved error detection with a few basic rules, also for isiZulu;
  • new isiXhosa error detection and correction;

The source code is open source, and, due to various tool limitations beyond our control, it’s still a standalone jar file (zipped for download). Here’s a screenshot of the tool, where it checks a piece of text from a novel in isiZulu, illustrating that *khupels has a substitution error (khupela was the intended word):

Single word error *khupels that has a substitution error s for a in the intended word (khupela)

The error corrector can propose possible corrections for single-word errors that are either transpositions, substitutions, insertions, or deletions. So, for instance, *eybo, *yrbo, *yeebo, and *ybo, respectively, cf. the correctly spelled yebo ‘yes’. It doesn’t perform equally well on each type of typo yet, with the best results obtained for transpositions. As with the error detector, it relies on a data-driven approach, with, for error correction, a lot more statistics-based algorithms cf. the error detection-only algorithms. They are described in detail in Frida Mjaria’s 2017 CS honours project. Suggestion accuracy (i.e., that it at least can suggest something) is 95% and suggestion relevance (that it contains the intended word) made it to 61%, mainly due to weak results of corrections for insertion errors (they mess too much with the trigrams).

The error detection accuracy has been improved mainly through better handling of punctuation, thanks to Norman Pilusa’s programming efforts. This was done through a series of rules on top of the data-driven approach, for it is too hard to learn those from a large corpus. For instance, semi-colons, end-of-sentence periods, and numbers (written in isiZulu like, e.g., ngu-42 rather than just 42) are now mostly ignored rather than the words adjacent to it being detected as probably misspelt. It works better than spellchecker.net’s version, which is the only other available isiZulu spellchecker: on a random selection of actual pieces of text, our tool obtained 91.71% lexical recall for error detection, whereas the spellchecker.net’s version got to 82.66% on the same text. Put differently: spellchecker.net flagged about twice as many words as incorrect as ours did (so there wasn’t much point in comparing error corrections).

Finally, because all the algorithms are essentially language-independent (ok, there’s an underlying assumption of using them for highly agglutinative languages), we fed the algorithms a large isiXhosa corpus that is being developed as part of another project, and incorporated that into the spellchecker. There’s room for some fine-tuning especially for the corrector, but at least now there is one, thanks to Norman Pilusa’s software development contributions. That we thought we could get away with this approach is thanks to Nthabiseng Mashiane’s 2017 CS honours project, which showed that the results would be fairly good (>80% error detection) with more data. We also tried a rules-based approach for isiXhosa. It obtained better accuracies than the statistical language model of Nthabiseng, but only for those parts of speech covered by the rules, which is a subset of all types of words. If you’re interested in those rules, please check out Siseko Neti’s 2017 CS Honours project. To the best of my knowledge, it’s the first time those rules have been formally represented in a computer-usable format and they may be useful for other endeavours, such as morphological analysers.

A section of the isiXhosa Wikipedia entry about the UN (*ukuez should be ukuze, which is among the proposed words).

Further improvements are possible, which are being scoped for a v3 some time later. For instance, for the linguists and language scholars: what are the most common typos? What are the most commonly used words? If we had known that, it would have been an easy way to boost the performance. Can we find optimisations to substitutions, insertions, and deletions similar to the one for transpositions? Should some syntax rules be added for further optimisation? These are some of the outstanding questions. If you’re interested in that or related questions, or you would like to use the algorithms in your tool, please contact me.

Our ESWC17 demos: TDDonto2 and an OWL verbaliser for isiZulu

Besides the full paper on heterogeneous alignments for 14th Extended Semantic Web Conference (ESWC’17) that will take place next week in Portoroz, Slovenia, we also managed to squeeze out two demo papers. You may already know of TDDonto2 with Kieren Davies and Agnieszka Lawrynowicz, which was discussed in an earlier post that has been updated with a tutorial video. It now has a demo paper as well [1], which describes the rationale and a few scenarios. The other demo, with Musa Xakaza and Langa Khumalo, is new-new, but the regular reader might have seen it coming: we finally managed to link the verbalisation patterns for certain Description Logic axiom types [2,3] to those in OWL ontologies. The tool takes as input an ontology in isiZulu and the verbalisation algorithms, and out come the isiZulu sentences, be this in plain text for further processing or in a GUI for inspection by a domain expert [4]. There is a basic demo-screencast to show it’s all working.

The overall architecture may be of interest, for it deviates from most OWL verbalisers. It is shown in the following figure:

For instance, we use the Python-based OWL API Owlready, rather than a Java-based app, for Python is rather popular in NLP and the verbalisation algorithms may be used elsewhere as well. We made more such decisions with the aim to make whatever we did as multi-purpose usable as possible, like the list of nouns with noun classes (surprisingly, and annoyingly, there is no such readily available list yet, though isizulu.net probably will have it somewhere but inaccessible), verb roots, and exceptions in pluralisation. (Problems for integrating the verbaliser with, say, Protégé will be interesting to discuss during the demo session!)

The text-based output doesn’t look as nice as the GUI interface, so I will show here only the GUI interface, which is adorned with some annotations to illustrate that those verbalisation algorithms in the background are far from trivial templates:

For instance, while in English the universal quantification is always ‘Each’ or ‘All’ regardless the named class quantified over, in isiZulu it depends on the noun class of the noun that is the name of the OWL class. For instance, in the figure above, izingwe ‘leopards’ is in noun class 10, so the ‘Each/All’ is Zonke, amavazi ‘vases’ is in noun class 6, so ‘Each/All’ then becomes Onke, and abantu ‘people’/’humans’ is in noun class 2, making Bonke. There are 17 noun classes. They also determine the subject concords (SC, alike conjugation) for the verbs, with zi- for noun class 10, ­a- for noun class 6, and ba- for noun class 2, to name a few. How this all works is described in [2,3]. We’ve implemented all those algorithms and integrated the pluraliser [5] in it to make it work. The source files are available to check and play with already, you can do so and ask us during the ESWC17 demo session, and/or also have a look at the related outputs of the NRF-funded project Grammar Engine for Nguni natural language interfaces (GeNi).

 

References

[1] Davies, K. Keet, C.M., Lawrynowicz, A. TDDonto2: A Test-Driven Development Plugin for arbitrary TBox and ABox axioms. Extended Semantic Web Conference (ESWC’17), Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (demo paper)

[2] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157.

[3] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. 9th International Natural Language Generation conference (INLG’16), 5-8 September, 2016, Edinburgh, UK. Association for Computational Linguistics, 174-183.

[4] Keet, C.M. Xakaza, M., Khumalo, L. Verbalising OWL ontologies in isiZulu with Python. 14th Extended Semantic Web Conference (ESWC’17). Springer LNCS. Portoroz, Slovenia, May 28 – June 2, 2017. (demo paper)

[5] Byamugisha, J., Keet, C.M., Khumalo, L. Pluralising Nouns in isiZulu and Related Languages. 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’16), Springer LNCS. April 3-9, 2016, Konya, Turkey.

Improved! TDDonto v2—more types of axioms supported and better feedback

Yes, the title almost sounds like a silly washing powder ad, but version 2 of TDDonto really does more than the TDDonto tool for Test-Driven Development of ontologies [1,2] that was introduced earlier this year. There are two principal novelties, largely thanks to Kieren Davies (also at UCT): more types of axioms are supported—arbitrary class expressions on both sides of the inclusion and ABox assertions—and differentiated test feedback beyond just pass/fail/unknown. TDDonto2 obviously still uses a test-first approach rather than test-last for ontology authoring, i.e., checking whether the axiom is already entailed in the ontology or would cause problems before actually adding it, saving yourself a lot of classification time overhead in the ontology authoring process. 

On the first item, TDDonto (or TawnyOwl or Scone) could not handle, e.g., Carnivore \sqcup Herbivore \sqsubseteq Animal or some domain restriction \forall eats.Animal \sqsubseteq Carnivore , or whether some individual is different/same from another. TDDonto2 can. This required a new set of algorithms, some nifty orchestration of several functions offered by an automated reasoned (of the DL/OWL variety), and extending the Protégé 5 functionality with parsing Manchester syntax keyword constructs for individuals as well (another 3600 lines of code). The Protégé 5 plugin works. Correctness of those algorithms has been proven, so you can rely on it just like you can with the test-last approach of add-axiom-and-then-run-the-reasoner (I’ll save you from those details).

On the second item (and also beyond the current TDD tools): now it can tell you not only just ‘pass’ (i.e., the axiom is entailed), but the ‘failed’ has been refined into the various possible cases: that adding the axiom to the ontology would cause the ontology to become inconsistent, or that it would cause a class to become unsatisfiable (incoherent), or it may be neither of the three (absent) so it would be ‘safe’ to add the axiom under test to the ontology (that is: at least not cause inconsistency, incoherence, or redundancy). Further, we’ve added ‘pre-real TDD unit test’ checks: if the ontology is already inconsistent, there’s no point in testing the axiom; if the ontology already has unsatisfiable classes, then one should fix that first; and if there is an entity in the test axiom that is not in the ontology, then it should be added first.

The remainder of the post mainly just shows off some of the functionality in text and with screenshots (UPDATE 13-3-2017: we now also have a screencast tutorial [177MB mov file]). Put the JAR file in the plugins directory, and then put it somewhere via Window – Views – Ontology views – TDDonto2. As toy ontology, I tend to end up with examples of the African Wildlife Ontology, which I use for exercises in my Ontology Engineering course, but as it is almost summer holiday here, I’ve conjured up a different example. That test ontology contains the following knowledge at the start:

ServiceObject \equiv Facility \sqcup Attraction

Pool \sqsubseteq Facility

Braai \sqsubseteq Facility

Pool \sqcap Braai \sqsubseteq \bot

Hotel \sqsubseteq Accommodation

BedAndBreakfast \sqsubseteq Accommodation

BedAndBreakfast \sqcap Hotel \sqsubseteq \bot

Facility \sqsubseteq \exists offeredBy.Accommodation

Hotel \sqsubseteq =1 offers.Pool

Hotel(LagoonBeach)

The first test is to see whether \exists offeredBy.Accommodation \sqsubseteq Facility , to show that TDDonto2 can handle class expressions on the left-hand side of the inclusion axiom. It can, and it is clearly absent from the toy ontology; see screenshot below, first line in the middle section. Likewise for the second and third test, where a typical novice ontology authoring mixup is made between ‘and’ and ‘or’, which different test results: one is absent, the other entailed.

poolbraaimissingThen some more fun: the pool braai. First of all, PoolBraai is not in our ontology, so TDDonto2 returns an error: it can be seen from Protégé’s handling (red dotted line below PoolBraai and red-lined text box in the screenshot above), and TDDonto2 will not let you add it to the set of tests (pop-up box, not shown). After adding it and testing “PoolBraai SubClassOf: Pool and Braai”, then if we were to add that axiom to the ontology, it will be incoherent (because Pool and Braai are disjoint):

poolbraaiwillbeincoherentDoing this nonetheless by selecting the axiom and adding it to our ontology by pressing the “Add selected to ontology”:

poolbraaiaddedand running all tests again by pressing the “Evaluate all” button (or select it and click “Evaluate selected”), the results look like this:

poolbraaifailedpreconditionThat is, we failed a precondition, because PoolBraai is unsatisfiable, so no tests are being executed until this is fixed. Did I make this up just to have a silly toy ontology example? No, the pool braai does exist, in South Africa at least: it is a stainless steel barbecue table-set that one can place in a small backyard pool. So, we remove PoolBraai \sqsubseteq Pool \sqcap Braai from the ontology and add PoolBraai \sqsubseteq Braai , so that we can do a few more tests.

Let’s assume we want to explore more about accommodations and their facilities, and add some knowledge about that (tests 5-7):

accosomefacilitiesFinally, let’s check something about any instances in the ontology. First, whether LagoonBeach is a hotel “LagoonBeach Type: Hotel”, which it is (with a view on Table Mountain), and whether it also could be a B&B, which it cannot be, because hotel and B&B are disjoint. Adding another individual to the ontology for the sake of example, SinCity (an owl:Thing), we want to know whether SinCity can be the same as LagoonBeach, or asserted as different (the last two test in the list): the tests return absent, i.e., they can be either, for nothing is known about SinCity.

instances

Now let’s remove a selection of the tests because they would cause problems in the ontology, and add the remaining five in one go:

5addedThis change requires one to classify the ontology, and subsequently you’re expected to run all the tests again to check that they are all entailed and do not cause some new problem, which they don’t:

testagain

And, finally, a few arbitrary ones that are ontologically a bit off, but they show that yes, something arbitrary both on the left-hand side and right-hand side of the inclusion (or equivalence) works (first test, below), disjointness still works (test 2) and now also with arbitrary class expressions (test 5), and the same/different individuals can take more than two arguments (tests 3 and 4).

extras

The source code and JAR file are freely available (GPL licence) to use, examine, or extend. A paper with the details has been submitted, so you’ll have to make do with just the tool for the moment. If you have any feedback on the tool, please let us know.

 

References

[1] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). H. Sack et al. (Eds.). Springer LNCS vol. 9678, pp642-657. 29 May – 2 June, 2016, Crete, Greece.

[2] Lawrynowicz, A., Keet, C.M. The TDDonto Tool for Test-Driven Development of DL Knowledge bases. 29th International Workshop on Description Logics (DL’16). April 22-25, Cape Town, South Africa. CEUR WS vol. 1577.

Launch of the isiZulu spellchecker

launchspellchecker

Langa Khumalo, ULPDO director, giving the spellchecker demo, pointing out a detected spelling error in the text. On his left, Mpho Monareng, CEO of PanSALB.

Yesterday, the isiZulu spellchecker was launched at UKZN’s “Launch of the UKZN isiZulu Books and Human Language Technologies” event, which was also featured on 702 live radio, SABC 2 Morning Live, and e-news during the day. What we at UCT have to do with it is that both the theory and the spellchecker tool were developed in-house by members of the Department of Computer Science at UCT. The connection with UKZN’s University Language Planning & Development Office is that we used a section of their isiZulu National Corpus (INC) [1] to train the spellchecker with, and that they wanted a spellchecker (the latter came first).

The theory behind the spellchecker was described briefly in an earlier post and it has been presented at IST-Africa 2016 [2]. Basically, we don’t use a wordlist + rules-based approach as some experiments of 20 years ago did, nor a wordlist + a few rules of the now-defunct translate.org.za OpenOffice v3 plugin seven years ago, but a data-driven approach with a statistical language model that uses tri-grams. The section of the INC we used were novels and news items, so, including present-day isiZulu texts. At the time of the IST-Africa’16 paper, based on Balone Ndaba’s BSc CS honours project, the spell checking was very proof-of-concept, but it showed that it could be done and still achieve a good enough accuracy. We used that approach to create an enduser-usable isiZulu spellchecker, which saw the light of day thanks to our 3rd-year CS@UCT student Norman Pilusa, who both developed the front-end and optimised the backend so that it has an excellent performance.

Upon starting the platform-independent isiZulu_spellchecker.jar file, the English interface version looks like this:

zuspellopen

You can write text in the text box, or open a txt or docx file, which then is displayed in the textbox. Click “Run”. Now there are two options: you can choose to step-through the words that are detected as misspelled one at a time or “Show All” words that are detected as misspelled. Both are shown for some sample text in the screenshot below.

zuspellonessection

processing one error at a time

zuspellallsection

highlighting all words detected as very probably misspelled

Then it is up to you to choose what to do with it: correct it in the textbox, “Ignore once”, “Ignore all”, or “Add” the word to your (local) dictionary. If you have modified the text, you can save it with the changes made by clicking “Save correction”. You also can switch the interface from the default English to isiZulu by clicking “File – Use English”, and back to English via “iFayela – ulimi lesingisi”. You can download the isiZulu spellchecker from the ULPDO website and from the GitHub repository for those who want to get their hands on the source code.

To anticipate some possible questions you may have: incorporating it as a plugin to Microsoft word, OpenOffice/LibreOffice, and Mozilla Firefox was in the planning. The former is technologically ‘closed source’, however, and the latter two have a certain way of doing spellchecking that is not amenable to the data-driven approach with the trigrams. So, for now, it is a standalone tool. By design, it is desktop-based rather than for mobile phones, because according to the client (ULPDO@UKZN), they expect the first users to be professionals with admin documents and emails, journalists writing articles, and such, writing on PCs and laptops.

There was also a trade-off between a particular sort of error: the tool now flags more words as probably incorrect than it could have, yet it will detect (a subset of) capitalization, correctly, such as KwaZulu-Natal whilst flagging some of the deviant spellings that go around, as shown in the screenshot below.

zuspellkznThe customer preferred recognising such capitalisation.

Error correction sounds like an obvious feature as well, but that will require a bit more work, not just technologically, but also the underlying theory. It will probably be an honours project topic for next year.

In the grand scheme of things, the current v1 of the spellchecker is only a small step—yet, many such small steps in succession will get one far eventually.

The launch itself saw an impressive line-up of speeches and introductions: the keynote address was given by Dr Zweli Mkhize, UKZN Chancellor and member of the ANC NEC; Prof Ramesh Krishnamurthy, from Aston University UK, gave the opening address; Mpho Monareng, CEO of PanSALB gave an address and co-launched the human language technologies; UKZN’s VC Andre van Jaarsveld provided the official welcome; and two of UKZN’s DVCs, Prof Renuka Vithal and Prof Cheryl Potgieter, gave presentations. Besides our ‘5-minutes of fame’ with the isiZulu spellchecker, the event also launched the isiZulu National Corpus, the isiZulu Term Bank, the ZuluLex mobile-compatible application (Android and iPhone), and two isiZulu books on collected short stories and an English-isiZulu architecture glossary.

 

References

[1] Khumalo, L. Advances in developing corpora in African languages. Kuwala, 2015, 1(2): 21-30.

[2] Ndaba, B., Suleman, H., Keet, C.M., Khumalo, L. The Effects of a Corpus on isiZulu Spellcheckers based on N-grams. IST-Africa 2016. May 11-13, 2016, Durban, South Africa.

An exhaustive OWL species classifier

Students enrolled in my ontology engineering course have to do a “mini-project” on a particular topic, chosen from a list of topics, such as on ontology quality, verbalisations, or language features, and may be theoretical or software development-oriented. In terms of papers, the most impressive result was OntoPartS that resulted in an ESWC2012 paper with the two postgraduate students [1], but also quite some other useful results have come out of it over the past 7 years that I’m teaching it in one form or another. This year’s top project in terms of understanding the theory, creativity to do something with it that hasn’t been done before, and working software using Semantic Web technologies was the “OWL Classifier” by Aashiq Parker, Brian Mc George, and Muhummad Patel.

The OWL classifier classifies an OWL ontology in any of its ‘species’, which can be any of the 8 specified in the standard, i.e., the 3 OWL 1 ones and the 5 OWL 2 ones. It also gives information on the DL ‘alphabet soup’—which axioms use which language feature with which letter, and an explanation of the letters—and reports on which axioms are the ones that violate a particular species. An example is shown in the following screenshot, with an exercise ontology on phone points:

phonePoints

The students’ motivation to develop it was because they had to learn about DLs and the OWL species, but Protégé 4.x and 5.x don’t tell you the species and the interfaces have only a basic, generic, explanation for the DL expressivity. I concur. And is has gotten worse with Protégé 5.0: if an ontology is outside OWL 2 DL, it still says the ‘old’ DL expressivity plus an easy-to-overlook tiny red triangle in the top-right corner once the reasoner was invoked (using Hermit 1.3.8) or a cryptic “internal reasoner error” message (Pellet), whereas with Protégé 4.x you at least got a pop-up box complaining about the ‘non-simple role…’ issues. Compare that with the neat feedback like this:

t15and16

It is also very ‘sensitive’—more so than one would be with Protégé alone. Any remote ontology imports have to be available at the location specified with the IRI. Violations due to wrong datatype usage is a known issue with the OWL Reasoner Evaluation set of ontologies, and which we’ve bumped into with the TDD testing as well. The tool doesn’t accept the invalid ones (wrong datatypes—one can select any XML data type in Protégé, but the OWL standard doesn’t support them all). In addition, a language such as OWL 2 QL has further restrictions on types of datatypes. (It is also not trivial to figure out manually whether some ontology is suitable for OBDA or not.) So I tried one from the Ontop website’s examples, presumably in OWL 2 QL:

fishdelish

Strictly speaking, it isn’t in OWL 2 QL! The OWL 2 QL profile does have xsd:integer as datatype [2], not xsd:int, as, and I quote the standard, “the intersection of the value spaces of any set of these datatypes [including xsd:integer but not xsd:int, mk] is either empty or infinite, which is necessary to obtain the desired computational properties”. [UPDATE 24-6, thanks to Martin Rezk:] The main toolset for OWL 2 QL, Ontop, actually does support xsd:int and a few other datatypes beyond the standard (e.g.: also float and boolean). There is similar syntax fun to be had with the pizza ontology: the original one is indeed in OWL DL, but if you open the file in Protégé 5 and save it, it is not in OWL DL anymore but in OWL 2 DL, for the save operation snuck in an owl#NamedIndividual. Click on the thumbnails below to see the before-and-after in the OWL classifier. This is not an increase in expressiveness—both are in SHOIN—just syntax and tooling.

pizzaOldpizzaP5

 

 

 

 

 

The OWL Classifier can thus classify both OWL 1 and OWL 2 ontologies, which it does through a careful orchestration of two OWL APIs: v1.4.3 was the last one to support OWL 1 species checking, whereas for the OWL 2 ontologies, the latest version is used (v4.2.3). The jar file and the source code are freely available on github for anyone to use and to take further. Turning it into a Protégé plugin very likely will make at least next year’s ontology engineering students happy. Comments, questions, and suggestion are welcome!

 

References

[1] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

[2] Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, Carsten Lutz, eds. OWL 2 Web Ontology Language: Profiles. W3C Recommendation, 11 December 2012 (2nd ed.).