CLaRO v2.0: A larger CNL for competency questions for ontologies

The avid blog reader with a good memory might remember we had developed a controlled natural language (CNL) in 2019 that we called CLaRO, a Competency question Language for specifying Requirements for an Ontology, model, or specification [1], for specifying requirements on the contents of the TBox (type-level) knowledge specifically. The paper won the best student paper award at the MTSR’19 conference.  Then COVID-19 came along.

Notwithstanding, we did take next steps and obtained some advances in the meantime, which resulted in a substantially extended CNL, called CLaRO v2 [2]. The paper describing how it came about has been accepted recently at the 7th Controlled Natural Language Workshop (CNL2020/21), which will be held on 8-9 September in Amsterdam, The Netherlands, in hybrid mode.

So, what is it about, being “new and improved!” compared to the first version? The first version was created in a bottom-up fashion based on a dataset of 234 competency questions [3] in a few domains only. It turned out alright with decent performance on coverage for unseen questions (88% overall) and very significantly outperforming the others, but there were some nagging doubts about the feasibility of bottom-up approaches to template development, which are essentially at the heart of every bottom-up approach: questions about representativeness and quality of the source data. We used more questions as basis to work from than others and had better coverage, but would coverage improve further then still with even more questions? Would it matter for coverage if the CQs were to come from more diverse subject domains? Also, upon manual inspection of the original CQs, it could be seen that some CQs from the dataset were ill-formed, which propagated through to the final set of templates of CLaRO. Would ‘cleaning’ the source data to presumably better quality templates improve coverage?

One of the PhD students I supervise, Mary-Jane Antia, set out to find answer to these questions. CQs were cleaned and vetted by a linguist, the templates recreated and compared and evaluated—this time automatically in a new testing pipeline. New CQs for ontologies were sourced by searching all over the place and finding some 70, to which we added 22 more variants by tweaking wording of existing CQs such that they still would be potentially answerable by an ontology. They were tested on the templates, which resulted in a lower than ideal percentage of coverage and so new templates were created from them, and yet again evaluated. The key results:

  • An increase from 88% for CLaRO v1 to 94.1% for CLaRO v2 coverage.
  • The new CLaRO v2 has 147 main templates and another 59 variants to cater for minor differences (e.g., singular/plural, redundant words), up from 93 and 41 in CLaRO.
  • Increasing the number of domains that the CQs were drawn from had a larger effect on the CQ coverage than cleaning the source data.
Screenshot of the CLaRO CQ editor tool.

All the data, including the new templates, are available on Github and the details are described in the paper [2]. The CLaRO tool that supports the authoring is in the process of being updated so as to incorporate the v2 templates (currently it is working with the v1 templates).

I will try to make it to Amsterdam where CNL’21 will take place, but travel restrictions aren’t cooperating with that plan just yet; else I’ll participate virtually. Mary-Jane will present the paper, and also for her, despite also having funding for the trip, it increasingly looks like a virtual presentation. On the bright side: at least there is a way to participate virtually.

References

[1] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS vol. 1075, 3-15.

[2]  Antia, M.-J., Keet, C.M. Assessing and Enhancing Bottom-up CNL Design for Competency Questions for Ontologies. 7th International Workshop on Controlled Natural language (CNL’21), 8-9 Sept. 2021, Amsterdam, the Netherlands. (in print)

[3] Potoniec, J., Wisniewski, D., Lawrynowicz, A., Keet, C.M. Dataset of Ontology Competency Questions to SPARQL-OWL Queries Translations. Data in Brief, 2020, 29: 105098.

Advertisement

More and better TDD for ontology authoring

Test-driven development (TDD) for ontology authoring [1] has received attention previously, including its accompanying tool TDDOnto [2] that was subsequently improved upon into the (also open source) TDDonto2 tool [3]. The TDDonto2 demo paper [3] did not contain the technical details about the new-and-improved algorithms and specification for TDD testing that we claimed it had. They are published just now in the International Journal on Artificial Intelligence Tools, as the article entitled More Effective Ontology Authoring with Test-Driven Development and the TDDonto2 tool [4]. The better algorithms cover more OWL language features than the original v1 of the theory and tool and it includes a specification for TDD testing such that there is not just pass/fail/absent as test result, but specific outcomes of the TDD test that are more informative, like that the ontology will become incoherent if that axiom were to be added. Given that model, the general flow for a simple standard case of a single TDD test (though more axioms can be tested at once) is as follows:

simplified view of the extended TDD process (source: adapted from [4])

The elements in the figure that are coloured light grey are the steps covered by the specification for TDD testing, algorithms, and TDDonto2 tool that is introduced in the paper.

The paper’s title clearly also hints to another contribution: using TDDonto2 for ontology authoring is significantly more effective. It was compared against the commonly used (and test-last) Protégé interface, which showed that the participants completed a larger part of the task in less time and with fewer mistakes. It also requires fewer interactions (clicking and typing) in the interface, which we reported on in an earlier (longer) tech report [5].

screenshot of the outcome of running the four tests on the sample ontology, in TDDonto2

As usual with research, more can be done. This is especially with respect to the white boxes in the figure above, i.e., the other aspects that would contribute toward a complete TDD methodology for ontology development. One step that we have been working on, is the idea of turning competency questions into axioms for TDD, which now is doable from CQ to SPARQL-OWL query [6] (more about that later), a CNL that may contribute to the authoring [7], and trying to figure out the modelling styles more precisely [8], since they hamper automation of these first steps in the process to get those axioms into the TDD plugin in a user-friendly way.

 

References

[1] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS vol. 9678, 642-657. 29 May – 2 June, 2016, Crete, Greece.

[2] Lawrynowicz, A., Keet, C.M. The TDDonto Tool for Test-Driven Development of DL Knowledge bases. 29th International Workshop on Description Logics (DL’16). April 22-25, Cape Town, South Africa. CEUR WS vol. 1577.

[3] Davies, K. Keet, C.M., Lawrynowicz, A. TDDonto2: A Test-Driven Development Plugin for arbitrary TBox and ABox axioms. The Semantic Web: ESWC 2017 Satellite Events, Blomqvist, E et al. (eds.). Springer LNCS vol 10577, 120-125. Portoroz, Slovenia, May 28 – June 2, 2017.

[4] Davies, K., Keet, C.M., Lawrynowicz, A. More Effective Ontology Authoring with Test-Driven Development and the TDDonto2 tool. International Journal on Artificial Intelligence Tools, 2019, 28(7): 1950023.

[5] Keet, C.M., Davies, K., Lawrynowicz, A. More Effective Ontology Authoring with Test-Driven Development. Technical Report 1812.06015. December 2018

[6] Wisniewski, D., Potoniec, J., Lawrynowicz, A., Keet, C.M. Analysis of Ontology Competency Questions and their Formalisations in SPARQL-OWL. Journal of Web Semantics. (in print)

[7] Keet, C.M., Mahlaza, Z., Antia, M.-J. CLaRO: a Controlled Language for Authoring Competency Questions. 13th Metadata and Semantics Research Conference (MTSR’19). 28-31 Oct 2019, Rome, Italy. Springer CCIS. (in print)

[8] Fillottrani, P.R., Keet, C.M.. Dimensions Affecting Representation Styles in Ontologies. 1st Iberoamerican conference on Knowledge Graphs and Semantic Web (KGSWC’19). Springer CCIS vol. 1029, 186-200. 23-30 June 2019, Villa Clara, Cuba.

Localising Protégé with Manchester syntax into your language of choice

Some people like a quasi natural language interface in ontology development tools, which is why Manchester Syntax was proposed [1]. A downside is that it locks the ontology developer into English, so that weird chimaeras are generated in the interface if the author prefers another language for the ontology, such as, e.g., the “jirafa come only (oja or ramita)” mentioned in an earlier post and that was deemed unpleasant in an experiment a while ago [2]. Those who prefer the quasi natural language components will have to resort to localising Manchester syntax and the tool’s interface.

This is precisely what two of my former students—Adam Kaliski and Casey O’Donnell—did during their mini-project in the ontology engineering course of 2017. A localisation in Afrikaans, as the case turned out to be. To make this publicly available, Michael Harrison brushed up the code a bit and tested it worked also in the new version of Protégé. It turned out it wasn’t that easy to localise it to another language the way it was done, so one of my PhD students, Toky Raboanary, redesigned the whole thing. This was then tested with Spanish, and found to be working. The remainder of the post describes informally some main aspects of it. If you don’t want to read all that but want to play with it right away: here are the jar files, open source code, and localisation instructions for if you want to create, say, a French or Dutch variant.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

Some sensible constraints, some slightly contrived ones (and some bad ones), for the purpose of showing the localisation of the interface for the various keywords. The view in English is included in the screenshot to facilitate comparison.

The localisation functions as a plugin for Protégé as a ‘view’ component. It can be selected under “Windows – Views – Class views” and then Beskrywing for the Afrikaans and Descripción for Spanish, and dragged into the desired position; this is likewise for object properties.

Instead of burying the translations in the code, they are specified in a separate XML file, whose content is fetched during the rendering. Adding a new ‘simple’ (more about that later) language merely amounts to adding a new XML file with the translations of the Protégé labels and of the relevant Manchester syntax. Here are the ‘simple’ translations—i.e., where both are fixed strings—for Afrikaans for the relevant tool interface components:

 

Class Description

(Label)

Klasbeskrywing

(Label in Afrikaans)

Equivalent To Dieselfde as
SubClass Of Subklas van
General Class axioms Algemene Klasaksiomas
SubClass Of (Anonymous Ancestor) Subklas van (Naamlose Voorvader)
Disjoint With Disjunkte van
Disjoint Union Of Disjunkte Unie van

 

The second set of translations is for the Manchester syntax, so as to render that also in the target language. The relevant mappings for Afrikaans class description keywords are listed in the table below, which contain the final choices made by the students who developed the original plugin. For instance, min and max could have been rendered as minimum and maksimum, but the ten minste and by die meeste were deemed more readable despite being multi-word strings. Another interesting bit in the translation is negation, where there has to be a second ‘no’ since Afrikaans has double negation in this construction, so that it renders it as nie <expression> nie. That final rendering is not grammatically perfect, but (hopefully) sufficiently clear:

An attempt at double negation with a fixed string

An attempt at double negation with a fixed string

Manchester OWL Keyword Afrikaans Manchester OWL

Keyword or phrase

some sommige
only slegs
min ten minste
max by die meeste
exactly precies
and en
or of
not nie <expression> nie
SubClassOf SubklasVan
EquivalentTo DieselfdeAs
DisjointWith DisjunkteVan

 

The people involved in the translations for the object properties view for Afrikaans are Toky, my colleague Tommie Meyer (also at UCT), and myself; snyding for ‘intersection’ sounds somewhat odd to me, but the real tough one to translate was ‘SuperProperty’. Of the four options that were considered—SuperEienskap, SuperVerwantskap, SuperRelasie, and SuperVerband SuperVerwantskap was chosen with Tommie having had the final vote, which is also a semantic translation, not a literal translation.

Screenshot of the object properties description, with comparison to the English

The Spanish version also has multi-word strings, but at least does not do double negation. On the other hand, it has accents. To generate the Spanish version, myself, my collaborator Pablo Fillottrani from the Universidad Nacional del Sur, Argentina, and Toky had a go at it in translating the terms. This was then implemented with the XML file. In case you do not want to dig into the XML file and not install the plugin either, but have a quick look at the translations, they are as follows for the class description view:

 

Class Description

Label

Descripción

(in Spanish)

Equivalent To Equivalente a
SubClass Of Subclase de
General Class axioms Axiomas generales de clase
SubClass Of (Anonymous Ancestor) Subclase de (Ancestro Anónimo)
Disjoint With Disjunto con
Disjoint Union Of Unión Disjunta de
Instances Instancias

 

Manchester OWL Keyword Spanish Manchester OWL Keyword
some al menos uno
only sólo
min al mínimo
max al máximo
and y
or o
not no
exactly exactamente
SubClassOf SubclaseDe
EquivalentTo EquivalenteA
DisjointWith DisjuntoCon

And here’s a rendering of a real ontology, for geo linked data in Spanish, rather than African wildlife yet again:

screenshot of the plugin behaviour with someone else’s ontology in Spanish

One final comment remains, which has to do with the ‘simple’ mentioned above. The approach of localisation presented here works only with fixed strings, i.e., the strings do not have to change depending on the context where it is uses. It won’t work with, say, isiZulu—a highly agglutinating and inflectional language—because isiZulu doesn’t have fixed strings for the Manchester syntax keywords nor for some other labels. For instance, ‘at least one’ has seven variants for nouns in the singular, depending on the noun class of the noun of the OWL class it quantifies over; e.g., elilodwa for ‘at least one’ apple, and esisodwa for ‘at least one’ twig. Also, the conjugation of the verb for the object property depends on the noun class of the noun of the OWL class, but in this case for the one that plays the subject; e.g., it’s “eats” in English for both humans and elephants eating, say, fruit, so one string for the name of the object property, but that’s udla and idla, respectively, in isiZulu. This requires annotations of the classes with ontolex-lemon or a similar approach and a set of rules (which we have, btw) to determine what to do in which case, which requires on-the-fly modifications to Manchester syntax keywords and elements’ names or labels. And then there’s still phonological conditioning to account for. It surely can be done, but it is not as doable as with the ‘simple’ languages that have at least a disjunctive orthography and much less genders or noun classes for the nouns.

In closing, while there’s indeed more to translate in the Protégé interface in order to fully localise it, hopefully this already helps as-is either for reading at least a whole axiom in one’s language or as stepping stone to extend it further for the other terms in the Manchester syntax and the interface. Feel free to extend our open source code.

 

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. 8th Extended Semantic Web Conference (ESWC’11), G. Antoniou et al (Eds.), Heraklion, Crete, Greece, 29 May-2 June, 2011. Springer LNCS 6643, 321-335.

[3] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2017, 51:131-157. accepted version

DL notation plugin for Protégé 5.x

Once upon a time… the Protégé ontology development environment used Description Logic (DL) symbols and all was well—for some users at least. Then Manchester Syntax came along as the new kid on the block, using hearsay and opinion and some other authors’ preferences for an alternative rendering to the DL notation [1]. Subsequently, everyone who used Protégé was forced to deal with those new and untested keywords in the interface, like ‘some’ and ‘only’ and such, rather than the DL symbols. That had another unfortunate side-effect, being that it hampers internationalisation, for it jumbles up things rather awkwardly when your ontology vocabulary is not in English, like, say, “jirafa come only (oja or ramita)”. Even in the same English-as-first-language country, it turned out that under a controlled set-up, the DL axiom rendering in Protégé fared well in a fairly large sized experiment when compared to the Protégé interface with the sort of Manchester syntax with GUI [2], and also the OWL 2 RL rules rendering appear more positive in another (smaller) experiment [3]. Various HCI factors remain to be examined in more detail, though.

In the meantime, we didn’t fully reinstate the DL notation in Protégé in the way it was in Protégé v3.x from some 15 years ago, but with our new plugin, it will at least render the class expression in DL notation in the tool. This has the benefits that

  1. the modeller will receive immediate feedback during the authoring stage regarding a notation that may be more familiar to at least a knowledge engineer or expert modeller;
  2. it offers a natural language-independent rendering of the axioms with respect to the constructors, so that people may develop their ontology in their own language if they wish to do so, without being hampered by continuous code switching or the need for localisation; and
  3. it also may ease the transition from theory (logics) to implementation for ontology engineering novices.

Whether it needs to be integrated further among more components of the tabs and views in Protégé or other ODEs, is also a question for HCI experts to answer. The code for the DL plugin is open source, so you could extend it if you wish to do so.

The plugin itself is a jar file that can simply be dragged into the plugin folder of a Protégé installation (5.x); see the github repo for details. To illustrate it briefly, after dragging the jar file into the plugin folder, open Protégé, and add it as a view:

Then when you add some new axioms or load an ontology, select a class, and it will render all the axioms in DL notation, as shown in the following two screenshots form different ontologies:

For the sake of illustration, here’s the giraffe that eats only leaves or twigs, in the Spanish version of the African Wildlife Ontology:

The first version of the tool was developed by Michael Harrison and Larry Liu as part of their mini-project for the ontology engineering course in 2017, and it was brushed up for presentation beyond that just now by Michael Harrison (meanwhile an MSc student a CS@UCT), which was supported by a DOT4D grant to improve my textbook on ontology engineering and accompanying educational resources. We haven’t examined all possible ‘shapes’ that a class expression can take, but it definitely processes the commonly used features well. At the time of writing, we haven’t detected any errors.

p.s.: if you want your whole ontology exported at once in DL notation and to latex, for purposes of documentation generation, that is a different usage scenario and is already possible [4].

p.p.s.: if you want more DL notation, please let me know, and I’ll try to find more resources to make a v2 with more features.

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] E. Alharbi, J. Howse, G. Stapleton, A. Hamie and A. Touloumis. The efficacy of OWL and DL on user understanding of axioms and their entailments. The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudre-Mauroux, J. Sequeda, C. Lange and J. He (eds.). Springer 2017, pp20-36.

[3] M. K. Sarker, A. Krisnadhi, D. Carral and P. Hitzler, Rule-based OWL modeling with ROWLtab Protégé plugin. Proceedings of ESWC’17, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig (eds.). Springer. 2017, pp 419-433.

[4] Cogan Shimizu, Pascal Hitzler, Matthew Horridge: Rendering OWL in Description Logic Syntax. ESWC (Satellite Events) 2017. Springer LNCS. pp109-113

An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

  • Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
  • Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
  • Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

The TDDonto tool to try out TDD for ontology authoring

Last month I wrote about Test-Driven Development for ontologies, which is described in more detail in the ESWC’16 paper I co-authored with Agnieszka Lawrynowicz [1]. That paper does not describe much about the actual tool implementing the tests, TDDonto, although we have it and used it for the performance evaluation. Some more detail on its design and more experimental results are described in the paper “The TDDonto Tool for Test-Driven Development of DL Knowledge Bases” [2] that has just been published in the proceedings of the 29th International Workshop on Description Logics, which will take place next weekend in Cape Town (22-25 April 2016).

What we couldn’t include there in [2] is multiple screenshots to show how it works, but a blog is a fine medium for that, so I’ll illustrate the tool with some examples in the remainder of the post. It’s an alpha version that works. No usability and HCI evaluations have been done, but at least it’s a Protégé plugin rather than command line :).

First, you need to download the plugin from Agnieszka’s ARISTOTELES project page and place the jar file in the plugins folder of Protégé 5.0. You can then go to the Protégé menu bar, select Windows – Views – Evaluation views – TDDOnto, and place it somewhere on the screen and start using it. For the examples here, I used the African Wildlife Ontology tutorial ontology (AWO v1) from my ontology engineering course.

Make sure to have selected an automated reasoner, and classify your ontology. Now, type a new test in the “New test” field at the top, e.g. carnivore DisjointWith: herbivore, click “Add test”, select the checkbox of the test to execute, and click the “Execute test”: the status will be returned, as shown in the screenshot below. In this case, the “OK” says that the disjointness is already asserted or entailed in the ontology.

cdisjh

Now let’s do a TDD test that is going to fail (you won’t know upfront, of course); e.g., testing whether impalas are herbivores:

impalaFail

The TDD test failed because the subsumption is neither asserted nor entailed in the ontology. One can then click “add to ontology”, which updates the ontology:

impalaAdd

Note that the reasoner has to be run again after a change in the ontology.

Lets do two more: testing whether lion is a carnivore and that flower is a plan part. The output of the tests is as follows:

lionflower

It returns “OK” for the lion, because it is entailed in the ontology: a carnivore is an entity that eats only animals or parts thereof, and lions eat only herbivore and eats some impala (which are animals). The other one, Flower SubClassOf: PlantParts fails as “undefined”, because Flower is not in the ontology.

Ontologies do not have only subsumption and disjointness axioms, so let’s assume that impalas eat leaves and we want check whether that is in the ontology, as well as whether lions eat animals:

lionImpalaEats

The former failed because there are no properties for the impala in the AWO v1, the latter passed, because a lion eats impala, and impala is an animal. Or: the TDDOnto tool indeed behaves as expected.

Currently, only a subset of all the specified tests have been implemented, due to some limitations of existing tools, but we’re working on implementing those as well.

If you have any feedback on TDDOnto, please don’t hesitate to tell us. I hope to be seeing you later in the week at DL’16, where I’ll be presenting the paper on Sunday afternoon (24th) and I also can give a live demo any time during the workshop (or afterwards, if you stay for KR’16).

 

References

[1] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS. 29 May – 2 June, 2016, Crete, Greece. (in print)

[2] Lawrynowicz, A., Keet, C.M. The TDDonto Tool for Test-Driven Development of DL Knowledge bases. 29th International Workshop on Description Logics (DL’16). April 22-25, Cape Town, South Africa. CEUR WS vol. 1577.

Ontology authoring with a Test-Driven Development approach

Ontology development has its processes and procedures—conducting a domain analysis, the implementation, maintenance, and so on—which have been developed since the mid 1990s. These high-level information systems-like methodologies don’t tell you what and how to add the knowledge to the ontology, however, i.e., the ontology authoring stage is a ‘just do it’, but not how. There are, perhaps surprisingly, few methods for how to do that; notably, FORZA uses domain and range constraints and the reasoner to propose a suitable part-whole relation [1] and advocatus diaboli zooms in on disjointness constraints among classes [2]. In a way, they all use what can be considered as tests on the ontology before adding an axiom. This smells of notions that are well-known in software engineering: unit tests, test-driven development (TDD), and Agile, with the latter two relying on different methodologies cf. the earlier ones (waterfall, iterative, and similar).

Some of those software engineering approaches have been adjusted and adopted for ontology engineering; e.g., the Agile-inspired OntoMaven that uses the standard reasoning services as tests [3], eXtreme Design with ODPs [4] that have been prepared previously, and Vrandecic and Gangemi’s early exploration of possibilities for unit tests [5]. Except for Warrender & Lord’s TDD tests for subsumption [6], they are all test-last approaches (design, author, test), rather than a test-first approach (test, author, test). The test-first approach is called test-driven development in software engineering [7], which has been ported to conceptual modelling recently as well [8]. TDD is a step up from a “add something and lets see what the reasoner says” stance, because one has to think and check upfront first before doing. (Competency questions can help with that, but they don’t say how to add the knowledge.) The question that arises, then, is how such a TDD approach would look like for ontology development. Some of the more detailed questions to be answered are (from [9]):

  • Given the TDD procedure for programming in software engineering, then what does that mean for ontologies during ontology authoring?
  • TDD uses mock objects for incomplete parts of the code, and mainly for methods; is there a parallel to it in ontology development, or can that aspect of TDD be ignored?
  • In what way, and where, (if at all) can this be integrated as a methodological step in existing ontology engineering methodologies?

We—Agnieszka Lawrynowicz from Poznan University of Technology in Poland and I—answer these questions in our paper that was recently accepted at the 13th Extended Semantic Web Conference (ESWC’16), to be held in May 2016 in Crete, Greece: Test-Driven Development of Ontologies [9]. In short: we specified tests for the OWL 2 DL language features and basic types of axioms one can add, implemented it as a Protégé plug-in, and tested it on performance with 67 ontologies (result: great). The tool and test data can be downloaded from Agnieszka’s ARISTOTELES project page.

Now slightly less brief than that one-liner. The tests—like for class subsumption, domain and range, a property chain—can be specified in two principal ways, called TBox tests and ABox tests. The TBox tests rely solely on knowledge represented in the TBox, whereas for the ABox tests, mock individuals are explicitly added for a particular TBox axiom. For instance, the simple existential quantification, as shown below, where the TBox query is denoted in SPARQL-OWL notation.

Teq

Teqprime

 

 

 

 

 

For the implementation, there is (1) a ‘wrapping’ component that includes creating the mock entities, checking the condition (line 2 in the TBox test example in the figure above, and line 4 in ABox TDD test), returning the TDD test result, and cleaning up afterward; and (2) the interaction with the ontology doing the actual testing, such as querying the ontology and class and instance classification. There are several options to realise component (2) of the TBox TDD tests. The query can be sent to, e.g., OWL-BGP [10] that uses SPARQL-OWL and Hermit to return the answer (line 1), but one also could use just the OWL API and send it to the automated reasoner, among other options.

Because OWL-BGP and the other options didn’t cater for the tests with OWL’s object properties, such as a property chain, so a full implementation would require extending current tools, we decided to first examine performance of the different options for (2) for those tests that could be implemented with current tools so as to get an idea of which approach would be the best to extend, rather than gambling on one, implement all, and go on with user testing. This TDD tool got the unimaginative name TDDonto and can be installed as a Protégé plugin. We tested the performance with 67 TONES ontology repository ontologies. The outcome of that is that the TBox-based SPARQL-OWL approach is faster than the ABox TDD tests (except for class disjointness; see figure below), and the OWL API + reasoner for the TBox TDD tests is again faster in general. These differences are bigger with larger ontologies (see paper for details).

Test computation times per test type (ABox versus TBox-based SPARQL-OWL) and per the kind of the tested axiom (source: [9]).

Test computation times per test type (ABox versus TBox-based SPARQL-OWL) and per the kind of the tested axiom (source: [9]).

Finally, can this TDD be simply ‘plugged in’ into one of the existing methodologies? No. As with TDD for software engineering, it has its own sequence of steps. An initial sketch is shown in the figure below. It outlines only the default scenario, where the knowledge to be added wasn’t there already and adding it doesn’t result in conflicts.

Sketch of a possible ontology lifecycle that focuses on TDD, and the steps of the TDD procedure summarised in key terms (source: [9]).

Sketch of a possible ontology lifecycle that focuses on TDD, and the steps of the TDD procedure summarised in key terms (source: [9]).

The “select scenario” has to do with what gets fed into the TDD tests, and therewith also where and how TDD could be used. We specified three of them: either (a) the knowledge engineer provides an axiom, (b) a domain expert fills in some template (e.g., the ‘all-some’ one) and that software generates the axiom for the domain ontology (e.g., Professor \sqsubseteq \exists teaches.Course ), or (c) someone wrote a competency question that is either manually or automatically converted into an axiom. The “refactoring” could include a step for removing a property from a subclass when it is added to its superclass. The “regression testing” considers previous tests and what to do when any conflicts may have arisen (which may need an interaction with step 5).

There is quite a bit of work yet to be done on TDD for ontologies, but at least there is now a first comprehensive basis to work from. Both Agnieszka and I plan to go to ESWC’16, so I hope to see you there. If you want more details or read the tests with a nicer layout than how it is presented in the ESWC16 paper, then have a look at the extended version [11] or contact us, or leave a comment below.

 

References

[1] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM proceedings. Oct. 27 – Nov. 1, 2013, San Francisco, USA. pp569-578.

[2] Ferre, S. and Rudolph, S. (2012). Advocatus diaboli exploratory enrichment of ontologies with negative constraints. In ten Teije, A. et al., editors, 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12), volume 7603 of LNAI, pages 42-56. Springer. Oct 8-12, Galway, Ireland

[3] Paschke, A., Schaefermeier, R. Aspect OntoMaven – aspect-oriented ontology development and configuration with OntoMaven. Tech. Rep. 1507.00212v1, Free University of Berlin (July 2015)

[4] Blomqvist, E., Sepour, A.S., Presutti, V. Ontology testing — methodology and tool. In: Proc. of EKAW’12. LNAI, vol. 7603, pp. 216-226. Springer (2012)

[5] Vrandecic, D., Gangemi, A. Unit tests for ontologies. In: OTM workshops 2006. LNCS, vol. 4278, pp. 1012-1020. Springer (2006)

[6] Warrender, J.D., Lord, P. How, What and Why to test an ontology. Technical Report 1505.04112, Newcastle University (2015), http://arxiv.org/abs/1505.04112

[7] Beck, K.: Test-Driven Development: by example. Addison-Wesley, Boston, MA (2004)

[8] Tort, A., Olive, A., Sancho, M.R. An approach to test-driven development of conceptual schemas. Data & Knowledge Engineering 70, 1088-1111 (2011)

[9] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS. 29 May – 2 June, 2016, Crete, Greece. (in print)

[10] Kollia, I., Glimm, B., Horrocks, I. SPARQL Query Answering over OWL Ontologies. In: Proc, of ESWC’11. LNCS, vol. 6643, pp. 382-396. Springer (2011)

[11] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies (extended version). Technical Report, Arxiv.org http://arxiv.org/abs/1512.06211. Dec 19, 2015.

 

 

 

 

Some ontology authoring guidelines to prevent pitfalls: TIPS

We showed pervasiveness of pitfalls in ontologies ealier [1], and it is overdue to look at how to prevent them in a structured manner. From an academic viewpoint, preventing them is better, because it means you have a better grasp of ontology development. Following our KEOD’13 paper [1], we received a book chapter invitation from its organisers, and the Typical pItfall Prevention Scheme (TIPS) is described there. Here I include a ‘sneak preview’ selection of the 10 sets of guidelines (i.e., it is somewhat reworded and shortened for this blog post).

The TIPS are relevant in general, including also to the latest OWL 2. They are structured in an order of importance in the sense of how one typically goes about developing an ontology at the level of ontology authoring, and they embed an emphasis with respect to occurrence of the pitfall so that common pitfalls can be prevented first. The numbers in brackets refer to the type of pitfall, and is the same numbering as in the OOPS! pitfall catalogue and in [1].

T1: Class naming and identification (includes P1, P2, P7, C2, and C5): Synonymy and polysemy should be avoided in naming a class: 1) distinguish the concept/universal itself from the names it can have (the synonyms) and create just one class for it and add other names using rdfs:label annotations; 2) in case of polysemy (the same name has different meanings), try to disambiguate the term and refine the names. Concerning identifying classes, do not lump several together into one with an ‘and’ or ‘or’ (like a class TaskOrGoal or ShrubsAndBushes), but try to divide them into subclasses. Squeezing in modality (like ‘can’, ‘may’, ‘should’) in the name is readable for you, but has no effect on reasoning—if you want that, choose another language—and sometimes can be taken care of in a different way (like a canCook: the stove has the function or affordability to cook). Last, you should have a good URI indicating where the ontology will be published and a relevant name for the file.

T2: Class hierarchy (includes P3, P6, P17, and P21): A taxonomy is based on is-a relationships, meaning that classA is-a classB, if and only if every instance of A is also instance of B, and is-a is transitive. The is-a is present in the language already (subclassOf in OWL), so do not introduce it as an object property. Also, do not confuse is-a with instance-of: the latter is used for representing membership of an individual in a class (which also has a primitive in OWL). Consider the leaf classes of the hierarchy: are they are still classes (entities that can have instances) or individuals (entities that cannot be instantiated anymore)? If the latter, then convert them into instances. What you typically want to avoid are cycles in the hierarchy, as then some class down in the hierarchy—and all of them in between—ends up as equivalent to one of its superclasses. Also try to avoid adding some class named Unknown, Other or Miscellaneous in a class hierarchy just because the set of sibling classes defined is incomplete.

T3: Domain and range of a class (includes P11 and P18): When you add an object or data property, answer the question “What is the most general class in the ontology for which this property holds?” and declare that class as domain/range of the property.  If the answer happens to be multiple classes, then ensure you combine them with ‘or’, not a simple list of those classes (which amounts to the intersection), likewise if the answer is owl:Thing, then try to combine several subclasses instead of using the generic owl:Thing (can the property really relate anything to anything?). For the range of a data property, you should take the answer to the question “What would be the format of data (strings of characters, positive numbers, dates, floats, etc.) used to fill in this information?” (the most general one is literal).

T4: Equivalent relations (includes P12 and P27):

T5: Inverse relations (includes P5, P13, P25, and P26): For object properties that are declared inverses of each other, check that the domain class of one is the same class as the range of the other one, and vv. (for a single object property, consider T6).

T6: Object property characteristics (includes P28 and P29): Go through the object properties and check their characteristics, such as symmetry, functional, and transitivity. See also the SubProS reasoning service [2] to ensure to have ‘safe’ object property characteristics declared that will not have unexpected deductions Concerning reflexivity, be sure to distinguish between the case where a property holds for all objects in your ontology—if so, declare it reflexive—and when it counts only for a particular relation and instances of the participating classes—then use the Self construct.

T7: Intended formalization (includes P14, P15, P16, P19, C1, and C4): As mentioned in T3, a property’s domain or range can consist of more than one class, which is usually a union of the classes, not the intersection of them. For a property’s usage in an axiom, there are typically three cases: (i) if there is at least one such relation (quite common), then use SomeValuesFrom/some/\exists ; (ii)  ‘closing’ the relation, i.e., it doesn’t relate to anything else than the class(es) specified, then also add a AllValuesFrom/only/\forall ; (iii) stating there is no such relation in which the class on the left-hand side participates, you have to be precise at what you really want to say: to achieve the latter, put the negation before the quantifier, but when there is a relation that is just not with some particular class, then the negation goes in front of the class on the right-hand side. For instance, a vegetarian pizza does have ingredients but not meat (\neg\exists hasIngredient.Meat ), which is different from saying that it has as ingredients anything in the ontology—cucumber, beer, soft drink, marsh mellow, chocolate, …—that is not meat (\exists hasIngredient.\neg Meat ). Don’t create a ‘hack’ by introducing a class with negation in the name, alike a NotMeat, but use negation properly in the axiom. Finally, when you are convinced that all relevant properties for a class have been represented, convert it to a defined class (if not already done so), which gets you more deductions for free.

T8: Modelling aspects (includes P4, P23, and C3):

T9: Domain coverage and requirements (includes P9 and P10):

T10: Documentation and understandability (includes P8, P20, and P22): annotate!

I don’t know yet when the book with the selected papers from KEOD will be published, but I assume within the next few months. (date will be added here once I know).

References

[1] Keet, C.M., Suárez Figueroa, M.C., and Poveda-Villalón, M. (2013) The current landscape of pitfalls in ontologies. International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

[2] C. Maria Keet. Detecting and Revising Flaws in OWL Object Property Expressions. EKAW’12. Springer LNAI vol 7603, pp2 52-266.