DL notation plugin for Protégé 5.x

Once upon a time… the Protégé ontology development environment used Description Logic (DL) symbols and all was well—for some users at least. Then Manchester Syntax came along as the new kid on the block, using hearsay and opinion and some other authors’ preferences for an alternative rendering to the DL notation [1]. Subsequently, everyone who used Protégé was forced to deal with those new and untested keywords in the interface, like ‘some’ and ‘only’ and such, rather than the DL symbols. That had another unfortunate side-effect, being that it hampers internationalisation, for it jumbles up things rather awkwardly when your ontology vocabulary is not in English, like, say, “jirafa come only (oja or ramita)”. Even in the same English-as-first-language country, it turned out that under a controlled set-up, the DL axiom rendering in Protégé fared well in a fairly large sized experiment when compared to the Protégé interface with the sort of Manchester syntax with GUI [2], and also the OWL 2 RL rules rendering appear more positive in another (smaller) experiment [3]. Various HCI factors remain to be examined in more detail, though.

In the meantime, we didn’t fully reinstate the DL notation in Protégé in the way it was in Protégé v3.x from some 15 years ago, but with our new plugin, it will at least render the class expression in DL notation in the tool. This has the benefits that

1. the modeller will receive immediate feedback during the authoring stage regarding a notation that may be more familiar to at least a knowledge engineer or expert modeller;
2. it offers a natural language-independent rendering of the axioms with respect to the constructors, so that people may develop their ontology in their own language if they wish to do so, without being hampered by continuous code switching or the need for localisation; and
3. it also may ease the transition from theory (logics) to implementation for ontology engineering novices.

Whether it needs to be integrated further among more components of the tabs and views in Protégé or other ODEs, is also a question for HCI experts to answer. The code for the DL plugin is open source, so you could extend it if you wish to do so.

The plugin itself is a jar file that can simply be dragged into the plugin folder of a Protégé installation (5.x); see the github repo for details. To illustrate it briefly, after dragging the jar file into the plugin folder, open Protégé, and add it as a view:

Then when you add some new axioms or load an ontology, select a class, and it will render all the axioms in DL notation, as shown in the following two screenshots form different ontologies:

For the sake of illustration, here’s the giraffe that eats only leaves or twigs, in the Spanish version of the African Wildlife Ontology:

The first version of the tool was developed by Michael Harrison and Larry Liu as part of their mini-project for the ontology engineering course in 2017, and it was brushed up for presentation beyond that just now by Michael Harrison (meanwhile an MSc student a CS@UCT), which was supported by a DOT4D grant to improve my textbook on ontology engineering and accompanying educational resources. We haven’t examined all possible ‘shapes’ that a class expression can take, but it definitely processes the commonly used features well. At the time of writing, we haven’t detected any errors.

p.s.: if you want your whole ontology exported at once in DL notation and to latex, for purposes of documentation generation, that is a different usage scenario and is already possible [4].

p.p.s.: if you want more DL notation, please let me know, and I’ll try to find more resources to make a v2 with more features.

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] E. Alharbi, J. Howse, G. Stapleton, A. Hamie and A. Touloumis. The efficacy of OWL and DL on user understanding of axioms and their entailments. The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudre-Mauroux, J. Sequeda, C. Lange and J. He (eds.). Springer 2017, pp20-36.

[3] M. K. Sarker, A. Krisnadhi, D. Carral and P. Hitzler, Rule-based OWL modeling with ROWLtab Protégé plugin. Proceedings of ESWC’17, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig (eds.). Springer. 2017, pp 419-433.

[4] Cogan Shimizu, Pascal Hitzler, Matthew Horridge: Rendering OWL in Description Logic Syntax. ESWC (Satellite Events) 2017. Springer LNCS. pp109-113

An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

• Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
• Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
• Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

First tractable encoding of ORM conceptual data models

For (relatively) many years I’ve been focusing on as-expressive-as-possible languages to represent information and knowledge, including the computationally impractical full first order logic, because one would/should want to be as precise as possible and required to represent the subject domain in an ontology and universe of discourse for the application in a conceptual data model. After all, one can always throw out the computationally unpleasant constructs later during the implementation stage, if the ontology or conceptual data model is intended for use at runtime, such as OBDA [1], test data generate for verification [2], and in the query compilation stage in RDBMSs [3]. The resulting slimmed theories/models may be different for different applications, but then at least the set of slimmed theories/models share their common understanding.

So, now I ventured in that area, not because there’s some logic x and conceptual modeling language y has to be forced into it, but it actually appears that many fancy construct/features are not used in publicly available conceptual data models anyway (see data set and xls with some analysis). The timing of the outcome of the analysis of the data set coincided with David Toman’s visit to UCT as part of his sabbatical and Pablo Fillottrani’s visit, who enjoyed the last exchange of our bi-lateral project on the unification of conceptual data modelling languages (project page). To sum up the issue we were looking at: the need for run-time usage of conceptual data models requires a tractable logic-based reconstruction of the conceptual models (i.e., in at most PTIME), which appeared to hardly exist or miss constructs important for conceptual models (regardless whether that was ORM, EER or UML Class Diagrams), or both.

The solution ended up to be a logic-based reconstruction for most of ORM2 using the $\mathcal{CFDI}_{nc}^{\forall -}$ Description Logic, which also happens to be the first tractable encoding of (most of) ORM/ORM2. With this logic, several features important for conceptual models (i.e., occur relatively often) do have their proper encoding in the logic, notably n-aries, complex identification constraints, and n-ary role subsumption. The, admittedly quite tedious, mapping

Low resolution and small version of our DL15 poster summarising the contributions.

captures over 96% of the constructs used in practice in the set of 33 ORM diagrams we analysed (see data set). Further, the results are easily transferable to EER and UML Class diagrams, with an even greater coverage. The results (and comparison with related works) are presented in our recently accepted paper at the 28th International Workshop on Description Logics (DL’15) that will take place form 7 to 11 June in Athens, Greece.

The list of accepted papers of DL’15 is available, listing 21 papers with long presentations, 16 papers with short presentation, and 26 papers with poster presentations. David will present our results in the poster session, as it’s probably of more relevance in the conceptual modelling community (and I’ll be marking exams then), and some other accepted papers cover more new ground, such as casting schema.org as a description logic, temporal query answering in EL, exact learning of ontologies, and more. The proceedings is will be online on CEUR-WS in the upcoming days as volume 1350. I’ve added a mini version of our poster on the right. I tried tikzposter, as they look really cool, but it doesn’t support figures (other than those made in latex), so I resorted to ppt (that doesn’t support math), wondering why these issues haven’t been solved by now.

References

[1] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland. pp 1389-1396.

[2] Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compilation. Synthesis Lectures on Data Management, Morgan & Claypool  Publishers (2011)

[3] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking and test data generation from modeling diagrams. Automation in Software Engineering 16, 73–99 (2009)

[4] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in $\mathcal{CFDI}_{nc}^{\forall -}$. 28th International Workshop on Description Logics (DL’15). CEUR-WS vol xx., 7-10 June 2015, Athens, Greece.

Forum for AI Research 2015, Cape Town

In 10 day’s time, the (CAIR-driven) Forum for Artificial Intelligence Research 2015 (FAIR’15) Workshop will be held at UCT in Cape Town, South Africa, from March 30 to April 2. There are still some spaces available; registration is free, but please register (for catering purposes). What will you get for this ‘bargain price’? A lot of food for the mind!

FAIR’15 follows the same format as the previous 7 editions that went under various acronyms since 2008 (among others, MOWS, MOSS, MAIS, FAIR), with a mini-course, a tutorial, and postgraduate student presentations. This edition has the following on offer.

Ulrike Sattler (University of Manchester, UK) will present a mini-course on automated reasoners in the mornings. She will go into the details of what really happens when you click that menu option “start reasoner” and Protégé’s “?” that explains the deductions, and what are the factors that influence the reasoner’s performance.

David Toman (University of Waterloo, Canada) will present a 2-hour tutorial on using knowledge representation and reasoning (logic) for query optimization in relational databases and ontology-based data access (i.e., advanced aspects of database systems implementation).

Further, there are several sessions with postgraduate student presentations. Among others, Catherine Chavula will talk about new results (cf. [1]) in multilingual ontologies, Zubeida Khan will talk about foundational ontology interchangeability (details in [2]), and (very recently MSc cum laude graduated!) Nasubo Ongoma will present her thesis on logic-based temporal conceptual data modeling (including material from [3]). Gavin Rens will talk about probabilistic belief change, Kody Moodley on defeasible reasoning for description logics, Henriette Harmse about scenario testing with OWL, and Nishal Morar on taxonomic classification.

Aurona Gerber will give an overview of Data Science at CSIR, and for some more variety in the programme, I’ll talk about the stuff ontology [4]. Check the programme for all titles of the presentations and the abstracts of the mini-course and tutorial.

An important aim of FAIR is the networking among people in Southern Africa, and share and discuss informally our research in (predominantly) KR&R and related areas—so if the above topics sound interesting, or made you curious, or you would like to meet a potential MSc/PhD supervisor, you’re welcome to join (note: some basic knowledge of logics will be needed to understand the talks, though). If you have any questions, please don’t hesitate to contact one of the organisers, Arina Britz and me.

References

[1] Chavula, C., Keet, C.M. Is Lemon Sufficient for Building Multilingual Ontologies for Bantu Languages? 11th OWL: Experiences and Directions Workshop (OWLED’14). Keet, C.M., Tamma, V. (Eds.). Riva del Garda, Italy, Oct 17-18, 2014. CEUR-WS vol. 1265, 61-72.

[2] Khan, Z.C., Keet, C.M. Feasibility of automated foundational ontology interchangeability. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI 8876, 225-237.

[3] Keet, C.M., Ongoma, E.A.N. Temporal Attributes: their Status and Subsumption. Asia-Pacific Conference on Conceptual Modelling (APCCM’15). Koehler, H., Saeki, M. (Eds.), Conferences in Research and Practice in Information Technology (CRPIT), Vol. 165. 27-30 January, 2015, Sydney, Australia.

[4] Keet, C.M. A core ontology of macroscopic stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI vol. 8876, 209-224.

Ontology Engineering lecture notes for 2014 online

The lecture notes for the Ontology Engineering BSc honours in CS course are available online now. The file is updated compared to the COMP720 module (and those notes have been removed). The main changes consist of reordering the chapters in Block II and Block III, adding better or more explanations and examples in several sections, fixing typos, and updates to reflect advances made in the field. It again includes the DL primer written by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!).

As with the last three installments, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks after the introduction: logic foundations, ontology engineering, and advanced topics (the latter we will skip, as this is a shorter course). The logic foundations contain a recap of FOL and the notion of reasoning, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with methods and methodologies that give guidance how to commence actually developing an ontology, and how to avoid and fix issues. Subsequently, there are two chapters going into some detail of two ‘paths’ in the methodology, being top-down ontology development using foundational ontologies, and bottom-up ontology development to extract knowledge from other material, such as relational databases, thesauri, and natural language documents.

The advanced topics are optional this year, but I left them in the lecture notes, as they may pique your interest. Chapter 8 on Ontology-Based Data Access is a particular application scenario of ontologies that ‘spice up’ database applications. Chapter 9 touches upon a few sub-areas within ontologies: representing and reasoning with vagueness and uncertainty, extending the language to include also temporal knowledge, the use of ontologies to enhance conceptual data models, and a note on social aspects.

It is still an evolving document, and relative completeness of sections varies slightly, so it has to be seen in conjunction with the slides, lectures, and some additional documentation that will be made available on the course’s Vula site.

Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.

Book chapter on conceptual data modeling for biology published

Just a quick note that my book chapter on “Ontology-driven formal conceptual data modeling for biological data analysis” finally has been published in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data (edited by Mourad Elloumi and Albert Y. Zomaya). A summary of the chapter’s contents is described in an earlier blog post from little over two years ago, and I’ve put the preprint online.

The whole book is an impressive 1192 pages consisting of 48 chapters of about 25 pages each, which are grouped into three main sections. The first section, Biological data pre-processing, has four parts: biological data management, biological data modeling (which includes my chapter), biological feature extraction, and biological feature selection. The second section, biological data mining, has six parts: Regression Analysis of Biological Data, Biological Data Clustering, Biological Data Classification, Association Rules Learning from Biological Data, Text Mining and Application to Biological Data, and High-Performance Computing for Biological Data Mining. The third section, biological data post-processing, has only one part: biological knowledge integration and visualization. (check the detailed table of contents). Happy reading!

Ontologies and Knowledge bases lecture notes for 2013

The lecture notes for the ontologies and knowledge bases module (COMP720) for semester 2 in 2013 are online available now. I’ve updated them compared to last year’s installment (mentioned here): in addition to the regular changes, like updates to reflect the advances made in the past year in ontology engineering, better explanations in several sections, and more examples, it includes the DL primer by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!), more exercises, and answers to selected exercises.

As last year, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with top-down ontology development using foundational ontologies, then bottom-up ontology development to extract knowledge from ‘legacy’ representations, and finally (perhaps too briefly), methods and methodologies. The advanced topics are balanced in two directions, where the first one certainly will be covered and the second one if time permits: ontology-based data access applications (i.e., an ontology-drive information system) and temporal ontologies.

It is essentially still an evolving document, and relative completeness of sections varies slightly. Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.