DL notation plugin for Protégé 5.x

Once upon a time… the Protégé ontology development environment used Description Logic (DL) symbols and all was well—for some users at least. Then Manchester Syntax came along as the new kid on the block, using hearsay and opinion and some other authors’ preferences for an alternative rendering to the DL notation [1]. Subsequently, everyone who used Protégé was forced to deal with those new and untested keywords in the interface, like ‘some’ and ‘only’ and such, rather than the DL symbols. That had another unfortunate side-effect, being that it hampers internationalisation, for it jumbles up things rather awkwardly when your ontology vocabulary is not in English, like, say, “jirafa come only (oja or ramita)”. Even in the same English-as-first-language country, it turned out that under a controlled set-up, the DL axiom rendering in Protégé fared well in a fairly large sized experiment when compared to the Protégé interface with the sort of Manchester syntax with GUI [2], and also the OWL 2 RL rules rendering appear more positive in another (smaller) experiment [3]. Various HCI factors remain to be examined in more detail, though.

In the meantime, we didn’t fully reinstate the DL notation in Protégé in the way it was in Protégé v3.x from some 15 years ago, but with our new plugin, it will at least render the class expression in DL notation in the tool. This has the benefits that

  1. the modeller will receive immediate feedback during the authoring stage regarding a notation that may be more familiar to at least a knowledge engineer or expert modeller;
  2. it offers a natural language-independent rendering of the axioms with respect to the constructors, so that people may develop their ontology in their own language if they wish to do so, without being hampered by continuous code switching or the need for localisation; and
  3. it also may ease the transition from theory (logics) to implementation for ontology engineering novices.

Whether it needs to be integrated further among more components of the tabs and views in Protégé or other ODEs, is also a question for HCI experts to answer. The code for the DL plugin is open source, so you could extend it if you wish to do so.

The plugin itself is a jar file that can simply be dragged into the plugin folder of a Protégé installation (5.x); see the github repo for details. To illustrate it briefly, after dragging the jar file into the plugin folder, open Protégé, and add it as a view:

Then when you add some new axioms or load an ontology, select a class, and it will render all the axioms in DL notation, as shown in the following two screenshots form different ontologies:

For the sake of illustration, here’s the giraffe that eats only leaves or twigs, in the Spanish version of the African Wildlife Ontology:

The first version of the tool was developed by Michael Harrison and Larry Liu as part of their mini-project for the ontology engineering course in 2017, and it was brushed up for presentation beyond that just now by Michael Harrison (meanwhile an MSc student a CS@UCT), which was supported by a DOT4D grant to improve my textbook on ontology engineering and accompanying educational resources. We haven’t examined all possible ‘shapes’ that a class expression can take, but it definitely processes the commonly used features well. At the time of writing, we haven’t detected any errors.

p.s.: if you want your whole ontology exported at once in DL notation and to latex, for purposes of documentation generation, that is a different usage scenario and is already possible [4].

p.p.s.: if you want more DL notation, please let me know, and I’ll try to find more resources to make a v2 with more features.

References

[1] Matthew Horridge, Nicholas Drummond, John Goodwin, Alan Rector, Robert Stevens and Hai Wang (2006). The Manchester OWL syntax. OWL: Experiences and Directions (OWLED’06), Athens, Georgia, USA, 10-11 Nov 2016, CEUR-WS vol 216.

[2] E. Alharbi, J. Howse, G. Stapleton, A. Hamie and A. Touloumis. The efficacy of OWL and DL on user understanding of axioms and their entailments. The Semantic Web – ISWC 2017, C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudre-Mauroux, J. Sequeda, C. Lange and J. He (eds.). Springer 2017, pp20-36.

[3] M. K. Sarker, A. Krisnadhi, D. Carral and P. Hitzler, Rule-based OWL modeling with ROWLtab Protégé plugin. Proceedings of ESWC’17, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler and O. Hartig (eds.). Springer. 2017, pp 419-433.

[4] Cogan Shimizu, Pascal Hitzler, Matthew Horridge: Rendering OWL in Description Logic Syntax. ESWC (Satellite Events) 2017. Springer LNCS. pp109-113

An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

  • Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
  • Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
  • Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

First tractable encoding of ORM conceptual data models

For (relatively) many years I’ve been focusing on as-expressive-as-possible languages to represent information and knowledge, including the computationally impractical full first order logic, because one would/should want to be as precise as possible and required to represent the subject domain in an ontology and universe of discourse for the application in a conceptual data model. After all, one can always throw out the computationally unpleasant constructs later during the implementation stage, if the ontology or conceptual data model is intended for use at runtime, such as OBDA [1], test data generate for verification [2], and in the query compilation stage in RDBMSs [3]. The resulting slimmed theories/models may be different for different applications, but then at least the set of slimmed theories/models share their common understanding.

So, now I ventured in that area, not because there’s some logic x and conceptual modeling language y has to be forced into it, but it actually appears that many fancy construct/features are not used in publicly available conceptual data models anyway (see data set and xls with some analysis). The timing of the outcome of the analysis of the data set coincided with David Toman’s visit to UCT as part of his sabbatical and Pablo Fillottrani’s visit, who enjoyed the last exchange of our bi-lateral project on the unification of conceptual data modelling languages (project page). To sum up the issue we were looking at: the need for run-time usage of conceptual data models requires a tractable logic-based reconstruction of the conceptual models (i.e., in at most PTIME), which appeared to hardly exist or miss constructs important for conceptual models (regardless whether that was ORM, EER or UML Class Diagrams), or both.

The solution ended up to be a logic-based reconstruction for most of ORM2 using the \mathcal{CFDI}_{nc}^{\forall -} Description Logic, which also happens to be the first tractable encoding of (most of) ORM/ORM2. With this logic, several features important for conceptual models (i.e., occur relatively often) do have their proper encoding in the logic, notably n-aries, complex identification constraints, and n-ary role subsumption. The, admittedly quite tedious, mapping

Low resolution and small version of our DL15 poster summarising the contributions.

Low resolution and small version of our DL15 poster summarising the contributions.

captures over 96% of the constructs used in practice in the set of 33 ORM diagrams we analysed (see data set). Further, the results are easily transferable to EER and UML Class diagrams, with an even greater coverage. The results (and comparison with related works) are presented in our recently accepted paper at the 28th International Workshop on Description Logics (DL’15) that will take place form 7 to 11 June in Athens, Greece.

The list of accepted papers of DL’15 is available, listing 21 papers with long presentations, 16 papers with short presentation, and 26 papers with poster presentations. David will present our results in the poster session, as it’s probably of more relevance in the conceptual modelling community (and I’ll be marking exams then), and some other accepted papers cover more new ground, such as casting schema.org as a description logic, temporal query answering in EL, exact learning of ontologies, and more. The proceedings is will be online on CEUR-WS in the upcoming days as volume 1350. I’ve added a mini version of our poster on the right. I tried tikzposter, as they look really cool, but it doesn’t support figures (other than those made in latex), so I resorted to ppt (that doesn’t support math), wondering why these issues haven’t been solved by now.

Anyway, more about this topic is in the pipeline that I soon hope to be able to give updates on.

 

References

[1] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland. pp 1389-1396.

[2] Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compilation. Synthesis Lectures on Data Management, Morgan & Claypool  Publishers (2011)

[3] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking and test data generation from modeling diagrams. Automation in Software Engineering 16, 73–99 (2009)

[4] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in \mathcal{CFDI}_{nc}^{\forall -} . 28th International Workshop on Description Logics (DL’15). CEUR-WS vol xx., 7-10 June 2015, Athens, Greece.

Forum for AI Research 2015, Cape Town

In 10 day’s time, the (CAIR-driven) Forum for Artificial Intelligence Research 2015 (FAIR’15) Workshop will be held at UCT in Cape Town, South Africa, from March 30 to April 2. There are still some spaces available; registration is free, but please register (for catering purposes). What will you get for this ‘bargain price’? A lot of food for the mind!

FAIR’15 follows the same format as the previous 7 editions that went under various acronyms since 2008 (among others, MOWS, MOSS, MAIS, FAIR), with a mini-course, a tutorial, and postgraduate student presentations. This edition has the following on offer.

Ulrike Sattler (University of Manchester, UK) will present a mini-course on automated reasoners in the mornings. She will go into the details of what really happens when you click that menu option “start reasoner” and Protégé’s “?” that explains the deductions, and what are the factors that influence the reasoner’s performance.

David Toman (University of Waterloo, Canada) will present a 2-hour tutorial on using knowledge representation and reasoning (logic) for query optimization in relational databases and ontology-based data access (i.e., advanced aspects of database systems implementation).

Further, there are several sessions with postgraduate student presentations. Among others, Catherine Chavula will talk about new results (cf. [1]) in multilingual ontologies, Zubeida Khan will talk about foundational ontology interchangeability (details in [2]), and (very recently MSc cum laude graduated!) Nasubo Ongoma will present her thesis on logic-based temporal conceptual data modeling (including material from [3]). Gavin Rens will talk about probabilistic belief change, Kody Moodley on defeasible reasoning for description logics, Henriette Harmse about scenario testing with OWL, and Nishal Morar on taxonomic classification.

Aurona Gerber will give an overview of Data Science at CSIR, and for some more variety in the programme, I’ll talk about the stuff ontology [4]. Check the programme for all titles of the presentations and the abstracts of the mini-course and tutorial.

An important aim of FAIR is the networking among people in Southern Africa, and share and discuss informally our research in (predominantly) KR&R and related areas—so if the above topics sound interesting, or made you curious, or you would like to meet a potential MSc/PhD supervisor, you’re welcome to join (note: some basic knowledge of logics will be needed to understand the talks, though). If you have any questions, please don’t hesitate to contact one of the organisers, Arina Britz and me.

References

[1] Chavula, C., Keet, C.M. Is Lemon Sufficient for Building Multilingual Ontologies for Bantu Languages? 11th OWL: Experiences and Directions Workshop (OWLED’14). Keet, C.M., Tamma, V. (Eds.). Riva del Garda, Italy, Oct 17-18, 2014. CEUR-WS vol. 1265, 61-72.

[2] Khan, Z.C., Keet, C.M. Feasibility of automated foundational ontology interchangeability. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI 8876, 225-237.

[3] Keet, C.M., Ongoma, E.A.N. Temporal Attributes: their Status and Subsumption. Asia-Pacific Conference on Conceptual Modelling (APCCM’15). Koehler, H., Saeki, M. (Eds.), Conferences in Research and Practice in Information Technology (CRPIT), Vol. 165. 27-30 January, 2015, Sydney, Australia.

[4] Keet, C.M. A core ontology of macroscopic stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI vol. 8876, 209-224.

Ontology Engineering lecture notes for 2014 online

The lecture notes for the Ontology Engineering BSc honours in CS course are available online now. The file is updated compared to the COMP720 module (and those notes have been removed). The main changes consist of reordering the chapters in Block II and Block III, adding better or more explanations and examples in several sections, fixing typos, and updates to reflect advances made in the field. It again includes the DL primer written by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!).

As with the last three installments, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks after the introduction: logic foundations, ontology engineering, and advanced topics (the latter we will skip, as this is a shorter course). The logic foundations contain a recap of FOL and the notion of reasoning, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with methods and methodologies that give guidance how to commence actually developing an ontology, and how to avoid and fix issues. Subsequently, there are two chapters going into some detail of two ‘paths’ in the methodology, being top-down ontology development using foundational ontologies, and bottom-up ontology development to extract knowledge from other material, such as relational databases, thesauri, and natural language documents.

The advanced topics are optional this year, but I left them in the lecture notes, as they may pique your interest. Chapter 8 on Ontology-Based Data Access is a particular application scenario of ontologies that ‘spice up’ database applications. Chapter 9 touches upon a few sub-areas within ontologies: representing and reasoning with vagueness and uncertainty, extending the language to include also temporal knowledge, the use of ontologies to enhance conceptual data models, and a note on social aspects.

It is still an evolving document, and relative completeness of sections varies slightly, so it has to be seen in conjunction with the slides, lectures, and some additional documentation that will be made available on the course’s Vula site.

Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.

Book chapter on conceptual data modeling for biology published

Just a quick note that my book chapter on “Ontology-driven formal conceptual data modeling for biological data analysis” finally has been published in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data (edited by Mourad Elloumi and Albert Y. Zomaya). A summary of the chapter’s contents is described in an earlier blog post from little over two years ago, and I’ve put the preprint online.

The whole book is an impressive 1192 pages consisting of 48 chapters of about 25 pages each, which are grouped into three main sections. The first section, Biological data pre-processing, has four parts: biological data management, biological data modeling (which includes my chapter), biological feature extraction, and biological feature selection. The second section, biological data mining, has six parts: Regression Analysis of Biological Data, Biological Data Clustering, Biological Data Classification, Association Rules Learning from Biological Data, Text Mining and Application to Biological Data, and High-Performance Computing for Biological Data Mining. The third section, biological data post-processing, has only one part: biological knowledge integration and visualization. (check the detailed table of contents). Happy reading!

Ontologies and Knowledge bases lecture notes for 2013

The lecture notes for the ontologies and knowledge bases module (COMP720) for semester 2 in 2013 are online available now. I’ve updated them compared to last year’s installment (mentioned here): in addition to the regular changes, like updates to reflect the advances made in the past year in ontology engineering, better explanations in several sections, and more examples, it includes the DL primer by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!), more exercises, and answers to selected exercises.

As last year, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with top-down ontology development using foundational ontologies, then bottom-up ontology development to extract knowledge from ‘legacy’ representations, and finally (perhaps too briefly), methods and methodologies. The advanced topics are balanced in two directions, where the first one certainly will be covered and the second one if time permits: ontology-based data access applications (i.e., an ontology-drive information system) and temporal ontologies.

It is essentially still an evolving document, and relative completeness of sections varies slightly. Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.

Logical and ontological reasoning services?

The SubProS and ProChainS compatibility services for OWL ontologies to check for good and ‘safe’ OWL object property expression [5] may be considered ontological reasoning services by some, but according others, they are/ought to be plain logical reasoning services. I discussed this issue with Alessandro Artale back in 2007 when we came up with the RBox Compatibility service [1]—which, in the end, we called an ontological reasoning service—and it came up again during EKAW’12 and the Ontologies and Conceptual Modelling Workshop (OCM) in Pretoria in November. Moreover, in all three settings, the conversation was generalized to the following questions:

  1. Is there a difference between a logical and an ontological reasoning service (be that ‘onto’-logical or ‘extra’-logical)? If so,
    1. Why, and what, then, is an ontological reasoning service?
    2. Are there any that can serve at least as prototypical example of an ontological reasoning service?

There’s still no conclusive answer on either of the questions. So, I present here some data and arguments I had and that I’ve heard so far, and I invite you to have your say on the matter. I will first introduce a few notions, terms, tools, and implicit assumptions informally, then list the three positions and their arguments I am aware of.

Some aspects about standard, non-standard, and ontological reasoning services

Let me first introduce a few ideas informally. Within Description Logics and the Semantic Web, a distinction is made between so-called ‘standard’ and ‘non-standard’ reasoning services. The standard reasoning services—which most of the DL-based reasoners support—are subsumption reasoning, satisfiability, consistency of the knowledge base, instance checking, and instance retrieval (see, e.g., [2,3] for explanations). Non-standard reasoning services include, e.g., glass-box reasoning and computing the least common subsumer, they are typically designed with the aim to facilitate ontology development, and tend to have their own plugin or extension to an existing reasoner. What these standard and non-standard reasoners have in common, is that they all focus on the (subset of first order predicate logic) logical theory only.

Take, on the other hand, OntoClean [4], which assigns meta-properties (such as rigidity and unity) to classes, and then, according to some rules involving those meta-properties, computes the class taxonomy. Those meta-properties are borrowed from Ontology in philosophy and the rules do not use the standard way of computing subsumption (where every instance of the subclass is also an instance of its super class and, thus, practically, the subclass has more or features or has the same features but with more constrained values/ranges). Moreover, OntoClean helps to distinguish between alternative logical formalisations of some piece of knowledge so as to choose the one that is better with respect to the reality we want to represent; e.g., why it is better to have the class Apple that has as quality a color green, versus the option of a class GreenObject that has shape apple-shaped. This being the case, OntoClean may be considered an ontological reasoning service. My SubProS and ProChainS [5] put constraints on OWL object property expressions so as to have safe and good hierarchies of object properties and property chains, based on the same notion of class subsumption, but then applied to role inclusion axioms: the OWL object sub-property (relationship, DL role) must be more constrained than its super-property and the two reasoning services check if that holds. But some of the flawed object property expressions do not cause a logical inconsistency (merely an undesirable deduction), so one might argue that the compatibility services are ontological.

The arguments so far

The descriptions in the previous paragraph contain implicit assumptions about the logical vs ontological reasoning, which I will spell out here. They are a synthesis from mine as well as other people’s voiced opinions about it (the other people being, among others and in alphabetical order, Alessandro Artale, Arina Britz, Giovanni Casini, Enrico Franconi, Aldo Gangemi, Chiara Ghidini, Tommie Meyer, Valentina Presutti, and Michael Uschold). It goes without saying they are my renderings of the arguments, and sometimes I state the things a little more bluntly to make the point.

1. If it is not entailed by the (standard, DL/other logic) reasoning service, then it is something ontological.

Logic is not about the study of the truth, but about the relationship of the truth of one statement and that of another. Effectively, it doesn’t matter what terms you have in the theory’s vocabulary—be this simply A, B, C, etc. or an attempt to represent Apple, Banana, Citrus, etc. conformant to what those entities are in reality—as it uses truth assignments and the usual rules of inference. If you want some reasoning that helps making a distinction between a good and a bad formalisation of what you aim to represent (where both theories are consistent), then that’s not the logician’s business but instead is relegated to the domain of whatever it is that ontologists get excited about. A counter-argument raised to that was that the early logicians were, in fact, concerned with finding a way to formalize reality in the best way; hence, not only syntax and semantics of the logic language, but also the semantics/meaning of the subject domain. A practical counter-example is that both Glimm et al [6] and Welty [7] managed to ‘hack’ OntoClean into OWL and use standard DL reasoners for it to obtain de desired inferences, so, presumably, then even OntoClean cannot be considered an ontological reasoning service after all?

2. Something ‘meta’ like OntoClean can/might be considered really ontological, but SubProS and ProChainS are ‘extra-logical’ and can be embedded like the extra-logical understanding of class subsumption, so they are logical reasoning services (for it is the analogue to class subsumption but then for role inclusion axioms).

This argument has to do with the notion of ‘standard way’ versus ‘alternative approach’ to compute something and the idea of having borrowed something from Ontology recently versus from mathematics and Aristotle somewhat longer ago. (note: the notion of subsumption in computing was still discussed in the 1980s, where the debate got settled in what is now considered the established understanding of class subsumption.) We simply can apply the underlying principles for class-subclass to one for relationships (/object properties/roles). DL/OWL reasoners and the standard view assume that the role box/object property expressions are correct and merely used to compute the class taxonomy only. But why should I assume the role box is fine, even when I know this is not always the case? And why do I have to put up with a classification of some class elsewhere in the taxonomy (or be inconsistent) when the real mistake is in the role box, not the class expression? Differently, some distinction seems to have been drawn between ‘meta’ (second order?), ‘extra’ to indicate the assumptions built into the algorithms/procedures, and ‘other, regular’ like satisfiability checking that we have for all logical theories. Another argument raised was that the ‘meta’ stuff has to do with second order logics, for which there are no good (read: sound and complete) reasoners.

3. Essentially, everything is logical, and services like OntoClean, SubProS, ProChainS can be represented formally with some clearly, precisely, formally, defined inferencing rules, so then there is no ontological reasoning, but there are only logical reasoning services.

This argument made me think of the “logic is everywhere” mug I still have (a goodie from the ICCL 2005 summer school in Dresden). More seriously, though, this argument raises some old philosophical debates whether everything can indeed be formalized, and provided any logic is fine and computation doesn’t matter. Further, it conflates the distinction, if any, between plain logical entailment, the notion of undesirable deductions (e.g., that a CarChassis is-a Perdurant [some kind of a process]), and the modeling choices and preferences (recall the apple with a colour vs. green object that has an apple-shape). But maybe that conflation is fine and there is no real distinction (if so: why?).

In my paper [5] and in the two presentations of it, I had stressed that SubProS and ProChainS were ontological reasoning services, because before that, I had tried but failed to convince logicians of the Type-I position that there’s something useful to those compatibility services and that they ought to be computed (currently, they are mostly not computed by the standard reasoners). Type-II adherents were plentiful at EKAW’12 and some at the OCM workshop. I encountered the most vocal Type-III adherent (mathematician) at the OCM workshop. Then there were the indecisive ones and people who switched and/or became indecisive. At the moment of writing this, I still lean toward Type-II, but I’m open to better arguments.

References

[1] Keet, C.M., Artale, A.: Representing and reasoning over a taxonomy of part-whole relations. Applied Ontology, 2008, 3(1-2), 91–110.

[2] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider (Eds). The Description Logics Handbook. Cambridge University Press, 2009.

[3] Pascal Hitzler, Markus Kroetzsch, Sebastian Rudolph. Foundations of Semantic Web Technologies. Chapman & Hall/CRC, 2009,

[4] Guarino, N. and Welty, C. An Overview of OntoClean. In S. Staab, R. Studer (eds.), Handbook on Ontologies, Springer Verlag 2009, pp. 201-220.

[5] Keet, C.M. Detecting and Revising Flaws in OWL Object Property Expressions. Proc. of EKAW’12. Springer LNAI vol 7603, pp2 52-266.

[6] Birte Glimm, Sebastian Rudolph, and Johanna Volker. Integrated metamodeling and diagnosis in OWL 2. In Peter F. Patel-Schneider, Yue Pan, Pascal Hitzler, Peter Mika, Lei Zhang, Jeff Z. Pan, Ian Horrocks, and Birte Glimm, editors, Proceedings of the 9th International Semantic Web Conference, volume 6496 of LNCS, pages 257-272. Springer, November 2010.

[7] Chris Welty. OntOWLclean: cleaning OWL ontologies with OWL. In B. Bennet and C. Fellbaum, editors, Proceedings of Formal Ontologies in Information Systems (FOIS’06), pages 347-359. IOS Press, 2006.

Lecture notes for the ontologies and knowledge bases course

The regular reader may recollect earlier posts about the ontology engineering courses I have taught at FUB, UH, UCI, Meraka, and UKZN. Each one had some sort of syllabus or series of blog posts with some introductory notes. I’ve put them together and extended them significantly now for the current installment of the Ontologies and Knowledge Bases Honours module (COMP718) at UKZN, and they are bound and printed into lecture notes for the enrolled students. These lecture notes are now online and I will add accompanying slides on the module’s webpage as we go along in the semester.

Given that the target audience is computer science students in their 4th year (honours), the notes are of an introductory nature. There are essentially three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, basics of Description Logics with ALC, all the DL-based OWL species, and some automated reasoning. The ontology engineering block covers top-down and bottom-up ontology development, and methods and methodologies, with top-down ontology development including mainly foundational ontologies and part-whole relations, and bottom-up the various approaches to extract knowledge from ‘legacy’ representations, such as from databases and thesauri. The advanced topics are balanced in two directions: one is toward ontology-based data access applications (i.e., an ontology-drive information system) and the other one has more theory with temporal ontologies.

Each chapter has a section with recommended/required reading and a set of exercises.

Unsurprisingly, the lecture notes have been written under time constraints and therefore the level of relative completeness of sections varies slightly. Suggestions and corrections are welcome!

Book chapter on conceptual data modelling for biological data

My invited book chapter, entitled “Ontology-driven formal conceptual data modeling for biological data analysis” [1], recently got accepted for publication in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, edited by Mourad Elloumi and Albert Y. Zomaya, and is scheduled for printing by Wiley early 2012.

All this started off with my BSc(Hons) in IT & Computing thesis back in 2003 and my first paper about the trials and tribulations of conceptual data modelling for bio-databases [2] (which is definitely not well-written, but has some valid points and has been cited a bit). In the meantime, much progress has been made on the topic, and I’ve learned, researched, and published a few things about it, too. So, what is the chapter about?

The main aspect is the ‘conceptual data modelling’ with EER, ORM, and UML Class Diagrams, i.e., concerning implementation-independent representations of the data to be managed for a specific application (hence, not ontologies for application-independence).

The adjective ‘formal’ is to point out that the conceptual modeling is not just about drawing boxes, roundtangles, and lines with some adornments, but there is a formal, logic-based, foundation. This is achieved with the formally defined CMcom conceptual data modeling language, which has the greatest common denominator between ORM, EER, and UML Class Diagrams. CMcom has, on the one hand, a mapping the Description Logic language DLRifd and, on the other hand, mappings to the icons in the diagrammatic languages. The nice aspect of this it that, at least in theory and to some extent in practice as well, one can subject it to automated reasoning to check consistency of the classes, of the whole conceptual data model, and derive implicit constraints (an example) or use it in ontology-based data access (an example and some slides on ‘COMODA’ [COnceptual MOdel-based Data Access], tailored to ORM and the horizontal gene transfer database as example).

Then there is the ‘ontology-driven’ component: Ontology and ontologies can aid in conceptual data modeling by providing solution to recurring modeling problems, an ontology can be used to generate several conceptual data models, and one can integrate (a section of) an ontology into a conceptual data model that is subsequently converted into data in database tables.

Last, but not least, it focuses on ‘biological data analysis’. A non-(biologist or bioinformatician) might be inclined to say that should not matter, but it does. Biological information is not as trivial as the typical database design toy examples like “Student is enrolled in Course”, but one has to dig deeper and figure out how to represent, e.g., catalysis, pathway information, the ecological niche. Moreover, it requires an answer to ‘which language features are ‘essential’ for the conceptual data modeling language?’ and if it isn’t included yet, how do we get it in? Some of such important features are n-aries (n>2) and the temporal dimension. The paper includes a proposal for more precisely representing catalysis, informed by ontology (mainly thanks to making the distinction between the role and its bearer), and shows how certain temporal information can be captured, which is illustrated by enhancing the model for SARS viral infection, among other examples.

The paper is not online yet, but I did put together some slides for the presentation at MAIS’11 reported on earlier, which might serve as a sneak preview of the 25-page book chapter, or you can contact me for the CRC.

References

[1] Keet, C.M. Ontology-driven formal conceptual data modeling for biological data analysis. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data. Mourad Elloumi and Albert Y. Zomaya (Eds.). Wiley (in print).

[2] Keet, C.M. Biological data and conceptual modelling methods. Journal of Conceptual Modeling, Issue 29, October 2003. http://www.inconcept.com/jcm.