An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

  • Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
  • Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
  • Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

Advertisements

First tractable encoding of ORM conceptual data models

For (relatively) many years I’ve been focusing on as-expressive-as-possible languages to represent information and knowledge, including the computationally impractical full first order logic, because one would/should want to be as precise as possible and required to represent the subject domain in an ontology and universe of discourse for the application in a conceptual data model. After all, one can always throw out the computationally unpleasant constructs later during the implementation stage, if the ontology or conceptual data model is intended for use at runtime, such as OBDA [1], test data generate for verification [2], and in the query compilation stage in RDBMSs [3]. The resulting slimmed theories/models may be different for different applications, but then at least the set of slimmed theories/models share their common understanding.

So, now I ventured in that area, not because there’s some logic x and conceptual modeling language y has to be forced into it, but it actually appears that many fancy construct/features are not used in publicly available conceptual data models anyway (see data set and xls with some analysis). The timing of the outcome of the analysis of the data set coincided with David Toman’s visit to UCT as part of his sabbatical and Pablo Fillottrani’s visit, who enjoyed the last exchange of our bi-lateral project on the unification of conceptual data modelling languages (project page). To sum up the issue we were looking at: the need for run-time usage of conceptual data models requires a tractable logic-based reconstruction of the conceptual models (i.e., in at most PTIME), which appeared to hardly exist or miss constructs important for conceptual models (regardless whether that was ORM, EER or UML Class Diagrams), or both.

The solution ended up to be a logic-based reconstruction for most of ORM2 using the \mathcal{CFDI}_{nc}^{\forall -} Description Logic, which also happens to be the first tractable encoding of (most of) ORM/ORM2. With this logic, several features important for conceptual models (i.e., occur relatively often) do have their proper encoding in the logic, notably n-aries, complex identification constraints, and n-ary role subsumption. The, admittedly quite tedious, mapping

Low resolution and small version of our DL15 poster summarising the contributions.

Low resolution and small version of our DL15 poster summarising the contributions.

captures over 96% of the constructs used in practice in the set of 33 ORM diagrams we analysed (see data set). Further, the results are easily transferable to EER and UML Class diagrams, with an even greater coverage. The results (and comparison with related works) are presented in our recently accepted paper at the 28th International Workshop on Description Logics (DL’15) that will take place form 7 to 11 June in Athens, Greece.

The list of accepted papers of DL’15 is available, listing 21 papers with long presentations, 16 papers with short presentation, and 26 papers with poster presentations. David will present our results in the poster session, as it’s probably of more relevance in the conceptual modelling community (and I’ll be marking exams then), and some other accepted papers cover more new ground, such as casting schema.org as a description logic, temporal query answering in EL, exact learning of ontologies, and more. The proceedings is will be online on CEUR-WS in the upcoming days as volume 1350. I’ve added a mini version of our poster on the right. I tried tikzposter, as they look really cool, but it doesn’t support figures (other than those made in latex), so I resorted to ppt (that doesn’t support math), wondering why these issues haven’t been solved by now.

Anyway, more about this topic is in the pipeline that I soon hope to be able to give updates on.

 

References

[1] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland. pp 1389-1396.

[2] Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compilation. Synthesis Lectures on Data Management, Morgan & Claypool  Publishers (2011)

[3] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking and test data generation from modeling diagrams. Automation in Software Engineering 16, 73–99 (2009)

[4] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in \mathcal{CFDI}_{nc}^{\forall -} . 28th International Workshop on Description Logics (DL’15). CEUR-WS vol xx., 7-10 June 2015, Athens, Greece.

Forum for AI Research 2015, Cape Town

In 10 day’s time, the (CAIR-driven) Forum for Artificial Intelligence Research 2015 (FAIR’15) Workshop will be held at UCT in Cape Town, South Africa, from March 30 to April 2. There are still some spaces available; registration is free, but please register (for catering purposes). What will you get for this ‘bargain price’? A lot of food for the mind!

FAIR’15 follows the same format as the previous 7 editions that went under various acronyms since 2008 (among others, MOWS, MOSS, MAIS, FAIR), with a mini-course, a tutorial, and postgraduate student presentations. This edition has the following on offer.

Ulrike Sattler (University of Manchester, UK) will present a mini-course on automated reasoners in the mornings. She will go into the details of what really happens when you click that menu option “start reasoner” and Protégé’s “?” that explains the deductions, and what are the factors that influence the reasoner’s performance.

David Toman (University of Waterloo, Canada) will present a 2-hour tutorial on using knowledge representation and reasoning (logic) for query optimization in relational databases and ontology-based data access (i.e., advanced aspects of database systems implementation).

Further, there are several sessions with postgraduate student presentations. Among others, Catherine Chavula will talk about new results (cf. [1]) in multilingual ontologies, Zubeida Khan will talk about foundational ontology interchangeability (details in [2]), and (very recently MSc cum laude graduated!) Nasubo Ongoma will present her thesis on logic-based temporal conceptual data modeling (including material from [3]). Gavin Rens will talk about probabilistic belief change, Kody Moodley on defeasible reasoning for description logics, Henriette Harmse about scenario testing with OWL, and Nishal Morar on taxonomic classification.

Aurona Gerber will give an overview of Data Science at CSIR, and for some more variety in the programme, I’ll talk about the stuff ontology [4]. Check the programme for all titles of the presentations and the abstracts of the mini-course and tutorial.

An important aim of FAIR is the networking among people in Southern Africa, and share and discuss informally our research in (predominantly) KR&R and related areas—so if the above topics sound interesting, or made you curious, or you would like to meet a potential MSc/PhD supervisor, you’re welcome to join (note: some basic knowledge of logics will be needed to understand the talks, though). If you have any questions, please don’t hesitate to contact one of the organisers, Arina Britz and me.

References

[1] Chavula, C., Keet, C.M. Is Lemon Sufficient for Building Multilingual Ontologies for Bantu Languages? 11th OWL: Experiences and Directions Workshop (OWLED’14). Keet, C.M., Tamma, V. (Eds.). Riva del Garda, Italy, Oct 17-18, 2014. CEUR-WS vol. 1265, 61-72.

[2] Khan, Z.C., Keet, C.M. Feasibility of automated foundational ontology interchangeability. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI 8876, 225-237.

[3] Keet, C.M., Ongoma, E.A.N. Temporal Attributes: their Status and Subsumption. Asia-Pacific Conference on Conceptual Modelling (APCCM’15). Koehler, H., Saeki, M. (Eds.), Conferences in Research and Practice in Information Technology (CRPIT), Vol. 165. 27-30 January, 2015, Sydney, Australia.

[4] Keet, C.M. A core ontology of macroscopic stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI vol. 8876, 209-224.

Ontology Engineering lecture notes for 2014 online

The lecture notes for the Ontology Engineering BSc honours in CS course are available online now. The file is updated compared to the COMP720 module (and those notes have been removed). The main changes consist of reordering the chapters in Block II and Block III, adding better or more explanations and examples in several sections, fixing typos, and updates to reflect advances made in the field. It again includes the DL primer written by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!).

As with the last three installments, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks after the introduction: logic foundations, ontology engineering, and advanced topics (the latter we will skip, as this is a shorter course). The logic foundations contain a recap of FOL and the notion of reasoning, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with methods and methodologies that give guidance how to commence actually developing an ontology, and how to avoid and fix issues. Subsequently, there are two chapters going into some detail of two ‘paths’ in the methodology, being top-down ontology development using foundational ontologies, and bottom-up ontology development to extract knowledge from other material, such as relational databases, thesauri, and natural language documents.

The advanced topics are optional this year, but I left them in the lecture notes, as they may pique your interest. Chapter 8 on Ontology-Based Data Access is a particular application scenario of ontologies that ‘spice up’ database applications. Chapter 9 touches upon a few sub-areas within ontologies: representing and reasoning with vagueness and uncertainty, extending the language to include also temporal knowledge, the use of ontologies to enhance conceptual data models, and a note on social aspects.

It is still an evolving document, and relative completeness of sections varies slightly, so it has to be seen in conjunction with the slides, lectures, and some additional documentation that will be made available on the course’s Vula site.

Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.

Book chapter on conceptual data modeling for biology published

Just a quick note that my book chapter on “Ontology-driven formal conceptual data modeling for biological data analysis” finally has been published in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data (edited by Mourad Elloumi and Albert Y. Zomaya). A summary of the chapter’s contents is described in an earlier blog post from little over two years ago, and I’ve put the preprint online.

The whole book is an impressive 1192 pages consisting of 48 chapters of about 25 pages each, which are grouped into three main sections. The first section, Biological data pre-processing, has four parts: biological data management, biological data modeling (which includes my chapter), biological feature extraction, and biological feature selection. The second section, biological data mining, has six parts: Regression Analysis of Biological Data, Biological Data Clustering, Biological Data Classification, Association Rules Learning from Biological Data, Text Mining and Application to Biological Data, and High-Performance Computing for Biological Data Mining. The third section, biological data post-processing, has only one part: biological knowledge integration and visualization. (check the detailed table of contents). Happy reading!

Ontologies and Knowledge bases lecture notes for 2013

The lecture notes for the ontologies and knowledge bases module (COMP720) for semester 2 in 2013 are online available now. I’ve updated them compared to last year’s installment (mentioned here): in addition to the regular changes, like updates to reflect the advances made in the past year in ontology engineering, better explanations in several sections, and more examples, it includes the DL primer by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!), more exercises, and answers to selected exercises.

As last year, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with top-down ontology development using foundational ontologies, then bottom-up ontology development to extract knowledge from ‘legacy’ representations, and finally (perhaps too briefly), methods and methodologies. The advanced topics are balanced in two directions, where the first one certainly will be covered and the second one if time permits: ontology-based data access applications (i.e., an ontology-drive information system) and temporal ontologies.

It is essentially still an evolving document, and relative completeness of sections varies slightly. Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.

Logical and ontological reasoning services?

The SubProS and ProChainS compatibility services for OWL ontologies to check for good and ‘safe’ OWL object property expression [5] may be considered ontological reasoning services by some, but according others, they are/ought to be plain logical reasoning services. I discussed this issue with Alessandro Artale back in 2007 when we came up with the RBox Compatibility service [1]—which, in the end, we called an ontological reasoning service—and it came up again during EKAW’12 and the Ontologies and Conceptual Modelling Workshop (OCM) in Pretoria in November. Moreover, in all three settings, the conversation was generalized to the following questions:

  1. Is there a difference between a logical and an ontological reasoning service (be that ‘onto’-logical or ‘extra’-logical)? If so,
    1. Why, and what, then, is an ontological reasoning service?
    2. Are there any that can serve at least as prototypical example of an ontological reasoning service?

There’s still no conclusive answer on either of the questions. So, I present here some data and arguments I had and that I’ve heard so far, and I invite you to have your say on the matter. I will first introduce a few notions, terms, tools, and implicit assumptions informally, then list the three positions and their arguments I am aware of.

Some aspects about standard, non-standard, and ontological reasoning services

Let me first introduce a few ideas informally. Within Description Logics and the Semantic Web, a distinction is made between so-called ‘standard’ and ‘non-standard’ reasoning services. The standard reasoning services—which most of the DL-based reasoners support—are subsumption reasoning, satisfiability, consistency of the knowledge base, instance checking, and instance retrieval (see, e.g., [2,3] for explanations). Non-standard reasoning services include, e.g., glass-box reasoning and computing the least common subsumer, they are typically designed with the aim to facilitate ontology development, and tend to have their own plugin or extension to an existing reasoner. What these standard and non-standard reasoners have in common, is that they all focus on the (subset of first order predicate logic) logical theory only.

Take, on the other hand, OntoClean [4], which assigns meta-properties (such as rigidity and unity) to classes, and then, according to some rules involving those meta-properties, computes the class taxonomy. Those meta-properties are borrowed from Ontology in philosophy and the rules do not use the standard way of computing subsumption (where every instance of the subclass is also an instance of its super class and, thus, practically, the subclass has more or features or has the same features but with more constrained values/ranges). Moreover, OntoClean helps to distinguish between alternative logical formalisations of some piece of knowledge so as to choose the one that is better with respect to the reality we want to represent; e.g., why it is better to have the class Apple that has as quality a color green, versus the option of a class GreenObject that has shape apple-shaped. This being the case, OntoClean may be considered an ontological reasoning service. My SubProS and ProChainS [5] put constraints on OWL object property expressions so as to have safe and good hierarchies of object properties and property chains, based on the same notion of class subsumption, but then applied to role inclusion axioms: the OWL object sub-property (relationship, DL role) must be more constrained than its super-property and the two reasoning services check if that holds. But some of the flawed object property expressions do not cause a logical inconsistency (merely an undesirable deduction), so one might argue that the compatibility services are ontological.

The arguments so far

The descriptions in the previous paragraph contain implicit assumptions about the logical vs ontological reasoning, which I will spell out here. They are a synthesis from mine as well as other people’s voiced opinions about it (the other people being, among others and in alphabetical order, Alessandro Artale, Arina Britz, Giovanni Casini, Enrico Franconi, Aldo Gangemi, Chiara Ghidini, Tommie Meyer, Valentina Presutti, and Michael Uschold). It goes without saying they are my renderings of the arguments, and sometimes I state the things a little more bluntly to make the point.

1. If it is not entailed by the (standard, DL/other logic) reasoning service, then it is something ontological.

Logic is not about the study of the truth, but about the relationship of the truth of one statement and that of another. Effectively, it doesn’t matter what terms you have in the theory’s vocabulary—be this simply A, B, C, etc. or an attempt to represent Apple, Banana, Citrus, etc. conformant to what those entities are in reality—as it uses truth assignments and the usual rules of inference. If you want some reasoning that helps making a distinction between a good and a bad formalisation of what you aim to represent (where both theories are consistent), then that’s not the logician’s business but instead is relegated to the domain of whatever it is that ontologists get excited about. A counter-argument raised to that was that the early logicians were, in fact, concerned with finding a way to formalize reality in the best way; hence, not only syntax and semantics of the logic language, but also the semantics/meaning of the subject domain. A practical counter-example is that both Glimm et al [6] and Welty [7] managed to ‘hack’ OntoClean into OWL and use standard DL reasoners for it to obtain de desired inferences, so, presumably, then even OntoClean cannot be considered an ontological reasoning service after all?

2. Something ‘meta’ like OntoClean can/might be considered really ontological, but SubProS and ProChainS are ‘extra-logical’ and can be embedded like the extra-logical understanding of class subsumption, so they are logical reasoning services (for it is the analogue to class subsumption but then for role inclusion axioms).

This argument has to do with the notion of ‘standard way’ versus ‘alternative approach’ to compute something and the idea of having borrowed something from Ontology recently versus from mathematics and Aristotle somewhat longer ago. (note: the notion of subsumption in computing was still discussed in the 1980s, where the debate got settled in what is now considered the established understanding of class subsumption.) We simply can apply the underlying principles for class-subclass to one for relationships (/object properties/roles). DL/OWL reasoners and the standard view assume that the role box/object property expressions are correct and merely used to compute the class taxonomy only. But why should I assume the role box is fine, even when I know this is not always the case? And why do I have to put up with a classification of some class elsewhere in the taxonomy (or be inconsistent) when the real mistake is in the role box, not the class expression? Differently, some distinction seems to have been drawn between ‘meta’ (second order?), ‘extra’ to indicate the assumptions built into the algorithms/procedures, and ‘other, regular’ like satisfiability checking that we have for all logical theories. Another argument raised was that the ‘meta’ stuff has to do with second order logics, for which there are no good (read: sound and complete) reasoners.

3. Essentially, everything is logical, and services like OntoClean, SubProS, ProChainS can be represented formally with some clearly, precisely, formally, defined inferencing rules, so then there is no ontological reasoning, but there are only logical reasoning services.

This argument made me think of the “logic is everywhere” mug I still have (a goodie from the ICCL 2005 summer school in Dresden). More seriously, though, this argument raises some old philosophical debates whether everything can indeed be formalized, and provided any logic is fine and computation doesn’t matter. Further, it conflates the distinction, if any, between plain logical entailment, the notion of undesirable deductions (e.g., that a CarChassis is-a Perdurant [some kind of a process]), and the modeling choices and preferences (recall the apple with a colour vs. green object that has an apple-shape). But maybe that conflation is fine and there is no real distinction (if so: why?).

In my paper [5] and in the two presentations of it, I had stressed that SubProS and ProChainS were ontological reasoning services, because before that, I had tried but failed to convince logicians of the Type-I position that there’s something useful to those compatibility services and that they ought to be computed (currently, they are mostly not computed by the standard reasoners). Type-II adherents were plentiful at EKAW’12 and some at the OCM workshop. I encountered the most vocal Type-III adherent (mathematician) at the OCM workshop. Then there were the indecisive ones and people who switched and/or became indecisive. At the moment of writing this, I still lean toward Type-II, but I’m open to better arguments.

References

[1] Keet, C.M., Artale, A.: Representing and reasoning over a taxonomy of part-whole relations. Applied Ontology, 2008, 3(1-2), 91–110.

[2] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider (Eds). The Description Logics Handbook. Cambridge University Press, 2009.

[3] Pascal Hitzler, Markus Kroetzsch, Sebastian Rudolph. Foundations of Semantic Web Technologies. Chapman & Hall/CRC, 2009,

[4] Guarino, N. and Welty, C. An Overview of OntoClean. In S. Staab, R. Studer (eds.), Handbook on Ontologies, Springer Verlag 2009, pp. 201-220.

[5] Keet, C.M. Detecting and Revising Flaws in OWL Object Property Expressions. Proc. of EKAW’12. Springer LNAI vol 7603, pp2 52-266.

[6] Birte Glimm, Sebastian Rudolph, and Johanna Volker. Integrated metamodeling and diagnosis in OWL 2. In Peter F. Patel-Schneider, Yue Pan, Pascal Hitzler, Peter Mika, Lei Zhang, Jeff Z. Pan, Ian Horrocks, and Birte Glimm, editors, Proceedings of the 9th International Semantic Web Conference, volume 6496 of LNCS, pages 257-272. Springer, November 2010.

[7] Chris Welty. OntOWLclean: cleaning OWL ontologies with OWL. In B. Bennet and C. Fellbaum, editors, Proceedings of Formal Ontologies in Information Systems (FOIS’06), pages 347-359. IOS Press, 2006.