An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

  • Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
  • Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
  • Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

Advertisement

Orchestrating 28 logical theories of mereo(topo)logy

Parts and wholes, again. This time it’s about the logic-aspects of theories of parthood (cf. aligning different hierarchies of (part-whole) relations and make them compatible with foundational ontologies). I intended to write this post before the Ninth Conference on Knowledge Capture (K-CAP 2017), where the paper describing the new material would be presented by my co-author, Oliver Kutz. Now, afterwards, I can add that “Orchestrating a Network of Mereo(topo) logical Theories” [1] even won the Best Paper Award. The novelties, in broad strokes, are that we figured out and structured some hitherto messy and confusing state of affairs, showed that one can do more than generally assumed especially with a new logics orchestration framework, and we proposed first steps toward conflict resolution to sort out expressivity and logic limitations trade-offs. Constructing a tweet-size “tl;dr” version of the contents is not easy, and as I have as much space here on my blog as I like, it ended up to be three paragraphs here: scene-setting, solution, and a few examples to illustrate some of it.

 

Problems

As ontologists know, parthood is used widely in ontologies across most subject domains, such as biomedicine, geographic information systems, architecture, and so on. Ontology (the philosophers) offer a parthood relation that has a bunch of computationally unpleasant properties that are structured in a plethora of mereologicial and meretopological theories such that it has become hard to see the forest for the trees. This is then complicated in practice because there are multiple logics of varying expressivity (support more or less language features), with the result that only certain fragments of the mereo(topo)logical theories can be represented. However, it’s mostly not clear what can be used when, during the ontology authoring stage one may want to have all those features so as to check correctness, and it’s not easy to predict what will happen when one aligns ontologies with different fragments of mereo(topo)logy.

 

Solution

We solved these problems by specifying a structured network of theories formulated in multiple logics that are glued together by the various linking constructs of the Distributed Ontology, Model, and Specification Language (DOL). The ‘structured network of theories’-part concerns all the maximal expressible fragments of the KGEMT mereotopological theory and five of its most well-recognised sub-theories (like GEM and MT) in the seven Description Logics-based OWL species, first-order logic, and higher order logic. The ‘glued together’-part refers to relating the resultant 28 theories within DOL (in Ontohub), which is a non-trivial (understatement, unfortunately) metalanguage that has the constructors for the glue, such as enabling one to declare to merge two theories/modules represented in different logics, extending a theory (ontology) with axioms that go beyond that language without messing up the original (expressivity-restricted) ontology, and more. Further, because the annoying thing of merging two ontologies/modules can be that the merged ontology may be in a different language than the two original ones, which is very hard to predict, we have a cute proof-of-concept tool so that it assists with steps toward resolution of language feature conflicts by pinpointing profile violations.

 

Examples

The paper describes nine mechanisms with DOL and the mereotopological theories. Here I’ll start with a simple one: we have Minimal Topology (MT) partially represented in OWL 2 EL/QL in “theory8” where the connection relation (C) is just reflexive (among other axioms; see table in the paper for details). Now what if we add connection’s symmetry, which results in “theory4”? First, we do this by not harming theory8, in DOL syntax (see also the ESSLI’16 tutorial):

logic OWL2.QL
ontology theory4 =
theory8
then
ObjectProperty: C Characteristics: Symmetric %(t7)

What is the logic of theory4? Still in OWL, and if so, which species? The Owl classifier shows the result:

 

Another case is that OWL does not let one define an object property; at best, one can add domain and range axioms and the occasional ‘characteristic’ (like aforementioned symmetry), for allowing arbitrary full definitions pushes it out of the decidable fragment. One can add them, though, in a system that can handle first order logic, such as the Heterogeneous toolset (Hets); for instance, where in OWL one can add only “overlap” as a primitive relation (vocabulary element without definition), we can take such a theory and declare that definition:

logic CASL.FOL
ontology theory20 =
theory6_plus_antisym_and_WS
then %wdef
. forall x,y:Thing . O(x,y) <=> exists z:Thing (P(z,x) /\ P(z,y)) %(t21)
. forall x,y:Thing . EQ(x,y) <=> P(x,y) /\ P(y,x) %(t22)

As last example, let me illustrate the notion of the conflict resolution. Consider theory19—ground mereology, partially—that is within OWL 2 EL expressivity and theory18—also ground mereology, partially—that is within OWL 2 DL expressivity. So, they can’t be the same; the difference is that theory18 has parthood reflexive and transitive and proper parthood asymmetric and irreflexive, whereas theory19 has both parthood and proper parthood transitive. What happens if one aligns the ontologies that contain these theories, say, O1 (with theory18) and O2 (with theory19)? The Owl classifier provides easy pinpointing and tells you the profile: OWL 2 full (or: first order logic, or: beyond OWL 2 DL—top row) and why (bottom section):

Now, what can one do? The conflict resolution cannot be fully automated, because it depends on what the modeller wants or needs, but there’s enough data generated already and there are known trade-offs so that it is possible to describe the consequences:

  • Choose the O1 axioms (with irreflexivity and asymmetry on proper part of), which will make the ontology interoperable with other ontologies in OWL 2 DL, FOL or HOL.
  • Choose O2’s axioms (with transitivity on part of and proper part of), which will facilitate linking to ontologies in OWL 2 RL, 2 EL, 2 DL, FOL, and HOL.
  • Choose to keep both sets will result in an OWL 2 Full ontology that is undecidable, and it is then compatible only with FOL and HOL ontologies.

As serious final note: there’s still fun to be had on the logic side of things with countermodels and sub-networks and such, and with refining the conflict resolution to assist ontology engineers better. (or: TBC)

As less serious final note: the working title of early drafts of the paper was “DOLifying mereo(topo)logy”, but at some point we chickened out and let go of that frivolity.

 

References

[1] Keet, C.M., Kutz, O. Orchestrating a Network of Mereo(topo)logical Theories. Ninth International Conference on Knowledge Capture (K-CAP’17), Austin, Texas, USA, December 4-6, 2017. ACM Proceedings.

An exhaustive OWL species classifier

Students enrolled in my ontology engineering course have to do a “mini-project” on a particular topic, chosen from a list of topics, such as on ontology quality, verbalisations, or language features, and may be theoretical or software development-oriented. In terms of papers, the most impressive result was OntoPartS that resulted in an ESWC2012 paper with the two postgraduate students [1], but also quite some other useful results have come out of it over the past 7 years that I’m teaching it in one form or another. This year’s top project in terms of understanding the theory, creativity to do something with it that hasn’t been done before, and working software using Semantic Web technologies was the “OWL Classifier” by Aashiq Parker, Brian Mc George, and Muhummad Patel.

The OWL classifier classifies an OWL ontology in any of its ‘species’, which can be any of the 8 specified in the standard, i.e., the 3 OWL 1 ones and the 5 OWL 2 ones. It also gives information on the DL ‘alphabet soup’—which axioms use which language feature with which letter, and an explanation of the letters—and reports on which axioms are the ones that violate a particular species. An example is shown in the following screenshot, with an exercise ontology on phone points:

phonePoints

The students’ motivation to develop it was because they had to learn about DLs and the OWL species, but Protégé 4.x and 5.x don’t tell you the species and the interfaces have only a basic, generic, explanation for the DL expressivity. I concur. And is has gotten worse with Protégé 5.0: if an ontology is outside OWL 2 DL, it still says the ‘old’ DL expressivity plus an easy-to-overlook tiny red triangle in the top-right corner once the reasoner was invoked (using Hermit 1.3.8) or a cryptic “internal reasoner error” message (Pellet), whereas with Protégé 4.x you at least got a pop-up box complaining about the ‘non-simple role…’ issues. Compare that with the neat feedback like this:

t15and16

It is also very ‘sensitive’—more so than one would be with Protégé alone. Any remote ontology imports have to be available at the location specified with the IRI. Violations due to wrong datatype usage is a known issue with the OWL Reasoner Evaluation set of ontologies, and which we’ve bumped into with the TDD testing as well. The tool doesn’t accept the invalid ones (wrong datatypes—one can select any XML data type in Protégé, but the OWL standard doesn’t support them all). In addition, a language such as OWL 2 QL has further restrictions on types of datatypes. (It is also not trivial to figure out manually whether some ontology is suitable for OBDA or not.) So I tried one from the Ontop website’s examples, presumably in OWL 2 QL:

fishdelish

Strictly speaking, it isn’t in OWL 2 QL! The OWL 2 QL profile does have xsd:integer as datatype [2], not xsd:int, as, and I quote the standard, “the intersection of the value spaces of any set of these datatypes [including xsd:integer but not xsd:int, mk] is either empty or infinite, which is necessary to obtain the desired computational properties”. [UPDATE 24-6, thanks to Martin Rezk:] The main toolset for OWL 2 QL, Ontop, actually does support xsd:int and a few other datatypes beyond the standard (e.g.: also float and boolean). There is similar syntax fun to be had with the pizza ontology: the original one is indeed in OWL DL, but if you open the file in Protégé 5 and save it, it is not in OWL DL anymore but in OWL 2 DL, for the save operation snuck in an owl#NamedIndividual. Click on the thumbnails below to see the before-and-after in the OWL classifier. This is not an increase in expressiveness—both are in SHOIN—just syntax and tooling.

pizzaOldpizzaP5

 

 

 

 

 

The OWL Classifier can thus classify both OWL 1 and OWL 2 ontologies, which it does through a careful orchestration of two OWL APIs: v1.4.3 was the last one to support OWL 1 species checking, whereas for the OWL 2 ontologies, the latest version is used (v4.2.3). The jar file and the source code are freely available on github for anyone to use and to take further. Turning it into a Protégé plugin very likely will make at least next year’s ontology engineering students happy. Comments, questions, and suggestion are welcome!

 

References

[1] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

[2] Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, Carsten Lutz, eds. OWL 2 Web Ontology Language: Profiles. W3C Recommendation, 11 December 2012 (2nd ed.).

OBDA/I Example in the Digital Humanities: food in the Roman Empire

A new installment of the Ontology Engineering module is about to start for the computer science honours students who selected it, so, in preparation, I was looking around for new examples of what ontologies and Semantic Web technologies can do for you, and that are at least somewhat concrete. One of those examples has an accompanying paper that is about to be published (can it be more recent than that?), which is on the production and distribution of food in the Roman Empire [1]. Although perhaps not many people here in South Africa might care about what happened in the Mediterranean basin some 2000 years ago, it is a good showcase of what one perhaps also could do here with the historical and archeological information (e.g., an inter-university SA project on digital humanities started off a few months ago, and several academics and students at UCT contribute to the Bleek and Lloyd Archive of |xam (San) cultural heritage, among others). And the paper is (relatively) very readable also to the non-expert.

 

So, what is it about? Food was stored in pots (more precisely: an amphora) that had engravings on it with text about who, what, where etc. and a lot of that has been investigated, documented, and stored in multiple resources, such as in databases. None of the resources cover all data points, but to advance research and understanding about it and food trading systems in general, it has to be combined somehow and made easily accessible to the domain experts. That is, essentially it is an instance of a data access and integration problem.

There are a couple of principal approaches to address that, usually done by an Extract-Transform-Load of each separate resource into one database or digital library, and then putting a web-based front-end on top of it. There are many shortcomings to that solution, such as having to repeat the ETL procedure upon updates in the source database, a single control point, and the, typically only, canned (i.e., fixed) queries of the interface. A more recent approach, of which the technologies finally are maturing, is Ontology-Based Data Access (OBDA) and Ontology-Based Data Integration (OBDI). I say “finally” here, as I still very well can remember the predecessors we struggled with some 7-8 years ago [2,3] (informally here, here, and here), and “maturing”, as the software has become more stable, has more features, and some of the things we had to do manually back then have been automated now. The general idea of OBDA/I applied to the Roman Empire Food system is shown in the figure below.

OBDA in the EPnet system (Source: [1])

OBDA in the EPnet system (Source: [1])

There are the data sources, which are federated (one ‘middle layer’, though still at the implementation level). The federated interface has mapping assertions to elements in the ontology. The user then can use the terms of the ontology (classes and their relations and attributes) to query the data, without having to know about how the data is stored and without having to write page-long SQL queries. For instance, a query “retrieve inscriptions on amphorae found in the city of ‘Mainz” containing the text ‘PNN’” would use just the terms in the ontology, say, Inscription, Amphora, City, found in, and inscribed on, and any value constraint added (like the PNN), and the OBDA/I system takes care of the rest.

Interestingly, the authors of [1]—admitted, three of them are former colleagues from Bolzano—used the same approach to setting up the ontology component as we did for [3]. While we will use the Protégé Ontology Development Environment in the OE module, it is not the best modelling tool to overcome the knowledge acquisition bottleneck. The authors modelled together with the domain experts in the much more intuitive ORM language and tool NORMA, and first represented whatever needed to be represented. This included also reuse of relevant related ontologies and non-ontology material, and modularizing it for better knowledge management and thereby ameliorating cognitive overload. A subset of the resultant ontology was then translated into the Web Ontology Language OWL (more precisely: OWL 2 QL, a tractable profile of OWL 2 DL), which is actually used in the OBDA system. We did that manually back then; now this can be done automatically (yay!).

Skipping here over the OBDI part and considering it done, the main third step in setting up an OBDA system is to link the data to the elements in the ontology. This is done in the mapping layer. This is essentially of the form “TermInTheOntology <- SQLqueryOverTheSource”. Abstracting from the current syntax of the OBDA system and simplifying the query for readability (see the real one in the paper), an example would thus have the following make up to retrieve all Dressel 1 type of amphorae, named Dressel1Amphora in the ontology, in all the data sources of the system:

Dressel1Amphora <-
    SELECT ic.id
       FROM ic JOIN at ON at.carrier=ic.id
          WHERE at.type=’DR1’

Or some such SQL query (typically larger than this one). This takes up a bit of time to do, but has to be done only once, for these mappings are stored in a separate mapping file.

The domain expert, then, when wanting to know about the Dressel1 amphorae in the system, would have to ask only ‘retrieve all Dressel1 amphorae’, rather than creating the SQL query, and thus being oblivious about which tables and columns are involved in obtaining the answer and being oblivious about that some data entry person at some point had mysteriously decided not to use ‘Dressel1’ but his own abbreviation ‘DR1’.

The actual ‘retrieve all Dressel1 amphorae’ is then a SPARQL query over the ontology, e.g.,

SELECT ?x WHERE {?x rdf:Type :Dressel1Amphora.}

which is surely shorter and therefore easier to handle for the domain expert than the SQL one. The OBDA system (-ontop-) takes this query and reasons over the ontology to see if the query can be answered directly by it without consulting the data, or else can be rewritten given the other knowledge in the ontology (it can, see example 5 in the paper). The outcome of that process then consults the relevant mappings. From that, the whole SQL query is constructed, which is sent to the (federated) data source(s), which processes the query as any relational database management system does, and returns the data to the user interface.

 

It is, perhaps, still unpleasant that domain experts have to put up with another query language, SPARQL, as the paper notes as well. Some efforts have gone into sorting out that ‘last mile’, such as using a (controlled) natural language to pose the query or to reuse that original ORM diagram in some way, but more needs to be done. (We tried the latter in [3]; that proof-of-concept worked with a neutered version of ORM and we have screenshots and videos to prove it, but in working on extensions and improvements, a new student uploaded buggy code onto the production server, so that online source doesn’t work anymore (and we didn’t roll back and reinstalled an older version, with me having moved to South Africa and the original student-developer, Giorgio Stefanoni, away studying for his MSc).

 

Note to OE students: This is by no means all there is to OBDA/I, but hopefully it has given you a bit of an idea. Read at least sections 1-3 of paper [1], and if you want to do an OBDA mini-project, then read also the rest of the paper and then Chapter 8 of the OE lecture notes, which discusses in a bit more detail the motivations for OBDA and the theory behind it.

 

References

[1] Calvanese, D., Liuzzo, P., Mosca, A., Remesal, J, Rezk, M., Rull, G. Ontology-Based Data Integration in EPNet: Production and Distribution of Food During the Roman Empire. Engineering Applications of Artificial Intelligence, 2016. To appear.

[2] Keet, C.M., Alberts, R., Gerber, A., Chimamiwa, G. Enhancing web portals with Ontology-Based Data Access: the case study of South Africa’s Accessibility Portal for people with disabilities. Fifth International Workshop OWL: Experiences and Directions (OWLED 2008), 26-27 Oct. 2008, Karlsruhe, Germany.

[3] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC 2010), March 22-26 2010, Sierre, Switzerland. ACM Proceedings, pp1389-1396.

Updated ontology engineering lecture notes (2015)

It’s that time of the year again, in the southern hemisphere that is, where course preparations for the academic year are going on full steam ahead. Also this year, I’ll be teaching a CS honours course on ontology engineering. To that end, the lecture notes have been updated, though not in a major way like last year. Some sections have been shuffled around, there are a few new exercises, Chris’s update suggestion from last year on the OBO-OWL mapping has been included, and a couple of typos and odd sentences have been fixed.

Practically, this installment will be a bit different from previous years, as it has integrated a small project on Semantic Wikis, funded by CILT and OpenUCT. Set up, maintenance, and filling it with contents on ontology engineering topics will initially be done ‘in house’ by students enrolled in the course and not be generally available on the Web, but if all goes well, it’ll be accessible to everyone some time in April this year, and possibly included in the OER Commons.

Semantic MediaWiki’s features are fairly basic and there are a bunch of plugins and extensions I’ve seen listed, but I didn’t check whether they all worked with the latest SMW. If you have a particular suggestion, please leave a comment or send me an email. One thing I’m still wondering about particularly, but haven’t found a solution to, is whether there’s a plugin that lets you see the (lightweight) ontology when adding contents, so that it makes it easier to use terms in the text from the ontology’s vocabulary rather than find an having to process manually whatever (near)synonyms have been used throughout the pages (like, one contributor using ‘upper ontology’, another ‘foundational ontology’ and the third ‘top-level ontology’), and allow on-the-fly extensions of that ontology.

FAIR’14 and ‘modelling relationships’ tutorial

After a weekend of ‘loadshedding’ (one of those South African euphemisms) I’m posting a few notes on the Forum on Artificial Intelligence Research 2014 (FAIR’14) that took place from 3-5 Dec 2014 at Stellenbosch University, which was organised by CAIR and co-located with the FASTAR/Espresso Workshop 2014, which, in turn, was co-located with PRASA, AFLaT, and RobMech 2014 in Cape Town. FAIR’14 consisted of a presentation by Sergei Obiedkov of the Higher School of Economics, Russia, a tutorial on modelling relationships in ontologies by me, and a course on computational social choice theory by Ulle Endriss from the ILLC, University of Amsterdam, The Netherlands.

While not quite relevant to my current research except for judgement aggregation at the end (for crowdsourcing), Ulle’s course was one of those events that made me think “[why didn’t/if only] I was exposed to this material before?!”, when I had to make choices as to what to study and specialise in (though, admitted, once knowing about the math with game theory and applying that to peace negotiations in my MA pdf, I still went on in CS with KR&R and ontologies). Ulle’s course combined socially relevant topics, such as the fair allocation of resources and voting systems, with solid, precise, logic- and math-based representations and computation. Besides the engaging content, he’s also good at teaching it. The content and slides are a condensed version of his MSc course on social choice theory and are available online here, which also has links to related reading material.

I tried to condense into 2 hours some aspects of modelling relationships in ontologies. It started with some problems and questions, proceeded to touching upon the nature of relations and some detail of the formal semantics, common relationships (with some detail about mereotopology), and closing with some practical modelling guidance and reasoner performance when modelling it one way or another. It being a tutorial, and not all participants had Protégé installed, I resorted to a peer instruction audience response system to incorporate interactively some questions about modelling some relationships. The slides are available online (though also here the text on the slides only partially reflect what I’ve talked about).

Other than that, there’s always the social component. Despite the weird time-warp that Stellenbosch town constitutes, it was really nice to catch up with former colleagues and to see the progress of postgrads of UKZN, to hear about the future of CAIR, and that it’s a small world even when meeting people new to me. And the food & wine was delicious. The train travel back to Cape Town took a bit longer than the schedule said it ought to be, but I recommend it nevertheless.

Results of the OWL feature popularity contest at OWLED 2014

One of the events on the OWLED 2014 programme–co-located with ISWC2014–was the OWL feature popularity contest, with as dual purpose to get a feel of possible improvements to the OWL 2 standard and to generate lively discussions (though the latter happened throughout the workshop already anyway). The PC co-chair, Valentina Tamma, and I had collected some questions ourselves and we had solicited suggestions for question from the participants beforehand, and we used a ‘software-based clicker’ (audience response system) during the session so that participants could vote and see results instantly. The remainder of this posts contains the questions and the results. We left the questions open, so you still can vote by going to govote.at and fill in the number shown in the left-hand bottom in the screenshots, and try to skew the outcome your way (voting is anonymous). I’ll check the results again in two weeks…

1.The first question referred back to discussions from around 2007 during the standardization process of OWL 2: Several rather distinct features were discussed for OWL 2 that didn’t make it into the standard; do you (still) want any or all of them, if you ever did?

  • n-ary object properties, with n>2
  • constraints among different data properties, be this of the same object or different objects
  • unique name assumption
  • all of them!
  • I don’t really miss any of them

The results, below, show some preference for constraints among data properties, and overall a mild preference to at least have some of them, rather than none.

Voting results of question 1

Voting results of question 1

2. Is there any common pattern for which you would propose syntactic sugar?

  • Strict partial ordering
  • Disjoint transitive roles
  • Witnessed universal/closure: adding existentially quantified to a universal (Carral et al., OWLED14)
  • Witnessed universal/closure: adding universally quantified to an existential (raised in bio-ontologies literature)
  • Specific patterns; e.g., episodes
  • Nothing really

The results, below, are a bit divided. Carral et al.’s paper presented the day before seems to have done some good convincing, given the three votes, and the strict partial ordering, i.e., a pattern for parthood also received some votes, but about half of the respondents weren’t particularly interested in such things.

Voting results of question 2

Voting results of question 2

3. Ignoring practicalities on (in)feasibility, which of the following set of features would you like to see OWL to be extended with most?

  • Temporal
  • Fuzzy and Probabilistic
  • Rough sets
  • I’m not interested in any of these extensions

The results show that some temporal extension is the clear winner, which practically isn’t going to be easy to do, unfortunately, because even minor temporal extensions cause dramatic jumps in complexity. Other suggestions for extensions made during the discussion were more on data properties (again) and a way to deal with measurement units.

Voting results of question 3

Voting results of question 3

4. Which paradigm do you prefer in order to model / modify your ontologies in an ODE?

  • Controlled natural language
  • Diagram-based tool
  • Online collaborative tool
  • Dedicated ontology editor
  • Text editor
  • No preference
  • It depends on the task

Results again in the figure below. The interesting aspect is, perhaps, that there was no one who had no preference, and no one preferred a diagram-based tool. Mostly, it depends on the task, then some tool that caters for collaborative ontology development.

Voting results of question 4

Voting results of question 4

5. There are four standardised optional syntaxes in OWL 2. If due to time/resource constraints, tool compatibilities, etc., not all optional syntaxes could be accommodated for in an “OWL 3.0”, which could be discontinued, according to you, if any?

  • OWL/XML
  • Functional style
  • Turtle
  • Manchester
  • They all should stay

The latter option, that they all should stay, was selected most among the participants, though not by a majority of voters, and I’m sure it would have ended up differently with more participants (based on discussions afterward). Note: by now, the voting was shown ‘live’ as the responses came in cf. the earlier hide-and-show.

Voting results of question 5

Voting results of question 5

6. Turning around the question phrasing: Which feature do you like less?

  • Property chains
  • Key
  • Transitivity
  • The restrictions limiting the interactions between the different property characteristics (thus preventing certain patterns)
  • They are all useful to a greater or lesser extent

Options B and D generated a lively debate, but the results show clearly that the participants who voted wanted to keep them all.

Voting results of question 6

Voting results of question 6

7. Which of the following OP characteristics features do you consider most important when developing an ontology?

  • reflexivity
  • irreflexivity
  • symmetry
  • asymmetry
  • antisymmetry
  • transitivity
  • acyclicity

This last question appeared a no-brainer among the choices, with a unanimous transitivity above all. It was raised whether functional ought to have been included, which we intentionally had not done, for it’s a different kind of constraint (cardinality/multiplicity) than the properties of properties. The results most likely would have looked quite different if we did.

Voting results of question 7

Voting results of question 7

The results were supposed to be on the OWLED community page, but I have from reliable source (the general chair of OWLED14, Bijan Parsia) that the software doesn’t seem to be very friendly and feature rich, hence a quick post here. You can read Bijan’s live blogging of the presentations at OWLED there as well. The proceedings of the workshop are online as CEUR-WS vol. 1265.

Considering some stuff—scientifically

Yay, now I can say “I look into stuff” and actually be precise about what I have been working on (and get it published, too!), rather than just oversimplifying into vagaries about some of my research topics. The final title of the paper I settled on is not as funny as proposing a ‘pointless theory’ [1], though: it’s a Core Ontology of Macroscopic Stuff [2], which has been accepted at the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14).

The ‘stuff’, in philosophical terms, are those things that are in natural language indicated typically with mass nouns, being those things you can’t count other than in quantities, like gold, water, whipping cream, agar, milk, and so on. The motivation to look into that was both for practical and theoretical reasons. For instance, you are working in the food industry and thus have to be concerned with traceability of ingredients, so you will have to know which (bulk) ingredients originate from where. Then, if something goes wrong—say, an E. coli infection in a product for consumption—then it would be doable to find the source of the microbial contamination. Most people might not realize what happens in the production process; e.g., some quantity of milk comes from a dairy farm, and in the food processing plant, some components of a portion of the milk is separated into parts (whey separated from the cheese-in-the-making, fat for butter and the remainder buttermilk). To talk about parts and portions of such stuffs requires one to know about those stuffs, and how to model it, so there can be some computerized tracking system for swift responses.

On the theoretical side, philosophers were talking about hypothetical cases of sending molecules of mixtures to Venus and the Moon, which isn’t practically usable, in particular because it was glossing over some important details, like that milk is an emulsion and thus has a ‘minimum portion’ for it to remain an emulsion involving many molecules. Foundational ontologies, which I like for their modeling guidance, didn’t come to the rescue either; e.g., DOLCE has Amount of Matter for stuffs but stops there, BFO has none of it. Domain ontologies for food, but also in other areas, such as ecology and biomedicine, each have their own way of modelling stuff, be this by source, usage, or whatever, making things incompatible because several criteria are used. So, there was quite a gap. The core ontology of macroscopic stuff aims to bridge this gap.

This stuff ontology contains categories of stuff and is formalised in OWL. There are distinctions between pure stuff and mixtures, and differences among the mixtures, e.g., true solutions vs colloids among homogeneous mixtures, and solid heterogeneous mixtures vs. suspension among heterogeneous mixtures, and each one with a set of defining criteria. So, Milk is an Emulsion by its very essence, regardless if you want to assign it a role that it is a beverage (Envo ontology) or an animal-associated habitat (MEO ontology), Blood is a Sol (type of colloid), and (table) Sugar a StructuredPureStuff. A basic alignment of the relations involved is possible with the stuff ontology as well regarding granules, grains, and sub-stuffs (used in cyc and biotop, among others).

The ontology both refines the DOLCE and BFO foundational ontologies and it resolves the main type of interoperability issues with stuffs in domain ontologies, thereby also contributing to better ontology quality. To make the ontology usable, modelling guidelines are provided, with examples of inferences, a decision diagram, outline of a template, and illustrations solving the principal interoperability issues among domain ontologies (scroll down to the last part of the paper). The decision diagram, which also gives an informal idea of what’s in the stuff ontology, is depicted below.

Decision diagram to select the principal kind of stuff (Source: [2])

Decision diagram to select the principal kind of stuff (Source: [2])

You can access the stuff ontology on its own, as well as versions linked to DOLCE and BFO. I’ll be presenting it in Sweden at EKAW late November.

p.s.: come to think of it, maybe I should have called it smugly “a real ontology of substance”… (substance being another term used for stuff/matter)

References

[1] Borgo S., Guarino N., and Masolo C.. A Pointless Theory of Space Based On Strong Connection and Congruence, in L. Carlucci Aiello, J. Doyle (eds.), in Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning (KR’96), Morgan Kaufmann, Cambridge Massachusetts (USA), 5-8 November 1996, pp. 220-229.

[2] Keet, C.M. A Core Ontology of Macroscopic Stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI. (accepted)

Dabbling into evaluating reasoners with the DMOP ontology

The Data Mining OPtimization ontology (DMOP) is a highly axiomatised ontology that uses almost all features of OWL 2 DL and the domain entities are linked to DOLCE, using all four main ‘branches’ of DOLCE. Some details are described in last year’s OWLED’13 paper [1] and a blog post. We did observe ‘slow’ reasoner performance to classify the ontology, however, like, between 10 and 20 minutes, varying across versions and machines. The Ontology Reasoner Evaluation (ORE’14) workshop (part of the Vienna Summer of Logic) was a nice motivation to have a look at trying to figure out what’s going on, and some initial results are described briefly in the 6 pages-short paper [2], which is co-authored with Claudia d’Amato, Agnieszka Lawrynowicz, and Zubeida Khan.

Those results are definitely what can be called interesting, even though we’re still at the level of dabbling into it from a reasoner user-centric viewpoint, and notably, from a modeller-centric viewpoint. The latter is what made us pose questions like “what effect does using feature x have on performance of the reasoner?”. No one knew, except for the informal feedback back I received at DL 2010 on [3] that reasoning with data types slows down things, and likewise when the cardinalities are high. That’s not an issue with DMOP, though.

So, the first thing we did was determining a baseline on a good laptop—your average modeller doesn’t have a HPC cluster readily at hand—and in an Ontology Development Environment, where the reasoner is typically accessed from. Some 9 minutes to classify the ontology (machine specs and further details in the paper).

The second steps were the analysis of one specific modeling construct (inverses), and what effect DOLCE has on the overall performance.

The reason why we chose representation of inverses is because in OWL 2 DL (cf. OLW DL), one can use the objectInverseOf(OP) to use the inverse of an object property instead of extending the ontology’s vocabulary and using InverseObjectProperties(OPE1 OPE2) to relate the property and its inverse. For instance, to use the inverse the property addresses in an axiom, one used to have to introduce a new property, addressed by, declare it inverse to addresses, and then use that in the axiom, whereas in OWL 2 DL, one can use ObjectInverseOf(addresses) in the axiom (in Protégé, the syntax is inverse(addresses)). That slashed computing the class hierarchy by at least over a third (and about half for the baseline). Why? We don’t know. Other features used in DMOP, such as punning and property chains, were harder to remove and are heavily used, so we didn’t test those.

The other one, removing DOLCE, is a bit tricky. But to give away the end results upfront: that made it 10 times faster! The ‘tricky’ part has to do with the notion of ‘linking to a foundational ontology’ (deserving of its own blog post). For DMOP, we had not imported but merged, and we did not merge everything from DOLCE and its ExtendedDnS, but only what was deemed relevant, being, in numbers, 43 classes, 78 object properties and 593 axioms. To make matters worse—from an evaluation viewpoint, that is—is that we reused heavily three DOLCE object properties, so we kept those three DOLCE properties in the evaluation file, as we suspected it otherwise would have affected the deductions too much and interfere with the DOLCE-or-not question (one also could argue that those three properties can be considered an integral part of DMOP). So, it was not a simple case of ‘remove the import statement and run the reasoner again’, but a ‘remove almost everything with a DOLCE URI manually and then run the reasoner again’.

Because computation was so ‘slow’, we wondered whether maybe cleverly modularizing DMOP could be the way to go, in case someone wants to use only a part of DMOP. We got as far as trying to modularize the ontology, which already was not trivial because DMOP and DOCLE are both highly axiomatised and with few, if any, relatively isolated sections amenable to modularization. Moreover, what it did show, is that such automated modularization (when it was possible) only affects the number of class and number of axioms, not the properties and individuals. So, the generated modules are stuck with properties and individuals that are not used in, or not relevant for, that module. We did not fix that manually. Also, putting back together the modules did not return it to the original version we started out with, missing 225 axioms out of the 4584.

If this wasn’t enough already, the DMOP with/without DOLCE test was performed with several reasoners, out of curiosity, and they gave different output. FaCT++ and MORe had a “Reasoner Died” message. My ontology engineering students know that, according to DOLCE, death is an achievement, but I guess that its reasoners’ developers would deem otherwise. Pellet and TrOWL inferred inconsistent classes; HermiT did not. Pellet’s hiccup had to do with datatypes and should not have occurred (see paper for details). TrOWL fished out a modeling issue from all of those 4584 axioms (see p5 of the paper), of the flavour as described in [4] (thank you), but with the standard semantics of OWL—i.e., not caring at all about the real semantics of object property hierarchies—it should not have derived an inconsistent class.

Overall, it feels like having opened up a can of worms, which is exciting.

References

[1] Keet, C.M., Lawrynowicz, A., d’Amato, C., Hilario, M. Modeling issues and choices in the Data Mining OPtimisation Ontology. 8th Workshop on OWL: Experiences and Directions (OWLED’13), 26-27 May 2013, Montpellier, France. CEUR-WS vol 1080.

[2] Keet, C.M., d’Amato, C., Khan, Z.C., Lawrynowicz, A. Exploring Reasoning with the DMOP Ontology. 3rd Workshop on Ontology Reasoner Evaluation (ORE’14). July 13, 2014, Vienna, Austria. CEUR-WS vol (accepted).

[3] Keet, C.M. On the feasibility of Description Logic knowledge bases with rough concepts and vague instances. 23rd International Workshop on Description Logics (DL’10), 4-7 May 2010, Waterloo, Canada.

[4] Keet, C. M. (2012). Detecting and revising flaws in OWL object property expressions. In Proc. of EKAW’12, volume 7603 of LNAI, pages 252–266. Springer.

Ontology Engineering lecture notes for 2014 online

The lecture notes for the Ontology Engineering BSc honours in CS course are available online now. The file is updated compared to the COMP720 module (and those notes have been removed). The main changes consist of reordering the chapters in Block II and Block III, adding better or more explanations and examples in several sections, fixing typos, and updates to reflect advances made in the field. It again includes the DL primer written by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!).

As with the last three installments, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks after the introduction: logic foundations, ontology engineering, and advanced topics (the latter we will skip, as this is a shorter course). The logic foundations contain a recap of FOL and the notion of reasoning, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with methods and methodologies that give guidance how to commence actually developing an ontology, and how to avoid and fix issues. Subsequently, there are two chapters going into some detail of two ‘paths’ in the methodology, being top-down ontology development using foundational ontologies, and bottom-up ontology development to extract knowledge from other material, such as relational databases, thesauri, and natural language documents.

The advanced topics are optional this year, but I left them in the lecture notes, as they may pique your interest. Chapter 8 on Ontology-Based Data Access is a particular application scenario of ontologies that ‘spice up’ database applications. Chapter 9 touches upon a few sub-areas within ontologies: representing and reasoning with vagueness and uncertainty, extending the language to include also temporal knowledge, the use of ontologies to enhance conceptual data models, and a note on social aspects.

It is still an evolving document, and relative completeness of sections varies slightly, so it has to be seen in conjunction with the slides, lectures, and some additional documentation that will be made available on the course’s Vula site.

Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.