ChatGPT, deep learning and the like do not make ontologies (and the rest of AI) obsolete

Countless articles have announced the death of symbolic AI, which includes, among others, ontology engineering, in favour of data-driven AI with deep learning, even more loudly so since large language model-based apps like ChatGPT have captured the public’s attention and imagination. There are those who don’t even realise there is more to AI than deep learning with neural networks. But there is; have a look at the ACM Computing Classification or scroll down to the screenshots at the end of this post if you’re unaware of that. With all the hype and narrow focus, doom and gloom is being predicted with a new AI winter on the cards. But is it? It’s not like we all ditched mathematics at school when portable calculators became cheap gadgets, so why would AI now with machine and deep learning and Large Language Models (LLMs) and an app that attracts attention? Let me touch upon a few examples to illustrate that ontologies have not become obsolete, nor will they.

How exactly do you think data integration is done? Maybe ChatGPT can tell you what’s involved, superficially, but it won’t actually do it for you. Consider, for instance, a paper published earlier this month, on finding clusters of long Covid patient symptoms [Reese23], described in a press release:  they obtained data of 20,532 relevant patients from 38 (!!) data partners, where the authors mapped the clinical findings taken from the electronic health records “to computable terms contained in the Human Phenotype Ontology (HPO), a standard framework for describing human traits … This allowed the researchers to analyze the data across the entire cohort.” (italics are mine). Here’s an illustration of the idea:

Could reliable data integration possibly be done by LLMs? No, not even in the future. NLP with electronic health records is an option, true, but it won’t harmonise terminology for you, nor will it integrate different electronic health record systems.

LLMs aren’t good at playing with data in the myriad of ways where ontologies are used to power ‘intelligent’ applications. Data that’s generated in automation of scientific experiments, for instance, like that cell types in the brain need to be annotated and processed to try to find new cell types and then add annotations with those new types, which is used downstream in queries and further analysis [Tan23]. There is no new stuff in off-the-shelf LLMs, so they can’t help; ontologies can – and do. Ontologies are used and extended as needed to document the new ground truth, which won’t ever be replaced by LLMs, nor by the approximations that machine learning’s outputs are.

What about intelligent analysis of real-time data? Those LLMs won’t be of assistance there either. Take, e.g., energy-optimised building systems control: the system takes real-time data that is linked to an ontology and then it can automatically derive energy conservation measures for the building and its use [Pruvost22].

Much has been written on ChatGPT and education. It’s an application domain that permits for no mistakes on the teaching side of it and, in fact, demands for vetted quality. There are many tasks, from content presentation to assessment. ChatGPT can generate quiz questions, indeed, but only on general knowledge. It can generate a response as well, but whether that will be correct answer is another matter altogether. We also need other types of educational questions besides MCQs, in many disciplines, on specific texts and textbooks with its particular vocabulary, and have the answer computed for automated marking. Computing correct questions and answers can be done with ontologies and some basic automated reasoning services [Raboanary22]. One obtains precision with ontologies that cannot be had with probabilistic guessing. Or take the Foundational Model of Anatomy ontology as a concrete example, which is used to manage the topics in anatomy classes augmented with VR [Soergel22]. Ontologies can also be used as a method of teaching, in art history no less, to push students to dig into the details and be precise [Bertens22] – the opposite of bland, handwaivy, roughly, sort of, non-committal, and fickle responses ChatGPT provides, at times, to open questions.

They’re just a few application examples that I lazily came across in the timespan of a mere 15 minutes (including selecting them) – one via the LinkedIn timeline, a GS search on “ontologies” with a “since 2022” (17300 results this morning) and clicking a few links that sounded appealing, and one I’m involved in.

This post is not a cry of desperation before sinking, but, rather, mainly one of annoyance. Technology blinkers of any kind are no good and one better has more than just a hammer in one’s toolbox. Not everything can be solved by LLMs and deep learning, and Knowledge Representation (& Reasoning) is not dead. It may have been elbowed to the side by the new kids on the block. I suspect that those in the ‘symbolic AI is obsolete’ camp simply aren’t aware – or would like to pretend not to be aware – of the many different AI-driven computing tasks that need to be solved and implemented. Tasks for which there are no humongous amounts of text or non-text data to grab and learn from. Tasks that are not tolerant to outputs that are noisy or plain wrong. Tasks that require current data, not stale stuff from over a year old and longer ago. Tasks where past data are not a good predictor for the future. Tasks in specialised domains. Tasks that are quirky to a locale. And so on. The NLP community already has recognised LLM’s outputs need fixing, which I was pleasantly surprised with when I attended EMNLP’22 in December (see my EMNLP22 trip report for a few pointers).

Also, and casting the net a little wider, our academic year is about to start, where students need to choose projects and courses, including, among others, another installment of ontology engineering, of logic for AI, Computer Vision, and so on. Perhaps this might assist in choosing and in reflecting that computing as a whole is not going to be obsolete either. ChatGPT and CodePilot can probably pass our 1st-year practical assignments, but there’s so much more computing beyond that, that relies on students understanding the foundations and problem-solving methods. Why should the whole rest of AI, and even computing as a discipline, become obsolete the instant a tool can, at best, regurgitate the known coding solutions to common basic tasks. There are still mathematicians notwithstanding all the devices more powerful than a pocket calculator and there are linguists regardless the free availability of Google Translate’s services; so why would software engineers not remain when there’s a code-completion tool for basic tasks.

Perhaps you still do not care about ontologies and knowledge representation & reasoning. That’s fine; everyone has their interests – just don’t confound new interests for obsolescence of established topics. In case you do want to know more about ontologies and ontology engineering: you may like to have a look at my award-winning open textbook, with exercises, tools, and slides.

p.s.: here are those screenshots on the ACM classification and AI, annotated:

References

[Bertens22] Bertens, L. M. F. Modeling the art historical canon. Arts and Humanities in Higher Education, 2022, 21(3), 240-262.

[Pruvost22] Pruvost, Hervé and Olaf Enge-Rosenblatt. Using Ontologies for Knowledge-Based Monitoring of Building Energy Systems. Computing in Civil Engineering 2021. American Society of Civil Engineers, 2022, pp762-770.

[Roboanary22] Raboanary, T., Wang, S., Keet, C.M. Generating Answerable Questions from Ontologies for Educational Exercises. 15th Metadata and Semantics Research Conference (MTSR’21). Garoufallou, E., Ovalle-Perandones, M-A., Vlachidis, A (Eds.). 2022, Springer CCIS vol. 1537, 28-40.

[Reese23] Reese, J. et al. Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. eBioMedicine, Volume 87, 104413, January 2023.

[Soergel22] Soergel, Dagobert, Olivia Helfer, Steven Lewis, Matthew Wysocki, David Mawer. Using Virtual Reality & Ontologies to Teach System Structure & Function: The Case of Introduction to Anatomy. 12th International conference on the Future of Education 2022. 2022/07/01

[Tan23] Tan, S.Z.K., Kir, H., Aevermann, B.D. et al. Brain Data Standards – A method for building data-driven cell-type ontologies. Scientific Data, 2023, 10, 50.

An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

• Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
• Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
• Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

Automatically finding the feasible object property

Late last month I wrote about the updated taxonomy of part-whole relations and claimed it wasn’t such a big deal during the modeling process to have that many relations to choose from. Here I’ll back up that claim. Primarily, it is thanks to the ‘Foundational Ontology and Reasoner enhanced axiomatiZAtion’ (FORZA) approach which includes the Guided ENtity reuse and class Expression geneRATOR (GENERATOR) method that was implemented in the OntoPartS-2 tool [1]. The general idea of the GENERATOR method is depicted in the figure below, which outlines two scenarios: one in which the experts perform the authoring of their domain ontology with the help of a foundational ontology, and the other one without a foundational ontology.

I think the pictures are clearer than the following text, but some prefer text, so here goes the explanation attempt. Let’s start with scenario A on the left-hand side of the figure: a modeller has a domain ontology and a foundational ontology and she wants to relate class two domain classes (indicated with C and D) and thus needs to select some object property. The first step is, indeed, selecting C and D (e.g., Human and Heart in an anatomy ontology); this is step (1) in the Figure.

Then (step 2) there are those long red arrows, which indicate that somehow there has to be a way to deal with the alignment of Human and of Heart to the relevant categories in the foundational ontology. This ‘somehow’ can be either of the following three options: (i) the domain ontology was already aligned to the foundational ontology, so that step (2) is executed automatically in the background and the modeler need not to worry, (ii) she manually carries out the alignment (assuming she knows the foundational ontology well enough), or, more likely, (iii) she chooses to be guided by a decision diagram that is specific to the selected foundational ontology. In case of option (ii) or (iii), she can choose to save it permanently or just use it for the duration of the application of the method. Step (3) is an automated process that moves up in the taxonomy to find the possible object properties. Here is where an automated reasoner comes into the equation, which can step-wise retrieve the parent class, en passant relying on taxonomic classification that offers the most up-to-date class hierarchy (i.e., including implicit subsumptions) and therewith avoiding spurious candidates. From a modeller’s viewpoint, one thus only has to select which classes to relate, and, optionally, align the ontology, so that the software will do the rest, as each time it finds a domain and range axiom of a relationship in which the parents of C and D participate, it is marked as a candidate property to be used in the class expression. Finally, the candidate object properties are returned to the user (step 4).

While the figure shows only one foundational ontology, one equally well can use a separate relation ontology, like PW or PWMT, which is just an implementation variant of scenario A: the relation ontology is also traversed upwards and on each iteration, the base ontology class is matched against relational ontology to find relations where the (parent of the) class is defined in a domain and range axiom, also until the top is reached before returning candidate relations.

The second scenario with a domain ontology only is a simplified version of option A, where the alignment step is omitted. In Figure-B above, GENERATOR would return object properties W and R as options to choose from, which, when used, would not generate an inconsistency (in this part of the ontology, at least). Without this guidance, a modeler could, erroneously, select, say, object property S, which, if the branches are disjoint, would result in an inconsistency, and if not declared disjoint, move class C from the left-hand branch to the one in the middle, which may be an undesirable deduction.

For the Heart and Human example, these entities are, in DOLCE terminology, physical objects, so that it will return structural parthood or plain parthood, if the PW ontology is used as well. If, on the other hand, say, Vase and Clay would have been the classes selected from the domain ontology, then a constitution relation would be proposed (be this with DOLCE, PW, or, say, GFO), for Vase is a physical object and Clay an amount of matter. Or with Limpopo and South Africa, a tangential proper parthood would be proposed, because they are both geographic entities.

The approach without the reasoner and without the foundational ontology decision diagram was tested with users, and showed that such a tool (OntoPartS) made the ontology authoring more efficient and accurate [2], and that aligning to DOLCE was the main hurdle for not seeing even more impressive differences. This is addressed with OntoPartS-2, so it ought to work better. What still remains to be done, admittedly, is that larger usability study with the updated version OntoPartS-2. In the meantime: if you use it, please let us know your opinion.

References

[1] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM proceedings, pp569-578. Oct. 27 – Nov. 1, 2013, San Francisco, USA.

[2] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

The TDDonto tool to try out TDD for ontology authoring

Last month I wrote about Test-Driven Development for ontologies, which is described in more detail in the ESWC’16 paper I co-authored with Agnieszka Lawrynowicz [1]. That paper does not describe much about the actual tool implementing the tests, TDDonto, although we have it and used it for the performance evaluation. Some more detail on its design and more experimental results are described in the paper “The TDDonto Tool for Test-Driven Development of DL Knowledge Bases” [2] that has just been published in the proceedings of the 29th International Workshop on Description Logics, which will take place next weekend in Cape Town (22-25 April 2016).

What we couldn’t include there in [2] is multiple screenshots to show how it works, but a blog is a fine medium for that, so I’ll illustrate the tool with some examples in the remainder of the post. It’s an alpha version that works. No usability and HCI evaluations have been done, but at least it’s a Protégé plugin rather than command line :).

First, you need to download the plugin from Agnieszka’s ARISTOTELES project page and place the jar file in the plugins folder of Protégé 5.0. You can then go to the Protégé menu bar, select Windows – Views – Evaluation views – TDDOnto, and place it somewhere on the screen and start using it. For the examples here, I used the African Wildlife Ontology tutorial ontology (AWO v1) from my ontology engineering course.

Make sure to have selected an automated reasoner, and classify your ontology. Now, type a new test in the “New test” field at the top, e.g. carnivore DisjointWith: herbivore, click “Add test”, select the checkbox of the test to execute, and click the “Execute test”: the status will be returned, as shown in the screenshot below. In this case, the “OK” says that the disjointness is already asserted or entailed in the ontology.

Now let’s do a TDD test that is going to fail (you won’t know upfront, of course); e.g., testing whether impalas are herbivores:

The TDD test failed because the subsumption is neither asserted nor entailed in the ontology. One can then click “add to ontology”, which updates the ontology:

Note that the reasoner has to be run again after a change in the ontology.

Lets do two more: testing whether lion is a carnivore and that flower is a plan part. The output of the tests is as follows:

It returns “OK” for the lion, because it is entailed in the ontology: a carnivore is an entity that eats only animals or parts thereof, and lions eat only herbivore and eats some impala (which are animals). The other one, Flower SubClassOf: PlantParts fails as “undefined”, because Flower is not in the ontology.

Ontologies do not have only subsumption and disjointness axioms, so let’s assume that impalas eat leaves and we want check whether that is in the ontology, as well as whether lions eat animals:

The former failed because there are no properties for the impala in the AWO v1, the latter passed, because a lion eats impala, and impala is an animal. Or: the TDDOnto tool indeed behaves as expected.

Currently, only a subset of all the specified tests have been implemented, due to some limitations of existing tools, but we’re working on implementing those as well.

If you have any feedback on TDDOnto, please don’t hesitate to tell us. I hope to be seeing you later in the week at DL’16, where I’ll be presenting the paper on Sunday afternoon (24th) and I also can give a live demo any time during the workshop (or afterwards, if you stay for KR’16).

References

[1] Keet, C.M., Lawrynowicz, A. Test-Driven Development of Ontologies. 13th Extended Semantic Web Conference (ESWC’16). Springer LNCS. 29 May – 2 June, 2016, Crete, Greece. (in print)

[2] Lawrynowicz, A., Keet, C.M. The TDDonto Tool for Test-Driven Development of DL Knowledge bases. 29th International Workshop on Description Logics (DL’16). April 22-25, Cape Town, South Africa. CEUR WS vol. 1577.

Reblogging 2012: Fixing flaws in OWL object property expressions

From the “10 years of keetblog – reblogging: 2012”: There are several 2012 papers I (co-)authored that I like and would have liked to reblog—whatever their citation counts may be. Two are on theoretical, methodological, and tooling advances in ontology engineering using foundational ontologies in various ways, in collaboration with Francis Fernandez and Annette Morales following a teaching and research visit to Cuba (ESWC’12 paper on part-whole relations), and a dedicated Honours student who graduated cum laude, Zubeida Khan (EKAW’12 paper on foundational ontology selection). The other one, reblogged here, is of a more fundamental nature—principles of role [object property] hierarchies in ontologies—and ended up winning best paper award at EKAW’12; an extended version has been published in JoDS in 2014. I’m still looking for a student to make a proof-of-concept implementation (in short, thus far: when some are interested, there’s no money, and when there’s money, there’s no interest).

———–

OWL 2 DL is a very expressive language and, thanks to ontology developers’ persistent requests, has many features for declaring complex object property expressions: object sub-properties, (inverse) functional, disjointness, equivalence, cardinality, (ir)reflexivity, (a)symmetry, transitivity, and role chaining. A downside of this is that with the more one can do, the higher is the chance that flaws in the representation are introduced; hence, an unexpected or undesired classification or inconsistency may actually be due to a mistake in the object property box, not a class axiom. While there are nifty automated reasoners and explanation tools that help with the modeling exercise, the standard reasoning services for OWL ontologies assume that the axioms in the ‘object property box’ are correct and according to the ontologist’s intention. This may not be the case. Take, for instance, the following thee examples, where either the assertion is not according to the intention of the modeller, or the consequence may be undesirable.

• Domain and range flaws; asserting hasParent $\sqsubseteq$ hasMother instead of hasMother $\sqsubseteq$ hasParent in accordance with their domain and range restrictions (i.e., a subsetting mistake—a more detailed example can be found in [1]), or declaring a domain or a range to be an intersection of disjoint classes;
• Property characteristics flaws: e.g., the family-tree.owl (when accessed on 12-3-2012) has hasGrandFather $\sqsubseteq$ hasAncestor and Trans(hasAncestor) so that transitivity unintentionally is passed down the property hierarchy, yet hasGrandFather is really intransitive (but that cannot be asserted in OWL);
• Property chain issues; for instance the chain hasPart $\circ$ hasParticipant $\sqsubseteq$ hasParticipant in the pharmacogenomics ontology [2] that forces the classes in class expressions using these properties—in casu, DrugTreatment and DrugGeneInteraction—to be either processes due to the domain of the hasParticipant object property, or they will be inconsistent.

Unfortunately, reasoner output and explanation features in ontology development environments do not point to the actual modelling flaw in the object property box. This is due to that implemented justification and explanation algorithms [3, 4, 5] consider logical deductions only and that class axioms and assertions about instances take precedence over what ‘ought to be’ concerning object property axioms, so that only instances and classes can move about in the taxonomy. This makes sense from a logic viewpoint, but it is not enough from an ontology quality viewpoint, as an object property inclusion axiom—being the property hierarchies, domain and range axioms to type the property, a property’s characteristics (reflexivity etc.), and property chains—may well be wrong, and this should be found as such, and corrections proposed.

So, we have to look at what type of mistakes can be made in object property expressions, how one can get the modeller to choose the ontologically correct options in the object property box so as to achieve a better quality ontology and, in case of flaws, how to guide the modeller to the root defect from the modeller’s viewpoint, and propose corrections. That is: the need to recognise the flaw, explain it, and to suggest revisions.

To this end, two non-standard reasoning services were defined [6], which has been accepted recently at the 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12): SubProS and ProChainS. The former is an extension to the RBox Compatibility Service for object subproperties by [1] so that it now also handles the object property characteristics in addition to the subsetting-way of asserting object sub-properties and covers the OWL 2 DL features as a minimum. For the latter, a new ontological reasoning service is defined, which checks whether the chain’s properties are compatible by assessing the domain and range axioms of the participating object properties. Both compatibility services exhaustively check all permutations and therewith pinpoint to the root cause of the problem (if any) in the object property box. In addition, if a test fails, one or more proposals are made how best to revise the identified flaw (depending on the flaw, it may include the option to ignore the warning and accept the deduction). Put differently: SubProS and ProChainS can be considered so-called ontological reasoning services, because the ontology does not necessarily contain logical errors in some of the flaws detected, and these two services thus fall in the category of tools that focus on both logic and additional ontology quality criteria, by aiming toward ontological correctness in addition to just a satisfiable logical theory. (on this topic, see also the works on anti-patterns [7] and OntoClean [8]). Hence, it is different from other works on explanation and pinpointing mistakes that concern logical consequences only [3,4,5], and SubProS and ProChainS also propose revisions for the flaws.

SubProS and ProChainS were evaluated (manually) with several ontologies, including BioTop and the DMOP, which demonstrate that the proposed ontological reasoning services indeed did isolate flaws and could propose useful corrections, which have been incorporated in the latest revisions of the ontologies.

Theoretical details, the definition of the two services, as well as detailed evaluation and explanation going through the steps can be found in the EKAW’12 paper [6], which I’ll present some time between 8 and 12 October in Galway, Ireland. The next phase is to implement an efficient algorithm and make a user-friendly GUI that assists with revising the flaws.

References

[1] Keet, C.M., Artale, A.: Representing and reasoning over a taxonomy of part-whole relations. Applied Ontology 3(1-2) (2008) 91–110

[2] Dumontier, M., Villanueva-Rosales, N.: Modeling life science knowledge with OWL 1.1. In: Fourth International Workshop OWL: Experiences and Directions 2008 (OWLED 2008 DC). (2008) Washington, DC (metro), 1-2 April 2008

[3] Horridge, M., Parsia, B., Sattler, U.: Laconic and precise justifications in OWL. In: Proceedings of the 7th International Semantic Web Conference (ISWC 2008). Volume 5318 of LNCS., Springer (2008)

[4] Parsia, B., Sirin, E., Kalyanpur, A.: Debugging OWL ontologies. In: Proceedings of the World Wide Web Conference (WWW 2005). (2005) May 10-14, 2005, Chiba, Japan.

[5] Kalyanpur, A., Parsia, B., Sirin, E., Grau, B.: Repairing unsatisfiable concepts in OWL ontologies. In: Proceedings of ESWC’06. Springer LNCS (2006)

[6] Keet, C.M. Detecting and Revising Flaws in OWL Object Property Expressions. 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12), Oct 8-12, Galway, Ireland. Springer, LNAI, 15p. (in press)

[7] Roussey, C., Corcho, O., Vilches-Blazquez, L.: A catalogue of OWL ontology antipatterns. In: Proceedings of K-CAP’09. (2009) 205–206

[8] Guarino, N., Welty, C.: An overview of OntoClean. In Staab, S., Studer, R., eds.: Handbook on ontologies. Springer Verlag (2004) 151–159

Reblogging 2010: Rough ontologies from an ontology engineering perspective

From the “10 years of keetblog – reblogging: 2010”: two solid papers on the feasibility of rough ontologies, presented at DL’10 (logic stuff) and EKAW’10 (ontology engineering aspects). Short answer: not really feasible from a computational viewpoint. Notwithstanding, I did give it a try afterwards with ‘rOWL’ (SAICSIT’11 paper) before giving up on it, and other people do try, too (see citations of the DL and EKAW papers).

—-

Somewhere buried in the blogpost about the DL’10 workshop, I mentioned the topic of my paper [1] at the 23rd International Description Logics Workshop (DL’10), which concerned the feasibility of rough DL knowledge bases. That paper was focussed on the theoretical assessment (result: there are serious theoretical hurdles for rough DL KBs) and had a rather short section where experimental results were crammed into the odd page (result: one can squeeze at least something out of the extant languages and tools, but more should be possible in the near future). More recently, my paper [2] submitted to the 17th International Conference on Knowledge Engineering and Knowledge Management (EKAW’10) got accepted, which focuses on the ontology engineering side of rough ontologies and therefore has a lot more information on how one can squeeze something out of the extant languages and tools; if that is not enough, there is also supplementary material that people can play with.

Ideally, they ought to go together in on paper to get a good overview at once, but there are page limits for conference papers and anyhow the last word has not been said about rough ontologies. For what it is worth, I have put the two together in the slides for the weekly KRDB Lunch Seminar that I will present tomorrow at, well, lunch hour in the seminar room on the first floor of the POS building.

References

[1] Keet, C. M. On the feasibility of Description Logic knowledge bases with rough concepts and vague instances. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp314-324.

[2] Keet, C. M. Ontology engineering with rough concepts and instances17th International Conference on Knowledge Engineering and Knowledge Management (EKAW’10). 11-15 October 2010, Lisbon, Portugal.  Springer LNCS.

Reblogging 2006: Figuring out requirements for automated reasoning services for formal bio-ontologies

From the “10 years of keetblog – reblogging: 2006”: a preliminary post that led to the OWLED 2007 paper I co-authored with Marco Roos and Scott Marshall, when I was still predominantly into bio-ontologies and biological databases. The paper received quite a few citations, and a good ‘harvest’ from both OWLED’07 and co-located DL’07 participants on how those requirements may be met (described here). The original post: Figuring out requirements for automated reasoning services for formal bio-ontologies, from Dec 27, 2006.

What does the user want? There is a whole sub-discipline on requirements engineering, where researchers look into methodologies how one best can extract the users’ desires for a software application and organize the requirements according to type and priority. But what to do when users – in this case biologists and (mostly non-formal) bio-ontologies developers – neither do know clearly themselves what they want nor what type of automated reasoning is already ‘on offer’. Here, I’m making a start by briefly listing informally some of the desires & usages that I came across in the literature, picked up from conversations and further probing to disambiguate the (for a logician) vague descriptions, or bumped into myself; they are summarised at the end of this blog entry and update (d.d. 5-5-’07) described more comprehensively in [0].

Feel free to add your wishes & demands; it may even be fed back into current research like [1] or be supported already after all. (An alternative approach is describing ‘scenarios’ from which one can try to extract the required reasoning tasks; if you want to, you can add those as well.)

I. A fairly obvious use of automated reasoners such as Racer, Pellet and FaCT++ with ontologies is to let the software find errors (inconsistencies) in the representation of the knowledge or reality. This is particularly useful to ensure no ‘contradictory’ information remains in the ontology when an ontology gets too big for one person to comprehend and multiple people update an ontology. Also, it tends to facilitate learning how to formally represent something. Hence, the usage is to support the ontology development process.

But this is just the beginning: having a formal ontology gives you other nice options, or at least that is the dangling carrot on front of the developer’s nose.

II. One demonstration of the advantages of having a formal ontology, thus not merely a promise, is the classification of protein phosphatases by Wolstencroft et al. [9], where also some modest results were obtained in discovering novel information about those phosphatases that was entailed in the extant information but hitherto unknown. Bandini and Mosca [2] pushed a closely related idea one step further in another direction. To constrain the search space of candidate rubber molecules for tire production, they defined the constraints (properties) all types of molecules for tires must satisfy in the TBox, treated each candidate molecule as an instance in the ABox, and performed model checking on the knowledgebase: each instance inconsistent w.r.t. the TBox was thrown out of the pool of candidate-molecules. Overall, the formal representation with model checking achieved a considerable reduction in resource usage of the system and reduced the amount of costly wet-lab research. Hence, the usages are classification and model checking.[i]

III. Whereas the former includes usage of particular instances for the reasoning scenarios, another on is to stay at the type level and, in particular, relations between the types (classes in the class hierarchy in Protégé). In short, some users want to discover new or missing relations. What type of relation is not always exactly clear, but I assume for now that any non-isA relation would do. For instance, Roos et al. [8] would like to do that for the subject domain of histones; with or without instance-level data. The former, using instance-level data, resembles the reverse engineering option in VisioModeler, which takes a physical database schema and the data stored in the tables and computes the likely entities, relations, and constraints at the level of a conceptual model (in casu, ORM). Mungall [7] wants to “Check for inconsistent or missing relationships” and “Automatically assign relationships for new terms”. How can one find what is not there but ought to be in the ontology? An automated reasoner is not an oracle. I will split up this topic into two aspects. First, one can derive relations among types, meaning that some ontology developer has declared several types, relations, other properties, but not everything. The reasoner then, takes the declared knowledge and can return relations that are logically implied by the formal ontology. From a user perspective, such a derived relation may be perceived as a ‘new’ or ‘missing’ relation – but it did not fall out of the sky because the relation was already entailed in the ontology (or: you did not realize you knew it already). Second, another notion of ‘missing relations’: e.g. there are 17 types of macrophages (types of cell) in the FMA, which must be part of, contained in, or located in something. If you query the FMA through OQAFMA, it gives as answer that the hepatic macrophage is part of the liver [5]. An informed user knows it cannot be the case that the other macrophages are not part of anything. Then, the ontology developer may want to fill this gap – adding the ‘missing’ relations – by further developing those cell-level sections of the ontology. Note that the reasoner will not second-guess you by asking “do you want more things there?”; it uses the Open World Assumption, i.e. that there always may be more than actually represented on the ontology (and absence of some piece of information is not negation of that piece). Thus, the requirements are to have some way of dealing with gaps’ in an ontology, to support computing derived relations entailed in a logical theory, and, third, deriving type-level relations based on instance-level data. The second one is already supported, the first one only with intervention by an informed user, and the third one might, to some extent.

Now three shorter points, either because there is even less material or there is too much to stuff it in this blog entry.

IV. A ‘this would be nice’ suggestion from Christina Hettne, among others, concerns the desire to compare pathways, which, in its simplest form, amounts to checking for sub-graph isomorphisms. More generally, one could – or should be able to – treat an ontology as a scientific theory [6] and compare competing explanations of some natural phenomenon (provided both are represented formally). Thus, we have a requirement for comparison of two ontologies, not with the aim of doing meaning negotiation and/or merging them, but where the discrepancies themselves are the fun part. This indicates that dealing with ‘errors’ that a reasoner spits out could use an upgrade toward user-friendliness.

V. Reasoning with parthood and parthood-like relations in bio-ontologies are on a par with importance of the subsumption relation. Donnelly [3] and Keet [4], among many, would like to use parthood and parthood-like relations for reasoning, covering more than transitivity alone. Generalizing a bit, we have another requirement: reasoning with properties (relations) and hierarchies of relations, focusing first on the part-whole relation. What reasoning services are required exactly, be it for parthood or any other relation, deserves an entry on its own.

VI. And whatnot? For instance, linking up different ontologies that each reside at their own level of granularity, yet have enabled to perform ‘granular cross-ontology queries’, or infer locations of diseases based on combining an anatomy ontology with a disease taxonomy, hence, reasoning over linked ontologies. This needs to be written down in more detail, and may be covered at least partially with point two in item III.

Summarizing, we have to following requirements for automated reasoning services, in random order w.r.t. importance:

• Support in the ontology development process;
• Classification;
• Model checking;
• Finding ‘gaps’ in the content of an ontology;
• Computing derived relations at the type level;
• Deriving type-level relations from instance-level data;
• Comparison of two ontologies ([logical] theories);
• Reasoning with a plethora of parthood and parthood-like relations;
• Using (including finding inconsistencies in) a hierarchy of relations in conjunction with the class hierarchy;

I doubt this is an exhaustive list, and expect to add more requirements & desires soon. They also have to be specified more precisely than explained briefly above and the solutions to meet these requirements need to be elaborated upon as well.

[0] Keet, C.M., Roos, M., Marshall, M.S. A survey of requirements for automated reasoning services for bio-ontologies in OWL. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria. CEUR-WS.

[1] European FP6 FET Project “Thinking ONtologiES (TONES)”. (UDATE 29-7-2015: URL defunct by now)

[2] Bandini, S., Mosca, A. Mereological knowledge representation for the chemical formulation. 2nd Workshop on Formal Ontologies Meets Industry 2006 (FOMI2006), 14-15 December 2006, Trento, Italy. pp55-69.

[3] Donnelly, M., Bittner, T. and Rosse, C. A Formal Theory for Spatial Representation and Reasoning in Biomedical Ontologies. Artificial Intelligence in Medicine, 2006, 36(1):1-27.

[4]
Keet, C.M. Part-whole relations in Object-Role Models. 2nd International Workshop on Object-Role Modelling (ORM 2006), Montpellier, France, Nov 2-3, 2006. In: OTM Workshops 2006. Meersman, R., Tari, Z., Herrero., P. et al. (Eds.), LNCS 4278. Berlin: Springer-Verlag, 2006. pp1116-1127.

[5]
Keet, C.M. Granular information retrieval from the Gene Ontology and from the Foundational Model of Anatomy with OQAFMA. KRDB Research Centre Technical Report KRDB06-1, Free University of Bozen-Bolzano, 6 April 2006. 19p.

[6] Keet, C.M.
Factors affecting ontology development in ecology. Data Integration in the Life Sciences 2005 (DILS2005), Ludaescher, B, Raschid, L. (eds.). San Diego, USA, 20-22 July 2005. Lecture Notes in Bioinformatics 3615, Springer Verlag, 2005. pp46-62.

[7] Mungall, C.J. Obol: integrating language and meaning in bio-ontologies. Comparative and Functional Genomics, 2004, 5(6-7):509-520. (UPDATE: link rot as well; a freely accessible veriosn is available at: http://berkeleybop.org/~cjm/obol/doc/Mungall_CFG_2004.pdf)

[8] Roos, M., Rauwerda, H., Marshall, M.S., Post, L., Inda, M., Henkel, C., Breit, T. Towards a virtual laboratory for integrative bioinformatics research. CSBio Reader: Extended abstracts of “CS & IT with/for Biology” Seminar Series 2005. Free University of Bozen-Bolzano, 2005. pp18-25.

[9]
Wolstencroft, K., Lord, P., Tabernero, L., Brass, A., Stevens, R. Using ontology reasoning to classify protein phosphatases [abstract]. 8th Annual Bio-Ontologies Meeting; 2005 24 June; Detroit, United States of America.

[i] Observe another aspect regarding model checking where the automated reasoner checks if the theory is satisfiable, or: given your ontology, if there is/can be a combination of instances such that all the declared knowledge in the ontology holds (is true), which is called a model’ (as well, like so many things). That an ontology is satisfiable does not imply it only has models as intended by you, i.e. there is a difference between ‘all models’ and ‘all intended models’. If an ontology is satisfiable it means that it is logically possible that each type can be instantiated without running into inconsistencies; it neither demonstrates that one can indeed find in reality the real-world versions of those represented entities nor if there is one-and-only-one model that actually matches exactly the data you may want to have linked to & checked against the ontology.

From data on conceptual models to optimal (logic) language profiles

There are manifold logic-based reconstructions of the main conceptual data modelling languages in a ‘gazillion’ of logics. The reasons for pursuing this line of work are good. In case you wonder, consider:

• Automated reasoning over a conceptual data model to improve their quality and avoid bugs; e.g., an empty database table due to an inconsistency in the model (unsatisfiable class). Instead of costly debugging, one can catch it upfront.
• Designing and executing queries with the model’s vocabulary cf. putting up with how the data is stored with its typically cryptic table and column names.
• Test data generation in automation of software engineering.
• Use it as ‘background knowledge’ during the query compilation stage (which helps optimizing it, so better performance querying a database).

Most of the research efforts on formalizing the conceptual data modelling languages have gone to capturing as much as possible of the modelling language, and therewith aiming to solve the first use case scenario. Runtime usage of conceptual models, i.e., use case scenarios 2-4 above, is receiving some attention, but it brings with it its own set of problems: which trade-offs are the best? That is, we know we can’t have both the modelling languages in their full glory formalised on some arbitrary (EXPTIME or undecidable) logic and have scalable runtime performance. But which subset to choose? There are papers where (logician) authors state something like ‘you don’t need keys in ER, so we ignore those’ or ‘let’s skip ternaries, as most relationships are binary anyway’ or ‘we sweep those pesky aggregation associations under the carpet’ or ‘hierarchies, disjointness and completeness are certainly important’. Who’s right? Or is neither one of them right?

So, we had all that data of the 101 UML, ER, EER, ORM, and ORM2 models analysed (see previous post and [1]). With that, we could construct evidence-based profiles based on the features that are actually used by modellers, rather than constructing profiles based on gut feeling or on one’s pet logic. We specified a core profile and one for each family of the conceptual data modelling languages under consideration (UML Class Diagrams, ER/EER, and ORM/ORM2). The details of the outcome can be found in our recently accepted paper “Evidence-based Languages for Conceptual Data Modelling Profiles” [1] that has been accepted at the 19th Conference on Advances in Databases and Information Systems (ADBIS’15), that will take place from September 8-12 in Poitiers, France. As with the other recent posts on conceptual data models, also this paper was co-authored with Pablo Fillottrani and is an output of our DST/MINCyT-funded bi-lateral project on the unification of conceptual data modelling languages (project overview).

To jump to the short answer: the core profile can be represented in $\mathcal{ALNI}$ (called $\mathcal{PL}_1$ in [3], with PTIME subsumption), whereas the modelling language-specific profiles do not match any of the very many currently existing Description Logic languages with known computational complexity.

Now how we got into that situation. There are some formalization options to consider first, which can affect the complexity of the logic. Notably, 1) whether to use inverses or qualified number restrictions, and 2) whether to go for DL role components for UML’s association ends/ORM’s roles/ER’s relationship components with a 1:1 mapping, or to ignore that and formalise the associations/fact types/relationships only (and how to handle that choice then). Extending a logic language with inverses tends to be less costly computationally cf. qualified number restrictions, so we chose the former. The latter is more complicated to handle regardless the choice, which is partially due to the fact that they are surface aspects of an underlying difference in ontological commitment as to what relations are—so-called standard view versus positionalist—and how it is represented in the models (see discussion in the paper). For the core profile, the dataset of conceptual models justified binaries + standard view representation. In addition to that, the core profile has classes, attributes, mandatory and arbitrary (unqualified) cardinality, class subsumption, and single identification. That set covers 87.57% of all the entities in the models in the dataset (91.88% of the UML models, 73.29% of the ORM models, and 94.64% of the ER/EER models). Note there’s no disjointness or completeness (there were too few of them to merit inclusion) and no role and relationship subsumption, so there isn’t much one can deduce automatically, which is a bit of a bummer.

The UML profile extends the core only slightly, yet it covers 99.44% of the elements in the UML diagrams of the dataset: add cardinality on attributes, attribute value constraints, subsumption for DL roles (UML associations), and aggregation (they are plain associations since UML v2.4.1). This makes a “$\mathcal{ALNHI}(D)$” DL that, as far as we know, hasn’t been investigated yet. That said, fiddling a bit by opting for unique name assumption and some constraints on cardinalities and role inclusion, it looks like $DL\mbox{-}Lite^{\mathcal{HN}}_{core}$ [4] may suffice, which is NLOGSPACE in subsumption and $AC^0$ in data complexity.

For ER/EER, we need to add to the core the following to make it to 99.06% coverage: composite and multivalued attribute (remodelled), weak entity type with its identification constraint, ternaries, associative entity types, and multi-attribute identification. With some squeezing and remodelling things a bit (see paper), $DL\mbox{-}Lite^{\mathcal{N}}_{core}$ [4] should do (also NLOGSPACE), though $\mathcal{DLR}_{ifd}$ [5] will make the formalisation better to follow (though that DL has too many features and is EXPTIME-complete).

Last, the ORM/ORM2 profile, which is the largest to achieve a high coverage (98.69% of the elements in the models in the data set): the core profile + subsumption on roles (DL role components) and fact types (DL roles), n-aries, disjointness on roles, nested object types, value constraints, disjunctive mandatory, internal and external uniqueness, external identifier (compound reference scheme). There’s really no way to avoid the roles, n-aries, and disjointness. There’s no exactly fitting DL for this cocktail of features, though $\mathcal{DLR}_{ifd}$ and $latex$\mathcal{CFDI}_{nc}^{\forall -} &s=-2\$ [6] approximate it; however, the former has too much constructs and the latter too few. That said, $\mathcal{DLR}_{ifd}$ is computationally not ‘well-behaved’, but with $\mathcal{CFDI}_{nc}^{\forall -}$ we still can capture over 96% of the elements in the ORM models of the dataset and it’s PTIME (yup, tractable) [7].

The discussion section of the paper answers the research questions we posed at the beginning of the investigation and reflects on not only missing features, but also ‘useless’ ones. Perhaps we won’t make a lot of friends discussing ‘useless’ features, especially when some authors investigated exactly those features. Anyway, here it goes. Really, nominal are certainly not needed (and computationally costly to boot). We can only guess why there were so few disjointness and completeness constraints in the data set, and even when they were present, they were in the few models we got from textbooks (see data set for sources of the models); true, there weren’t a lot of class hierarchies, but still. The other thing that was a bit of a disappointment was that the relational properties weren’t used a lot. Looking at the relationships in the models, there were certainly opportunities for transitivity and more irreflexivity declarations. One of our current conjectures is that they have limited implementation support, so maybe modellers don’t see the point of adding such constraints; another could be that an ‘average modeller’ (whatever that means) doesn’t quite understand all the 11 that are available in ORM2.

Overall, while a bit disappointing for the use case scenario of reasoning over conceptual data models for inconsistency management, the results are actually very promising for runtime usage of conceptual data models. Maybe that of itself will generate more interest from industry in doing that analysis step before implementing a database or software application: instead of developing a conceptual data model “just for documentation and dust-gathering”, you’ll have one that also will add more, new, better advanced features to your application.

References

[1] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Springer LNCS. 19-22 Oct, Stockholm, Sweden. (in print)

[2] Fillottrani, P.R., Keet, C.M. Evidence-based Languages for Conceptual Data Modelling Profiles. 19th Conference on Advances in Databases and Information Systems (ADBIS’15). Springer LNCS. Poitiers, France, Sept 8-11, 2015. (in print)

[3] Donini, F., Lenzerini, M., Nardi, D., Nutt, W. Tractable concept languages. In: Proc. of IJCAI’91. vol. 91, pp. 458-463. 1991.

[4] Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M. The DL-Lite family and relations. Journal of Artificial Intelligence Research, 2009, 36:1-69.

[5] Calvanese, D., De Giacomo, G., Lenzerini, M. Identification constraints and functional dependencies in Description Logics. In: Proc. of IJCAI’01, pp155-160, Morgan Kaufmann.

[6] Toman, D., Weddell, G. On adding inverse features to the Description Logic $\mathcal{CFDI}_{nc}^{\forall}$. In: Proc. of PRICAI 2014, pp587-599.

[7] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in $\mathcal{CFDI}_{nc}^{\forall -}$. 28th International Workshop on Description Logics (DL’15). Calvanese, D., Konev, B. (Eds.), CEUR-WS vol. 1350, pp401-414. 7-10 June 2015, Athens, Greece.

Forum for AI Research 2015, Cape Town

In 10 day’s time, the (CAIR-driven) Forum for Artificial Intelligence Research 2015 (FAIR’15) Workshop will be held at UCT in Cape Town, South Africa, from March 30 to April 2. There are still some spaces available; registration is free, but please register (for catering purposes). What will you get for this ‘bargain price’? A lot of food for the mind!

FAIR’15 follows the same format as the previous 7 editions that went under various acronyms since 2008 (among others, MOWS, MOSS, MAIS, FAIR), with a mini-course, a tutorial, and postgraduate student presentations. This edition has the following on offer.

Ulrike Sattler (University of Manchester, UK) will present a mini-course on automated reasoners in the mornings. She will go into the details of what really happens when you click that menu option “start reasoner” and Protégé’s “?” that explains the deductions, and what are the factors that influence the reasoner’s performance.

David Toman (University of Waterloo, Canada) will present a 2-hour tutorial on using knowledge representation and reasoning (logic) for query optimization in relational databases and ontology-based data access (i.e., advanced aspects of database systems implementation).

Further, there are several sessions with postgraduate student presentations. Among others, Catherine Chavula will talk about new results (cf. [1]) in multilingual ontologies, Zubeida Khan will talk about foundational ontology interchangeability (details in [2]), and (very recently MSc cum laude graduated!) Nasubo Ongoma will present her thesis on logic-based temporal conceptual data modeling (including material from [3]). Gavin Rens will talk about probabilistic belief change, Kody Moodley on defeasible reasoning for description logics, Henriette Harmse about scenario testing with OWL, and Nishal Morar on taxonomic classification.

Aurona Gerber will give an overview of Data Science at CSIR, and for some more variety in the programme, I’ll talk about the stuff ontology [4]. Check the programme for all titles of the presentations and the abstracts of the mini-course and tutorial.

An important aim of FAIR is the networking among people in Southern Africa, and share and discuss informally our research in (predominantly) KR&R and related areas—so if the above topics sound interesting, or made you curious, or you would like to meet a potential MSc/PhD supervisor, you’re welcome to join (note: some basic knowledge of logics will be needed to understand the talks, though). If you have any questions, please don’t hesitate to contact one of the organisers, Arina Britz and me.

References

[1] Chavula, C., Keet, C.M. Is Lemon Sufficient for Building Multilingual Ontologies for Bantu Languages? 11th OWL: Experiences and Directions Workshop (OWLED’14). Keet, C.M., Tamma, V. (Eds.). Riva del Garda, Italy, Oct 17-18, 2014. CEUR-WS vol. 1265, 61-72.

[2] Khan, Z.C., Keet, C.M. Feasibility of automated foundational ontology interchangeability. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI 8876, 225-237.

[3] Keet, C.M., Ongoma, E.A.N. Temporal Attributes: their Status and Subsumption. Asia-Pacific Conference on Conceptual Modelling (APCCM’15). Koehler, H., Saeki, M. (Eds.), Conferences in Research and Practice in Information Technology (CRPIT), Vol. 165. 27-30 January, 2015, Sydney, Australia.

[4] Keet, C.M. A core ontology of macroscopic stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI vol. 8876, 209-224.

Dabbling into evaluating reasoners with the DMOP ontology

The Data Mining OPtimization ontology (DMOP) is a highly axiomatised ontology that uses almost all features of OWL 2 DL and the domain entities are linked to DOLCE, using all four main ‘branches’ of DOLCE. Some details are described in last year’s OWLED’13 paper [1] and a blog post. We did observe ‘slow’ reasoner performance to classify the ontology, however, like, between 10 and 20 minutes, varying across versions and machines. The Ontology Reasoner Evaluation (ORE’14) workshop (part of the Vienna Summer of Logic) was a nice motivation to have a look at trying to figure out what’s going on, and some initial results are described briefly in the 6 pages-short paper [2], which is co-authored with Claudia d’Amato, Agnieszka Lawrynowicz, and Zubeida Khan.

Those results are definitely what can be called interesting, even though we’re still at the level of dabbling into it from a reasoner user-centric viewpoint, and notably, from a modeller-centric viewpoint. The latter is what made us pose questions like “what effect does using feature x have on performance of the reasoner?”. No one knew, except for the informal feedback back I received at DL 2010 on [3] that reasoning with data types slows down things, and likewise when the cardinalities are high. That’s not an issue with DMOP, though.

So, the first thing we did was determining a baseline on a good laptop—your average modeller doesn’t have a HPC cluster readily at hand—and in an Ontology Development Environment, where the reasoner is typically accessed from. Some 9 minutes to classify the ontology (machine specs and further details in the paper).

The second steps were the analysis of one specific modeling construct (inverses), and what effect DOLCE has on the overall performance.

The reason why we chose representation of inverses is because in OWL 2 DL (cf. OLW DL), one can use the objectInverseOf(OP) to use the inverse of an object property instead of extending the ontology’s vocabulary and using InverseObjectProperties(OPE1 OPE2) to relate the property and its inverse. For instance, to use the inverse the property addresses in an axiom, one used to have to introduce a new property, addressed by, declare it inverse to addresses, and then use that in the axiom, whereas in OWL 2 DL, one can use ObjectInverseOf(addresses) in the axiom (in Protégé, the syntax is inverse(addresses)). That slashed computing the class hierarchy by at least over a third (and about half for the baseline). Why? We don’t know. Other features used in DMOP, such as punning and property chains, were harder to remove and are heavily used, so we didn’t test those.

The other one, removing DOLCE, is a bit tricky. But to give away the end results upfront: that made it 10 times faster! The ‘tricky’ part has to do with the notion of ‘linking to a foundational ontology’ (deserving of its own blog post). For DMOP, we had not imported but merged, and we did not merge everything from DOLCE and its ExtendedDnS, but only what was deemed relevant, being, in numbers, 43 classes, 78 object properties and 593 axioms. To make matters worse—from an evaluation viewpoint, that is—is that we reused heavily three DOLCE object properties, so we kept those three DOLCE properties in the evaluation file, as we suspected it otherwise would have affected the deductions too much and interfere with the DOLCE-or-not question (one also could argue that those three properties can be considered an integral part of DMOP). So, it was not a simple case of ‘remove the import statement and run the reasoner again’, but a ‘remove almost everything with a DOLCE URI manually and then run the reasoner again’.

Because computation was so ‘slow’, we wondered whether maybe cleverly modularizing DMOP could be the way to go, in case someone wants to use only a part of DMOP. We got as far as trying to modularize the ontology, which already was not trivial because DMOP and DOCLE are both highly axiomatised and with few, if any, relatively isolated sections amenable to modularization. Moreover, what it did show, is that such automated modularization (when it was possible) only affects the number of class and number of axioms, not the properties and individuals. So, the generated modules are stuck with properties and individuals that are not used in, or not relevant for, that module. We did not fix that manually. Also, putting back together the modules did not return it to the original version we started out with, missing 225 axioms out of the 4584.

If this wasn’t enough already, the DMOP with/without DOLCE test was performed with several reasoners, out of curiosity, and they gave different output. FaCT++ and MORe had a “Reasoner Died” message. My ontology engineering students know that, according to DOLCE, death is an achievement, but I guess that its reasoners’ developers would deem otherwise. Pellet and TrOWL inferred inconsistent classes; HermiT did not. Pellet’s hiccup had to do with datatypes and should not have occurred (see paper for details). TrOWL fished out a modeling issue from all of those 4584 axioms (see p5 of the paper), of the flavour as described in [4] (thank you), but with the standard semantics of OWL—i.e., not caring at all about the real semantics of object property hierarchies—it should not have derived an inconsistent class.

Overall, it feels like having opened up a can of worms, which is exciting.

References

[1] Keet, C.M., Lawrynowicz, A., d’Amato, C., Hilario, M. Modeling issues and choices in the Data Mining OPtimisation Ontology. 8th Workshop on OWL: Experiences and Directions (OWLED’13), 26-27 May 2013, Montpellier, France. CEUR-WS vol 1080.

[2] Keet, C.M., d’Amato, C., Khan, Z.C., Lawrynowicz, A. Exploring Reasoning with the DMOP Ontology. 3rd Workshop on Ontology Reasoner Evaluation (ORE’14). July 13, 2014, Vienna, Austria. CEUR-WS vol (accepted).

[3] Keet, C.M. On the feasibility of Description Logic knowledge bases with rough concepts and vague instances. 23rd International Workshop on Description Logics (DL’10), 4-7 May 2010, Waterloo, Canada.

[4] Keet, C. M. (2012). Detecting and revising flaws in OWL object property expressions. In Proc. of EKAW’12, volume 7603 of LNAI, pages 252–266. Springer.