Some of you may have read about the 9 identified types of scenarios of the kind of reasoning services bio-ontologists want, are trying to implement, and the occasional successful application (see also this post and [1]). Of that survey, it was clear that some services are already possible, are in the pipeline, known under another term in the logics area, or nice problems to solve… This was presented last week at the third OWL Experience and Directions (OWLED 2007) in Innsbruck. As hoped for, there was useful feedback from the participants – both there and at the 20th Description Logics workshop (DL 2007) afterwards in Bressanone: they had “precisely the solution you’re looking for”. Well, not exactly, but close. There are more partial solutions – theory and downloadable applications – Marco, Scott and I were aware of when writing the article; there’s even more on offer you probably never thought you needed but are actually quite useful; and there are new technologies that demonstrate better performance. In the remainder of this post I talk together those suggestions and related papers.
(As an aside, to avoid some potential confusion regarding terminology across disciplines: here, a knowledge base is Tbox [ontology of types in some description logics language] together with instances [ABox].)
Finding gaps & knowledge discovery. There are several different strands of approaches in addition to those mentioned in the article [1]. First, you can ‘fill gaps’, or: complete a knowledge base, through using a novel method that avails of Formal Concept Analysis [2,3]. Basically, the FCA approach & downloadable SWOOP-plugin (I’ve seen it working on Baris Sertkaya’s pc) lets you select several terms (/DL-concept, universal) from the taxonomy, and then it checks the instances in the knowledgebase if there are instances in the knowledge base for any of the intersections. If there are none, then it will ask you questions where you can click yes or no; if the latter, then you must provide the system with a counter example. For instance, at some point in the iteration “are endopeptidases also toxins?”, with as answer “no”, and you add tetanospasmin as counter example because it is a zinc-endopeptidase that is produced by Clostridium tetani for normal cellular function, and so forth. Eventually you’re done going through the questions and have a new subtype added automatically to the taxonomy; thus, for which there do exist instances for the intersection of the selected concepts that apparently is not simply an intersection but amounts to (the representation of) a universal. Of course, if there are instances in the knowledgebase that already satisfy your postulated new concept/universal, you are done in one step. Conversely, if you keep on adding counter-examples until the end, your hypothesis is falsified.
Second, there are non-standard reasoning services that may be of use, which use techniques involving the “least common subsumer” (Anni-Yasmin Turhan pointed me to an introduction [4] and has more results on this topic) and “consequence finding” [5], although they can be viewed as part of ontology development as well. With the least common subsumer service for description logics knowledge bases, you can start from several typical examples as instances, which are subsequently automatically generalized into a concept description by the system. So, if you have a bunch of instances and cannot detect yourself what are the common features, you can find out automatically with the least common subsumer service. Consequence finding focuses on “prime implicates” in a sub-language of OWL (for very simple ontologies), which helps evaluating the impact of adding a piece of new information to the ontology (e.g., for testing a hypothesis about a concept or relation).
Third, Ralf Moeller demonstrated with RacerPorter [6] (using the query language nRQL) the first query pattern of section 2.3 in [1]. For instance, we have a medical ontology that contains Patient, LactoseIntolerance, and Nausea as types and the relation (property) hasDigestiveDiscomfort between Patient and LactoseIntolerance, and another relation HasSymptom. Now, imagine this to be integrated with electronic health record data, and a medical doctor is hypothesizing that all (or at least one) patient(s) recorded in the hospital information system that have the digestive discomfort lactose intolerance also have the symptom of being nauseous. One can query for this type of queries (among others!) with RacerPorter.
Ontology development. This comprises any reasoning service that helps with “debugging” ontologies. Kalyanpur’s thesis on debugging OWL-DL ontologies is exactly about this topic. He and his colleagues at IBM are working on the SHER reasoner for SHIN (a very expressive proper sublanguage of OWL-DL) and very large data sets. Unlike the current widely used reasoners such as racer, pellet, and fact++, this one can isolate the “source error” without bothering to highlight all umpteen consequence-errors as well, thereby saving the ontology developer the proverbial searching of the needle in the haystack. A brief outline was provided by Liebig et al about algorithms for explaining subsumption and patching unwanted non-subsumption for languages less expressive than OWL-DL [7], and Barinskis and Barzdins presented a poster on visualization of results of satisfiability checking (i.e., testing if each concept can be instantiated) [8]. Hence, these features provide explanations of the results obtained from reasoning over the ontology.
There are undoubtedly more possibilities than described here, or which ones come close to the requirements and only would need a little tweaking to match them, but due to the limited time of the presentation I could only highlight a few reasoning scenarios. For instance, interesting usable results on “spatioterminological reasoning” (RCC & OWL) has been presented [9] – developed by the Swiss federal institute for forest, snow and landscape research – so as to process queries that combine thematic aspects with (qualitative) spatial aspects, e.g., to process information on endangered butterflies ‘in Birmensdorf and neighbouring’ villages.
Last, a note on reasoning and performance. More reasoning services are always welcome, but some tests can take a long time. One cannot avoid trade-offs between ontology languages [10]: using a more expressive language for your ontology comes at the cost of performance of reasoning and querying. However, there are optimizations and better technologies already at the prototype stage and available for use. For instance, there is HermiT [11] with a more efficient algorithm, which they tested on the OBO ontologies and have hard test data on the improvements (e.g., it can classify GALEN). The only one in the OBO repository that caused problems was the FMA-lite, and they’re looking into it. A different approach is taken by the developers of the QuOnto plugin for Protégé [12]. Given the ‘metrics’ function in Protégé, the QuOnto reasoner for the less expressive ontology language DL-Lite F (see [10] for a comparison of features), and that most bio-ontologies do not use all language features anyway, the plugin checks the actual language you’re using, and if it is a ‘simple’ language (compared to OWL-DL), then it will delegate the reasoning to the faster reasoner; that is, reasoning in polynomial in time (and logspace w.r.t. data complexity) compared to non-deterministic exponential in time for OWL-DL.
At some point, making a repository with all available reasoning services and small examples will probably very useful.
Besides the reasoning stuff, you may be interested to know that a W3C working group will be set up in the upcoming months, with as aim to achieve standardization of “OWL 1.1”. The whole set of OWL 1.1 features are not set in stone yet; hence, you can take your chance and participate in order to get a new ontology language that will be closer to what you want/need.
[1] Keet, C.M., Roos, M., Marshall, M.S. A survey of requirements for automated reasoning services for bio-ontologies in OWL. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria.
[2] Franz Baader, Bernhard Ganter, Ulrike Sattler and Baris Sertkaya. Completing Description Logic Knowledge Bases using Formal Concept Analysis. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria.
[3] Franz Baader, Bernhard Ganter, Ulrike Sattler, and Baris Sertkaya. Completing Description Logic Knowledge Bases using Formal Concept Analysis. Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07). AAAI Press, 2007.
[4] Sebastian Brandt, Anni-Yasmin Turhan. Using Non-standard Inferences in Description Logics – what does it buy me? Proceedings of the KI Workshop on Applications of Description Logics 2001 (KIDLWS2001), Vienna, Austria.
[5] Meghyn Bienvenu. Consequence finding in ALC. 20th International Workshop on Description Logics (DL2007), 8-10 June 2007, Brixen-Bressanone, Italy.
[6] Michael Wessel and Ralf Möller. Design Principles and Realization Techniques for User Friendly, Interactive, and Scalable Ontology Browsing and Inspection Tools. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria.
[7] Thorsten Liebig, Stephan Scheele and Julian Lambertz. Explainaing subsumption and patching non-subsumption with tableaux methods. 20th International Workshop on Description Logics (DL2007), 8-10 June 2007, Brixen-Bressanone, Italy.
[8] Martins Barinskis and Guntis Barzdins. The minimal finite model visualization as an ontology debugging tool. 20th International Workshop on Description Logics (DL2007), 8-10 June 2007, Brixen-Bressanone, Italy.
[9] Rolf Grütter and Bettina Bauer-Messmer. Combining OWL with RCC for Spatioterminological Reasoning on Environmental Data. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria.
[10] Keet, C.M., Rodriguez, M. Toward using biomedical ontologies: trade-offs between ontology languages. AAAI 2007 Workshop Semantic eScience (SeS 2007). 23 July 2007, Vancouver.
[11] Boris Motik, Rob Shearer and Ian Horrocks. A hypertableau calculus for SHIQ. 20th International Workshop on Description Logics (DL2007), 8-10 June 2007, Brixen-Bressanone, Italy.
[12] Manuel Dioturni and Maurizio Iacovella. Tools for the QuOnto system – conversion between OWL and DL-Lite with Protégé-OWL plugin. 20th International Workshop on Description Logics (DL2007), 8-10 June 2007, Brixen-Bressanone, Italy.
Pingback: bioinformatics » Blog Archive » Reasoning requirements for bio ontologies: the…
Thanks for this. Helpful to know what is happening. Did any of the reasoners offer the ability to turn off/on processing of specific language features? For example, I have an application that only needs owl:sameAs + RDFS inference and I’d like specify this at runtime.
No. It was one of the conversation topics in the context of OWL 1.1 “full” and/versus OWL 1.1 “tractable fragments” (see http://www.webont.org/owl/1.1/tractable.html). How I understood it, is that it is the intention to have the theoretical & implementation work for switching between more and less expressive languages going in parallel during the process of standardizing OWL 1.1, and that it may become part of the OWL 1.1 specification. But the W3C is soliciting member opinions about that at the moment (see also the owl-dev archive at http://www.nabble.com/w3.org—public-owl-dev-f11611.html)
Now with e.g. the QuOnto plugin for Protégé, it is a passive check that does not mess with the ontology itself. The “simplifying” of an ontology could be done with brute force deletion of the axioms that have those language constructs you don’t want to consider (would you prefer that?) or through elaborate but automated transformations so as to keep as much of the domain semantics as possible. The latter will take more time to realize, but I think is a better solution; the former is easy to implement.
I don’t think there are many reasoner-developers who know that there are users who want such a fine-grained toggling feature. Some were even surprised I had mentioned things in that direction, because for quite a few years biomedical ontologists have been reiterating the request for more expressiveness rather than less, let alone asking for some control over the language used in order to get better performance.
Pingback: OWLED’08 in brief « Keet blog
Pingback: Live from ISWC 2008 in Karlsruhe « Keet blog
Pingback: Figuring out requirements for automated reasoning services for formal bio-ontologies | Keet blog