The DiDOn method to develop bio-ontologies from semi-structured life science diagrams

It is well-known among (bio-)ontology developers that ontology development is a resource-consuming task (see [1] for data backing up this claim). Several approaches and tools do exists that speed up the time-consuming efforts of bottom-up ontology development, most notably natural language processing and database reverse engineering. They are generic and the technologies have been proposed from a computing angle, and are therefore noisy and/or contain many heuristics to make them fit for bio-ontology development. Yet, the most obvious one from a domain expert perspective is unexplored: the abundant diagrams in the sciences that function as existing/’legacy’ knowledge representation of the subject domain. So, how can one use them to develop domain ontologies?

The new DiDOn procedure—from Diagram to Domain Ontology—can speed up and simplify bio-ontology development by exploiting the knowledge represented in such semi-structured bio-diagrams. It does this by means of extracting explicit and implicit knowledge, preserving most of the subject domain semantics, and making formalisation decisions explicit, so that the process is done in a clear, traceable, and reproducible way.

DiDOn is a detailed, micro-level, procedure to formalise those diagrams in a logic of choice; it provides migration paths into OBO, SKOS, OWL and some arbitrary FOL, and guidelines which axioms, and how, have to be added to the bio-ontology. It also uses a foundational ontology so as to obtain more precise and interoperable subject domain semantics than otherwise would have been possible with syntactic transformations alone. (Choosing an appropriate foundational ontology is a separate topic and can be done wit, e.g., ONSET.)

The paper describing the rationale and details, Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn [2], has just been accepted at the Journal of Biomedical Informatics. They require a graphical abstract, so here it goes:

DiDOn consists of two principal steps: (1) formalising the ‘icon vocabulary’ of a bio-drawing tool, which then functions as a seed ontology, and (2) populating the seed ontology by processing the actual diagrams. The algorithm in the second step is informed by the formalisation decisions taken in the first step. Such decisions include, among others, the representation language and how to represent the diagram’s n-aries (with n≥2, such as choosing between n-aries as relationship or reified as classes).

In addition to the presentation of DiDOn, the paper contains a detailed application of it with Pathway Studio as case study.

The neatly formatted paper is behind a paywall for those with no or limited access to Elsevier’s journals, but the accepted manuscript is openly accessible from my home page.

References

[1] Simperl, E., Mochol, M., Bürger, T. Achieving maturity: the state of practice in ontology engineering in 2009. International Journal of Computer Science and Applications, 2010, 7(1):45-65.

[2] Keet, C.M. Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn. Journal of Biomedical Informatics. In print. DOI: http://dx.doi.org/10.1016/j.jbi.2012.01.004

Advertisements

Notes on SAKT’10: Bolzano colloquium on logic for temporal databases

On 16 and 17 December, the KRDB Research Centre at FUB hosted the successful SAKT’10 Symposium in Advances in KRDB Technologies that had as special theme logics for temporal databases, which was organized by Alessandro Artale from the KRDB. As the title suggest, the topics of the presentations and discussions offered various approaches to address the interaction between temporal logics and temporal databases, which I’ll try to summarise here from the notes I took. Note/caution: there are no proceedings to cross-check and to take some of the formal apparatus to illustrate some aspects more precisely, and my note-taking had its ups and downs; so, if you want to know more about the topics, then take a look at the attendees’ respective homepage and publications.

Database-oriented talks

David Toman from the University of Waterloo gave two talks, one about querying temporal databases with temporal SQL and one about data streams and temporal databases. Stream databases can be seen as an append-only temporal database where holistic/bounded synopses (views) have to be made about the glut of past data to be able to effectively manage such ever-growing databases, which can be done policy-driven and query-driven, the latter being a variant of data expiration.

Carlo Combi from the University of Verona and Pietro Sala from the University of Udine gave a joint presentation about temporal functional dependencies in databases with some rather challenging examples taken from the medical domain. It requires a DBMS to deal somehow with fixed intervals as well as moving windows, such as querying if a patient indeed received the right doses at the stipulated times in her chemotherapy treatment (i.e., constraints about the past) and ensuring constraints formulated about treatments that ought to hold in the future, such as (roughly) “a patient should be administered the same quantity of medicine x after each two weeks from the start of the treatment”, for which both a point-based and an interval-based approach was presented. How this is to be presented in a temporal conceptual model is another topic.

Jef Wijssen from Mons University considered temporal patterns (word problems) motivated by certain query answering and pondered about what constitutes a temporal conjunctive query. Mark Reynolds’ work at the University of Western Australia was about reasoning with time stamps and metric temporal logic with real numbers. Paolo Terenziani from the University of Torino reported on his ongoing work towards a unified data model for temporal databases, taking into account both valid time and transaction time, the atelic/telic distinctions (roughly: point-based and interval-based), temporally determinate/indeterminate and some other issues (“now”, granularities).

Logic talks

Roughly in-between ‘just the logic’ and ‘just databases’ was Alessandro Artale’s  presentation about temporal conceptual data modelling with simplified ER and UML Class Diagrams extended with a few temporal operators—those temporal operators that are in TDL-Lite. Vlad Ryzhikov from KRDB presented Marco Gario’s project work about trying to implement a reasoner over temporal ER, which was realized through transformations from the Temporal ER to TDL-Lite, to TFOL over Z to LTL over Z to LTL over N and CTL, the latter two handled with the NuSMV tool. It worked with bounded model checking, but for symbolic model checking there were scalability problems even for very small temporal ER models; so there is room for improvements. There was some immediate feedback from Viktor Schuppan from Fondazione Bruno Kessler where they developed NuSMV, who, in turn, talked in his presentation about the comparison of LTL satisfiability solvers. There are several such solvers that use different techniques, and it appeared that different tools are better at solving different problems, but there is not one that scores best throughout. Michel Ludwig from the University of Liverpool presented his PhD thesis work about TSPASS, a monodic temporal logic prover that builds upon SPASS 3.0.

Roman Kontchakov from Birkbeck College gave a brief overview on how to develop decidable temporal languages, augmented by his joint presenter, Vlad, who talked about TDL-Lite over natural numbers. Being logicians, they invented their own list of what such a decidable temporal language ‘needs’; it would have been nicer if they actually had developed a decidable language that demonstrably contains what is needed from a modelling perspective, such as, to handle essential and immutable parts, relation migration, or demonstrate that one can represent some Relation Ontology relation with a time component, such as the transformation-of relation.

Ian Pratt-Hartmann from the University of Manchester focused on the trials and tribulations of interval temporal logics, where he argued that one should not go the way of restricting the Allen relations but instead restrict quantification to the event-guarded case so as to maintain decidability of the language.

Discussion sessions

Aside from the questions and discussions with each presentation there were two dedicated discussion sessions, one introduced by Vlad on the temporal ER reasoning and chaired by Alessandro Artale, and the second one introduced and moderated by Franz Baader from Dresden University. In addition to the majority of KRDB members and the aforementioned presenters, the other invited attendees—Carsten Lutz from Bremen University, Angelo Montanari from the University of Udine, and Frank Wolter and Boris Konev from the University of Liverpool—also participated.

The first discussion session had as topic the interaction between logic and temporal databases, or: we have al those fancy temporal operators in the logic languages, but how can/do/should/may they translate to the database setting? Take, for instance, an evolution constraint “each employee eventually will be promoted to become a manager”, which seems to ask for the notion of possible/potential satisfiability. Or when we have branching in the future, in which way can there be (certain) query answering that, given a particular state of the database, checks if all branches (possible worlds) lead to some state x or at least some of them, versus that such a state cannot be reached anymore in any possible future state of the database. Can one view the main temporal operators in the logics as database updates in DBMSs? Should one use the CWA for reasoning/querying about the past and OWA for the future? Also, when we represent something in our ontology, and it being a realist ontology, then there is only a single path in the past, but we may revise our understanding and representation of what actually happened; what if that new branch in the past leads to an inconsistency in the system, and how can we keep track of such things (including how valid time and transaction time, whilst being distinct, still interact in data management)? Unsurprisingly, the actual official closing of the discussion was after the scheduled end of the discussion time, and continued informally afterwards.

The discussion session on the second day looked more at applications of temporal logics, such as time-stamped fact bases (ABoxes, if you wish) and how that may interact with (a)temporal TBoxes. If DL knowledge bases may be suitable for it, or perhaps it will be a part of a larger system with some temporal pre-processing before it is entered into the fact base. For the biomed-oriented reader: Franz Baader motivated it with an interesting medical informatics example about management of hospital data in conjunction with knowledge represented in SNOMED CT, having to manage quantitative and qualitative patient data about, e.g., hypertension, patient histories, and projection into the future about likely development of a particular disease, and handling rules, such as (simplified here) “if the measurements of properties x, y, and z are above 1, 2, and 3 at least three times in the past hour in patient1, then classify partient1 as having disease A”.

Overall, it was a stimulating mix of talks and a good ambience for cross-fertilization of topics, problems, and solutions. There is still plenty of research to be done.

Ontological realism, methodologies, and mud slinging: a few notes on the AO trilogy

In July at the start of the MOWS’10 course on ontology engineering I pointed to more background literature about the debate about ontology as reality representation, its principle references, the new comprehensive assessment on its problems by Gary Merrill [1], and I included the note from the Applied Ontology Journal editors that Barry Smith and Werner Ceusters were writing a comprehensive rebuttal, to which Merrill would response in turn. They’re out now [2,3], and also freely available through the dedicated AO page.

On cursory glance seeing some juicy sentences, Smith and Ceusters’ 50-page reply [2] seemed like a good pastime to read on the gray, rainy, and cold Sunday afternoon last week and to ponder if and how I would incorporate it in an updated version of the ontology engineering course. It, however, contains many harsh statements with the main message that they’re doing a great thing with their so-called “realist methodology” and that Merrill’s critique is irrelevant. Merrill’s 30-page response to that [3], which I finished reading recently, is that Smith and Ceusters’ clarification made matters worse and thereby confirming it is a misdirection.

So, what to make of all that? If I were a VIP in ontology engineering, I would ask the AO editors to write a proper reply to the Smith and Ceusters’ (BS & WC) paper. But I am not; hence, I will mention a few aspects on my blog only (which might me do more harm than good, but I hope not). I will start with a note on realism, then the usage of the term “application ontologies”, and finally claims about BS & WC’s “realist methodology” that is not a methodology.

Notes on realism

On the realism dimension of the debate, I have not much to say. I subscribe to the, what Merrill formulates as the, “Empiricist Doctrine” [1], which states that “the terms of science… are to be taken to refer to the actually existing entities in the real world”, especially when it comes to ontologies for the life sciences and (bio)medicine. If you want an ontology of deities, fairies, or other story characters, that’s fine, too—just do not put them in a bio-ontology. What I had understood from the conversations, presentations, and papers of BS & WC is that if you accept the “Empiricist Doctrine”, then so you must go along with universals (as opposed to concepts). Merrill calls the latter component the “Universalist Doctrine” where “the terms of science… are to be understood as referring directly to universals”, which is one of many metaphysical stances [1]. I do not know if I subscribe to universals and I do not care about that that much. Although I did some philosophy of science and philosophy of nature a while ago and read up on other subjects in philosophy in the past few years, I am not a philosopher by training and do not know about all intricacies of all alternatives around (but maybe I should).

Another reason for my misunderstanding—or: conflating the two doctrines—is also due to the fact that descriptions and definitions in the BS & WC papers are not consistent throughout (elaborated on by Merrill [1,3]). For instance, in [4], ontology is taken as reality representations, but in [2] it is reality representation that is described by science, i.e., as scientists understand it, or in other words: the representation of the theories. Thus, where the things in the ontology are terms that do not have a ‘direct link’ to the actual entities, but they go through the scientists’ mind with their conceptualizations of reality. This is quite a difference from [4]. Make of it what you like.

Last, the ‘funny’ thing is that when you use the Empiricists Doctrine it does not matter if you use BFO, DOLCE, GFO, or whichever foundational ontology for practical ontology development. The current formalisations of BFO, DOLCE or any of the others do not have in their formalisation that the categories [unary predicates] denote either universals or concepts. Clearly, the communication of the informal intentions would be different if the top (OWL:thing or similar) in the ontology is called Universal or Concept, but in BFO it is called Entity and in DOLCE it is Particular. Thus, de facto, neither one commits to one philosophical doctrine or another in the top-level categorization and formalisation.

What are “application ontologies”?

Smith and Ceusters in [4] make a distinction between reference ontologies and application ontologies, the former intended to represent “settled science” and latter that part of science that is in flux. This rather difficult to maintain distinction is discussed at length in [3]. What I wish to add, and which was only mentioned in passing in [3], is that the notion of ‘application ontologies’ elsewhere in the ontologies enterprise is used quite differently. It refers to OWL- or DL-formalised conceptual data models modeled in one of the common conceptual data modelling languages (UML, EER, ORM), but not real ontologies. The discussion about the difference between an ontology and a conceptual data model is beyond the current scope, but it is important to note that the same term means something different in pretty much all other literature about ontologies. Perhaps BS & WC have not read that literature, given that they happily attack computer science, knowledge engineering, and conceptual modelling (section 3.1 in [2]) with ‘justifications’ that Wüster-the-businessman over at ISO is a telling example of knowledge engineering and conceptual modelling (he is not), and that it was the training in cognitive psychology we all got as computer scientists (we did not) that makes us confused and stick to concepts instead of buying into universalist doctrine. Such statements are not helpful.

Either way, application ontology as a formal conceptual data model is definitely a more tenable definition [setting aside if one agrees with it] than application ontology for the non-settled science for the fact that there is no crisp boundary between settled and non-settled science. If the vague distinction is not enough already to complicate the debate: concepts are allowed to appear in BS & WC’s application ontologies.

About “methodologies”

Smith and Ceusters propose their “realist methodology” in section 1 of [2], but a methodology it is not—at least, not in the sense I, and (m)any other people in CS & IT, use the term. What BS & WC put forward is a set of principles. It does not say what to do, how, and when. And there is no empirical validation that the resultant ontologies are better (validation sensu a proper scientific experiment with subjects with/without using the ‘methodology’, measurable quality criteria, statistically significant, etc.).

An example of a fairly straightforward methodology for ontology development is METHONTOLOGY (among others [5]), and a more recent one for collaborative distributed ontology development: the NeON Methodology [6]. The latter has a nice fairly comprehensive overview picture of the interactions of the different steps (see Fig. 1, below) that are described further in [6] (and an aspect of this are the interactions between the different steps [7]). In my lectures, I like to be impartial and include a variety of options to sensitise  ontology developers to the plethora of options (see, e.g., Sections 3 and 4 of the MOWS’10 course, which is an updated version of SemWebTech lecture 3+4, where the what comes before the how, outlined in SemWebTech lecture 5: Methods and Methodologies), but a set of principles that is labeled “methodology” is not something that fits in a real methodology section (though they may well fit in another module).

How can BS & WC even dare to propose a methodology for ontology development when disregarding all literature on ontology development (except for the OntoClean method)? If their methodology is so superior, than give me evidence why and how it is better than all the methodologies that have been proposed over the past 15 years or so. Spoon-feed me about the shortcomings of those procedures; that is, not a lecture about the realist vs anti-realist, conceptualist and what have you, but why I should not buy into collecting non-ontological resources, looking at ontology design patterns, providing intermediate steps for the formalization, and so forth.

Whilst reading section 1 of [2], I have been trying to extract a methodology—that is, reading it with a positive attitude to try to make something of it—but could find little, and what I extracted from it, is not enough for practical ontology development and maintenance. As example, let us take the step of  “non-ontological resource reuse” for the chosen subject domain. In an ontology engineering methodology, this includes options, such as assessing chosen sources such as relevant thesauri, databases, natural language text, and methods for each option, i.e., the how-to to reuse the non-ontological resources, such as the manual database reverse engineering steps vs. semi-automated tools (in, say, VisioModeler, or the Protégé plugin Lubyte  developed [8]), data mining and clustering, the different methods to extract terms from text etc. From [2], e.g. section 1.13, I gather that the only way to execute this step of “non-ontological resource reuse” is that domain experts manually read the scientific literature and manually add the knowledge to the ontology. No help from, say, the KEGG, AGROVOC, ICD10, or ontologies that were already developed by other groups—all that should be ignored—let alone automating anything to find, say, candidate terms automatically with NLP tools. That surely must be a joke (or oversight, or sheer ignorance) and does not reflect what happens during the development of OBO ontologies. Or take, e.g., METHONTOLOGY or MoKi’s stage of intermediate representations between de domain expert’s informal representation and the formalisation of it in a suitable logic language, such as pseudo-natural language, diagrams as syntactic sugar for the underlying logic, the Protégé and OBOEdit ODEs: are they to be ignored, too? Of course not; well, I presume that that is not the intention of BS & WC’s “methodology”.

They may have enjoyed having written a trashing of 20 years of knowledge engineering and conceptual data modelling whose outputs apparently can be ignored, but there surely is room to learn a thing or two about it. After reading up on the related works on methodologies, they can make a real attempt at developing a methodology that satisfies the set of principles, be that by developing a methodology from scratch or integrating it into (or extending) existing methodologies. Until then, what is presented in section 1 of [2] will not—cannot—be added to a ‘methods and methodologies’ module in an ontology engineering course.

P.S.: Other views

A different online debate about realism in ontology engineering can be read over at Phil Lord’s blog (The Status quo farewell tour on realism, Why not?, and Why realism is wrong) and his paper together with Robert Stevens at PLoS ONE [9], versus David Sutherland’s Realism, Really? and Yes, really in favour of the realist approach for practical ontology development. Then there is the OBO-Foundry discussion list, and, e.g., a paper in FOIS’10 by Michel Dumontier and Robert Hoehndorf [10], and undoubtedly more papers about the issues raised in the AO trilogy will follow.

References

[1] Gary H. Merrill. Ontological realism: Methodology or misdirection? Applied Ontology, 5 (2010) 79–108.

[2] Barry Smith and Werner Ceusters. Ontological realism: A methodology for coordinated evolution of scientific ontologies. Applied Ontology, 5 (2010) 79–108.

[3] Gary H. Merrill. Realism and reference ontologies: Considerations, reflections and problems. Applied Ontology, 5 (2010) 79–108.

[4] Barry Smith. Beyond Concepts, or: Ontology as Reality Representation. Achille Varzi and Laure Vieu (eds.), Formal Ontology and Information Systems. Proceedings of the Third International Conference (FOIS 2004), Amsterdam: IOS Press, 2004, 73-84.

[5] Corcho, O., Fernandez-Lopez, M. and Gomez-Perez, A. (2003). Methodologies, tools and languages for building ontologies. Where is their meeting point?. Data & Knowledge Engineering 46(1): 41-64.

[6] Mari Carmen Suarez-Figueroa, Guadalupe Aguado de Cea, Carlos Buil, Klaas Dellschaft, Mariano Fernandez-Lopez, Andres Garcia, Asuncion Gomez-Perez, German Herrero, Elena Montiel-Ponsoda, Marta Sabou, Boris Villazon-Terrazas, and Zheng Yufei. NeOn Methodology for Building Contextualized Ontology Networks. NeOn Deliverable D5.4.1. 2008.

[7] Keet, C.M. Dependencies between Ontology Design Parameters. International Journal of Metadata, Semantics and Ontologies, 2010, 5(4): 265-284.

[8] Lina Lubyte. Techniques and Tools for the Design of Ontologies for Data Access. PhD Thesis, Free University of Bozen-Bolzano, KRDB Dissertation Series DS-2010-02, 2010.

[9] Lord, P. & Stevens, R. Adding a little reality to building ontologies for biology. PLoS One, 2010, 5(9), e12258. DOI: 10.1371/journal.pone.0012258.

[10] Dumontier, M. & Hoehndorf, R. Realism for scientific ontologies. In: Proceeding of the Conference on Formal Ontology in Information Systems: Proceedings of the Sixth International Conference (FOIS 2010), 387–399. Amsterdam: IOS Press.

Fig 1. Graphical depiction of different steps in ontology development, where each step has its methods and interactions with other steps (taken from 6).

A strike against the ‘realism-based approach’ to ontology development

The ontology engineering course starting this Monday at the Knowledge Representation and Reasoning group at Meraka commences with the question What is an ontology? In addition to assessing definitions, it touches upon long-standing disagreements concerning if ontologies are about representing reality, our conceptualization of entities in reality, or some conceptualization that does not necessarily ascribe to existence of reality. The “representation of reality school” is advocated in ontology engineering most prominently by Barry Smith and cs. and their foundational ontology BFO, the “conceptualization of entities in reality school” by various people and research groups, such as the LOA headed by Nicola Guarino and their DOLCE foundational ontology, whereas the “conceptualization irrespective regardless reality school” can be (but not necessarily is) encountered in organisations developing, e.g., medical ontologies that do not ascribe to evidence-based medicine to decide what goes in the ontology and how (but instead base it on, say, the outcome of power plays between big pharma and health insurance companies).

Due to the limited time and scope of this and previous courses on ontology engineering I taught, I mention[ed] only succinctly that those differences exist (e.g., pp10-11 of the UH slides), and briefly illustrate some of the aspects of the debate and their possible consequences in practical aspects of ontology engineering. This information is largely based on a few papers and extracting consequences from that, the examples they describe and that I encountered, and the discussions that took place at the various meetings, workshops, conferences, and summer schools that I participated in. But there was no nice, accessible, paper that describes de debate—or even part of it—more precisely and is readable also by ontologists who are not philosophers. Until last week, that is. The Applied Ontology journal published a paper by Gary Merrill, entitled Ontological realism: Methodology or misdirection? [1], that assess critically the ontological realism advocated by Barry Smith and his colleague Werner Ceusters. Considering its relevance in ontology engineering, the article has been made freely available, and in the announcement of the journal issue, its editors in chief (Nicola Guarino and Mark Musen) mentioned that Smith and Ceusters are busy preparing a response on Merrill’s paper, which will be published in a subsequent issue of Applied Ontology. Merrill, in turn, promised to respond to this rebuttal.

But for now, there are 30 pages of assessment on the merits of, and problems with, the philosophical underpinnings of the “realism-based approach” that is used in particular in the realm of ontology engineering within the OBO Foundry project and its large set of ontologies, BFO, and the Relation Ontology. The abstract gives an idea of the answer to the question in the paper’s title:

… The conclusion reached is that while Smith’s and Ceusters’ criticisms of prior practice in the treatment of ontologies and terminologies in medical informatics are often both perceptive and well founded, and while at least some of their own proposals demonstrate obvious merit and promise, none of this either follows from or requires the brand of realism that they propose.

The paper’s contents backs this up with analysis, arguments, examples, and bolder statements than the abstracts suggests.
For anyone involved in ontology development and interested in the debate—even if you think you’re tired of it—I recommend reading the paper, and to at least follow how the debate will unfold with responses and rebuttals.

My opinion? Well, I have one, of course, but this post is an addendum to the general course page of MOWS’10, hence I try to refrain from adding too much bias to the course material.

UPDATE (27-7-2010): On whales and apples, and on ontology and reality: you might enjoy also “Moby Dick: an exercise in ontology”, written by Lorne A. Smith.

References

[1] Gary H. Merrill. Ontological realism: Methodology or misdirection? Applied Ontology, 5 (2010) 79–108.

72010 SemWebTech lecture 9: Successes and challenges for ontologies in the life sciences

To be able to talk about successes and challenges of SWT for health care and life sciences (or any other subject domain), we first need to establish when something can be deemed a success, when it is a challenge, and when it is an outright failure. Such measures can be devised in an absolute sense (compare technology x with an SWT one: does it outperform on measure y?) and relative (to whom is technology x deemed successful?) Given these considerations, we shall take a closer look at several attempts, being two successes and a few challenges in representation and reasoning. What were the problems and how did they solve it, and what are the problems and can that be resolved, respectively?

As success stories we take the experiments by Wolstencroft and coauthors about classifying protein phosphatases [1] and Calvanese et al for graphical, web-based, ontology-based data access applied to horizontal gene transfer data [2]. They each focus on different ontology languages and reasoning services to solve different problems. What they have in common is that there is an interaction between the ontology and instances (and that it was a considerable amount of work by people with different specialties): the former focuses on classifying instances and the latter on querying instances. In addition, modest results of biological significance have been obtained with the classification of the protein phosphatases, whereas with the ontology-based data analysis we are tantalizingly close.

The challenges for SWT in general and for HCLS in particular are quite diverse, of which some concern the SWT proper and others are by its designers—and W3C core activities on standardization—considered outside their responsibility but still need to be done. Currently, for the software aspects, the onus is put on the software developers and industry to pick up on the proof-of-concept and working-prototype tools that have come out of academia and to transform them into the industry-grade quality that a widespread adoption of SWT requires. Although this aspect should not be ignored, we shall focus on the language and reasoning limitations during the lecture.

In addition to the language and corresponding reasoning limitations that passed the revue in the lectures on OWL, there are language “limitations” discussed and illustrated at length in various papers, with the most recent take [3], where it might well be that the extensions presented in lecture 6 and 7 (parts, time, uncertainty, and vagueness) can ameliorate or perhaps even solve the problem. Some of the issues outlined by Schultz and coauthors are ‘mere’ modelling pitfalls, whereas others are real challenges that can be approximated to a greater or lesser extent. We shall look at several representation issues that go beyond the earlier examples of SNOMED CT’s “brain concussion without loss of consciousness”; e.g. how would you represent in an ontology that in most but not all cases hepatitis has as symptom fever, or how would you formalize the defined concept “Drug abuse prevention”, and (provided you are convinced it should be represented in an ontology) that the world-wide prevalence of diabetes mellitus is 2.8%?

Concerning challenges for automated reasoning, we shall look at two of the nine identified required reasoning scenarios [4], being the “model checking (violation)” and “finding gaps in an ontology and discovering new relations”, thereby reiterating that it is the life scientists’ high-level goal-driven approach and desire to use OWL ontologies with reasoning services to, ultimately, discover novel information about nature. You might find it of interest to read about the feedback received from the SWT developers upon presenting [4] here: some requirements are met in the meantime and new useful reasoning services were presented.

References

[1] Wolstencroft, K., Stevens, R., Haarslev, V. Applying OWL reasoning to genomic data. In: Semantic Web: revolutionizing knowledge discovery in the life sciences, Baker, C.J.O., Cheung, H. (eds), Springer: New York, 2007, 225-248.

[2] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland.

[3] Stefan Schulz, Holger Stenzhorn, Martin Boekers and Barry Smith. Strengths and Limitations of Formal Ontologies in the Biomedical Domain. Electronic Journal of Communication, Information and Innovation in Health (Special Issue on Ontologies, Semantic Web and Health), 2009.

[4] Keet, C.M., Roos, M. and Marshall, M.S. A survey of requirements for automated reasoning services for bio-ontologies in OWL. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria. CEUR-WS Vol-258.

[5] Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, Doherty D, Forsberg K, Gao Y, Kashyap V, Kinoshita J, Luciano J, Scott Marshall M, Ogbuji C, Rees J, Stephens S, Wong GT, Elizabeth Wu, Zaccagnini D, Hongsermeier T, Neumann E, Herman I, Cheung KH. Advancing translational research with the Semantic Web, BMC Bioinformatics, 8, 2007.

p.s.: the first part of the lecture on 21-12 will be devoted to the remaining part of last week’s lecture; that is, a few discussion questions about [5] that are mentioned in the slides of the previous lecture.

Note: references 1 and 3 are mandatory reading, 2 and 4 recommended to read, and 5 was mandatory for the previous lecture.

Lecture notes: lecture 9 – Successes and challenges for ontologies

Course website