72010 SemWebTech lecture 9: Successes and challenges for ontologies in the life sciences

To be able to talk about successes and challenges of SWT for health care and life sciences (or any other subject domain), we first need to establish when something can be deemed a success, when it is a challenge, and when it is an outright failure. Such measures can be devised in an absolute sense (compare technology x with an SWT one: does it outperform on measure y?) and relative (to whom is technology x deemed successful?) Given these considerations, we shall take a closer look at several attempts, being two successes and a few challenges in representation and reasoning. What were the problems and how did they solve it, and what are the problems and can that be resolved, respectively?

As success stories we take the experiments by Wolstencroft and coauthors about classifying protein phosphatases [1] and Calvanese et al for graphical, web-based, ontology-based data access applied to horizontal gene transfer data [2]. They each focus on different ontology languages and reasoning services to solve different problems. What they have in common is that there is an interaction between the ontology and instances (and that it was a considerable amount of work by people with different specialties): the former focuses on classifying instances and the latter on querying instances. In addition, modest results of biological significance have been obtained with the classification of the protein phosphatases, whereas with the ontology-based data analysis we are tantalizingly close.

The challenges for SWT in general and for HCLS in particular are quite diverse, of which some concern the SWT proper and others are by its designers—and W3C core activities on standardization—considered outside their responsibility but still need to be done. Currently, for the software aspects, the onus is put on the software developers and industry to pick up on the proof-of-concept and working-prototype tools that have come out of academia and to transform them into the industry-grade quality that a widespread adoption of SWT requires. Although this aspect should not be ignored, we shall focus on the language and reasoning limitations during the lecture.

In addition to the language and corresponding reasoning limitations that passed the revue in the lectures on OWL, there are language “limitations” discussed and illustrated at length in various papers, with the most recent take [3], where it might well be that the extensions presented in lecture 6 and 7 (parts, time, uncertainty, and vagueness) can ameliorate or perhaps even solve the problem. Some of the issues outlined by Schultz and coauthors are ‘mere’ modelling pitfalls, whereas others are real challenges that can be approximated to a greater or lesser extent. We shall look at several representation issues that go beyond the earlier examples of SNOMED CT’s “brain concussion without loss of consciousness”; e.g. how would you represent in an ontology that in most but not all cases hepatitis has as symptom fever, or how would you formalize the defined concept “Drug abuse prevention”, and (provided you are convinced it should be represented in an ontology) that the world-wide prevalence of diabetes mellitus is 2.8%?

Concerning challenges for automated reasoning, we shall look at two of the nine identified required reasoning scenarios [4], being the “model checking (violation)” and “finding gaps in an ontology and discovering new relations”, thereby reiterating that it is the life scientists’ high-level goal-driven approach and desire to use OWL ontologies with reasoning services to, ultimately, discover novel information about nature. You might find it of interest to read about the feedback received from the SWT developers upon presenting [4] here: some requirements are met in the meantime and new useful reasoning services were presented.

References

[1] Wolstencroft, K., Stevens, R., Haarslev, V. Applying OWL reasoning to genomic data. In: Semantic Web: revolutionizing knowledge discovery in the life sciences, Baker, C.J.O., Cheung, H. (eds), Springer: New York, 2007, 225-248.

[2] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland.

[3] Stefan Schulz, Holger Stenzhorn, Martin Boekers and Barry Smith. Strengths and Limitations of Formal Ontologies in the Biomedical Domain. Electronic Journal of Communication, Information and Innovation in Health (Special Issue on Ontologies, Semantic Web and Health), 2009.

[4] Keet, C.M., Roos, M. and Marshall, M.S. A survey of requirements for automated reasoning services for bio-ontologies in OWL. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria. CEUR-WS Vol-258.

[5] Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, Doherty D, Forsberg K, Gao Y, Kashyap V, Kinoshita J, Luciano J, Scott Marshall M, Ogbuji C, Rees J, Stephens S, Wong GT, Elizabeth Wu, Zaccagnini D, Hongsermeier T, Neumann E, Herman I, Cheung KH. Advancing translational research with the Semantic Web, BMC Bioinformatics, 8, 2007.

p.s.: the first part of the lecture on 21-12 will be devoted to the remaining part of last week’s lecture; that is, a few discussion questions about [5] that are mentioned in the slides of the previous lecture.

Note: references 1 and 3 are mandatory reading, 2 and 4 recommended to read, and 5 was mandatory for the previous lecture.

Lecture notes: lecture 9 – Successes and challenges for ontologies

Course website

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.