Ontological realism, methodologies, and mud slinging: a few notes on the AO trilogy

In July at the start of the MOWS’10 course on ontology engineering I pointed to more background literature about the debate about ontology as reality representation, its principle references, the new comprehensive assessment on its problems by Gary Merrill [1], and I included the note from the Applied Ontology Journal editors that Barry Smith and Werner Ceusters were writing a comprehensive rebuttal, to which Merrill would response in turn. They’re out now [2,3], and also freely available through the dedicated AO page.

On cursory glance seeing some juicy sentences, Smith and Ceusters’ 50-page reply [2] seemed like a good pastime to read on the gray, rainy, and cold Sunday afternoon last week and to ponder if and how I would incorporate it in an updated version of the ontology engineering course. It, however, contains many harsh statements with the main message that they’re doing a great thing with their so-called “realist methodology” and that Merrill’s critique is irrelevant. Merrill’s 30-page response to that [3], which I finished reading recently, is that Smith and Ceusters’ clarification made matters worse and thereby confirming it is a misdirection.

So, what to make of all that? If I were a VIP in ontology engineering, I would ask the AO editors to write a proper reply to the Smith and Ceusters’ (BS & WC) paper. But I am not; hence, I will mention a few aspects on my blog only (which might me do more harm than good, but I hope not). I will start with a note on realism, then the usage of the term “application ontologies”, and finally claims about BS & WC’s “realist methodology” that is not a methodology.

Notes on realism

On the realism dimension of the debate, I have not much to say. I subscribe to the, what Merrill formulates as the, “Empiricist Doctrine” [1], which states that “the terms of science… are to be taken to refer to the actually existing entities in the real world”, especially when it comes to ontologies for the life sciences and (bio)medicine. If you want an ontology of deities, fairies, or other story characters, that’s fine, too—just do not put them in a bio-ontology. What I had understood from the conversations, presentations, and papers of BS & WC is that if you accept the “Empiricist Doctrine”, then so you must go along with universals (as opposed to concepts). Merrill calls the latter component the “Universalist Doctrine” where “the terms of science… are to be understood as referring directly to universals”, which is one of many metaphysical stances [1]. I do not know if I subscribe to universals and I do not care about that that much. Although I did some philosophy of science and philosophy of nature a while ago and read up on other subjects in philosophy in the past few years, I am not a philosopher by training and do not know about all intricacies of all alternatives around (but maybe I should).

Another reason for my misunderstanding—or: conflating the two doctrines—is also due to the fact that descriptions and definitions in the BS & WC papers are not consistent throughout (elaborated on by Merrill [1,3]). For instance, in [4], ontology is taken as reality representations, but in [2] it is reality representation that is described by science, i.e., as scientists understand it, or in other words: the representation of the theories. Thus, where the things in the ontology are terms that do not have a ‘direct link’ to the actual entities, but they go through the scientists’ mind with their conceptualizations of reality. This is quite a difference from [4]. Make of it what you like.

Last, the ‘funny’ thing is that when you use the Empiricists Doctrine it does not matter if you use BFO, DOLCE, GFO, or whichever foundational ontology for practical ontology development. The current formalisations of BFO, DOLCE or any of the others do not have in their formalisation that the categories [unary predicates] denote either universals or concepts. Clearly, the communication of the informal intentions would be different if the top (OWL:thing or similar) in the ontology is called Universal or Concept, but in BFO it is called Entity and in DOLCE it is Particular. Thus, de facto, neither one commits to one philosophical doctrine or another in the top-level categorization and formalisation.

What are “application ontologies”?

Smith and Ceusters in [4] make a distinction between reference ontologies and application ontologies, the former intended to represent “settled science” and latter that part of science that is in flux. This rather difficult to maintain distinction is discussed at length in [3]. What I wish to add, and which was only mentioned in passing in [3], is that the notion of ‘application ontologies’ elsewhere in the ontologies enterprise is used quite differently. It refers to OWL- or DL-formalised conceptual data models modeled in one of the common conceptual data modelling languages (UML, EER, ORM), but not real ontologies. The discussion about the difference between an ontology and a conceptual data model is beyond the current scope, but it is important to note that the same term means something different in pretty much all other literature about ontologies. Perhaps BS & WC have not read that literature, given that they happily attack computer science, knowledge engineering, and conceptual modelling (section 3.1 in [2]) with ‘justifications’ that Wüster-the-businessman over at ISO is a telling example of knowledge engineering and conceptual modelling (he is not), and that it was the training in cognitive psychology we all got as computer scientists (we did not) that makes us confused and stick to concepts instead of buying into universalist doctrine. Such statements are not helpful.

Either way, application ontology as a formal conceptual data model is definitely a more tenable definition [setting aside if one agrees with it] than application ontology for the non-settled science for the fact that there is no crisp boundary between settled and non-settled science. If the vague distinction is not enough already to complicate the debate: concepts are allowed to appear in BS & WC’s application ontologies.

About “methodologies”

Smith and Ceusters propose their “realist methodology” in section 1 of [2], but a methodology it is not—at least, not in the sense I, and (m)any other people in CS & IT, use the term. What BS & WC put forward is a set of principles. It does not say what to do, how, and when. And there is no empirical validation that the resultant ontologies are better (validation sensu a proper scientific experiment with subjects with/without using the ‘methodology’, measurable quality criteria, statistically significant, etc.).

An example of a fairly straightforward methodology for ontology development is METHONTOLOGY (among others [5]), and a more recent one for collaborative distributed ontology development: the NeON Methodology [6]. The latter has a nice fairly comprehensive overview picture of the interactions of the different steps (see Fig. 1, below) that are described further in [6] (and an aspect of this are the interactions between the different steps [7]). In my lectures, I like to be impartial and include a variety of options to sensitise  ontology developers to the plethora of options (see, e.g., Sections 3 and 4 of the MOWS’10 course, which is an updated version of SemWebTech lecture 3+4, where the what comes before the how, outlined in SemWebTech lecture 5: Methods and Methodologies), but a set of principles that is labeled “methodology” is not something that fits in a real methodology section (though they may well fit in another module).

How can BS & WC even dare to propose a methodology for ontology development when disregarding all literature on ontology development (except for the OntoClean method)? If their methodology is so superior, than give me evidence why and how it is better than all the methodologies that have been proposed over the past 15 years or so. Spoon-feed me about the shortcomings of those procedures; that is, not a lecture about the realist vs anti-realist, conceptualist and what have you, but why I should not buy into collecting non-ontological resources, looking at ontology design patterns, providing intermediate steps for the formalization, and so forth.

Whilst reading section 1 of [2], I have been trying to extract a methodology—that is, reading it with a positive attitude to try to make something of it—but could find little, and what I extracted from it, is not enough for practical ontology development and maintenance. As example, let us take the step of  “non-ontological resource reuse” for the chosen subject domain. In an ontology engineering methodology, this includes options, such as assessing chosen sources such as relevant thesauri, databases, natural language text, and methods for each option, i.e., the how-to to reuse the non-ontological resources, such as the manual database reverse engineering steps vs. semi-automated tools (in, say, VisioModeler, or the Protégé plugin Lubyte  developed [8]), data mining and clustering, the different methods to extract terms from text etc. From [2], e.g. section 1.13, I gather that the only way to execute this step of “non-ontological resource reuse” is that domain experts manually read the scientific literature and manually add the knowledge to the ontology. No help from, say, the KEGG, AGROVOC, ICD10, or ontologies that were already developed by other groups—all that should be ignored—let alone automating anything to find, say, candidate terms automatically with NLP tools. That surely must be a joke (or oversight, or sheer ignorance) and does not reflect what happens during the development of OBO ontologies. Or take, e.g., METHONTOLOGY or MoKi’s stage of intermediate representations between de domain expert’s informal representation and the formalisation of it in a suitable logic language, such as pseudo-natural language, diagrams as syntactic sugar for the underlying logic, the Protégé and OBOEdit ODEs: are they to be ignored, too? Of course not; well, I presume that that is not the intention of BS & WC’s “methodology”.

They may have enjoyed having written a trashing of 20 years of knowledge engineering and conceptual data modelling whose outputs apparently can be ignored, but there surely is room to learn a thing or two about it. After reading up on the related works on methodologies, they can make a real attempt at developing a methodology that satisfies the set of principles, be that by developing a methodology from scratch or integrating it into (or extending) existing methodologies. Until then, what is presented in section 1 of [2] will not—cannot—be added to a ‘methods and methodologies’ module in an ontology engineering course.

P.S.: Other views

A different online debate about realism in ontology engineering can be read over at Phil Lord’s blog (The Status quo farewell tour on realism, Why not?, and Why realism is wrong) and his paper together with Robert Stevens at PLoS ONE [9], versus David Sutherland’s Realism, Really? and Yes, really in favour of the realist approach for practical ontology development. Then there is the OBO-Foundry discussion list, and, e.g., a paper in FOIS’10 by Michel Dumontier and Robert Hoehndorf [10], and undoubtedly more papers about the issues raised in the AO trilogy will follow.

References

[1] Gary H. Merrill. Ontological realism: Methodology or misdirection? Applied Ontology, 5 (2010) 79–108.

[2] Barry Smith and Werner Ceusters. Ontological realism: A methodology for coordinated evolution of scientific ontologies. Applied Ontology, 5 (2010) 79–108.

[3] Gary H. Merrill. Realism and reference ontologies: Considerations, reflections and problems. Applied Ontology, 5 (2010) 79–108.

[4] Barry Smith. Beyond Concepts, or: Ontology as Reality Representation. Achille Varzi and Laure Vieu (eds.), Formal Ontology and Information Systems. Proceedings of the Third International Conference (FOIS 2004), Amsterdam: IOS Press, 2004, 73-84.

[5] Corcho, O., Fernandez-Lopez, M. and Gomez-Perez, A. (2003). Methodologies, tools and languages for building ontologies. Where is their meeting point?. Data & Knowledge Engineering 46(1): 41-64.

[6] Mari Carmen Suarez-Figueroa, Guadalupe Aguado de Cea, Carlos Buil, Klaas Dellschaft, Mariano Fernandez-Lopez, Andres Garcia, Asuncion Gomez-Perez, German Herrero, Elena Montiel-Ponsoda, Marta Sabou, Boris Villazon-Terrazas, and Zheng Yufei. NeOn Methodology for Building Contextualized Ontology Networks. NeOn Deliverable D5.4.1. 2008.

[7] Keet, C.M. Dependencies between Ontology Design Parameters. International Journal of Metadata, Semantics and Ontologies, 2010, 5(4): 265-284.

[8] Lina Lubyte. Techniques and Tools for the Design of Ontologies for Data Access. PhD Thesis, Free University of Bozen-Bolzano, KRDB Dissertation Series DS-2010-02, 2010.

[9] Lord, P. & Stevens, R. Adding a little reality to building ontologies for biology. PLoS One, 2010, 5(9), e12258. DOI: 10.1371/journal.pone.0012258.

[10] Dumontier, M. & Hoehndorf, R. Realism for scientific ontologies. In: Proceeding of the Conference on Formal Ontology in Information Systems: Proceedings of the Sixth International Conference (FOIS 2010), 387–399. Amsterdam: IOS Press.

Fig 1. Graphical depiction of different steps in ontology development, where each step has its methods and interactions with other steps (taken from 6).

Advertisements

72010 SemWebTech lecture 8: SWT for HCLS background and data integration

After the ontology languages and general aspects of ontology engineering, we now will delve into one specific application area: SWT for health care and life sciences. Its frontrunners in bioinformatics were adopters of some of the Semantic Web ideas even before Berners-Lee, Hendler, and Lassila wrote their Scientific American paper in 2001, even though they did not formulate their needs and intentions in the same terminology: they did want to have shared, controlled vocabularies with the same syntax, to facilitate data integration—or at least interoperability—across Web-accessible databases, have a common space for identifiers, it needing to be a dynamic, changing system, to organize and query incomplete biological knowledge, and, albeit not stated explicitly, it all still needed to be highly scalable [1].

Bioinformaticians and domain experts in genomics already organized themselves together in the Gene Ontology Consortium, which was set up officially in 1998 to realize a solution for these requirements. The results exceeded anyone’s expectations in its success for a range of reasons. Many tools for the Gene Ontology (GO) and its common KR format, .obo, have been developed, and other research groups adopted the approach to develop controlled vocabularies either by extending the GO, e.g., rice traits, or adding their own subject domain, such as zebrafish anatomy and mouse developmental stages. This proliferation, as well as the OWL development and standardization process that was going on at about the same time, pushed the goal posts further: new expectations were put on the GO and its siblings and on their tools, and the proliferation had become a bit too wieldy to keep a good overview what was going on and how those ontologies would be put together. Put differently, some people noticed the inferencing possibilities that can be obtained from moving from obo to OWL and others thought that some coordination among all those obo bio-ontologies would be advantageous given that post-hoc integration of ontologies of related and overlapping subject domains is not easy. Thus came into being the OBO Foundry to solve such issues, proposing a methodology for coordinated evolution of ontologies to support biomedical data integration [2].

People in related disciplines, such as ecology, have taken on board experiences of these very early adopters, and instead decided to jump on board after the OWL standardization. They, however, were not only motivated by data(base) integration. Referring to Madin et al’s paper [3] again, I highlight three points they made: “terminological ambiguity slows scientific progress, leads to redundant research efforts, and ultimately impedes advances towards a unified foundation for ecological science”, i.e., identification of some serious problems they have in ecological research; “Formal ontologies provide a mechanism to address the drawbacks of terminological ambiguity in ecology”, i.e., what they expect that ontologies will solve for them (disambiguation); and “and fill an important gap in the management of ecological data by facilitating powerful data discovery based on rigorously defined, scientifically meaningful terms”, i.e., for what purpose they want to use ontologies and any associated computation (discovery). That is, ontologies not as a—one of many possible—tool in the engineering/infrastructure means, but as a required part of a method in the scientific investigation that aims to discover new information and knowledge about nature (i.e., in answering the who, what, where, when, and how things are the way they are in nature).

What has all this to do with actual Semantic Web technologies? On the one hand, there are multiple data integration approaches and tools that have been, and are being, tried out by the domain experts, bioinformaticians, and interdisciplinary-minded computer scientists [4], and, on the other hand, there are the W3C Semantic Web standards XML, RDF(S), SPARQL, and OWL. Some use these standards to achieve data integration, some do not. Since this is a Semantic Web course, we shall take a look at two efforts who (try to) do, which came forth from the activities of the W3C’s Health Care and Life Sciences Interest Group. More precisely, we take a closer look at a paper written about 3 years ago [5] that reports on a case study to try to get those Semantic Web Technologies to work for them in order to achieve data integration and a range of other things. There is also a more recent paper from the HCLS IG [6], where they aimed at not only linking of data but also querying of distributed data, using a mixture of RDF triple stores and SKOS. Both papers reveal their understanding of the purposes of SWT, and, moreover, what their goals are, their experimentation with various technologies to achieve them, and where there is still some work to do. There are notable achievements described in these, and related, papers, but the sought-after “killer app” is yet to be announced.

The lecture will cover a ‘historical’ overview and what more recent ontology-adopters focus on, the very basics of data integration approaches that motivated the development of ontologies, and we shall analyse some technological issues and challenges mentioned in [5] concerning Semantic Web (or not) technologies.

References:

[1] The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics, May 2000;25(1):25-9.

[2] Barry Smith, Michael Ashburner, Cornelius Rosse, Jonathan Bard, William Bug, Werner Ceusters, Louis J. Goldberg, Karen Eilbeck, Amelia Ireland, Christopher J Mungall, The OBI Consortium, Neocles Leontis, Philippe Rocca-Serra, Alan Ruttenberg, Susanna-Assunta Sansone, Richard H Scheuermann, Nigam Shah, Patricia L. Whetzel, Suzanna Lewis. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 25, 1251-1255 (2007).

[3] Joshua S. Madin, Shawn Bowers, Mark P. Schildhauer and Matthew B. Jones. (2008). Advancing ecological research with ontologies. Trends in Ecology & Evolution, 23(3): 159-168.

[4] Erhard Rahm. Data Integration in Bioinformatics and Life Sciences. EDBT Summer School, Bolzano, Sep. 2007.

[5] Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, Doherty D, Forsberg K, Gao Y, Kashyap V, Kinoshita J, Luciano J, Scott Marshall M, Ogbuji C, Rees J, Stephens S, Wong GT, Elizabeth Wu, Zaccagnini D, Hongsermeier T, Neumann E, Herman I, Cheung KH. Advancing translational research with the Semantic Web, BMC Bioinformatics, 8, 2007.

[6] Kei-Hoi Cheung, H Robert Frost, M Scott Marshall, Eric Prud’hommeaux, Matthias Samwald, Jun Zhao, and Adrian Paschke. A journey to Semantic Web query federation in the life sciences. BMC Bioinformatics 2009, 10(Suppl 10):S10

Note: references 1, 2, and (5 or 6) are mandatory reading, and 3 and 4 are recommended to read.

Lecture notes: lecture 8 – SWLS background and data integration

Course website