Bottom-up ontology development using bio-diagrams

Development of (bio-)ontologies takes up a lot of resources, especially when conducted manually. This is a well-known hurdle to overcome, and various strategies and tools for bottom-up ontology development have been proposed from a computing angle, such as the reverse engineering of databases and, most prominently in the bio-ontologies area, natural language processing (NLP) (e.g. [1,2] and a review by [3]). Both, however, generate a rather crude, noisy, and simple ontology that requires substantial manual intervention to clean up and to add ‘missing’ knowledge. Nevertheless, NLP provides at least a set of terms one can start with instead of starting with an empty screen and adding everything de novo. There is, however, a way to have your cake and eat it too: exploiting the plentiful diagrams in the life sciences.

Diagrams are very important in biology, and from early on in the education, students are taught to read and draw them. There is even a rule of thumb that one should be able to understand an article by reading the abstract, conclusions, and diagrams alone. Diagrams also summarise the accompanying text, or even can tell more than what is explained in the text. That much from the biology side. They can be useful from the computing angle as well. They are at least semi-structured (compared to natural language), with conventions about depicting lipid bi-layers, DNA, sequences of interactions by means of arrows, and so forth, and over the years more and more drawing applications have been developed. The nice thing (still for computing) is that those tools have an ‘alphabet’—legend—with permissible icons and colours and how they can be used in the diagrams. There are many diagrams that represent our understanding of biological reality.

Now, imagine that those diagrams can be transferred into an ontology in one fell swoop, and subsequently used for whatever purpose ontologies are being used (such as annotation, consistency checking, and finding implicit knowledge). And because those diagrams are more structured than natural language, we can obtain a richer ontology than with NLP alone—with less effort.


One thing is recognizing there’s much to be gained in improving bottom-up bio-ontology development by availing of such diagrams (already observed in [4]), another thing is how to go about doing this in the most effective way—not for just one diagram tool, but for any one. This problem I aim to tackle in the paper “Bottom-up ontology development reusing semi-structured life sciences diagrams”, which was recently accepted for the AFRICON’11 Special Session on Robotics and AI in Africa. This 6-page paper is a very condensed version of its 12-page draft, so not everything could be included. Nevertheless, it does give the basics of the method to formalize bio-diagrams in an ontology and a use case to demonstrate it.

The approach consists of a four-stage process: (i) choosing the appropriate language (OBO, SKOS, OWL, and arbitrary FOL are considered), (ii) inclusion of a foundational ontology (DOLCE, BFO, RO etc.), (iii) formalizing the icons of the diagram tool’s ‘legend’ (e.g., ‘enzyme’), and (iv) devising an algorithm to populate the TBox to mine the actual diagrams so that the individual components (e.g., ‘protease’) end up in the right position in the ontology. The main details are described in the paper.

Thus, this bottom-up method is not one of only formalising ‘legacy’ information, but also takes into account subject domain semantics that can be represented better by using a foundational ontology during the principal transformation of the diagram’s vocabulary. In addition to the more precise, formal, representation of the subject domain semantics, the use of a foundational ontology also increases interoperability.

The guidelines are demonstrated with a transformation of the Pathway Studio [6] diagrams into an OWLized (OWL 2 DL) bio-ontology with BFO and RO.

As an aside (from my perspective), it may be of interest to note that such formalized diagrams then can be deployed also as intermediate representation of the knowledge, which can facilitate understanding and communication between logicians and domain experts. And, for the financially challenged: it can bring the information modelled in such diagrams, which are often locked in expensive hardcopy textbooks and pay-per-view scientific articles, into the open access domain for free use and reuse.


[1] Alexopoulou D, Wachter T, Pickersgill L, Eyre C, Schroeder M. Terminologies for text-mining: an experiment in the lipoprotein metabolism domain. BMC Bioinformatics 2008;9(Suppl 4).

[2] Coulet A, Shah NH, Garten Y, Musen M, Altman RB. Using text to build semantic networks for pharmacogenomics. Journal of Biomedical Informatics 2010;43(6):1009-19.

[3] Liu K, Hogan WR, Crowley RS. Natural language processing methods and systems for biomedical ontology learning. Journal of Biomedical Informatics 2011;44(1):163-79.

[4] Keet CM. Factors affecting ontology development in ecology. In: Ludaescher B, Raschid L, editors. Data Integration in the Life Sciences 2005 (DILS2005); vol. 3615 of LNBI. Springer Verlag; 2005, p. 46-62. San Diego, USA, 20-22 July 2005.

[5] Keet CM. Bottom-up ontology development reusing semi-structured life sciences diagrams. AFRICON’11 — Special Session on Robotics and Artificial Intelligence in Africa, Livingstone, Zambia 13-15 September, 2011. IEEE (to appear).

[6] Nikitin A, Egorov S, Daraselia N, Mazo I. Pathway studio—the analysis and navigation of molecular networks. Bioinformatics 2003;19(16):2155-2157.


7 responses to “Bottom-up ontology development using bio-diagrams

    • Ontologies can be usefully applied to solve a range of problems, such as database and application integration, coordination among web services, and improving the quality of conceptual data models. I would not call that “our savior”.

      • you must become the poster girl for ontologies … and get more people into your cult … you must spread the word and influence the young ones ..

    • Hola Rey,
      I’m not sure I understand you question. If you are asking if a foundational ontology (as artifact) is one that contains the most general entities–such as, endurant, process, participation, part-of–then the answer is yes (they need not necessarily be only those concerning human activity though, and equally well can–should be able to be–applied to nature).
      Cordiales saludos,

      • Hi Maria:
        No, I mean, if somebody constructs an ontology (from scratch), containing the description of human activity in most general terms. For example, in any human activity there is at least one subject, an object, the external conditions where activity is taking place, etc. An there are general object relations: Subject acts on Object, Subject acts in External Conditions, etc.


      • Hi Rey,

        that may be a “top-domain” that has generic entities [*] for a specific domain or “domain” ontology with more detailed and often also entities specific to that subject domain, which may very well (and is expected to) reuse some of the generic entities of a foundational ontology. For instance, a top-domain ontology in cell biology has such general things like Protein, Environment (which do not appear in a foundational ontology), and a domain ontology entities with like 3-chlorobenzoate.

        The boundaries of what goes in which ontology are not hard ones though–at least, I have not come across unambiguous and widely agreed-upon theoretical and operational criteria other than the clear distinctions between a foundational and a domain ontology (ok, and “application ontology” that is actually a conceptual data model).

        Kind regards,

        [*] entities being n-ary predicates where n>=1; called classes and object properties in OWL, elsewhere universals and relationship etc.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.