Lecture notes for the ontologies and knowledge bases course

The regular reader may recollect earlier posts about the ontology engineering courses I have taught at FUB, UH, UCI, Meraka, and UKZN. Each one had some sort of syllabus or series of blog posts with some introductory notes. I’ve put them together and extended them significantly now for the current installment of the Ontologies and Knowledge Bases Honours module (COMP718) at UKZN, and they are bound and printed into lecture notes for the enrolled students. These lecture notes are now online and I will add accompanying slides on the module’s webpage as we go along in the semester.

Given that the target audience is computer science students in their 4th year (honours), the notes are of an introductory nature. There are essentially three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, basics of Description Logics with ALC, all the DL-based OWL species, and some automated reasoning. The ontology engineering block covers top-down and bottom-up ontology development, and methods and methodologies, with top-down ontology development including mainly foundational ontologies and part-whole relations, and bottom-up the various approaches to extract knowledge from ‘legacy’ representations, such as from databases and thesauri. The advanced topics are balanced in two directions: one is toward ontology-based data access applications (i.e., an ontology-drive information system) and the other one has more theory with temporal ontologies.

Each chapter has a section with recommended/required reading and a set of exercises.

Unsurprisingly, the lecture notes have been written under time constraints and therefore the level of relative completeness of sections varies slightly. Suggestions and corrections are welcome!

The DiDOn method to develop bio-ontologies from semi-structured life science diagrams

It is well-known among (bio-)ontology developers that ontology development is a resource-consuming task (see [1] for data backing up this claim). Several approaches and tools do exists that speed up the time-consuming efforts of bottom-up ontology development, most notably natural language processing and database reverse engineering. They are generic and the technologies have been proposed from a computing angle, and are therefore noisy and/or contain many heuristics to make them fit for bio-ontology development. Yet, the most obvious one from a domain expert perspective is unexplored: the abundant diagrams in the sciences that function as existing/’legacy’ knowledge representation of the subject domain. So, how can one use them to develop domain ontologies?

The new DiDOn procedure—from Diagram to Domain Ontology—can speed up and simplify bio-ontology development by exploiting the knowledge represented in such semi-structured bio-diagrams. It does this by means of extracting explicit and implicit knowledge, preserving most of the subject domain semantics, and making formalisation decisions explicit, so that the process is done in a clear, traceable, and reproducible way.

DiDOn is a detailed, micro-level, procedure to formalise those diagrams in a logic of choice; it provides migration paths into OBO, SKOS, OWL and some arbitrary FOL, and guidelines which axioms, and how, have to be added to the bio-ontology. It also uses a foundational ontology so as to obtain more precise and interoperable subject domain semantics than otherwise would have been possible with syntactic transformations alone. (Choosing an appropriate foundational ontology is a separate topic and can be done wit, e.g., ONSET.)

The paper describing the rationale and details, Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn [2], has just been accepted at the Journal of Biomedical Informatics. They require a graphical abstract, so here it goes:

DiDOn consists of two principal steps: (1) formalising the ‘icon vocabulary’ of a bio-drawing tool, which then functions as a seed ontology, and (2) populating the seed ontology by processing the actual diagrams. The algorithm in the second step is informed by the formalisation decisions taken in the first step. Such decisions include, among others, the representation language and how to represent the diagram’s n-aries (with n≥2, such as choosing between n-aries as relationship or reified as classes).

In addition to the presentation of DiDOn, the paper contains a detailed application of it with Pathway Studio as case study.

The neatly formatted paper is behind a paywall for those with no or limited access to Elsevier’s journals, but the accepted manuscript is openly accessible from my home page.

References

[1] Simperl, E., Mochol, M., Bürger, T. Achieving maturity: the state of practice in ontology engineering in 2009. International Journal of Computer Science and Applications, 2010, 7(1):45-65.

[2] Keet, C.M. Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn. Journal of Biomedical Informatics. In print. DOI: http://dx.doi.org/10.1016/j.jbi.2012.01.004