72010 SemWebTech lectures 3+4: Ontology Engineering Top-down and Bottom-up

Ontology languages were introduced in the previous two lectures, but one can ask oneself what then an ontology is (and if, perhaps, the two topics should have been done in reverse order). There is no unanimously agreed-upon definition what an ontology is and proposed definitions within computer science have changed over the past 20 years (see e.g., [1] and here). For better or worse, currently, the tendency is toward it being equivalent to a logical theory—even formalizing a thesaurus in OWL then ends up as a simple ‘ontology’ (e.g., the NCI thesaurus as cancer ontology) and a conceptual data model originally in EER or UML becomes an ‘application ontology’ by virtue of being formalized in OWL. Salient aspects of the merits of one definition and other will pass the revue at the beginning of the lecture on the 23rd of November. For an initial indication of ‘things that have to do with an ontology’, I include the Ontology Summit’s “Dimension map” that is by its authors intended as a “Template for discourse” about ontologies, which has a brief and longer explanation.

The “Dimension map” of ontologies, made by the attendees of the Ontology Summit 2007 (intended as a “Template for discourse”)

The main focus of lectures 3 and 4, however, will be devoted to ontology development: having a language is one thing and in the lab of 24-11 you will practice with software-supported ontology development environments, but what to represent, and how, is quite another. Where do you start? How can you avoid reinventing the wheel? What things can guide you to make the process easier to carry out successfully? How can you make the best of ‘legacy’ material? There are two principal approaches, being the so-called top-down and bottom-up ontology development, which will be the topic of the lectures on 23 and 24 November.

Top-down ontology development

The basic starting point for top-down ontology development is to think of, and decide about, core principles. For instance, do you commit to a 3D view with objects persisting in time or a perdurantist one with space-time worms, are you concerned with (in OWL terminology) classes or individuals, is your ontology intended to be descriptive or prescriptive (see, e.g. [2,3])? Practically, the different answers to such questions end up as different foundational ontologies—even with the same answers they may be different. Foundational ontologies provide a high-level categorization about the kinds of things you will model, such as process, non-agentive-physical-object, and (what are and) how to represent ‘attributes’ (e.g., as qualities or some kind of dependent continuant or trope.).

There are several such foundational ontologies, such as DOLCE, BFO, GFO, natural language focused GUM, and SUMO. Within the Wonderweb project, the participants realized it might not be feasible to have one singe foundational ontology that pleases everybody; hence, the idea was to have a library of foundational ontologies with appropriate mappings between them so that each modeller can chose his or her pet ontology and the system will sort out the rest regarding the interoperability of ontologies that use different foundational ontologies. The basis for this has been laid with the Wonderweb deliverable D18 [3], but an implementation is yet to be done. One of the hurdles to realize this, is that people who tend to be interested in foundational ontologies start out formalizing the basic categories in a logic of their convenience (which is not OWL). For instance, DOLCE—the Descriptive Ontology for Linguistic and Cognitive Engineering—has a paper-based formalisation in a first order predicate logic, and subsequent trimming down in lite and ultralite OWL versions. BFO—the Basic Formal Ontology—too, as well as a version of it in Isabelle syntax, but this version focuses on the mereological basis only.

In the meantime, leaner OWL versions of DOLCE and BFO have been made available, which are intended to be used for development of ontologies in one’s domain of interest. These files can be found on their respective websites at the LOA and IFOMIS. To read them leisurely and make a comparison—and finding any correspondence—of the two foundational ontologies somewhat easier, I have exported the DOLCE-lite and BFO 1.1 OWL versions in a Description Logics representation and Manchester syntax rendering (generated with the Protégé ontology development tool). Whereas DOLCE-Lite is encoded in SHI, BFO is simpler (in ALC); that is, neither one uses all OWL-DL capabilities of SHOIN(D). Another difference is that BFO-in-owl is only a bare taxonomy (extensions do exist though), whereas DOLCE-Lite makes heavy use of object properties. More aspects of both foundational ontologies will be addressed in the lecture.

A different approach to the reuse of principal notions, is to use ontology design patterns (ODPs), which is inspired by the idea of software design patterns. Basically, ODPs provide mini-ontologies with formalised knowledge for how to go about modelling reusable pieces, e.g. an -ary relation or a relation between data type values, in an ontology (in OWL-DL), so that one can do that consistently throughout the ontology development and across ontologies. ODPs for specific subject domains are called content ODPs, such as the ‘sales and purchase order contracts’ or the ‘agent role’ to represent agents, the roles they play, and the relations between them, and even an attempt to consistently represent the classification scheme invented by Linnaeus with an ODP.

There are several different types of ODPs, which are summarized in the following figure (click to enlarge).

Taxonomy of ODPs

During the lecture, the main aspects of DOLCE, BFO, and the ODPs will be elaborated on.

Bottom-up ontology development

Bottom-up ontology development starts from the other end of the spectrum, where it may be that the process is at least informed by foundational ontologies. Principally, one can distinguish between (i) transforming information or knowledge represented in one logic into an OWL species, (ii) transforming somewhat structured information into an OWL species, (iii) starting at the base. Practically, this means starting from some ‘legacy’ (i.e., not-SemWeb) material, such as, but not limited to:

  • Databases
  • Conceptual models (ER, UML)
  • Frame-based systems
  • OBO format
  • Thesauri
  • Biological models
  • Excel sheets
  • Tagging, folksonomies
  • Output of text mining, machine learning, clustering

The following figure gives an idea as to how far one has to ‘travel’ from the legacy representation to a ‘SemWeb compliant’ one (and, correspondingly, put more effort in to realize it).

Less and more structured things, with corresponding lower to higher ontological precision of the subject domain represented with the language

Given the limited time available, we shall not discuss all variants. Instead, we shall focus first on taking databases as source material. Some rather informal points about reverse engineering from databases to ontologies will be structured briefly, to subsequently take a formal turn with [5]. Imperfect transformations from other languages, such as the common OBO format [6] and a pure frames-based approach [7], are available as well, which also describe the challenges to create them. While the latter two do serve a user base, their overall impact on widespread bottom-up development is very likely to be less than the potential that might possibly be unlocked with leveraging knowledge of existing (relational) databases. One may be led to assume this holds, too, for text processing (NLP) as starting point for semi-automated ontology development, but the results have not been very encouraging yet (it will be discussed in lecture 10).

Two examples that, by basic idea at least, can have a large impact on domain ontology development will be described during the lecture: taking biological models (or any other structured graphical representation) as basis [8]—which amounts to formalizing the graphical vocabulary in textbooks and drawing tools—and the rather more cumbersome one of sorting out thesauri [9,10], which faces problems such as what to do with its basic notions (e.g., “RT: related term”) in a more expressive OWL ontology. Both examples have abundant similar instances in science, medicine, industry, and government, and, undoubtedly, some more automation to realize it would be a welcome addition to ease the efforts to realize the Semantic Web.

References

[1] Guarino, N. Formal Ontology in Information Systems. Proceedings of FOIS’98, Trento, Italy, June 6-8, 1998. IOS Press, Amsterdam, pp. 3-15.

[2] Barry Smith. Beyond Concepts, or: Ontology as Reality Representation. Achille Varzi and Laure Vieu (eds.), Formal Ontology and Information Systems. Proceedings of the Third International Conference (FOIS 2004), Amsterdam: IOS Press, 2004, 73-84.

[3] Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A. WonderWeb Deliverable D18–Ontology library. WonderWeb. 2003.

[4] Presutti, V., Gangemi, A., David, S., de Cea, G. A., Surez-Figueroa, M. C., Montiel-Ponsoda, E., Poveda, M. A library of ontology design patterns: reusable solutions for collaborative design of networked ontologies. NeOn deliverable D2.5.1, Institute of Cognitive Sciences and Technologies (CNR). 2008.

[5] L. Lubyte, S. Tessaris. Automatic Extraction of Ontologies Wrapping Relational Data Sources. In Proc. of the 20th International Conference on Database and Expert Systems Applications (DEXA 2009). To appear.

[6] Christine Golbreich and Ian Horrocks. The OBO to OWL mapping, GO to OWL 1.1! In Proc. of the Third OWL Experiences and Directions Workshop, number 258 in CEUR (http://ceur-ws.org/), 2007. See also with wiki page on oboInOwl

[7] Zhang S, Bodenreider O, Golbreich C. Experience in reasoning with the Foundational Model of Anatomy in OWL-DL. In: Pacific Symposium on Biocomputing 2006, Altman RB, Dunker AK, Hunter L, Murray TA, Klein TE, (Eds.). World Scientific, 2006, 200-211.

[8] Keet, C.M. Factors affecting ontology development in ecology. Data Integration in the Life Sciences 2005 (DILS’05), Ludaescher, B, Raschid, L. (eds.). San Diego, USA, 20-22 July 2005. Lecture Notes in Bioinformatics LNBI 3615, Springer Verlag, 2005. pp46-62.

[9] Dagobert Soergel, Boris Lauser, Anita Liang, Frehiwot Fisseha, Johannes Keizer and Stephen Katz. Reengineering thesauri for new applications: the AGROVOC example. Journal of Digital Information 4(4) (2004)

[10] Maria Angela Biasiotti, Meritxell Fernández-Barrera. Enriching Thesauri with Ontological Information: Eurovoc Thesaurus and DALOS Domain Ontology of Consumer law. Proceedings of the Third Workshop on Legal Ontologies and Artificial Intelligence Techniques (LOAIT 2009). Barcelona, Spain, June 8, 2009.

Note: references 1, 2, 5, 8 are mandatory reading, 3 and 4 are strongly recommended to read at least in part, and 6, 7, 9, and 10 are optional.

Lecture notes: lecture 3 – Top-down and lecture 4 – Bottom-up

Course webpage

Advertisements

6 responses to “72010 SemWebTech lectures 3+4: Ontology Engineering Top-down and Bottom-up

  1. Pingback: Ontological realism, methodologies, and mud slinging: a few notes on the AO trilogy « Keet blog

  2. Pingback: 2010 in (blog) review « Keet blog

  3. Pingback: 2010 in (blog) review « Keet blog

  4. Pingback: Five years of keet blog « Keet blog

  5. Pingback: 8 years of keetblog | Keet blog

  6. Hiya very nice web site!! Man .. Excellent .. Amazing ..

    I’ll bookmark your web site and take the feeds also? I am happy to
    find numerous useful information right here in the put
    up, we want work out more techniques on this regard, thanks
    for sharing. . . . . .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s