72010 SemWebTech lecture 6: Parts and temporal aspects

The previous three lectures covered the core topics in ontology engineering. There are many ontology engineering topics that zoom in on one specific aspect of the whole endeavour, such as modularization, the semantic desktop, ontology integration, combining data mining and clustering with ontologies, and controlled natural language interfaces to OWL. In the next two lectures on Dec 1 and Dec 14, we will look at three such advanced topics in modelling and language and tool development, being the (ever recurring) issues with part-whole relations, temporalizations and its workarounds, and languages and tools for dealing with vagueness and uncertainty.

Part-whole relations

On the one hand, there is a SemWeb best practices document about part-whole relations and further confusion by OWL developers [1, 2] that was mentioned in a previous lecture.  On the other hand, part-whole relations are deemed essential by the most active adopters of ontologies—i.e., bio- and medical scientist—while its full potential is yet to be discovered by, among others, manufacturing. A few obvious examples are how to represent plant or animal anatomy, geographic information data, and components of devices. And then the need to reason over it. When we can deduce which part of the device is broken, then only that part has to be replaced instead of the whole it is part of (saving a company money). One may want to deduce that when I have an injury in my ankle, I have an injury in my limb, but not deduce that if you have an amputation of your toe, you also have an amputation of your foot that the toe is (well, was) part of. If a toddler swallowed a Lego brick, it is spatially contained in his stomach, but one does not deduce it is structurally part of his stomach (normally it will leave the body unchanged through the usual channel). This toddler-with-lego-brick gives a clue why, from an ontological perspective, equation 23 in [2] is incorrect.

To shed light on part-whole relations and sort out such modelling problems, we will look first at mereology (the Ontology take on part-whole relations), and to a lesser extent meronymy (from linguistics), and subsequently structure the different terms that are perceived to have something to do with part-whole relations into a taxonomy of part-whole relations [3]. This, in turn, is to be put to use, be it with manual or software-supported guidelines to choose the most appropriate part-whole relation for the problem, and subsequently to make sure that is indeed represented correctly in an ontology. The latter can be done by availing of the so-called RBox Reasoning Service [3]. All this will not solve each modelling problem of part-whole relations, but at least provide you with a sound basis.

Temporal knowledge representation and reasoning

Compared to part-whole relations, there are fewer loud and vocal requests for including a temporal dimension in OWL, even though it is needed. For instance, you can check the annotations in the OWL files of BFO and DOLCE (or, more conveniently, search for “time” in the pdf) where they mention temporality that cannot be represented in OWL, or SNOMED CT’s concepts like “Biopsy, planned” and “Concussion with loss of consciousness for less than one hour” where the loss of consciousness still can be before or after the concussion, or a business rule alike ‘RentalCar must be returned before Deposit is reimbursed’ or the symptom HairLoss during the treatment Chemotherapy, and Butterfly is a transformation of Caterpillar.

Unfortunately, there is no single (computational) solution to address all these examples at once. Thus far, it is a bit of a patchwork, with, among many aspects, the Allen’s interval algebra (qualitative temporal relations, such as before, during, etc.), Linear Temporal Logics (LTL), and Computational Tree Logics (CTL, with branching time), and a W3C Working draft of a time ontology.

If one assumes that recent advances in temporal Description Logics may have the highest chance of making it into a temporal OWL (tOWL)—although there are no proof-of-concept temporal DL modelling tools or reasoners yet—then the following is ‘on offer’. A very expressive (undecidable) DL language is DLRus (with the until and since operators), which already has been used for temporal conceptual data modelling [4] and for representing essential and immutable parts and wholes [5]. A much simpler language is TDL-Lite [6], which is a member of the DL-Lite family of DL languages of which one is the basis for OWL 2 QL; but these first results are theoretical, hence no “lite tOWL” yet. It is already known that EL++ (the basis for OWL 2 EL) does not keep the nice computational properties when extended with LTL, and results with EL++ with CTL are not out yet. If you are really interested in the topic, you may want to have a look at a recent survey [7] or take a broader scope with any of the four chapters in [8] (that cover temporal KR&R, situation calculus, event calculus, and temporal action logics), and several people with the KRDB Research Centre work on temporal knowledge representation & reasoning.  Depending on the remaining time during the lecture, more or less about time and temporal ontologies will pass the revue.

References

[1] I. Horrocks, O. Kutz, and U. Sattler. The Even More Irresistible SROIQ. In Proc. of the 10th International Conference of Knowledge Representation and Reasoning (KR-2006), Lake District UK, 2006.

[2] B. Cuenca Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel-Schneider, and U. Sattler. OWL 2: The next step for OWL. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 6(4):309-322, 2008

[3] Keet, C.M. and Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, IOS Press, 2008, 3(1-2): 91-110.

[4] Alessandro Artale, Christine Parent, and Stefano Spaccapietra. Evolving objects in temporal information systems. Annals of Mathematics and Artificial Intelligence (AMAI), 50:5-38, 2007, Springer.

[5] Artale, A., Guarino, N., and Keet, C.M. Formalising temporal constraints on part-whole relations. 11th International Conference on Principles of Knowledge Representation and Reasoning (KR’08). Gerhard Brewka, Jerome Lang (Eds.) AAAI Press, pp 673-683. Sydney, Australia, September 16-19, 2008

[6] Alessandro Artale, Roman Kontchakov, Carsten Lutz, Frank Wolter and Michael Zakharyaschev. Temporalising Tractable Description Logics. Proc. of the 14th International Symposium on Temporal Representation and Reasoning (TIME-07), Alicante, June 2007.

[7] Carsten Lutz, Frank Wolter, and Michael Zakharyaschev.  Temporal Description Logics: A Survey. In  Proceedings of the Fifteenth International Symposium on Temporal Representation and Reasoning. IEEE Computer Society Press, 2008.

[8] Frank van Harmelen, Vladimir Lifschitz and Bruce Porter (Eds.). Handbook of Knowledge Representation. Elsevier, 2008, 1034p. (also available from the uni library)

Note: reference 3 is mandatory reading, 4 optional reading, 2 was mandatory and 1 recommended for an earlier lecture, and 5-8 are optional.

Lecture notes: lecture 6 – Parts and temporal issues

Course webpage

Advertisements

72010 SemWebTech lecture 5: Methods and Methodologies

The previous two lectures have given you a basic idea about the two principal approaches for starting developing an ontology—top-down and bottom-up—but they do not constitute an encompassing methodology to develop ontologies. In fact, there is no proper, up-to-date comprehensive methodology for ontology development like there is for conceptual model development (e.g., [1]) or ‘waterfall’ versus ‘agile’ software development methodologies. There are many methods and, among others, the W3C’s Semantic Web best practices, though, which to a greater or lesser extent can form part of a comprehensive ontology development methodology.

As a first step towards methodologies that gives a general scope, we will look at a range of parameters that affect ontology development in one way or another [2]. There are four influential factors to enhance the efficiency and effectiveness of developing ontologies, which have to do with the purpose(s) of the ontology; what to reuse from existing ontologies and ontology-like artifacts and how to reuse them; the types of approaches for bottom-up ontology development from other legacy sources; and the interaction with the choice of representation language and reasoning services.

Second, methods that helps the ontologist in certain tasks of the ontology engineering process include, but are not limited to, assisting the modelling itself, how to integrate ontologies, and supporting software tools. We will take a closer look at OntoClean [3] that contributes to modelling taxonomies. One might ask oneself: who cares, after all we have the reasoner to classify our taxonomy anyway, right? Indeed, but that works only if you have declared many properties for the classes, which is not always the case, and the reasoner sorts out the logical issues, but not the ontological issues. OntoClean uses several notions from philosophy, such as rigidity, identity criteria, and unity [4, 5] to provide modelling guidelines. For instance, that anti-rigid properties cannot subsume rigid properties; e.g., if we have, say, both Student and Person in our ontology, the former is subsumed by the latter. The lecture will go into some detail of OntoClean.

If, on the other hand, you do have a rich ontology and not mostly a bare taxonomy, ‘debugging’ by availing of an automated reasoner is useful in particular with larger ontologies and ontologies represented in an expressive ontology language. Such ‘debugging’ goes under terms like glass box reasoning [6], justification [7], explanation [8], and pinpointing errors. While they are useful topics, we will spend comparatively little time on it, because it requires some more knowledge of Description Logics and its (mostly tableaux-based) reasoning algorithms that will be introduced only in the 2nd semester (mainly intended for the EMCL students). Those techniques use the automated reasoner to at least locate modelling errors and explain in the most succinct way why this is so, instead of just returning a bunch of inconsistent classes; proposing possible fixes is yet a step further (one such reasoning service will be presented in lecture 6 on Dec. 1).

Aside from parameters, methods, and tools, there are only few methodologies, which are even coarse-grained: they do not (yet) contain all the permutations at each step, i.e. what and how to do each step, given the recent developments. A comparatively comprehensive one is Methontology [10], which has been applied to various subject domains (e.g., chemicals, legal domain [9,11]) since its development in the late 1990s. While some practicalities are superseded with new [12] and even newer languages and tools, some of the core aspects still hold. The five main steps are: specification, conceptualization (with intermediate representations, such as in text or diagrams, like with ORM [1] and pursued by the modelling wiki MOKI that was developed during the APOSDLE project for work-integrated learning), formalization, implementation, and maintenance. Then there are various supporting tasks, such as documentation and version control.

Last, but not least, there are many tools around that help you with one method or another. WebODE aims to support Methontology, the NeOn toolkit aims to support distributed development of ontologies, RacerPlus for sophisticated querying, Protégé-PROMPT for ontology integration (there are many other plug-ins for Protégé), SWOOGLE to search across ontologies, OntoClean with Protégé, and so on and so forth. For much longer listings of tools, see the list of semantic web development tools, the plethora of ontology reasoners and editors, and range of semantic wiki projects engines and features for collaborative ontology development. Finding the right tool to solve the problem at hand (if it exists) is a skill of its own and it is a necessary one to find a feasible solution to the problem at hand. From a technologies viewpoint, the more you know about the goals, features, strengths, and weaknesses of available tools (and have the creativity to develop new ones, if needed), the higher the likelihood you bring a potential solution of a problem to successful completion.

References

[1] Halpin, T., Morgan, T.: Information modeling and relational databases. 2nd edn. Morgan Kaufmann (2008)

[2] Keet, C.M. Ontology design parameters for aligning agri-informatics with the Semantic Web. 3rd International Conference on Metadata and Semantics (MTSR’09) — Special Track on Agriculture, Food & Environment, Oct 1-2 2009 Milan, Italy. F. Sartori, M.A. Sicilia, and N. Manouselis (Eds.), Springer CCIS 46, 239-244.

[3] Guarino, N. and Welty, C. An Overview of OntoClean. in S. Staab, R. Studer (eds.), Handbook on Ontologies, Springer Verlag 2004, pp. 151-172

[4] Guarino, N., Welty, C.: A formal ontology of properties. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNAI, vol. 1937, pp. 97–112. Springer, Heidelberg (2000)

[5] Guarino, N., Welty, C.: Identity, unity, and individuality: towards a formal toolkit for ontological analysis. In: Proc. of ECAI 2000. IOS Press, Amsterdam (2000)

[6] Parsia, B., Sirin, E., Kalyanpur, A. Debugging OWL ontologies. World Wide Web Conference (WWW 2005). May 10-14, 2005, Chiba, Japan.

[7] M. Horridge, B. Parsia, and U. Sattler. Laconic and Precise Justifications in OWL. In Proc. of the 7th International Semantic Web Conference (ISWC 2008), Vol. 5318 of LNCS, Springer, 2008.

[8] Alexander Borgida, Diego Calvanese, and Mariano Rodriguez-Muro. Explanation in the DL-Lite family of description logics. In Proc. of the 7th Int. Conf. on Ontologies, DataBases, and Applications of Semantics (ODBASE 2008), LNCS vol 5332, 1440-1457. Springer, 2008.

[9] Fernandez, M.; Gomez-Perez, A. Pazos, A.; Pazos, J. Building a Chemical Ontology using METHONTOLOGY and the Ontology Design Environment. IEEE Expert: Special Issue on Uses of Ontologies, January/February 1999, 37-46.

[10] Gomez-Perez, A.; Fernandez-Lopez, M.; Corcho, O. Ontological Engineering. Springer Verlag London Ltd. 2004.

[11] Oscar Corcho, Mariano Fernández-López, Asunción Gómez-Pérez, Angel López-Cima. Building legal ontologies with METHONTOLOGY and WebODE. Law and the Semantic Web 2005. Springer LNAI 3369, 142-157.

[12] Corcho, O., Fernandez-Lopez, M. and Gomez-Perez, A. (2003). Methodologies, tools and languages for building ontologies. Where is their meeting point?. Data & Knowledge Engineering 46(1): 41-64.

Note: references 2, 3, and 9 are mandatory reading, 6, 7, and 10 recommended, and 1, 4, 5, 8, 11, and 12 are optional.

Lecture notes: lecture 5 – Methodologies

Course webpage

An official response on the dirty war index

A while a go I posted an informal review of the dirty war index (DWI) and its tightly related problem of biased databases, which was in response to Hicks and Spagat’s paper about the DWI [1] in PLoS Medicine. In the meantime, I have written that in an article-style format and have structured the arguments into one narrative; this just got published [2] in the Fall issue of the Peace & Conflict Review journal.

Like PLoS, PCR is an open access journal, so the short article is freely available online in html and pdf format that is coordinated and provided by the UN-mandated University for Peace. When you download the pdf, you get the rest of the Fall issue with it, which has a special feature with an analysis of Obama’s Prague speech on moral responsibility of the United States, an assessment of development models in conflict settings, and much more.

References

[1] Hicks MH-R, Spagat M (2008) The Dirty War Index: A public health and human rights tool for examining and monitoring armed conflict outcomes. PLoS Med 5(12): e243. doi:10.1371/journal.pmed.0050243.

[2] Keet, C.M. (2009). Dirty wars, databases, and indices. Peace & Conflict Review, 4(1):75-78.

72010 SemWebTech lectures 3+4: Ontology Engineering Top-down and Bottom-up

Ontology languages were introduced in the previous two lectures, but one can ask oneself what then an ontology is (and if, perhaps, the two topics should have been done in reverse order). There is no unanimously agreed-upon definition what an ontology is and proposed definitions within computer science have changed over the past 20 years (see e.g., [1] and here). For better or worse, currently, the tendency is toward it being equivalent to a logical theory—even formalizing a thesaurus in OWL then ends up as a simple ‘ontology’ (e.g., the NCI thesaurus as cancer ontology) and a conceptual data model originally in EER or UML becomes an ‘application ontology’ by virtue of being formalized in OWL. Salient aspects of the merits of one definition and other will pass the revue at the beginning of the lecture on the 23rd of November. For an initial indication of ‘things that have to do with an ontology’, I include the Ontology Summit’s “Dimension map” that is by its authors intended as a “Template for discourse” about ontologies, which has a brief and longer explanation.

The “Dimension map” of ontologies, made by the attendees of the Ontology Summit 2007 (intended as a “Template for discourse”)

The main focus of lectures 3 and 4, however, will be devoted to ontology development: having a language is one thing and in the lab of 24-11 you will practice with software-supported ontology development environments, but what to represent, and how, is quite another. Where do you start? How can you avoid reinventing the wheel? What things can guide you to make the process easier to carry out successfully? How can you make the best of ‘legacy’ material? There are two principal approaches, being the so-called top-down and bottom-up ontology development, which will be the topic of the lectures on 23 and 24 November.

Top-down ontology development

The basic starting point for top-down ontology development is to think of, and decide about, core principles. For instance, do you commit to a 3D view with objects persisting in time or a perdurantist one with space-time worms, are you concerned with (in OWL terminology) classes or individuals, is your ontology intended to be descriptive or prescriptive (see, e.g. [2,3])? Practically, the different answers to such questions end up as different foundational ontologies—even with the same answers they may be different. Foundational ontologies provide a high-level categorization about the kinds of things you will model, such as process, non-agentive-physical-object, and (what are and) how to represent ‘attributes’ (e.g., as qualities or some kind of dependent continuant or trope.).

There are several such foundational ontologies, such as DOLCE, BFO, GFO, natural language focused GUM, and SUMO. Within the Wonderweb project, the participants realized it might not be feasible to have one singe foundational ontology that pleases everybody; hence, the idea was to have a library of foundational ontologies with appropriate mappings between them so that each modeller can chose his or her pet ontology and the system will sort out the rest regarding the interoperability of ontologies that use different foundational ontologies. The basis for this has been laid with the Wonderweb deliverable D18 [3], but an implementation is yet to be done. One of the hurdles to realize this, is that people who tend to be interested in foundational ontologies start out formalizing the basic categories in a logic of their convenience (which is not OWL). For instance, DOLCE—the Descriptive Ontology for Linguistic and Cognitive Engineering—has a paper-based formalisation in a first order predicate logic, and subsequent trimming down in lite and ultralite OWL versions. BFO—the Basic Formal Ontology—too, as well as a version of it in Isabelle syntax, but this version focuses on the mereological basis only.

In the meantime, leaner OWL versions of DOLCE and BFO have been made available, which are intended to be used for development of ontologies in one’s domain of interest. These files can be found on their respective websites at the LOA and IFOMIS. To read them leisurely and make a comparison—and finding any correspondence—of the two foundational ontologies somewhat easier, I have exported the DOLCE-lite and BFO 1.1 OWL versions in a Description Logics representation and Manchester syntax rendering (generated with the Protégé ontology development tool). Whereas DOLCE-Lite is encoded in SHI, BFO is simpler (in ALC); that is, neither one uses all OWL-DL capabilities of SHOIN(D). Another difference is that BFO-in-owl is only a bare taxonomy (extensions do exist though), whereas DOLCE-Lite makes heavy use of object properties. More aspects of both foundational ontologies will be addressed in the lecture.

A different approach to the reuse of principal notions, is to use ontology design patterns (ODPs), which is inspired by the idea of software design patterns. Basically, ODPs provide mini-ontologies with formalised knowledge for how to go about modelling reusable pieces, e.g. an -ary relation or a relation between data type values, in an ontology (in OWL-DL), so that one can do that consistently throughout the ontology development and across ontologies. ODPs for specific subject domains are called content ODPs, such as the ‘sales and purchase order contracts’ or the ‘agent role’ to represent agents, the roles they play, and the relations between them, and even an attempt to consistently represent the classification scheme invented by Linnaeus with an ODP.

There are several different types of ODPs, which are summarized in the following figure (click to enlarge).

Taxonomy of ODPs

During the lecture, the main aspects of DOLCE, BFO, and the ODPs will be elaborated on.

Bottom-up ontology development

Bottom-up ontology development starts from the other end of the spectrum, where it may be that the process is at least informed by foundational ontologies. Principally, one can distinguish between (i) transforming information or knowledge represented in one logic into an OWL species, (ii) transforming somewhat structured information into an OWL species, (iii) starting at the base. Practically, this means starting from some ‘legacy’ (i.e., not-SemWeb) material, such as, but not limited to:

  • Databases
  • Conceptual models (ER, UML)
  • Frame-based systems
  • OBO format
  • Thesauri
  • Biological models
  • Excel sheets
  • Tagging, folksonomies
  • Output of text mining, machine learning, clustering

The following figure gives an idea as to how far one has to ‘travel’ from the legacy representation to a ‘SemWeb compliant’ one (and, correspondingly, put more effort in to realize it).

Less and more structured things, with corresponding lower to higher ontological precision of the subject domain represented with the language

Given the limited time available, we shall not discuss all variants. Instead, we shall focus first on taking databases as source material. Some rather informal points about reverse engineering from databases to ontologies will be structured briefly, to subsequently take a formal turn with [5]. Imperfect transformations from other languages, such as the common OBO format [6] and a pure frames-based approach [7], are available as well, which also describe the challenges to create them. While the latter two do serve a user base, their overall impact on widespread bottom-up development is very likely to be less than the potential that might possibly be unlocked with leveraging knowledge of existing (relational) databases. One may be led to assume this holds, too, for text processing (NLP) as starting point for semi-automated ontology development, but the results have not been very encouraging yet (it will be discussed in lecture 10).

Two examples that, by basic idea at least, can have a large impact on domain ontology development will be described during the lecture: taking biological models (or any other structured graphical representation) as basis [8]—which amounts to formalizing the graphical vocabulary in textbooks and drawing tools—and the rather more cumbersome one of sorting out thesauri [9,10], which faces problems such as what to do with its basic notions (e.g., “RT: related term”) in a more expressive OWL ontology. Both examples have abundant similar instances in science, medicine, industry, and government, and, undoubtedly, some more automation to realize it would be a welcome addition to ease the efforts to realize the Semantic Web.

References

[1] Guarino, N. Formal Ontology in Information Systems. Proceedings of FOIS’98, Trento, Italy, June 6-8, 1998. IOS Press, Amsterdam, pp. 3-15.

[2] Barry Smith. Beyond Concepts, or: Ontology as Reality Representation. Achille Varzi and Laure Vieu (eds.), Formal Ontology and Information Systems. Proceedings of the Third International Conference (FOIS 2004), Amsterdam: IOS Press, 2004, 73-84.

[3] Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A. WonderWeb Deliverable D18–Ontology library. WonderWeb. 2003.

[4] Presutti, V., Gangemi, A., David, S., de Cea, G. A., Surez-Figueroa, M. C., Montiel-Ponsoda, E., Poveda, M. A library of ontology design patterns: reusable solutions for collaborative design of networked ontologies. NeOn deliverable D2.5.1, Institute of Cognitive Sciences and Technologies (CNR). 2008.

[5] L. Lubyte, S. Tessaris. Automatic Extraction of Ontologies Wrapping Relational Data Sources. In Proc. of the 20th International Conference on Database and Expert Systems Applications (DEXA 2009). To appear.

[6] Christine Golbreich and Ian Horrocks. The OBO to OWL mapping, GO to OWL 1.1! In Proc. of the Third OWL Experiences and Directions Workshop, number 258 in CEUR (http://ceur-ws.org/), 2007. See also with wiki page on oboInOwl

[7] Zhang S, Bodenreider O, Golbreich C. Experience in reasoning with the Foundational Model of Anatomy in OWL-DL. In: Pacific Symposium on Biocomputing 2006, Altman RB, Dunker AK, Hunter L, Murray TA, Klein TE, (Eds.). World Scientific, 2006, 200-211.

[8] Keet, C.M. Factors affecting ontology development in ecology. Data Integration in the Life Sciences 2005 (DILS’05), Ludaescher, B, Raschid, L. (eds.). San Diego, USA, 20-22 July 2005. Lecture Notes in Bioinformatics LNBI 3615, Springer Verlag, 2005. pp46-62.

[9] Dagobert Soergel, Boris Lauser, Anita Liang, Frehiwot Fisseha, Johannes Keizer and Stephen Katz. Reengineering thesauri for new applications: the AGROVOC example. Journal of Digital Information 4(4) (2004)

[10] Maria Angela Biasiotti, Meritxell Fernández-Barrera. Enriching Thesauri with Ontological Information: Eurovoc Thesaurus and DALOS Domain Ontology of Consumer law. Proceedings of the Third Workshop on Legal Ontologies and Artificial Intelligence Techniques (LOAIT 2009). Barcelona, Spain, June 8, 2009.

Note: references 1, 2, 5, 8 are mandatory reading, 3 and 4 are strongly recommended to read at least in part, and 6, 7, 9, and 10 are optional.

Lecture notes: lecture 3 – Top-down and lecture 4 – Bottom-up

Course webpage

72010 SemWebTech lecture 1+2: the Web Ontology Languages

The previous lectures by Jos de Bruijn covered various W3C standards of Semantic Web languages, which we shall continue with this week. More precisely, we will look at the Web Ontology Languages—not just one standard like previous years, but two: the successor of OWL, OWL 2, has become a W3C recommendation about 2 weeks ago and the tools you will use in the lab already can cope with OWL 2 (to a greater or lesser extent).

To put the languages in their right location, you can have a look at the common layer cake picture on the right (for the beginner in ‘modelling knowledge’, there is also an OWL primer):

sw-stack

The Semantic Web 'layer cake'

RDFS has some problems to function as ontology language—expressive limitations, syntactic and semantic issues—which is what OWL aimed to address so as to provide a comprehensive ontology language for the Semantic Web. In addition to the RDFS issues, several predecessors to OWL had a considerable influence on the final product, most notably SHOE, DAML-ONT, OIL, and DAML+OIL, and, more generally, the fruits of 20 years of research and prototyping of automated reasoners by the Description Logics (DL) community. What makes OWL a Semantic Web language compared to the DL languages, is that OWL uses URI references as names (like used in RDF, e.g., http://www.w3.org/2002/07/owl#Thing), OWL gathers information into ontologies stored as documents written in RDF/XML including things like owl:imports, and it adds RDF data types and XML schema data types for the ranges of data properties (attributes).

In addition, there was a document of requirements and objectives that such an ontology langue for the Semantic Web should meet, which we shall address during the lecture. The “making of an ontology language” article [1] gives a general historical view and summary of OWL with its three species (OWL lite, OWL-DL, and OWL full) and the details of the standard can be found here, which will be summarized in the lecture. It does not, however, give you a clear definition of what an ontology is (other than a DL knowledge base)—this is a topic we shall address later in the course—but instead gives some directions on they purposes it can be used for.

Over the past 5 years, OWL has been used across subject domains, but in particular in the heath care and life sciences disciplines, perhaps also thanks to the W3C interest group dedicated to this topic (the HCLS IG). Experimentation with the standard, however, revealed expected as well as unexpected shortcomings in addition to the ideas mentioned in the “Future extensions” section of [1] so that a successor to OWL was deemed to be of value. Work towards a standardization of an OWL 2 took shape after the OWL Experiences and Directions workshop in 2007 and a final draft was ready by late 2008. On October 27 2009 it has become the official OWL 2 W3C recommendation.

What does this OWL 2 consists of, and what does it fix? This will be the topic of the second lecture on 17 November. The language aims to address (to a greater or lesser extent) the issues described in section 2 of [2], which is not quite a super- or subset of [1]’s ideas for possible extensions. For instance, the consideration to perhaps cater for the Unique Name Assumption did not make it into OWL 2, despite that it has quite an effect on the complexity of a language [3].

First, you can see the ‘orchestration’ of the various aspects of OWL 2 in the figure below. The top section indicates several syntaxes that can be used to serialize the ontology, with RDF/XML being required and the other four optional. Then there are mappings between an OWL ontology and RDF graph in the middle, and the lower half depicts that there is both a direct semantics (“OWL 2 DL”) and an RDF-based one (“OWL 2 full”).

OWL2-structure2-800

The structure of OWL 2

Second, the OWL 2 DL species is based on SROIQ(D) [4], which is more expressive than the base language of OWL-DL (SHOIN(D)) and therewith meeting some of the ontology modellers’ requests, such as a few more properties of properties (the second role chaining example in [2] is wrong, though, which will be discussed in lecture 6) and qualified number restrictions. Then there is cleaner support for annotations, debatable (from an ontological perspective, that is) punning for metamodelling, and a deceptive ‘key’ functionality that is not a key in the common and traditional sense of keys in conceptual models and databases. Also, it irons out some difficulties that tool implementers had with the syntaxes of OWL and makes importing ontologies more transparent. To name but a few things.

Third, in addition to the OWL 2 DL version, there are three OWL 2 profiles, which are, strictly speaking, sub-languages of (syntactic restrictions on) OWL 2 DL so as to cater for different purposes of ontology usage in applications. This choice has its consequences that very well can, but may not necessarily, turn out to be a positive one; we shall dwell on this point in lecture 5 about ontology engineering methodologies. The three profiles are OWL 2 EL that is based on the EL++ language [5], OWL 2 QL that is based on the DL-LiteR language [6], and OWL 2 RL that is inspired by Description Logic Programs and pD [7], which are intended for use with large relatively simple type-level ontologies, querying large amounts of instances through the ontology, and ontologies with rules and data in the form of RDF triples, respectively. Like with OWL 2 DL, each of these languages has automated reasoners tailored to the language so as to achieve the best performance for the application scenario. Indirectly, the notion of the profiles and automated reasoners says you cannot have it all together in one language and expect to have good performance with your ontology and ontology-driven information system. Such is life with the limitations of computers (why this is so, is taught in the theory of computing course), but one can achieve quite impressive results with the languages and its tools that are practically not really doable with paper-based manual efforts or OWL 2 DL.

During this second lecture, we discuss the OWL 2 features and profiles more comprehensively and provide context to the rationale how and why things ended up the way they did.

References

[1] Ian Horrocks, Peter F. Patel-Schneider, and Frank van Harmelen. From SHIQ and RDF to OWL: The making of a web ontology language. Journal of Web Semantics, 1(1):7, 2003.

[2] B. Cuenca Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel-Schneider, and U. Sattler. OWL 2: The next step for OWL. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 6(4):309-322, 2008. (or click here)

[3] Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. DL-Lite without the unique name assumption. In Proc. of the 22nd Int. Workshop on Description Logic (DL 2009), volume 477 of CEUR Electronic Workshop Proceedings, http://ceur-ws.org/, 2009.

[4] I. Horrocks, O. Kutz, and U. Sattler. The Even More Irresistible SROIQ. In Proc. of the 10th International Conference of Knowledge Representation and Reasoning (KR-2006), Lake District UK, 2006.

[5] Franz Baader, Sebastian Brandt, and Carsten Lutz . Pushing the EL Envelope. In Proc. of the 19th Joint Int. Conf. on Artificial Intelligence (IJCAI 2005), 2005.

[6] Diego Calvanese, Giuseppe de Giacomo, Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati. Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family. J. of Automated Reasoning 39(3):385-429, 2007.

[7] Benjamin N. Grosof, Ian Horrocks, Raphael Volz, and Stefan Decker. Description Logic Programs: Combining Logic Programs with Description Logic. in Proc. of the 12th Int. World Wide Web Conference (WWW 2003), Budapest, Hungary, 2003. pp. 48-57.

Note: references 1-2 are mandatory reading, 3-7 are optional.

Lecture notes: lecture 1 – OWL, lecture 2 – OWL 2

Course webpage

Semantic Web Technologies course part 2: start of an experiment

At the start of the Semantic Web Technologies (SWT) course, Jos de Bruijn has introduced you to Web 2.0, among other things, which we are going to put to use. In addition to the fairly common static SWT course syllabus page and the Moodle course page (which is not being used interactively although it has this functionality, once logged in), I will use my blog to add some general lecture comments. Perhaps they may not turn out to be as detailed and well-visited as the extensive lecture notes of Fields Medal winner Terry Tao’s lecture notes, but I will give it a try, also because Semantic Web course blogs are few and far between (see also a list of Semantic Weblogs).

All posts written for this course will have the category “72010-SemWebTech”, so if you do not wish to be informed about other things I write about, you can subscribe to that category only (information on subscribing can be found here).

You are welcome to add questions and comments, including links to other sources you come across and think may be relevant for the lecture topic, add links to other Semantic Web blogs or blog posts you come across, and so forth.

Outline of the course

The second part of the SWT course consists of three blocks.

The first block (next week) is a ‘leftover’ of the first part where we will go through yet another language—well, set of languages—of the layer cake: OWL.

As any cake, it is to be eaten. Put differently: the various, relatively isolated, aspects of the languages you have learned in the first part are to be put together to make the technologies work so as to realize the Semantic Web. The second and third block aim to provide more theory, integrative approaches, several applications, and issues applying Semantic Web Technologies to get you on the way to do that.

The second block looks into ontology engineering: having, and knowing about, several ontology languages is one thing, developing an ontology is quite a different story. We will go though various extant approaches—paradigms, formalisms, tools, methods, and methodologies—and three ‘gaps’ where much remains to be done to bring the research outcomes into the mainstream implementation stage of SWT, being parts, temporal aspects, and managing uncertainty & vagueness.

The third block is application-oriented, where we look into the Semantic Web for the Life Sciences. Researchers and practitioners in Life Sciences and Health Care are, to date, the most active and prolific adopters of SWT—by a large margin. Why do they use SWT, to achieve which goals? What do they do with it, and how are successful solutions that use SWT realised? The five lectures dedicated to this topic will give you a general overview and a few implementations in detail.

Last, but not least

There is only so much (little) material one can cram into a course, and you are advised to read as much as possible. At the moment there are only few books that cover the course topics, which are not textbooks. The disadvantage is that you will have to read quite broadly and critically; the advantages are that you will become acquainted with some of the most recent research in the area and be at the forefront of SWT developments.

—————

p.s.: In case you are not enrolled at FUB and thus have missed the first part, you can download last year’s slides of the 4 ECTS-course version—what is now mostly the first part of this new 8 ECTS credit course—here.

p.p.s: Slides will be made available online after the lecture.

Computational, and other, problems for genomics of emerging infectious diseases

PLoS has published a cross-journal special collection on Genomics of Emerging Infectious Diseases last week. Perhaps unsurprisingly, I had a look at the article about limitations of and challenges for the computational resources by Berglund, Nystedt, and Andersson [1]. It reads quite like the one about computational problems for metagenomics I wrote about earlier (here and here), but they have a somewhat curious request in the closing section of the paper.

Concerning the overall contents of the paper and its similarity with the computational aspects of metagenomics, the computational aspects of complete genome assembly is still not quite sorted out fully, and in particular the need “for better ways to integrate data from diverse sources, including shotgun sequencing, paired-end sequencing, [and more]…” and a quality scoring standard. The other major one is the recurring topic is the hard work of annotation to give meaning to the data. Then there are the requests for better, and 3D, visualizations of the data and the cross-granular analysis of data along the omics trail and at different scales.

Limitations and challenges that are more specific to this subject domain are the classification and risk assessments for the emergence of novel infectious strains and risk prediction software for disease outbreaks. In addition, they put a higher importance put on the request for supporting tools to figure out the evolutionary aspects of the sequences and how the pieces of DNA have recombined, including how and from where they have horizontally transferred.

In the closing section, the authors reiterate that

To achieve these goals, investments in user-friendly software and improved visualization tools, along with excellent expertise in computational biology, will be of utmost importance.

I fancy the thought that our WONDER system for the HGT-DB for, in particular, graphical querying meets the first requirement—at least as proof of concept that one can construct an arbitrary query graphically using an ontology and all that in a web browser. Having said that, I am also aware of the authors’ complaint that

Currently, the slow transition from a scientific in-house program to the distribution of a stable and efficient software package is a major bottleneck in scientific knowledge sharing, preventing efficient progress in all areas of computational biology. Efforts to design, share, and improve software must receive increased funding, practical support, and, not the least, scientific impact.

Yes, like most bioinformatics tools and proof-of-concept and prototype software that comes from academia, it is not industry-grade software. To get the latter, companies have to do some more ‘shopping’ around and investment into it, i.e., monitor the latest engineering developments that demonstrate working theory presented at conferences, take up the ones they deem interesting, and transform it into a stable tool. We—be it here at FUB or almost any other university—do not have an army of experienced programmers, not only because we do not have the financial resources to pay programmers (cf. researchers with whom more scientific brownie-points can be scored) but, moreover, a computing department is not a cheap software house. The authors’ demand for more funding for software development to program cuter and more stable software would kill computing and engineering research at the faculty if the extra funding would not be real extra funding on top of existing budgets. The reality these days, is that many universities face cuts in funding. Go figure where that leaves the authors’ request. The complaint may have been more appropriate and effective when the authors would have voiced it in an industry journal.

The last part of the quote, receiving increased scientific impact, seems to me a difficult one. Descriptions of proof-of-concept and prototype software to experimentally validate the implementability of a theory can find a home in a scientific publication outlet, but a paper that tells the reader the authors have made a tool more stable is not reporting on research results and it does not bring us any new knowledge, does not answer a research question, does not solve an hitherto unsolved problem, does not confirm/refute a hypothesis. Why should—“must” in the authors’ words—improved, more usable and more stable, software receive scientific impact? Stable and freely available tools have an impact on doing science and some tasks would be nigh on undoable without them, but this does not imply such tools are part and parcel of the scientific discovery. One does not include in the “scientific impact” the Petri dish vendor, PCR machine developers, or Oracle 10g development team either. There are different activities with different scopes, goals, outcomes, and reward mechanisms; and that’s fine. Proposing to offer companies some fairly difficult to determine scientific-impact-brownie-points may not be the most effective way to motivate them to develop tools for science—getting across the possibility to make profit in the medium- to long term and to do something of societal relevance may well be a better motivator.

References

[1] Berglund EC, Nystedt B, Andersson SGE (2009) Computational Resources in Infectious Disease: Limitations and Challenges. PLoS Comput Biol 5(10): e1000481. doi:10.1371/journal.pcbi.1000481