Tools to access data through an ontology

Linking data to ontologies and, as natural next step, Ontology-Based Data Access (OBDA)—i.e., using an ontology to mediate access to data, such as querying the data through an ontology—is one of those requirements from the field that took some time to address theoretically, but now the first working prototypes are available. In particular, there is now a combination of an ODBA-enabled reasoner (DIG-Mastro) at the back-end and an ODBA-plugin for Protégé to also have an editor front-end together in one coherent solution. The tools and latest accompanying papers [1,2,3] are in print and online accessible. For those of you who will go to OWLED 2008 or IIMAS’08 this week, you can see it working during the demo sessions, where they take the LUBM benchmark ontology together with the La Sapienza university database (27 tables and 250000 tuples) and two scenarios are worked out with the MiFID ontology and customer business processes.

The remainder of this post is a bit of a marketing exercise about it: the DIG-Mastro and OBDA-plugin for Protégé were developed in a collaboration between members of the KRDB group here at UniBz and the Romans from the DIS at “La Sapienza” university.

On the motivation side, the advantages of OBDA are that the ontology provides a semantic view of the application domain (as opposed to the gory details of the data), constraints expressed in the ontology can fix some incompleteness that tends to be present in especially legacy databases, and, in principle, it can provide the single-view for multiple databases underneath.

The engineers among you are probably well aware that an OWL-DL/OWL 1.1 type-level ontology in Protégé does not scale well if one wants to reason over it, let alone link it to data too to do, e.g., automated instance classification. In order to allow for a scalable system, the DL-lite family of Description Logic languages [4] was developed. Of this family, DL-liteA is used for the implementation (DL-LiteA is LogSpace in data complexity, just as efficient relational databases). The language’s features, i.e. what kind of things you can model, are described in [3,4] and summarized and compared with other ontology languages in [5] in table 1, which is almost the same as can be done with standard UML class diagrams and ER. In contrast to the more expressive DL-based ontology languages and accompanying reasoners, the DIG-Masto actually can deal with unions of conjunctive queries (UCQ) over large data sources and still have efficient reasoning.

A far from trivial issue is the question of how to link the data to the ontologies; the theoretical details can be found in [3]. The mappings do not look very nice for complex mappings (see fig.7 in [1] compared to the readable mappings in fig.1 in [2]), but the OBDA-plugin makes it a lot easier to make them—automation of this procedure is in the pipeline [6,7]—and once the GLAV mappings are defined, you can simply reuse them as often as you want. In short, the plugin allows you to describe the data sources, the mappings, send these descriptions to an OBDA-enabled reasoner, issue OBDA-specific queries, and view the results in the GUI. And yes, I’ve seen it working.

Here are two screenshots of part of the GUI in Protégé (copied from [2]), where the first shows RDBMS-to-Ontology mappings, and the second one a UCQ issued to the DIG-Mastro with query and results manageable through the OBDA-plugin (click to enlarge).

RDBMS-to-Ontology mapping

SPARQL UCQ

What else do you want? 🙂

[1] Mariano Rodriguez-Muro, Lina Lubyte, and Diego Calvanese. Realizing Ontology Based Data Access: A plug-in for Protégé. In Proc. of the Workshop on Information Integration Methods, Architectures, and Systems (IIMAS 2008 ), 2008. Cancun, Mexico.

[2] Antonella Poggi, Mariano Rodriguez-Muro, and Marco Ruzzi. Ontology-based database access with DIG-Mastro and the OBDA Plugin for Protégé (Demo). In Proceeding of the Workshop OWLED 2008. Washington DC, USA, 1-2 April 2008.

[3] Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. Linking Ontologies to Data. Journal on Data Semantics. X: 133-173, 2008.

[4] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning, 2007, 39(3):385-429.

[5] C. Maria Keet and Mariano Rodriguez. Toward using biomedical ontologies: trade-offs between ontology languages. AAAI 2007 Workshop Semantic eScience (SeS 2007). 23 July 2007, Vancouver, Canada.

[6] Lina Lubyte and Sergio Tessaris. Extracting ontologies from relational databases. Proceedings of the 20th International workshop on Description Logics (DL’07). Bressanone, Italy. CEUR-WS Vol-250, 387-394.

[7] L. Lubyte, S. Tessaris. Supporting the Design of Ontologies for Data Access. In Proc. of the 21st International Workshop on Description Logics (DL 2008 ). To appear.

Advertisements

Representing the difference between mandatory and essential parts and wholes

As mentioned earlier, there is more in the pipeline about part-whole relations than only the taxonomy of types of part-whole relations and the RBox Compatibility service [1]. There are a lot of issues in representing parts, wholes, and part-whole relations—in particular in bio(medical) ontologies and conceptual data models. One of them is the distinction between the plain mandatory constraint on the participation of the part (whole) in the part-whole relation and the stronger notion of essential part (whole). Informally, they deal with representing that “the part must be part of some whole” versus “the part must be part of the same whole”. A classical example is the difference between how your heart is part of your body versus how your brain is part of your body: your heart is replaceable and as long as you have some heart in your body you’ll be fine (well, continue to exist), whereas this is different for your brain[1]. This, again, is different from parts that a whole normally has (or is supposed to have), such as two eyes and two kidneys in case of a human: without the eyes, you still can live healthily without medical intervention, whereas without the kidneys, you will die if there’s no possibility for regular dialysis—hence, there is somehow a difference in modality on the participation of the parts and wholes in the part-whole relation.

To represent this sort of difference, one can resort to adding existence and necessity [2], but also assess it along the temporal dimension. To say that a part is essential to a whole, then throughout its entire lifetime, the whole has exactly that part related through only that part-whole relation. This does not say anything about the part, though: that part might well have existed before the whole or continue to exist after the whole ceased to exist as a whole. Vice versa, if a whole is essential to the part, then that part cannot survive as is without that whole it is part of. Of course, this can be combined so that the part and the whole are mutually essential.

To represent this talk about “before”, “after”, and “during” in the setting of essential parts and wholes, one can add time t to the predicates, add an ordering over time points (chronons) t1, …, tn, and construct long formalizations to represent precisely the temporal constrains over the objects participating in the part-whole relation as well as over the part-whole relation itself. With an eye on potential for implementation, however, we chose to take the well-studied Description Logic language DLRus and its corresponding ERvt temporal conceptual data modeling language (see [3] for the latest comprehensive treatment of both) so as to capture succinctly the set of constraints for mandatory and essential parts and wholes. A rather dense, DL-readership-oriented, paper has just been accepted for DL’08 that presents this solution [4], which I’ll try to render in a brief digest-format in the following paragraph and give a few realistic examples afterward.

DLRus is an expressive temporal description logic with the Until and Since operators and can capture most of the common conceptual data modeling languages, such as n-aries, cardinality restrictions, sub-relations, disjointness, covering etc. ERvt is, roughly, EER with extra constructs for the time aspects and for each ERvt conceptual model, there is an equi-statisfiable DLRus knowledge base.

In [3] you will find explanation on inclusion of the notion of status classes (well-known in temporal information systems), where some instance o can be member of Scheduled-C, Active-C, Suspended-C, or Disabled-C, with Active-C denoting the usual class C in a conceptual model (or call it concept C in DL terminology, universal C in an OBO Foundry ontology, whichever). There is a range of implications to ensure correct behaviour of the status classes, such as if an object is member of Suspended-C then it first must have been member of C. If we entertain ourselves with a particular instance o1 of the Papilionoidae, then when o1 is member of Caterpillar, we might as well make o1 also member of the Scheduled-Butterfly class and of the Disabled-Egg class (if it is interesting to do so, is another topic). We can do the same for relations; i.e., in [4] we extend ERvt by introducing the notion of status relations (from §3 onwards, including an informal description). Applying that to the partof relation, we get Scheduled-partof, Active-partof, Suspended-partof, and Disabled-partof. For the axioms that deal with essential participation, we first have that the partof relation cannot be suspended, and subsequently add axioms to say that the lifetime of the part (or whole) either starts before that of the whole (or part) or at the same time, and if the part (whole) finishes at the same time or if the part (whole) can outlive the whole (part). Thus, there are eight combinations of the possible constraints, which are drawn in an illustrative figure as well (Fig.3): four for essential parts and four for essential wholes (theorems 1 and 2). That’s it.

With this addition of status relations, we can represent a lot more than only the distinction between mandatory and essential parts and wholes—for quite realistic information, actually. For instance, we would like to say in a medical ontology or conceptual data model intended for development of a transplant database that all transplanted hearts must have been part of some other human. Put differently, and at the instance-level for illustrative purpose, such a constraint would enforce that if a heart h1 as member of Heart is partof p2 that is member of class Human and this partof is member of Active-partof, then there must be a human p1 that is member of DisabledHuman (i.e., p1 has died, assuming that a person cannot live without having a heart) and there must be a relational instance (tuple) of partof that relates h1 and p1 that is member of Disabledpartof. For kidney transplants, we can amend this to say that p1 is member of either Human or Disabled-Human (one could have donated just one kidney). For planning purposes, we can have donors in the transplant database whose organs are scheduled to become part of another human, i.e., the parts and wholes are both in their respective active classes, but a partof relation is member of Scheduledpart of relating the organ to a prospective recipient. Further, if we drop the standard essential part (whole) to less restrictive cases so that the objects and relations may become suspended some time during their lifespan, we can keep track of, say, some car engine e1 at the car mechanic who has removed it from the car c1 for maintenance purposes, but this e1 surely is supposed to be reinstalled in that car c1. And so forth.

Now, before running off to go forth and play with, e.g., the temporalised relations in the RO [5], some of those (like derivation), as well as other options, have already been addressed in [3] under the heading of so-called “evolution constraints”. And a caveat is that the full DLRus is undecidable[2], but there’s ongoing work on temporalising the well-behaved computationally nice DL-lite and some subsets of DLRus are in Exptime (see the last section of [3] for a summary).

[1] Keet, C.M., Artale, A. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, in print.
[2] Guizzardi, G. Ontological foundations for structural conceptual models. PhD Thesis, Telematica Institute, Twente University, Enschede, the Netherlands. 2005.
[3] Artale, A., Parent, C., Spaccapietra, S. Evolving objects in temporal information systems. Annals of Mathematics and Artificial Intelligence (AMAI), 2007, 50(1-2), 5-38.
[4] Artale, A., Keet, C.M. Essential and mandatory part-whole relations in conceptual data models. 21st International Workshop on Description Logics (DL’08 ), 13-16 May 2008, Dresden, Germany.
[5] Smith, B., Ceusters, W., Klagges, B., Koehler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus, F., Rector, A.L., Rosse, C. (2005).
Relations in biomedical ontologies. Genome Biology, 2005, 6:R46.


[1] Other subtopics, such as optional parts, amount of parts, or parts that a whole should have are not further considered in [4].

[2] Who cares? At least now we know what we need to represent the distinction between mandatory and essential parts and wholes… as well as several other cases with part-wholes relations.

About insurmountable simplicities

Some reader might think I’m heading towards a write up about the seemingly insurmountable simplicities of the PhD programme, but I still think that doing a PhD amounts to coping with surmountable difficulties. The ‘insurmountable simplicities’ is part of the title of a popular philosophy book, which has the full title in English, among the eight languages it is translated into: Insurmountable simplicities—thirty-nine philosophical conundrums” by Achille Varzi and Roberto Casati. I just finished reading the 39 short stories and dialogues spread over 129 pages, and I can highly recommend it to anyone. It is written in a way that is easily accessible to the general public, yet the stories cover a wide range of philosophical puzzles that make you both laugh and, moreover, think. Else, if you have no life but occasionally have to socialize and don not know of anything else to talk about than to bore your conversation partner with your thesis topic/work, then any of the 39 stories will do to get a conversation going. I will summarize and comment some of them below; “Zombie Inc.”, “Partial Amnesia”, “Person transplant” and “My ice cream, your ice cream” are available online for free as appetizers.

The dialogue “Person transplant” has a man walking into a transplant clinic asking for a new brain. As donor, he can make his brain available to anyone interested and it costs him $10k but as receiver requesting a new brain he can get $10k from the clinic… or so begins the dialogue. Put differently: brain donor versus body receiver; i.e., is your brain you with a disposable body or your body you and your brain just like any organ that, at least in theory, could be transplanted like you heart, kidneys and so forth? And, by the way, is it really an either-or case? Staying for a bit with dialogues about medicine, there are some complications with the placebo effect, where a customer in a drug store asks for a placebo against his headache. After all, it has been shown that it works, so one might well ask for a little starch pill, which, of course, defeats the purpose. So, how to administer a placebo that is both effective and ethically correct (as the pharmacist cannot give a non-medicine knowing that better is readily available)?

In “The traveler’s pictionary”, a word may be worth a thousand pictures. Instead of going on holiday with a dictionary, the travel agent offers the traveller to Siberia a pictionary, so that she can point to the pictures instead of messing with Russian vocabulary and grammar. The pictionary has only pictures of things that can be depicted, such as for ‘buying’ (not uncontroversial) and a picture for ‘bicycle’, but can things like ‘wisdom’ or ‘inflation’ be drawn, or the negation of doing something? Moreover, and where the recurring personage “the meddler” chimes in, “a picture is itself something that requires an interpretation. And if a picture requires an interpretation, bringing it to mind can hardly help” (with a nudge to Wittgenstein). A practical example that many a biologist/bioinformatician has come across, is the derogatory term “[useless/informal/underspecified] cartoon” that computer scientists and software engineers regularly use for the very clear and explanatory colourful diagrams in biology textbooks; but then, they haven’t gotten the training in how to read such figures…

Prisoner K.J., the director of the penitentiary, the medical officer, and the Smiths are involved in a correspondence about that K.J. can neither recall the crimes he is convicted of nor the date of imprisonment due to irreversible amnesia (“Partial amnesia”). Should he be informed about it? He found out and considers himself responsible for the act he cannot remember. But given that he cannot remember it, does that affect one’s personal identity and if so, is he then really responsible for those crimes? The most interesting bit comes at the end though, with a note from the state legal office. The story does take for granted one knows the main principle of putting people in jail as punishment for having committed a crime (deny the right of free movement, reflect on the crime, learn from it so that recidivism does not occur upon release). This obviously does not include revoking the right to vote, nor for the effect as the George Jung character in the movie Blow said cynically about his first experience serving time in jail: that he went in with a bachelors in marihuana and got out with a PhD in cocaine. To name just a few ‘collateral effects’ of prison systems in several countries; but I’ll leave that for another post sometime because it has little to do with philosophy (or has it?).

Last, there is also a section with entertaining logic, such as “Interesting!”, although, of course, not everything can be interesting, for—in the case of the dialogue in the bookstore about intrinsically interesting books—“if all books are interesting, and if being interesting requires some original feature, then relative to the property of being interesting, all books would appear to be uninteresting. Which is to say: boring.” Casati and Varzi’s book is far from boring and contains many other stories covering, among others, causality, paradoxes of time and space, the notion of choice, and chance, which are narrated in settings ranging from birthdays for entering the museum for free, reducing majority voting to one person, playing lotto in reverse, to useless project proposals.

p.s.: Varzi’s publication page here has the links for all the languages the ‘insurmountable simplicities’ is translated in.