Linking data to ontologies and, as natural next step, Ontology-Based Data Access (OBDA)—i.e., using an ontology to mediate access to data, such as querying the data through an ontology—is one of those requirements from the field that took some time to address theoretically, but now the first working prototypes are available. In particular, there is now a combination of an ODBA-enabled reasoner (DIG-Mastro) at the back-end and an ODBA-plugin for Protégé to also have an editor front-end together in one coherent solution. The tools and latest accompanying papers [1,2,3] are in print and online accessible. For those of you who will go to OWLED 2008 or IIMAS’08 this week, you can see it working during the demo sessions, where they take the LUBM benchmark ontology together with the La Sapienza university database (27 tables and 250000 tuples) and two scenarios are worked out with the MiFID ontology and customer business processes.
The remainder of this post is a bit of a marketing exercise about it: the DIG-Mastro and OBDA-plugin for Protégé were developed in a collaboration between members of the KRDB group here at UniBz and the Romans from the DIS at “La Sapienza” university.
On the motivation side, the advantages of OBDA are that the ontology provides a semantic view of the application domain (as opposed to the gory details of the data), constraints expressed in the ontology can fix some incompleteness that tends to be present in especially legacy databases, and, in principle, it can provide the single-view for multiple databases underneath.
The engineers among you are probably well aware that an OWL-DL/OWL 1.1 type-level ontology in Protégé does not scale well if one wants to reason over it, let alone link it to data too to do, e.g., automated instance classification. In order to allow for a scalable system, the DL-lite family of Description Logic languages [4] was developed. Of this family, DL-liteA is used for the implementation (DL-LiteA is LogSpace in data complexity, just as efficient relational databases). The language’s features, i.e. what kind of things you can model, are described in [3,4] and summarized and compared with other ontology languages in [5] in table 1, which is almost the same as can be done with standard UML class diagrams and ER. In contrast to the more expressive DL-based ontology languages and accompanying reasoners, the DIG-Masto actually can deal with unions of conjunctive queries (UCQ) over large data sources and still have efficient reasoning.
A far from trivial issue is the question of how to link the data to the ontologies; the theoretical details can be found in [3]. The mappings do not look very nice for complex mappings (see fig.7 in [1] compared to the readable mappings in fig.1 in [2]), but the OBDA-plugin makes it a lot easier to make them—automation of this procedure is in the pipeline [6,7]—and once the GLAV mappings are defined, you can simply reuse them as often as you want. In short, the plugin allows you to describe the data sources, the mappings, send these descriptions to an OBDA-enabled reasoner, issue OBDA-specific queries, and view the results in the GUI. And yes, I’ve seen it working.
Here are two screenshots of part of the GUI in Protégé (copied from [2]), where the first shows RDBMS-to-Ontology mappings, and the second one a UCQ issued to the DIG-Mastro with query and results manageable through the OBDA-plugin (click to enlarge).
What else do you want? 🙂
[1] Mariano Rodriguez-Muro, Lina Lubyte, and Diego Calvanese. Realizing Ontology Based Data Access: A plug-in for Protégé. In Proc. of the Workshop on Information Integration Methods, Architectures, and Systems (IIMAS 2008 ), 2008. Cancun, Mexico.
[2] Antonella Poggi, Mariano Rodriguez-Muro, and Marco Ruzzi. Ontology-based database access with DIG-Mastro and the OBDA Plugin for Protégé (Demo). In Proceeding of the Workshop OWLED 2008. Washington DC, USA, 1-2 April 2008.
[3] Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. Linking Ontologies to Data. Journal on Data Semantics. X: 133-173, 2008.
[4] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning, 2007, 39(3):385-429.
[5] C. Maria Keet and Mariano Rodriguez. Toward using biomedical ontologies: trade-offs between ontology languages. AAAI 2007 Workshop Semantic eScience (SeS 2007). 23 July 2007, Vancouver, Canada.
[6] Lina Lubyte and Sergio Tessaris. Extracting ontologies from relational databases. Proceedings of the 20th International workshop on Description Logics (DL’07). Bressanone, Italy. CEUR-WS Vol-250, 387-394.
[7] L. Lubyte, S. Tessaris. Supporting the Design of Ontologies for Data Access. In Proc. of the 21st International Workshop on Description Logics (DL 2008 ). To appear.