Did you ever not want to bother knowing how the data is stored in a database, but simply want to know what kind of things are stored in the database at, say, the conceptual or ontological layer of knowledge? And did you ever not want to bother writing queries in SQL or SPARQL, but have a graphical point-and-click interface with which you can compose a query using that what layer of knowledge and that the system generates automatically the SQL/SPARQL query for you, in the correct syntax? And all that not with a downloaded desktop application but in a Web browser?
Our domain experts in genetics as well as in healthcare informatics, at least, wanted that. We have designed and implemented it now [1], which we have enthusiastically named Web ONtology mediateD Extraction of Relational data (WONDER). Moreover, we have a working system for the use case about the 4GB horizontal gene transfer database [2] and its corresponding ‘application ontology’. (pdf)
Subscribers to this blog might remember I mentioned a that we were working towards this goal, using Ontology-Based Data Access tools to access a database through an ontology and learning from (and elaborating on) its preliminary case studies [3]. In short, we added a usability extension to the OBDA implementations so that not only savvy Semantic Web engineers can use it, but also—actually, moreover—that the domain experts who want to get information from their database(s) can do so. By building upon the OBDA framework [4], we can avail of its solid formal foundations; that is, WONDER is not merely a software application, but there is a logic-based representation behind both the graphics in the ontology browser and the query pane.
In addition, WONDER is scalable because the ontology language (roughly: OWL 2 QL) is ‘simple’. Yes, we had to drop a few things from the original ORM conceptual model, but they have—at least for our case study—no effect on querying the data. The ‘difficult’ constraints are (and generally: should be anyway) implemented in the database, so there will be no instances violating the constraints we had to drop. Trade-offs, indeed, but now one can use an ontology to access a large database over the Web and retrieve the results quickly.
For instance, take the query “For the Firmicutes, retrieve the organisms and their genes that have a GCtotal contents higher than 60”, which is for various reasons not possible through the current web interface of the source database.
Fig.1 shows the ontology pane with three relevant elements selected. (click on the figures to enlarge)
Fig.2 shows the constrained adder, where I’m adding that the GCValue has to be > 60.
Fig.3 shows the query ready for execution: the attributes with a green border are those that will appear in the query answer (I could have selected all, if I wanted to). In the menu bar on the right you can see I have customized the names of the attributes, so that the columns in the results pane will have a query-relevant name in your preferred language (not necessary to do), as well as the automatically generated query.
Fig.4 shows a section of the results of the first page and Fig.5 of the second page; the “Family” column that has all the Firmicutes (out of about 500 organisms in the database) gives you the whole section of the species tree, because that is how the taxonomy information is stored in the database (refining the database is a separate topic). Alternatively, I could have selected the organism Name from the ontology browser (see Fig.1), de-selected the taxonomic classification in the query pane, and included the Name of the organism in the query answer to have the species name only but not all the taxonomic information; in this case, I wanted to have all that taxonomy information. The genes are the relevant selection (made with the other constraints) out of about the 2 million genes that are stored in the database.
There is also a constraint manager for the AND, OR, NOT and nesting. For instance, for the query “Give me the names of the organisms of which the abbreviation starts with a b, but not being a Bacillus, and the prediction and KEGG code of those organisms’ genes that are putatively either horizontally transferred or highly expressed” (Fig.6), we have the constraint manager as shown in Fig.7.
You can also save and load queries when you’re logged in, and download the results set in any case.
For those who want to play with it: feel free to drop me a line and I will send you the URL. (The reason for not linking the URL here is that the current URL is still for the beta version, whereas the operational one is expected to have a more stable URL soon.)
Last, but not least, the “we” I used in the previous sentences is not some ‘standard writing in plural’, but several people were involved in various ways to realize the WONDER system. In alphabetical order, they are: Diego Calvanese, Marijke Keet, Werner Nutt, Mariano Rodriguez-Muro, and Giorgio Stefanoni, all at FUB. I also want to thank our domain experts of the case study (with whom we’re writing a bio-oriented paper): Santi Garcia-Vallvé (with the Evolutionary Genomics Group, ‘Rovira i Virgilli’ University, Tarragona, Spain) and Mark van Passel (with the Laboratory for Microbiology, Wageningen University and Research Centre, the Netherlands).
References
[1] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland.
[2] Garcia-Vallve, S, Guzman, E., Montero, MA. and Romeu, A. 2003. HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Research 31: 187-189.
[3] R. Alberts, D. Calvanese, G. De Giacomo, A. Gerber, M. Horridge, A. Kaplunova, C. M. Keet, D. Lembo, M. Lenzerini, M. Milicic, R. Moeller, M. Rodríguez-Muro, R. Rosati, U. Sattler, B. Suntisrivaraporn, G. Stefanoni, A.-Y. Turhan, S. Wandelt, M. Wessel. Analysis of Test Results on Usage Scenarios. Deliverable TONES-D27 v1.0, Oct. 10 2008.
[4] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Mariano Rodriguez-Muro, and Riccardo Rosati. Ontologies and databases: The DL-Lite approach. In Sergio Tessaris and Enrico Franconi, editors, Semantic Technologies for Informations Systems – 5th Int. Reasoning Web Summer School (RW 2009), volume 5689 of Lecture Notes in Computer Science, pages 255-356. Springer, 2009.