Live from ISWC 2008 in Karlsruhe

It is already the last day of ISWC’08, which had some really good papers, comments from the attendees during the sessions, and ample ambience for networking. I will discuss the keynote speeches first, then mention a few research papers, and close with a few general remarks.

Ramesh Jain gave a good keynote speech on semantic multimedia searches—or: the lack thereof and how to bridge the semantic gap between mere images and the meaning we attribute to them so that we can find the right multimedia in the sea of images, video, etc., perhaps by what he denoted as the “Event Web” as multimedia items are ‘snapshots’ of larger events that give context, and meaning, to those multimedia items. In addition to the extant ontologies, such as LSCOM, he is developing an ontology for events so as to better annotate the items and, consequently, obtain better search results. John Giannandrea’s keynote on Freebase on the other hand, can indeed be summarized by the quote from Babbage he gave: “errors using inadequate data are much less than those using no data at all”. While obviously the wisdom of the crowds and domain expert input for building knowledge bases is a laudable idea and has achieved remarkable successes toward the proverbial “80%”—but it is the remaining “20%” that is the hard part to take it from a ‘web 2’ version to a `web 3’ version of semantic searches (cf. string matching) to retrieve the right set of answers instead of a sea of links, software agent collaboration to plan your trip based on your requirements, and so forth. To take an entertaining example from another knowledge base, SNOMED CT, which is adopted in several countries: while Stefan Schulz and I were searching for suspended concepts and relations (suspended sensu [1]), we came across a congenital absence of one tooth that is a subtype (is a in SNOMED CT) of congenital absence of mouth, of jaw, and of alimentary tract… never mind that acquired absence is a body structure, and the concoction of previous known suicide attempt that throws together temporal, epistemic, and intentional notions into one concept.
The third keynote speech was by Stefan Decker from DERI, and rather provokingly about “how to save the Semantic Web?”. Based on an analysis of the successes of physics, he identified five points: (i) appealing unified message, (ii) credibility, (iii) concerted lobbying efforts, (iv) potentially transformational power, and (v) doable agenda for successes. His answers for AI in general and Semantic Web in particular are, respectively: yes-?-yes-yes-no and no-?-yes-yes-yes. In addition, his vision for the Semantic Web is to aim for a network of knowledge and collaborative problem solving and recollecting that the Semantic Web is, ultimately, for humans. However, part of the latter point was that he dismissed (well, ridiculed in a not so entertaining way) the required theoretical foundations, which annoyed quite a few people in the audience. During the break afterwards, one put forward that it is precisely because of theoretical foundations that physics continues to do well. After all, building tools on quicksand—compared to fundaments on solid ground—is not sustainable in the long run. Surely, the human and engineering components should, will, and gradually already are receiving more attention as the topics of the papers attest, be it here or ESWC and emerging workshops about them; e.g. there was a session on user interfaces and one semantic social networks. On the other hand, is the “semantic desktop” that Decker proposes really a sexy “appealing unified message”? Surely we can—and do—do more, be it to, from a end-user perspective, facilitate bioscientists in their research or focus on goals to streamline public administration and open up and enhance e-learning, to name just three sub-areas.

Of the presented papers, several were more detailed or improved versions of earlier works, such as the one about testing with probabilistic reasoner pronto using P-mathcal{SHIQ}(D) (see here), approximating RCC in OWL [4] and details about how IBM managed to have the SHER scalable reasoner for expressive ontologies (represented in the mathcal{SHIN} DL language) SHER [3] of which earlier work had been presented or discussed during OWLED’07 and this year’s introduction of Anatomylens as real application. SHER achieves scalability via summarization of the ABox and filtering. The RCC & OWL paper [4] seeks to solve the problem of performing spatio-thematic queries by approximating RCC8 (the full-blown version cannot be fully represented in OWL) and use that for consistency checking w.r.t. assertions in the ABox.

Putting data types in an ontology is from a formal ontology (and, eventually database and ontology interoperation) perspective problematic, but many developers seem to want to have them (treating an ontology as if it were a formal conceptual data model) and better than currently possible in OWL. For those who want more of it: your requests have been heard, and with data types in OWL 2, you will be able to state, e.g. geq_5 land leq_{10} , name data ranges, it redefines XSD numeric data types, rdf:text is added as well as date/time, and there will be a data type checker [6].
A nice feature that even I have used during development of the ADOLENA ontology, are the semantic explanations for the deductions (originally in SWOOP, and later also in, e.g., in Protégé 4 where after classifying, one clicks on the “?” that appears with the inconsistent and inferred classes). More precisely, Matthew Horridge presented the work on laconic and precise justifications [2], which has been nominated for the best paper award. Their work enhances the way how explanations are computed and what information about it the justifications is needed so as to give only the minimal required information for repair; put differently: toward minimizing the haystack where to find the needle to fix your ontology.

Several presentations are, or will be, made available on Video lectures.

Last, some indication of where the semantic web still has to go to, just a tiny practical example: the conference site called for tagging blog posts with iswc2008 or ISWC 2008, and if you click their link to do a Google blog search you are supposed to get a long list. But it does not. In fact, their defined Google blog search searches on iswc2008 or “iswc 2008” that does not work when I had it in the text of two posts—well, I had used ISWC’08 and that particular permutation of semantically the same thing was not in the pre-defined search term. Even after changing it on 28-10-2008, it still has not been recognized. A non-blog web search does return lots of hits. Not that I want to insist having my two-seconds fame on the ISWC website as one of the results, but something like that simply should work by now, or ought to… I will add both their desired tags this time, and let’s see what happens. UPDATE: the tagging worked, so there are just few bloggers who bother with the manual tagging, it seems…

Overall, it was an entertaining and very interesting conference, with—from a research perspective—both encouraging results and plenty of topics for further research.

[1] Artale, A., Guarino, N., and Keet, C.M. Formalising temporal constraints on part-whole relations. 11th International Conference on Principles of Knowledge Representation and Reasoning (KR’08). Gerhard Brewka, Jerome Lang (Eds.) AAAI Press. Sydney, Australia, September 16-19, 2008.
[2] M. Horridge, B. Parsia, U. Sattler. Laconic and Precise Justifications in OWL. Proc. Of ISWC’08, 28-30 Oct. 2008, Karlsruhe, Germany.
[3] Julian Dolby, Achille Fokoue, Aditya Kalyanpur, Li Ma, Edith Schonberg, Kavitha Srinivas, and Xingzhi Sun. Scalable Conjunctive Query Evaluation Over Large and Expressive Knowledge Bases. Proc. Of ISWC’08, 28-30 Oct. 2008, Karlsruhe, Germany.
[4] Rolf Grütter, Thomas Scharrenbach, and Bettina Bauer-Messmer. Improving an RCC-Derived Geospatial Approximation by OWL Axioms.
[5] Boris Motik and Ian Horrocks. OWL datatypes: design and implementation. Proc. Of ISWC’08, 28-30 Oct. 2008, Karlsruhe, Germany.

OWLED’08 in brief

Unlike the report of last year’s OWLED and DL, this one about OWLED’08 in Karlsruhe (co-located with ISWC 2008) will be a bit shorter, because few papers were online before the workshop so there was little to prepare for it properly and during the first day the internet access was limited. However, all OWLED’08 papers are online now here, freely available to read at your leisure.

There were two user experiences sessions, one on OWL tools, one on reasoners (including our paper on OBDA for a web portal using the Protégé plugin and the DIG-QuOnto reasoner), one on quantities and infrastructure, and one on OWL extensions. In addition, there was a ‘break out session’ and a panel discussion.

The notion of extensions for “database-esque features” featured prominently among the wide range of topics; for instance, the so-called easy keys [1] that will be added to the OWL 2 spec and dealing with integrity constraints in the light of instance data [2,3]. An orthogonal issue—well, an official Semantic Web challenge—is the “billion triples dataset”, which is alike a very scaled up LUBM benchmark. Other application-motivated issues came afore were about modeling and reasoning with data types, and in particular quantities and units of measurements, probabilistic information, and calculations.
The presentation of the paper about probabilistic modelling with P-\mathcal{SHIQ}(D) [4] generated quite a bit of discussion, both during the question session as well as the social dinner. The last word has not been said about it yet, perhaps also because of the example with the breast cancer ontology and representing relative risk factors and probabilities for developing cancer in general and that in conjunction with instance data. The paper’s intention is to give a non-technical user-oriented introduction to start experimenting with it.

The 1.5hr panel discussion focused on the question: What are the criteria for determining when OWL has succeeded? Or, with a negative viewpoint: what does/can/could make it to fail? Given the diversity of the panelists and attendees, and of the differences in goals, there were many opinions voiced on both questions. What is yours?

[1] Bijan Parsia, Ulrike Sattler, and Thomas Schneider. Easy Keys for OWL. Proc. of the Fifth OWL: Experiences and Directions 2008 (OWLED’08), 26-27 October 2008, Karlsruhe, Germany.
[2] Evren Sirin, Michael Smith and Evan Wallace. Opening, Closing Worlds – On Integrity Constraints. Proc. of the Fifth OWL: Experiences and Directions 2008 (OWLED’08), 26-27 October 2008, Karlsruhe, Germany.
[3] Jiao Tao, Li Ding, Jie Bao and Deborah McGuinness. Characterizing and Detecting Integrity Issues in OWL Instance Data. Proc. of the Fifth OWL: Experiences and Directions 2008 (OWLED’08), 26-27 October 2008, Karlsruhe, Germany.
[4] Pavel Klinov and Bijan Parsia. Probabilistic Modeling and OWL: A User Oriented Introduction to P-SHIQ(D). Proc. of the Fifth OWL: Experiences and Directions 2008 (OWLED’08), 26-27 October 2008, Karlsruhe, Germany.

Case studies of ontology interoperation with OBDA and OBDI

A while ago I wrote about tools to access data through an ontology, which had both several positive comments as well as big question marks as to what I was actually talking about. As is not uncommon of DL literature, the theory papers about Ontology-based Data Access (OBDA) and Ontology-based Data Integration (OBDI) are deemed not particularly readable by people who focus more on Semantic Web Technologies and the two demo papers were small examples to show the software to realise OBDA/OBDI works as intended. So, lo and behold, we now have two case studies! And I mean the real stuff, done with operational information systems.

One deals with linking an ontology to a database of a content management system—the National Accessibility Portal for people with disabilities—and doing funky queries, thereby significantly enhancing the web portal’s search capabilities [1] and the other deals with integrating five different data sources through the ontology so as to provide the system engineers of the SELEX-SI company with one coherent view of the information and greatly simplify querying the diverse data sources [2]. The contents of [1] is focused on the user and usage perspectives, whereas [2] has less about that but a 2.5-page summary of the system’s principles and functionality instead; hence, the two papers highlight both different scenarios and different details for ontology interoperation, thereby complementing each other.

The first paper goes into some detail with a methodology, developing an experimental domain ontology (adolena, made in Protégé), efforts to shape it in a way to be more easily usable for OBDA (best to be within a particular “OWL 2 profile”, well, DL-LiteA), using the OBDA Plugin for Protégé to map the classes and properties in the ontology to SQL queries over the database, and with the query interface one can run SPARQL queries. The SPARQL queries can be over the ontology itself, plain queries over the database, or using a combination of reasoning over the ontology and the database. Concerning the latter, when declared with care, you can indeed have a SPARQL query using concepts and roles of the ontology for which there is no data in the database and retrieve the correct answer; page 8 in [1] explains this for the query to retrieve “all devices that assist with upper limb mobility and has the SPARQL equivalent and query answer in a colourful screenshot. At the back-end, it is the DIG-QuOnto server, which is an extension to the original QuOnto reasoner, that does the query unfolding and rewriting; that is, it is a full-fledged reasoner, tailor-made for ontologies that have to deal with lots of data.

An extension to OBDA is OBDI: one first needs the access so as to integrate it. In this case study by the Romans at uni “La Sapienza” and engineers at SELEX-SI [2], they faced the all-too-familiar problem where data (in this case, Configuration and Data Management (C&DM)) was fragmented over various and autonomous sources, managed by different systems under heterogeneous data models, and the data integration was conducted manually. They developed an ontology for the C&DM subject domain, mapped that to the federated “source schema” (i.e., one schema for dealing with the data sources that varied from RDBMS to Excel to HTML), and demonstrate several queries with the Mastro-I solution, which also uses DL-LiteA and its sophisticated reasoner, by using the direct access to the server through the console so as to exploit all its features to the full (GUI improvements are well down in the pipeline by now). One of the major advantages of this OBDI is that you have to declare the mappings only once and then can avail of the comparatively much simpler SPARQL queries, thereby saving both the domain experts and the system engineers a lot of time.

Just in case the experiments did whet your appetite, you would like to know more about it, and discuss it in person (or perhaps, attend the ISWC’08 tutorial that will give an overview of the OBDA/I tools, among other topics): authors of both papers, as well as the theory and tool developers, will be at ISWC 2008 and the co-located OWLED’08 in Karlsruhe (26-30 Oct.).

[1] Keet, C.M., Alberts, R., Gerber, A., Chimamiwa, G. Enhancing web portals with Ontology-Based Data Access: the case study of South Africa’s Accessibility Portal for people with disabilities. Fifth International Workshop OWL: Experiences and Directions (OWLED’08 ). 26-27 Oct. 2008, Karlsruhe, Germany.
[2]
Amoroso, A., Esposito, G., Lembo, D., Urbano, P., Vertucci, R.: Ontology-based data integration with MASTRO-I for configuration and data management at SELEX Sistemi Integrati. In: Proc. of SEBD’08. (2008 ) 81-92.

Improving science blogging

As a brief diversion from report-writing to meet deadlines and setting aside for a moment the discussion on science blogging journalism vs blogging by scientist, I had a quick look at the PLoS Biology paper on Advancing Science through Conversations: Bridging the Gap between Blogs and the Academy [1]. After the usual introductory things, they set out to

propose a roadmap for turning blogs into institutional educational tools and present examples of successful collaborations that can serve as a model for such efforts. We offer suggestions for improving upon the traditionally used blog platform to make it more palatable to institutional hosts and more trustworthy to readers; creating mechanisms for institutions to provide appropriate (but not stifling) oversight to blogs and to facilitate high-quality interactions between blogs, institutions, and readers; and incorporating blogs into meta-conversations within and between institutions.

For instance, like done by Stanford (and several others mentioned in the article), the university or research institute could host a blogging site that aggregates their blogging scientists to give some trustworthiness to the blog and, perhaps, could be a showcase to the wide world that the ‘ivory towers’ do care about the public and that the institute would want to add a new mode of communication with the wide world. The variation by MIT is broader in types of content, e.g. with editorial and tech review, and more of a top-down approach (even though the scientists who are blogging show more of a bottom-up process). Our uni just went on facebook; would that be a step in the right direction (and add, say, LinkedIn for the alumni)?

Then there are issues of ‘blog review’, moderation, rankings and how one could approach that, as well as post categories of discussing peer reviewed published papers like research blogging, though I think one also could have other categories, such as with Ben Good’s experiment on putting out a draft for comment before submitting and where the blog (or a section thereof) is dedicated to some course the scientist is teaching.

More points and suggestions are being raised in the article (see in particular also the last section after figure 2), to which I might return after the deadline.

UPDATE: one of the article authors, Nick Anthis, has already written a blog post synthesising the various comments from other bloggers. Admitted, I lag behind the mainstream blogging and perhaps should have spend the time writting that paragraph in the report instead of browsing articles and blogs…

[1] Batts SA, Anthis NJ, Smith TC (2008) Advancing Science through Conversations: Bridging the Gap between Blogs and the Academy. PLoS Biol 6(9): e240.