What are the ‘core’ SWLS references, if any?

I’m asked to provide a guest lecture for the “Semantic Web Technologies” course for our students of the European Masters in Computational Logic. The idea is to demonstrate that “yes, there are actually people using SW technologies that are being taught in the course” and what they do with (unto) those technologies[i].

With the aim to give the students a taste of the Semantic Web for the Life Sciences, I made a list of topics—including reasoning, of course—and a list of hot-from-the-press references to demonstrate what is happening at the forefront of the field. But, alas, that appeared not to be the intention. Reasoning is covered in the KR course (the standard classical reasoning services, that is, not the breadth of what a good few bio-info researchers would like to see [1]), and, well, the references were supposed to be well-established, much-cited, overview articles about the use of SW technologies in the life sciences. Uhm… . Well-established, even though it is all quite recent and largely about visions what one may (might) be able to do with the technologies? Technical overview articles from the engineers and users, but no single use cases reporting on the efforts for overcoming a range of hurdles to actually get it to work? How many citations does one need for ‘much’-cited? A Google scholar query on “semantic web life sciences” returned a 45-citations article for SW services experiences from 2004 [2] and a 2005 one on the Semantic Grid [3] that has 69 citations, which both cover the same UK grid project; they seem to be the highest-cited articles with these keywords though. They are not “the” well-established overview article on SW technologies in the life sciences. I don’t think there is one, but you may have better ideas as in my daily work I’m not focusing on how to apply those technologies (or: smell the article-need in case you are short of work to do).

In the end, the ([theoretical] computer science!) students have to read through the May 2007 BMC Bioinformatics overview article by Ruttenberg et al [4] on the activities in the scope of the W3C HCLS SIG, and they will have to form arguments on, say, the (non?)sense of converting an RDBMS into an RDF triple store, whether the bioinformatician is better off with SPARQL compared to well-established-but-not-SW-tech SQL, trade-offs between ontology languages, how to put up with legacy data in the context of moving toward the SW, and data integration & SW technologies. I’ll add interesting students’ “verdicts” after next week’s lecture.

Update d.d. 23-5-2007: the lecture slides are online.

Some of the students’ views (9 out of the 10 students) were that putting up with legacy data and converting them to a Semantic Web ‘compliant’ technology surely does not belong to Semantic Web Technologies, despite that the BioRDF group in the HCLSIG has exactly this goal.

It was not easy to for them to list all the pros and cons of RDF and RDBMSs and, moreover, what to do with them, therefore eventually the suggestion was made that “can’t we just use the best of both together?”. Hmmm. On RDF, SPARQL etc, I gave the example of the interval join on genome data with the HistOn ontology, which was shock, horror: knowing the theory is one thing, but putting up with that query in SeRQL and SPARQL was something else.

They drew a blank (relatively) on the “methodology, tools, and strategies” of the ontology task force, and hopefully they will have a look at sources such as the W3C Best Practices for a very basic idea (but then, that topic was not part of the course) and the pizza ontology tutorial.

Interesting was a response on the “RDFS and OWL offer some relief to the burden of understanding data schemas” [4] and question that UML, ER, ORM, and conceptual graphs are well-established graphical and formal conceptual data modelling languages, so if was something wrong with using those ones? One student gave it a try that, well, he had read a few chapters of Guizzardi’s book [5], and a reason to go for OWL was that at least it has formal semantics, unlike the ambiguous UML. Not that it had a lot to do with “relieving the burden of understanding data schemas” by the user and there is at least one formalisation of the UML class diagrams [6] and several of ER, ORM, and CGs. There are differences between conceptual data models and ontologies, in particular regarding the knowledge that is represented but much less so regarding the formal languages. I’m not convinced that RDFS and OWL offer more relief for understanding data schemas than do UML, ER, ORM, CG/NGs, but maybe one of you have arguments in favour that I haven’t thought of.

Last, temporal concepts and constraints passed the revue, like it does now in the HCLSIG mailing list.

 

[1] Keet, C.M., Roos, M., Marshall, M.S. A survey of requirements for automated reasoning services for bio-ontologies in OWL. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria.
[2] Lord, P. Bechhofer, S., Wilkinson, M.D., Schiltz, G., Gessler, D., Hull, D., Goble, C., Stein, L. Applying Semantic Web Services to Bioinformatics: Experiences Gained, Lessons Learnt. ISWC 2004, LNCS 3298, pp350-364.
[3] De Roure, D., Jennings, N.R.,
Shadbolt, N.R. The semantic grid: past, present, and future. Proceedings of the IEEE, 93(3): 669- 681.
[4] Ruttenberg, A. et al. Advancing translational research with the Semantic Web. BMC Bioinformatics, 2007, 8(Suppl 3):S2.

[5] Guizzardi, G. Ontological foundations for structural conceptual models. PhD Thesis Telematica Institute, University of Enschede, the Netherlands. 2005. TI/FRS/015 (ISBN 90-75176-81-3, ISSN 1388-1795; No. 015).

[6] Daniela Berardi, Diego Calvanese, and Giuseppe De Giacomo. Reasoning on UML class diagrams. Artificial Intelligence, 168(1-2):70-118, 2005.


[i] The topics that pass the revue in the course are: RDF, SPARQL, OWL, Rules, F-Logic, and SW services. (Topics such as XML and Description Logics are covered in other courses.)