### Ontologies and conceptual modelling workshop in Pretoria

A first attempt was made in South Africa to get researchers and students together who are interested in, and work on, ontologies, conceptual data modelling, and the interaction between the two, shaped in the form of an interactive Workshop on Ontologies and Conceptual Modelling on 15-16 Nov 2012 in Tshwane/Pretoria (part of the Forum on AI Research (FAIR’12) activities). The participants came from, the University of KwaZulu-Natal, University of South Africa, Fondazione Bruno Kessler, and different research units of CSIR-Meraka (where the workshop was organized and held), and the remainder of the post contains a brief summary of the ongoing and recently competed research that was presented at the workshop.

The focus on the first day of the workshop was principally on the modeling itself, modeling features, and some prospects for reasoning with that represented information and knowledge. I had the honour to start the sessions with the talk of the paper that recently won the best paper award at EKAW’12 on “Detecting and Revising Flaws in OWL Object Property Expressions” [1], which was followed by Zubeida Khan’s talk of our paper at EKAW’12 about ONSET: Automated Foundational Ontology Selection and Explanation [2] that was extended with a brief overview of her MSc thesis on an open ontology repository for foundational ontologies that is near completion. Tahir Khan, who is a visiting PhD student (at UKZN) from Fondazione Bruno Kessler in Trento, gave the third talk within the scope of ontology engineering research. The main part of Tahir’s presentation consisted of an overview of his template-based approach for ontology construction that aims to involve the domain experts in the modeling process of domain ontology development in a more effective way [3]. This was rounded off with a brief overview of one component of this approach, which has to do with being able to select the right DOLCE category when one adds a new class to the ontology and integrating OntoPartS for selecting the appropriate part-whole relation [4] into the template-based approach and its implementation in the MoKi ontology development environment.

There were three talks about representation of and reasoning over defeasible knowledge. Informally, defeasible information representation concerns the ability to represent (and, later, reason over) ‘typical’ or ‘usual’ cases that do have exceptions; e.g., that a human heart is typically positioned left, but in people with sinus inversus, it is positioned on the right-hand side in the chest, and policy rules, such as that, normally, users have access to, say, documents of type x, but black-listed users should be denied access. Giovanni Casini presented recent results on extending the ORM2 conceptual data modeling language with the ability to represent such defeasible information [5], which will be presented also at the Australasian Ontology Workshop in early December. Tommie Meyer focused on the reasoning about it in a Description Logics context ([6] is somewhat related to the talk), whereas Ivan Varzinczak looked at the propositional case with defeasible modalities [7], which will be presented at the TARK’13 conference.

Arina Britz and I also presented fresh-fresh in-submission stage results. Arina gave a presentation about semantic similarities and ‘forgetting’ in propositional logical theories (joint work with Ivan Varzinczak), and I presented a unifying metamodel for UML class diagrams v2.4.1, EER, and ORM2 (joint work with Pablo Fillottrani).

Deshen Moodley gave an overview of the HeAL lab at UKZN and outlined some results from his students Ryan Chrichton (MSc) and Ntsako Maphophe (BSc(honours)). Ryan designed an architecture for software interoperability of health information systems in low-resource settings [8]. Ntsako has developed a web-based ontology development and browsing tool for lightweight ontologies stored in a relational database that was tailored to the use case of a lightweight ontology of software artifacts. Ken Halland presented and discussed his experiences with teaching a distance-learning-based honours-level ontology engineering module at UNISA.

Overall, it was a stimulating and interactive workshop that hopefully can, and will, be repeated next year with an even broader participation than this year’s 16 participants.

### A successful EKAW’12 conference

Having returned four days ago from the 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12)—held in a sunny (!) and beautiful Galway from 8-12 October—I have not yet managed to read all the papers I checked off to read, but I don’t want to postpone the usual conference blogpost too much. So here it goes.

The main reasons why ‘successful’ is in the title of this post is that there were several interesting papers, I was (co-)author of two full papers (acceptance rate 15%) of which one won the best paper award, useful feedback on the contents of the papers, it was productive regarding meeting up and conversing about our research and networking, and it was held in Galway. The remainder of this posts briefly outlines some of that; there are Springer LNAI conference proceedings and most presentations have been uploaded on YouTube now.

There were three keynotes. Martin Hepp talked about the difference between ontologies and (more lightweight) web ontologies. Michael Uschold reflected on building the Enterprise Ontology and the lessons learned. Lee Harland provided a lot of information about “practical semantics” for the pharmaceutical industry to improve on the drug discovery process with, a.o., flexible data integration, the new W3C draft of the provenance data model, and quantitative data ontology in the Open PHACTS project.

There were several sessions spread over three whole days, grouped by the following topics: knowledge extraction and enrichment, natural language processing, linked data, ontology engineering and evaluation, social and cognitive aspects of knowledge representation, applications of knowledge engineering, and in-use papers.

Unsurprisingly, I’ll zoom in a bit on the ontology engineering contributions. There were several papers on improving the quality of an ontology. María Poveda-Villalón presented the OntOlogy Pitfall Scanner OOPS! tool that implements the current catalogue of 29 pitfalls [1], where pitfalls may be logical consistency issues or due to modeling or due to human understanding. Given an ontology, OOPS! evaluates it on those pitfalls and reports possible instances, which then can be corrected; e.g., a user defined a property to be the inverse of itself or swapped intersection and union in an expression or missing disjointness axioms. Concerning the latter, Sebastien Ferré’s Advocatus Diaboli—or: “pew! pew!”—may come in helpful as well [2]: it lets one explore the ontology, find “absurd” conjuncts, and add an axiom to exclude that. Or: the aim of the Possible World Explorer is to reduce the amount of possible worlds admitted by the ontology and therewith approximate the intended models better. My own contribution on Detecting and Revising Flaws in OWL Object Property Expressions [3]—which won the best paper award—considers flaws in object property expressions, good and safe role boxes/object property expressions, defines two tests to check for that in an ontology, and provides proposals for how to correct the mistakes (there’s an informal introduction in a previous blog post). In addition to these research contributions on finding and fixing flaws, there was also an in-use paper about that, though then applied to SKOS vocabularies [4], which won the best in-use paper award. It combines guidelines and constraints for SKOS in a new tool Skosify and evaluated 14 SKOS vocabularies and thesauri in some detail, therewith improving those artifacts.

From a modelling/ontology viewpoint, the paper about derived roles [5] was really interesting: although I had thought about the basic temporal dimension of roles before, not in such detail as Mizoguchi and co-authors did. For instance, how should one represent ‘murderer’ or ‘examinee’? There is such thing as an “original role” as we commonly know it, but also a “derived role”, where the meaning of the original role is slightly altered, based on the context of that role; e.g., an examinee not only being an examinee whilst writing the exam, but also when she is studying before the exam, and once one is a murderer during the act of killing, one remains ‘a murderer’ for the remainder of one’s life (though, obviously, not permanently stuck in an act of killing). These derived roles have further, more detailed, specifications, which are summarized in the paper.

Another aspect of foundational ontologies is using them in domain ontology development, and the step prior to that: how to figure out what the ‘best’ foundational ontology is for your project. I co-authored a paper about that with my MSc student Zubeida Khan: ONSET: Automated Foundational Ontology Selection and Explanation [6], which was presented by her and also featured at the demo session where colleagues provided suggestions for more nice features. As mentioned in earlier blogposts (e.g., here), features of foundational ontologies were analysed, as well as criteria for selection of a foundational ontology and needs by existing ontology development projects, which were both used to design a tool, ONSET, that helps with automated selection of a foundational ontology and providing an explanation of the computed selection. Riichiro Mizoguchi—from the YAMATO foundational ontology and who was also attending the conference—has provided the values for the criteria of their foundational ontology in the meantime (thank you!), and you will see an updated ONSET very soon.

Some tools have been evaluated more rigorously than others, and there are a myriad of evaluation approaches. One that stands out by having used the Systems Usability Scale and a funny video during the presentation, is the evaluation of the Live OWL Documentation Environment LODE that automatically generates documentation of your ontology in one HTML page [7]. One that stands out for its interesting results, is the paper about the effect of software-supported collaboration features in the ontology development environment [8]. Marco Rospocher presented the user evaluation done with the MoKi modeling wiki with and without its collaboration features and evaluated their effect on ontology development. The collaborative ontology development went better with such features.

More papers deserve attention here (and I may add them later once I have read the papers), and likewise the mention of other people who attended and of which it was really pleasant to meet them again as well as some fist meeting-in-person after reading several of their papers over the years (among others, and in alphabetical order: Claudia d’Amato, Matthieu d’Aquin, Aldo Gangemi, Chiara Ghidini, Patrick Lambrix, Riichiro Mizoguchi, Marco Rospocher, Mari Carmen Suárez-Figueroa, and Michael Uschold), and to my pleasant surprise, there appear to be ontology enthusiasts in Senegal as well (Gaoussou Camara presented a poster about the use of the infectious diseases ontology).

The next EKAW conference in 2014 will be held in Sweden and I’m looking forward to participating again.

### Some ideas about what the Semantic Web will look like in 2022

Research into realizing a vision of the Semantic Web has been ongoing for little over 10 years, and a call has gone out to ponder, daydream, fantasize, think wishfully or with fear about “What will the Semantic Web look like 10 years from now?” (SW2022). A selection of the many ideas will be presented on November 11, 2012, at the SW2022 workshop, held in conjunction with the 11th International Semantic Web Conference (ISWC’12) in Boston, USA.

For the curious: all SW2022 papers that will be presented are online on the SW2022 page (scroll down to about half-way on the web page for the programme). I picked out a few that I will summarise and comment on below; my selection is based on topic and/or author(s) and/or curious title, and I am a co-author of one of the papers.

Abraham Bernstein will present the first main paper [1], on the “global brain Semantic Web”, where the Internet is going to serve as the analogue to a brain’s neurons. The ‘global brain’ is used as a metaphor (or revamped old-fashioned AI?) for “distributed interleaved human-machine computation”, or, in fancier, more marketable, terms, now also called “collective intelligence” and “social computing”. In short: put the human in the Semantic Web, both as part of the knowledge provider and as educated user. Bernstein zooms in on the need to be able to manage the “motivational diversity, cognitive diversity, and error diversity” with respect to the possibility of realizing this global brain Semantic Web. Alessandro Oltramari’s vision for a cognitive Semantic Web [2] is quite similar to Bernstein’s one, where the semantic web is tuned to the individual user and “it will be an emergent social network of human and artificial cognitive agents interacting in a hybrid environment, where the distinction between physical and virtual will be superseded by the very nature of the entities populating it, namely knowledge objects and knowledge agents” [2]. Compared to these, our vision of interoperability is somewhat more humble.

Oliver Kutz will present our paper [3] about interoperability among ontologies, to be realized with the Distributed Ontology Language (DOL) that is currently in the process of standardisation at ISO (scheduled to be finalized by 2015). DOL is a metalanguage for distributed ontologies that may be represented in different ontology languages (some of the technical details can be found in a recent paper that won the best paper award at FOIS’12 [4] and a few examples are described in [5]). Overall then, it would be nice if, by 2022, we have solved the interoperability issues not only among data, but also the ‘models’ (ontologies, services descriptions etc.) and, especially, their logic-based representation languages. For instance, being able to seamlessly link knowledge that is represented partially in OWL 2 DL and partially in an ontology represented in Common Logic or leaving an OBO ontology like that yet declare more semantics (e.g., cardinality constraints, property chains) ‘around’ it in a more expressive language for those who need it, and advanced features for modularization, which are all realistic usage scenarios with the DOL. Clearly, all this will need some tool support. Initial tools do exist—Hets for reasoning over heterogeneous ontologies and the Ontohub ontology repository—but more can and will have to be done to realize full interoperability.

The paper on the Semantic Web needs (vision?) for cultural heritage [6] offers nothing I did not already know. South Africa has its own programme in that area—albeit called “indigenous knowledge management”, not “cultural heritage”—and we did our own requirements analysis some time ago already [7, 8]. Our list of requirements lists matches the one by Vavliakis et al., and we have a technology maturity analysis, a set of OWL requirements, and actual use cases from the domain experts and users of the Department of Science & technology’s National Recordal System project for indigenous knowledge management (about which I blogged before). That the topics will receive attention also at SW2022 hopefully increases the chance that those requirements will be investigated further, solved, and realized, which, in turn, will improve the software developed here and, ultimately, the people will benefit from it all.

Mutharaju [9] emphasizes on the need for connectivity, personalization and abstraction. Regarding the latter, he notes that “There would be a need to provide multiple (and higher) levels of abstractions and facilitate drill-down mechanisms.” yey! maybe my work on granularity (among others, [10]) will find its way into implementations after all. Also, Mutharaju thinks that the Semantic Web may be of use for the benefit of the environment (e.g., calculating better traffic flow, using sensor data etc.).

A short paper scheduled for the panel session is entitled “The rise of the verb” [11], which I found a curious title: verbs are taken into account already, where a verb’s ontological foundation is, in the Semantic Web context, represented as an object property in OWL or reified under, say, DOLCE’s Perdurant. Considering the contents of the paper, a more suitable title with respect to the contents could have been “action in the Semantic Web”: the paper’s introduction suggests adding something executable to the semantic web by means of JavaScript but where the instruction is specified at the knowledge level. Heiko Paulheim and Jeff Pan also want some language extensions: they argue in favour of language extensions, so as to be able to handle imprecision/uncertainty in particular [12].

Vander Sande and co-authors present a rather bleak vision of the Semantic Web [13], in that it could endanger humanity. They spend the full 6 pages on highlighting the myriad of dangers and the possible misuses of Semantic Web technologies. Among others: ‘semantic spam’ instead of the dumb variety we have gotten used to, where spammers take advantage of the Linked Open Data cloud and otherwise linked social network data to make the spam look more believable; polluting the LOD cloud through link spoofing; identity theft and provenance manipulation; and the Web of Things for autonomous computerized weaponry. One also could have added a follow-through of the saying that ‘knowledge is power’, where better and scaled-up knowledge management facilitates obtaining more power (and power corrupts, and absolute power corrupts absolutely). All this, in turn, goes back to the philosophical issues regarding responsibility in research, engineering, and technology and whether some field is inherently bad, neutral, or good, or whether the bad pops up only with some application scenarios where the technologies could possibly be used. For the Semantic Web, I think it is only the latter, but you may try to convince me otherwise.

Although I won’t be attending, it’s appreciated that the papers are online already, and I can imagine there will be some lively discussions at the SW2022 workshop.

### A few notes on a successful ESWC’12 and OWLED’12

Slightly later than near-realtime due to flight delays, here are a few notes on the 9th Extended Semantic Web Conference ESWC’12 and OWL: Experiences and Directions OWLED’12, which I attended about two weeks ago in Crete, Greece.

ESWC’12

ESWC’12 was as selective as previous years, with, on average, a 25% acceptance rate. The proceedings are published by Springer; where applicable, I’ve linked the freely available versions in the references below. There’s also metadata and a list of award winners.

Main background picture of the ESWC’12 conference, with Cretan hills

Keynotes

I assume that, like last year, The keynotes have been put on the video lectures website; below follows a brief impression. for now, you’ll have to make do with a brief impression through my lenses.

Alon Halevy, head of structured data at Google, gave his keynote the morning after the social dinner (but the conference hall was full nevertheless). He entertains the perspective of Knowledge Representation and the Semantic Web as being “databases on steroids”. The talk’s topics were Google fusion tables with lightweight semantics that are intended as a “data management for the 99%” and Webtables, which was about a search for data tables on the Web, with as goal to have an easy to use database system that is integrated with the web. The work on web tables was alike a very large-scale attempt at bottom-up lightweight conceptual data model and ontology development. They crawled the Web for raw tables (14 billion), of which an estimated 154 million can pass for real relations (relations from the database viewpoint, with structured data, not using a html table for the layout of a page), which then ended up as 2.5 million schemas as recovered table/relation semantics. And then there’s Halevy’s enthusiasm about coffee.

Aleksander Kolcz from Twitter went over a few problems they are trying to solve at Twitter, such as the tweet relevance, who to follow, content recommendation, language, anti-spam, and user interest modeling. As small tidbit of data: there are 140 million users, 340 million tweets/day, and 2.3 billion search queries/day (i.e., 26K/sec.). Apparently, when one has enough, i.e., very large amounts, of data, simple models work “remarkably well” and ensembles of classifiers perform better in accuracy.

Abraham Bernstein’s keynote was about getting our act together in the semantic web research area and promoting the “garbage can theory” that was introduced by Cohen, March and Olsen in 1973: or, some ideas, theories, and tools are ‘thrown away’ into the garbage, where they can meet others, and combine so that something beautiful can come of it after all (this is my simplistic, shorthand version of it).

Unfortunately I missed the pre-conference keynote by Julius van der Laar because OWLED was still ongoing. By hearsay, I’ve heard it was a good/interesting one about what (sneaky) social media strategies the Obama campaign used in the previous presidential elections in 2008.

Papers

There were several tracks that ran in parallel, hence attendance was necessarily limited due to those logistic constraints. I’ve attended the ontologies, reasoning, semantic data management, digital libraries and cultural heritage, and in use sessions. The following pointers are based on my attendance of the presentations and partial reading of the papers.

Ontologies track. Yves Raimond from the BBC presented a query-driven evaluation framework for ontologies, defining their way of ‘good’ with respect to the task and data, and applied it to the music ontology (online slides), noting some room for improvements. The paper also has a neat brief overview of techniques for ontology evaluation [1]. I presented the paper co-authored with Francis Fernandez and Annette Morales on mereotopology and the OntoPartS tool that helps modellers to represent part-whole relations [2], which I introduced in an earlier post. OntoPartS was also presented at the demo session [3], which generated quite some interest among logicians and practitioners alike. Besides my ‘toy ontology’ examples to demonstrate the tool’s functionality, Martin Hepp had brought his GoodRelations ontology for e-commerce, which I thus used instead to illustrate adding part-whole relations to a real ontology. The demo session ended officially at 9pm, but it was after 10pm before I packed up my tablet.

Semantic data management track. Craig Knoblock and co-authors developed a system to link data to ontologies and preserve the linking in a so-called (logic-based) “source model” that is computed semi-automatically by taking as input the data, an ontology, some learned semantic types, and a refinement step by the user in a nice GUI [4]. This was evaluated with a set of bio-informatics resources, such as UniProt. The presentation by Lorena Etcheverry was a bit long on the intro, but the idea nice: enhancing OLAP analysis with ‘good enough’ temporary cubes generated from web sources, the introduction of a new vocabulary, Open Cubes, for the specification and publication of multidimensional cubes on the Semantic Web (which, unfortunately, the authors still have not shared online), and an algorithm for creating the SPARQL 1.1 query for rollup [5].

In use track. Michel Dumontier demonstrated an extension to the HyQue hypothesis formulator and evaluator, using rules sets using the SPARQL Inferencing Notation (SPIN) so that users can trace their hypothesis evaluation [6]. Stefan Scheglmann presented a paper on their efforts how to provide “programming access” to ontologies and have an accompanying tool OntoMDE, a model-driven engineering toolkit (which, however, does not seem to be online available, although a link was shown in the presentation, and I jotted down something on Eclipse plugins) [7]. StorySpace was put in the Digital Libraries and cultural heritage track, but could just as well have been in in-use: it is an environment for constructing and navigating stories, plots, and narratives, guided by the newly introduced curate ontology [8]. We’ll have to look at all that in more detail in the context of our IKMS development [9].

OWLED’12

The proceedings of OWLED’12 are available on CEUR-WS. Over 30 papers were submitted, so, the workshop ended up to be somewhat selective compared to previous years. 18 papers were presented, a keynote, and two tutorials. The following is, again, a selection of that (mainly due to my time constraints reading the papers and typing up something).

Mariano Rodriguez presented the ontopQuest system [10] for Ontology-Based Data Access, providing SPARQL query answering with OWL 2 QL/RDFS entailments.  It works with the so-called “classic ABox mode” with an internal relational database and in “virtual ABox mode”, and, unlike, say, QuOnto, it embeds most of the TBox semantics into the database by availing of a (also recently developed) semantic indexing technique. (Hopefully that’ll help my ontologies & knowledge bases students to answer the OBDA questions better next time, who ought to have read at least David Toman’s slides on the principal approaches to realize OBDA before the test.) Staying with reasoning, Dmitry Tsarkov presented the idea of using metareasoning that takes into account both the features of current reasoners and modularisation to come up with the ‘best’ reasoning strategy to answer a query over only that part of the ontology that is relevant for the query [11].

An extension to the OWLGrEd tool for modeling OWL ontologies through a UML-like interface was presented: the developers have added a ‘splitter’ to enable a user to decide which axioms to close (using the OWL + Integrity Constraints), then to send the serialization to the reasoner and display the inferences [12]. Pity that it works only with the commercial RDF database Stardog by Clark & Parsia. Bijan Parsia  presented—among other things—a paper on automatically generating analogy questions, which are widely used in multiple choice questions, and determining somehow their difficulty. The automated generation was facilitated by an ontology, and the initial results are promising [13]. I presented the paper on OWL requirements for indigenous knowledge management systems [9], about which I blogged earlier, as one of my co-authors, Ronell Alberts, was already presenting a paper based on her recently completed MSc thesis [14].

One of the tutorials was about modularity, which was presented by Chiara del Vescovo and Dmitry Tsarkov from Manchester University (see their modularity website for more info). The tutorial presented an overview of where modularity is useful, and how. Some of the reasons to modularise are to facilitate the explanation services, to perform incremental reasoning, semantic diff, and hotspot detection (= splitting an ontology into the simple and the complex part). That is, it presented a viewpoint on modularity as possible solution for the issues of (and the need for) scalability and performance of automated reasoning. Modularity and modularization during modeling and to reduce the so-called cognitive overload—i.e., involving some, or even driven by, subject domain semantics—was here (and is in most other DL-oriented outlets) apparently entirely outside the scope, which is a missed opportunity (more about that another time).

Typical tourist picture of the conference hotel (the view from my room wasn’t that great, but with the busy schedule, that didn’t matter anyway)

Aside from the stimulating papers and keynotes, and ensuing conversations with fellow researchers, it was great to meet people again and meet new people, and we had a lot of fun socialising. Now back to work so as to have shot at next year’s installment of ESWC in Montpellier, France (which is close to a village I used to go on holidays for some 8 years, many years ago).

### Notes on AFRICON’11, MAIS’11, and SAICSIT’11

It has been a busy month of conferencing. First was AFRICON’11 in Livingstone in Zambia (13-15 Sept.) and its special session on Robotics and AI in Africa, where I presented a paper on bottom-up ontology development of bio-ontologies [1]). Then the Masters AI spring School (MAIS’11) hosted by UKZN (26-30 Sept.), of which I was the main organizer and where I gave a presentation on ontology-driven formal conceptual data modeling for biological data analysis. And I just returned from the South African Institute of Computer Scientists and Information Technologists Annual Research Conference (SAICSIT’11) in Cape Town (3-5 Oct.), where I presented two papers (also blogged about before: on rough subsumption reasoning [2] and keys in UML class diagrams [3]). The remainder of this post contains a quick recap of each.

AFRICON’11

I think back at this conference with mixed emotions: the logistics were quite lousy and very expensive, but I’ve made several new connections and it was good to be informed about who’s working on what in Africa. Overall, and going by the sessions I attended, it gave me the impression of a workshop-level event rather than ‘the’ major conference on the continent it is claimed to be. Looking through my notes now, some of the noteworthy items are Dietmar Dietrich’s keynote on the questions if IT is to/can be a major contributor to solve the energy challenge. (Green IT seems to be the new up and coming hot topic in research and engineering.). The Robotics and AI sessions and dialog session I attended had several showcases of robots, Tracey Booysen from UCT presented how to build the, thus far cheapest, swarm robot (60 USD) [4], and Alexander Ferrein described the experiment of high school students preparing and participating in Robocup Junior [5]. Other topics were as diverse as smart carpets, water quality monitoring with live sensors made from algae, the role (if any) for robotics in sustainable development, and ubiquitous healthcare with mobile phones.

Slightly off-topic: the walk at the top of the Victoria Falls was doable even for me and the microlight flight over the falls was great.

MAIS’11

MAIS followed three previous yearly winter/spring schools (MOWS’08, MOSS’09, MOWS’10), though this time it was held in Durban instead of Pretoria and the scope was broader than ontologies.

Alessandro Artale, from my former employer the Free University of Bozen-Bolzano, gave his Formal Methods course to participants from UKZN, UNISA, CSIR-Meraka, and UP in the mornings, augmenting the theoretical aspects with practical’s with NuSMV in the labs, and he closed with recent results on formal temporal conceptual data modelling with light-weight temporal DLs.

The afternoons were filled with tutorials and research presentations. Nelishia Pillay from UKZN gave a well-prepared tutorial on hyper heuristics and Sergio Tessaris, also from FUB, gave a tutorial on SAT and efficient Boolean reasoning (online abstracts). The research presentations by students and researchers covered topics such as formal conceptual data modeling, non-monotonic reasoning, event processing of video, ICT for the sugar cane supply chain, belief revision, foundational ontologies, optimization, and digital forensics. We were short on time with all sessions and continued the discussions during the breaks. Hopefully the ongoing research activities and new ideas the participants were exposed to and exchanged with each other will lead to fruitful collaboration.

Local and International participants of MAIS’11 (photo by Phumelele Mavaneni)

UPDATE (17-10): Phumelele Mavaneni, intern journalist from the UKZN Online e-newspaper, wrote an article about MAIS’11 (vol 5, issue 39), and on the right is a group photo with some of the participants.

SAICSIT’11

It was my first SAICSIT attendance, and it gave me a positive impression, both regarding papers presented and the people who attended. The event was quite selective with a 33% acceptance rate for full papers and 20% for short papers. The ambiance of the venue was good to meet the few people I’d met before and become acquainted with fellow CS & IT researchers and the system in South Africa.

Now it’s back to the regular activities of, mainly, teaching theory of computation and researching so as to have some results for the upcoming submission deadline season (and also to submit that journal paper).

### A few notes on ESWC2011 in Heraklion

It’s the end of a interesting and enjoyable ESWC’11 conference in Heraklion, Crete. Compared to other conferences, there were many keynote speeches (and not all of them that much on the Semantic Web, but interesting nevertheless), and, as usual, there were parallel sessions with (unfortunately) many co-scheduled presentations I would have liked to attend. Here follows a few notes on them (which I might update once travelled back to SA, as this is written rather hastily before departure).

Keynotes

Jim Hendler’s talk was entitled “Why the Semantic Web will never work”—with the quotation marks. There have been quite a few people uttering that sentence, but, in Hendler’s review of the past 10 years, we actually have achieved more in some areas than initially anticipated and more than pessimists thought was feasible. For instance, “the semantic web will never scale”: it does, according to Hendler, as demonstrated, e.g., by participants in the billion triple challenge and the growing LOD data cloud. Or the “folksonomies will win” (as opposed to, at least, structured vocabularies): wrong again, mainly because it does not achieve its goal without “social context” and it lacks the crucial aspect of links between entities. However, these achievements are principally in the bottom part of the Semantic Web layer cake and Hendler claims that the “ontology story is still confused”, although OWL is to a large degree “succeeding as a KR standard”. Key challenges for Hendler include: relating linked data to ontologies, the equivalent of a database calculus for linked data, and the need for providing a means for evaluating reasoning with incomplete and possibly inconsistent data. UPDATE (13-6): Hendler’s slides are on slideshare.

Lars Backstrom, data scientist at Facebook, gave a keynote about analyzing FB data and working toward ranking and filtering news feeds by turning it into a classification problem using a set of properties (localization, relation to actor, and others). Interestingly, Backstrom emphasized that FB is moving toward more structured data, which makes it easier to manage and analyse with the algorithms they are developing. If that is a good thing or not is a separate discussion, especially regarding privacy issues, which was the talk of Abe Hsuan about (clearly, this does not hold only for FB but the web in general). According to Hsuan, “Privacy cannot exist on a lawless Semantic Web”. It was good for several after-talk discussions among the attendees, and the last word on how to deal with all this has not been said and done yet. In this context, someone may want to have a look at episode 3 of The virtual revolution documentary about non-free services on the Web, the TED-talk on The filter bubble, or the less recent Database nation book.

Andraz Tori, CTO of Zemanta, gave a keynote describing some background of the ‘writing help’, as offered by WordPress since recently, whilst trying to avoid wrong usage of it and cleaning up the data. As you may have guessed, I have not used that feature yet when writing my blog posts (and do not see the need for it from my perspective). Prasad Kantamneni from Yahoo! Gave an interactive keynote on HCI applied to the effects of different web interfaces for their search engines—and the consequences on revenue, which was lively and interesting. Seemingly ‘silly little things’ like putting the keyword in boldface in the search results makes a big difference on how a user scans through the results (more efficient), likewise auto-completion that in the end make you read more of the results page.

Last, but most certainly not least, Chris Welty gave the conference dinner keynote, which was entertaining. He described some hurdles they had overcome in building ‘Watson’, a sophisticated question answering engine that finds answers to trivia/general knowledge quizzes for the Jeopardy! game that, in the end, did consistently outperform the national human experts on it. The talk was filled with entertaining mistakes they encountered during the development of Watson, and what it required to fix them. The key message was that one cannot go in a linear fashion from natural language to knowledge management, but one has to use a integration of various technologies to make a successful ‘intelligent’ tool.

Sessions and other things

Normally I have a dense section on the papers presented in the session here, but due to the very busy conference schedule and shortage of free online papers before the conference, I did not get around reading all the papers that I would have liked (and I don’t cite papers I have not read, still roughly following my approach to conference blogging). The one on removing redundancy in ontologies presented by Jens Wissmann [1] was quite interesting, in particular for its creative reuse of computing justifications to remove ‘redundant’ axioms, i.e., those which can be derived from other knowledge represented in the ontology anyway. This was computationally costly, so they also developed another algorithm with better performance; details and experimental results can be found in the paper. My own paper [2] on the experiment of the use of foundational ontologies in ontology engineering was well-received, and generated quite some interest, such as on the quality of the foundational ontologies themselves and how the results presented could translate to their particular domain ontology scenario. I may add something on epistemic queries, computing generalizations, matching 4K ontologies in one year, and cross-lingual ontology mappings (provided I find the time to do so in the upcoming days).

The panel session about e- and open- Government was a bit meager and can be summarized as: Linked Open Data (LOD) is good and catching on well but the integration problems still exist, and we need (at least) structured controlled vocabularies to fix it.

I will close with an announcement that Alexander Garcia-Castro brought under my attention: there will be an “Ontologies come of Age in the Semantic Web” workshop co-located with ISWC’11.

References

### Notes on SAKT’10: Bolzano colloquium on logic for temporal databases

On 16 and 17 December, the KRDB Research Centre at FUB hosted the successful SAKT’10 Symposium in Advances in KRDB Technologies that had as special theme logics for temporal databases, which was organized by Alessandro Artale from the KRDB. As the title suggest, the topics of the presentations and discussions offered various approaches to address the interaction between temporal logics and temporal databases, which I’ll try to summarise here from the notes I took. Note/caution: there are no proceedings to cross-check and to take some of the formal apparatus to illustrate some aspects more precisely, and my note-taking had its ups and downs; so, if you want to know more about the topics, then take a look at the attendees’ respective homepage and publications.

Database-oriented talks

David Toman from the University of Waterloo gave two talks, one about querying temporal databases with temporal SQL and one about data streams and temporal databases. Stream databases can be seen as an append-only temporal database where holistic/bounded synopses (views) have to be made about the glut of past data to be able to effectively manage such ever-growing databases, which can be done policy-driven and query-driven, the latter being a variant of data expiration.

Carlo Combi from the University of Verona and Pietro Sala from the University of Udine gave a joint presentation about temporal functional dependencies in databases with some rather challenging examples taken from the medical domain. It requires a DBMS to deal somehow with fixed intervals as well as moving windows, such as querying if a patient indeed received the right doses at the stipulated times in her chemotherapy treatment (i.e., constraints about the past) and ensuring constraints formulated about treatments that ought to hold in the future, such as (roughly) “a patient should be administered the same quantity of medicine x after each two weeks from the start of the treatment”, for which both a point-based and an interval-based approach was presented. How this is to be presented in a temporal conceptual model is another topic.

Jef Wijssen from Mons University considered temporal patterns (word problems) motivated by certain query answering and pondered about what constitutes a temporal conjunctive query. Mark Reynolds’ work at the University of Western Australia was about reasoning with time stamps and metric temporal logic with real numbers. Paolo Terenziani from the University of Torino reported on his ongoing work towards a unified data model for temporal databases, taking into account both valid time and transaction time, the atelic/telic distinctions (roughly: point-based and interval-based), temporally determinate/indeterminate and some other issues (“now”, granularities).

Logic talks

Roughly in-between ‘just the logic’ and ‘just databases’ was Alessandro Artale’s  presentation about temporal conceptual data modelling with simplified ER and UML Class Diagrams extended with a few temporal operators—those temporal operators that are in TDL-Lite. Vlad Ryzhikov from KRDB presented Marco Gario’s project work about trying to implement a reasoner over temporal ER, which was realized through transformations from the Temporal ER to TDL-Lite, to TFOL over Z to LTL over Z to LTL over N and CTL, the latter two handled with the NuSMV tool. It worked with bounded model checking, but for symbolic model checking there were scalability problems even for very small temporal ER models; so there is room for improvements. There was some immediate feedback from Viktor Schuppan from Fondazione Bruno Kessler where they developed NuSMV, who, in turn, talked in his presentation about the comparison of LTL satisfiability solvers. There are several such solvers that use different techniques, and it appeared that different tools are better at solving different problems, but there is not one that scores best throughout. Michel Ludwig from the University of Liverpool presented his PhD thesis work about TSPASS, a monodic temporal logic prover that builds upon SPASS 3.0.

Roman Kontchakov from Birkbeck College gave a brief overview on how to develop decidable temporal languages, augmented by his joint presenter, Vlad, who talked about TDL-Lite over natural numbers. Being logicians, they invented their own list of what such a decidable temporal language ‘needs’; it would have been nicer if they actually had developed a decidable language that demonstrably contains what is needed from a modelling perspective, such as, to handle essential and immutable parts, relation migration, or demonstrate that one can represent some Relation Ontology relation with a time component, such as the transformation-of relation.

Ian Pratt-Hartmann from the University of Manchester focused on the trials and tribulations of interval temporal logics, where he argued that one should not go the way of restricting the Allen relations but instead restrict quantification to the event-guarded case so as to maintain decidability of the language.

Discussion sessions

Aside from the questions and discussions with each presentation there were two dedicated discussion sessions, one introduced by Vlad on the temporal ER reasoning and chaired by Alessandro Artale, and the second one introduced and moderated by Franz Baader from Dresden University. In addition to the majority of KRDB members and the aforementioned presenters, the other invited attendees—Carsten Lutz from Bremen University, Angelo Montanari from the University of Udine, and Frank Wolter and Boris Konev from the University of Liverpool—also participated.

The first discussion session had as topic the interaction between logic and temporal databases, or: we have al those fancy temporal operators in the logic languages, but how can/do/should/may they translate to the database setting? Take, for instance, an evolution constraint “each employee eventually will be promoted to become a manager”, which seems to ask for the notion of possible/potential satisfiability. Or when we have branching in the future, in which way can there be (certain) query answering that, given a particular state of the database, checks if all branches (possible worlds) lead to some state x or at least some of them, versus that such a state cannot be reached anymore in any possible future state of the database. Can one view the main temporal operators in the logics as database updates in DBMSs? Should one use the CWA for reasoning/querying about the past and OWA for the future? Also, when we represent something in our ontology, and it being a realist ontology, then there is only a single path in the past, but we may revise our understanding and representation of what actually happened; what if that new branch in the past leads to an inconsistency in the system, and how can we keep track of such things (including how valid time and transaction time, whilst being distinct, still interact in data management)? Unsurprisingly, the actual official closing of the discussion was after the scheduled end of the discussion time, and continued informally afterwards.

The discussion session on the second day looked more at applications of temporal logics, such as time-stamped fact bases (ABoxes, if you wish) and how that may interact with (a)temporal TBoxes. If DL knowledge bases may be suitable for it, or perhaps it will be a part of a larger system with some temporal pre-processing before it is entered into the fact base. For the biomed-oriented reader: Franz Baader motivated it with an interesting medical informatics example about management of hospital data in conjunction with knowledge represented in SNOMED CT, having to manage quantitative and qualitative patient data about, e.g., hypertension, patient histories, and projection into the future about likely development of a particular disease, and handling rules, such as (simplified here) “if the measurements of properties x, y, and z are above 1, 2, and 3 at least three times in the past hour in patient1, then classify partient1 as having disease A”.

Overall, it was a stimulating mix of talks and a good ambience for cross-fertilization of topics, problems, and solutions. There is still plenty of research to be done.

### Recap of the sixth workshop on Fact-Oriented Modelling: ORM’10

The sixth workshop on Fact-Oriented/Object-Role Modelling (ORM’10) in Hersonissou, Crete, Greece, and co-located with the OTM conference just came to a close after a long session on metamodelling to achieve a standard exchange format for the different ORM tools that are in use and under development (such as NORMA, DocTool, and CaseTalk). The other sessions during these three days were filled with paper presentations and several tool demos, reflecting not only the mixed audience of academia and industry, but also the versatility of fact-oriented modelling. I will illustrate some of that in the remainder of the post. (Note: ORM is a conceptual data modelling language that enjoys a formal foundation, and a graphical interface to draw the diagrams and a textual interface to verbalize the domain knowledge so as to facilitate communication with, and validation by, the domain experts.)

An overview of a novel mapping of ORM2 to DatalogLB was presented by Terry Halpin from LogicBlox and INTI International University [1]. The choice for such a mapping was motivated by the support for rules in Datalog so as to also have a formal foundation and implemented solution for the (derivation) rules one can define in an ORM conceptual data model in the NORMA tool.

Staying with formalisms (but of a different kind and scope), Fazat Nur Azizah from the Bandung Institute of Technology proposed a grammar to specify modelling patterns so that actual patterns can be reused for different conceptual data models—alike software design patterns, but then for the FCO-IM flavour of fact-oriented conceptual data modelling [2].

At the other end of the spectrum were two papers that proposed and assessed the use and benefits of ORM in the setting of understanding natural language text documents. Ron McFadyen from the University of Winnipeg introduced document literacy and ORM [3]. Peter Bollen from Maastricht University showed how ORM can improve the completeness and maintenance of specifications like the Business Process Model and Notation [4], which is in analogy with the WSML-documentation-in-ORM [5] and thereby thus strengthening the case that one indeed can be both more precise and communicative with one’s specification if accompanied by a representation in ORM.

There was a session on Master Data Management (MDM), presented by Baba Piprani from MetaGlobal Systems and Patricia Schiefelbein from Boston Scientific. However, I got a bit sidetracked when Baba Piprani had an interesting quote called the “Helsinki principle”, being

Any meaningful exchange of utterances depends upon the prior existence of an agreed set of semantic and syntactic rules. The recipients of the utterances must use only these rules to interpret the received utterances, if it is to mean the same as that which was meant by the utterer. (ISO TR9007)

whereas I was associating the term “Helsinki principle” with a wholly different story, being the right to self-determination described in the Helsinki accords on security and cooperation in Europe. Now, it happens to be the case that proper MDM contributes to solving semantic mismatches.

Last, there was a session on extensions. Tony Morgan from INTI International University [6] had a go at folding and zooming, presenting an alternative approach to abstraction for large ORM diagram (that is, alternative to [7,8] and the many other proposals outside ORM); it introduced new notations, the code-folding idea for but then for ORM diagrams, and a lightweight algorithm. Yan Tang from STARLab at the Free University of Brussels elaborated on the interaction between semantic decision tables and DOGMA [9] (DOGMA is an approach and tool that reuses ORM notation for ontology engineering). Last, but not least, I presented the paper by Alessandro Artale and myself about the basic constraints for relation migration [10], about which I wrote in an earlier blog post.

To wrap up, the workgroup on the common exchange format for fact-oriented modelling tools—chaired by Serge Valera from the European Space Agency—will continue their work toward standardization, the slides of the presentations will be made available on the ORM Foundation website in these days, and else it is on heading towards the 7th ORM workshop next year somewhere in the Mediterranean.

### From the Description Logics Workshop 2010, Waterloo

The 23rd International Workshop on Description Logics was held from 4-7 May at the University of Waterloo, in Canada. The full proceedings are online as one large pdf and as individual files for each paper, which contain the papers of the 29 oral presentations (including mine) and 14 posters. Unsurprisingly, the following brief report contains only a selection of the very latest research outcomes in the DL arena that passed the revue in the past 3 days.

Keynotes

Ian Horrocks’ keynote was about his quest to search for the “holy grail” and the lessons learned along the way. That is, he started his research with the problems of the GRAIL language and the too slow classification of the GALEN terminology. With much persistence and desire to solve the problems, eventually his FaCT reasoner managed to get the classification of GALEN core down from 24 hours to 400 seconds. The next steps were to extend the language and introduce optimizations to improve the performance (whereby careful study of typical inputs were crucial for successful optimization)—in an ongoing virtuous spiral. Moving on in the time line, the Semantic Web is, according to Horrocks, alike a “grand challenge” and “killer app” for DLs. Closing the presentation, OWL 2 DL finally contains all the features that GRAIL has (in particular role chaining), but the reasoners were still unable to classify GALEN (until Kazakov’s recent approach with consequence-driven reasoning that reduced it to < 10 seconds). So, while most papers that Horrocks wrote are not particularly written for (nor particularly readable according to) bio- and biomedical ontologists, they might find it nice to know that the base motivation comes from trying to solve the problems they brought in.

The keynote by Phokion Kolaitis was purely database-oriented and focused on schema mappings in the context of database integration (comprising the data federation and translation approaches) and schema evolution, which concerned a line of research originally motivated by the experiences obtained with the CLIO project. During the talk, the emphasis was on the composition and inverse operators and for the former the consequences of chaining different kinds of mappings (e.g., GAV + GAV, GAV + GLAV).
Unfortunately, I missed the keynote by Roberto Sebastiani due to the fuzzy notion of “nearby within walking distance” between the accommodation and the conference venue on the rather large and spacious campus.

Papers

The papers were grouped into sessions about theory, extensions, ontology, reasoning, EL, systems, querying, DL-Lite, OWL, and modules.

Extensions included, among others, complexity of temporal description logics in relation to temporal conceptual modelling and tractable reasoning (i.e., temporal extensions to the DL-Lite family that are the basis for the OWL 2 QL profile) [1], presented by Alessandro Artale. Other extensions, such as fuzzy, rough, and probabilistic, passed the revue in other sessions. For instance, using a probabilistic DL (that is, the option to represent defaults) for repairing TBoxes that was presented by Thomas Scharrenbach [2], approximate least common subsumer [3] by Anni-Yasmin Turham, and my paper in the ontologies section. My paper was about the feasibility of DL knowledge bases with rough concept or vague instances [4]—yes, or and not and, because there are both theoretical and practical limitations to have rough DL knowledge bases in their full glory even when we take into account only the basic aspects of rough sets. The upside is that several research lines on DL languages & tools on the interaction between ontologies and data (and the interest shown by reasoner developers, such as Volker Haarslev of RacerPro, in the experimentation) as well as other avenues, such as semantic scientific workflows, will be very useful to improve the situation so that the combination of ontologies and data can be used better for hypothesis testing to advance science at a faster pace.

Mariano Rodriguez presented a new case study of Ontology-Based Data Access in industry [5], which considers additional features of the system, such as dealing with incompleteness of the data and integrity constraints, and addressing performance issues by assessing the query structure better. Performance optimization was also a motivation for the query answering for expressive DLs by creating “islands” in the ABox [6] presented by Ralf Moeller, and for developing a scalable reasoner for OWL 2 EL and RL using Java and database technologies (MySQL), called OREL [7], presented by Sebastian Rudolph.

Two papers dealt with the topic of (ultimately) helping the modeller to figure out in the case when there is an inconsistency, why this is so. One paper dealt with the complexity of pinpointing (which is not great, as many a modeller who used Protégé 4.0-alpha) in the tractable DL-Lite [8], which was presented by Rafael Peñaloza, and the other one (presented by Matthew Horridge) was about masking the “irrelevant” parts of the justification so as to keep the explanation as short as possible [9]. Another requested feature is dealing with updates of the ontology, for which several strategies are possible, and one such approach for DL-lite ontologies [10] was presented by Dmitriy Zheleznyakov. Also modularization and extraction of sections of an ontology is a well-known request, and an empirical study was presented jointly by Chiara del Vescovo and Thomas Schneider discussing how well the algorithms work: full automated modularization does not look good from a practical perspective, and computing only some modules will be more feasible [11]. This is still fine, I think, because, generally, full modularization is not what the modelers are after anyway, but they only would want to have one or a few subsections extracted from the larger ontology. (In addition, one could use granularity to modularise a large ontology aside from letting one be guided solely by the syntactical features of the ontology.)

That’s it for this year’s DL workshop. DL’11 will be held in Barcelona (colocated with IJCAI’11).

### Progress on the EnvO at the Dagstuhl workshop

Over the course of the 4,5 days packed together at the beautiful and pleasant ambience of Schloss Dagstul, the fourth Environment Ontology workshop has been productive, and a properly referenceable paper outlining details and decisions will follow. Here I will limit myself to mentioning some of the outcomes and issues that passed the revue.

Group photo of most of the participants at the EnvO Workshop at Dagstuhl

After presentations by all attendees, a long list of discussion themes was drawn up, which we managed to discuss and agree upon to a large extent. The preliminary notes and keywords are jotted down and put on the EnvO wiki dedicated to the workshop.

Focussing first on the content topics, which took up the lion’s share of the workshop’s time, significant advances have been made in two main areas. First, we have sorted out the Food branch in the ontology, which has been moved as Food product under Environmental material and then Anthropogenic environmental material, and the kind and order of differentia have been settled, using food source and processing method as the major axes. Second, the Biome branch will be refined in two directions, regarding (i) the ecosystems at different scales and the removal of the species-centred notion of habitat to reflect better the notion of environment and (ii) work toward inclusion of the aspect of n-dimensional hypervolume of an environment (both the conditions / parameters / variables and the characterization of a particular type of environment using such conditions, analogous to the hypervolumes of an ecological niche so that EnvO can be used better for annotation and analysis of environmental data). Other content-related topics concerned GPS coordinates, hydrographic features, and the commitment to BFO and the RO for top-level categories and relations. You can browse through the preliminary changes in the envo-edit version of the ontology, which is a working version that changes daily (i.e., not an officially released one).

There was some discussion—insufficient, I think—and recurring comments and suggestions on how to represent the knowledge in the ontology and, with that, the ontology language and modelling guidelines. Some favour bare single-inheritance trees for appealing philosophical motivations. The first problematic case, however, was brought forward by David Mark, who had compelling arguments for multiple inheritance with his example of how to represent Wadi, and soon more followed with terms such as Smoked sausage (having as parents the source and processing method) and many more in the food branch. Some others preferred lattices or a more common knowledge representation language—both are ways to handle more neatly the properties/qualities with respect to the usage of properties and the property inheritance by sub-universals from its parent. Currently, the EnvO is represented in OBO and modelling the knowledge does not follow the KR approach of declaring properties of some universal (/concept/class) and availing of property inheritance, so that one ends up having to make multiple trees and then adding ‘cross-products’ between the trees. Hence, and using intuitive labels merely for human readability here, Smoked sausage either will have two parents, amounting to—in the end where the branching started—$\forall x (SmokedSausage(x) \equiv AnimalFoodProduct(x) \land ProcessingMethod(x))$ (which is ontologically incorrect because a smoked sausage is not way of processing) or, if done with a ‘cross-product’ and a new relation ($hasQuality$), then the resulting computation will have something alike $\forall x \exists y (SmokedSausage(x) \equiv Sausage(x) \land hasQuality(x,y) \land Smoking(y))$ instead of having declared directly in the ontology proper, say, $\forall x \exists y (SmokedSausage(x) \equiv Sausage(x) \land HasProcessingMethod(x,y) \land Smoking(y))$. The latter option has the advantages that it makes it easier to add, say, Fermented smoked sausage or Cooked smoked sausage as a sausage that has the two properties of being [fermented/cooked] and being smoked, and that one can avail of automated reasoners to classify the taxonomy. Either way, the details are being worked on. The ontology language and the choice for one or the other—whichever it may be—ought not to get in the way of developing an ontology, but, generally, it does so both regarding underlying commitment that the language adheres to and any implicit or explicit workaround in the modelling stage that to some extent make up for a language’s limitations.

On a lighter note, we had an excursion to Trier together with the cognitive robotics people (from a parallel seminar at Dagstuhl) on Wednesday afternoon. Starting from the UNESCO’s world heritage monument Porta Nigra and the nearby birthplace of Karl Marx, we had a guided tour through the city centre with its mixture of architectural styles and rich history, which was even more pleasant with the spring-like weather. Afterwards, we went to relax at the wine tasting event at a nearby winery, where the owners provided information about the 6 different Rieslings we tried.

Extension to the Aula Palatina (Constantine's Basilica) in Trier

Section of the Porta Nigra, Trier