Posts Tagged ‘Semantic Web’

Release of the (beta version of the) foundational ontology library ROMULUS

With the increase on ontology development and networked ontologies, both good ontology development and ontology matching for ontology linking and integration are becoming a more pressing issue. Many contributions have been proposed in these areas. One of the ideas to tackle both—supposedly in one fell swoop—is the use of a foundational ontology. A foundational ontology aims to (i) serve as a building block in ontology development by providing the developer with guidance how to model the entities in a domain, and  (ii) serve as a common top-level when integrating different domain ontologies, so that one can identify which entities are equivalent according to their classification in the foundational ontology. Over the years, several foundational ontologies have been developed, such as DOLCE, BFO, GFO, SUMO, and YAMATO, which have been used in domain ontology development. The problem that has arisen now, is how to link domain ontologies that are mapped to different foundational ontologies?

To be able to do this in a structured fashion, the foundational ontologies have to be matched somehow, and ideally have to have some software support for this. As early as 2003, this issue as foreseen already and the idea of a “WonderWeb Foundational Ontologies Library” (WFOL) proposed, so that—in the ideal case—different domain ontologies can to commit to different but systematically related (modules of) foundational ontologies [1]. However, the WFOL remained just an idea because it was not clear how to align those foundational ontologies and, at the time of writing, most foundational ontologies were still under active development, OWL was yet to be standardised, and there was scant stable software infrastructure. Within the Semantic Web setting, the solvability of the implementation issues is within reach yet not realised, but their alignment is still to be carried out systematically (beyond the few partial comparisons in the literature).

We’re trying to solve these theoretical and practical shortcomings through the creation of the first such online library of machine-processable, aligned and merged, foundational ontologies: the Repository of Ontologies for MULtiple USes ROMULUS. This version contains alignments, mappings, and merged ontologies for DOLCE, BFO, and GFO and some modularized versions thereof, as a start. It also has a section on logical inconsistencies; i.e., entities that were aligned manually and/or automatically and seemed to refer to the same thing—e.g., a mathematical set, a temporal region—actually turned out not to be (at least from a logical viewpoint) due to other ‘interfering’ axioms in the ontologies. What one should be doing with those, is a separate issue, but at least it is now clear where the matching problems really are down to the nitty-gritty entity-level.

We performed a small experiment on the evaluation of the mappings (thanks to participants from DERI, Net2 funds, and Aidan Hogan), and we would like to have more feedback on the alignments and mappings. It is one thing that we, or some alignment tool, aligned two entities, another that asserting an equivalence ends up logically consistent (hence mapped) or inconsistent, and yet another what you think of the alignments, especially the ontology engineers. You can participate in the evaluation: you will get a small set of a few alignments at a time, and then you decide whether you agree, partially agree, or disagree with it, are unsure about it, or skip it if you have no clue.

Finally, ROMULUS also has a range of other features, such as ontology selection, a high-level comparison, browsing the ontology through WebProtégé, a verbalization of the axioms, and metadata. It is the first online library of machine-processable, modularised, aligned, and merged foundational ontologies around. A poster/demo paper [2] was accepted at the Seventh International Conference on Knowledge Capture (K-CAP’13), and papers describing details are submitted and in the pipeline. In the meantime, if you have comments and/or suggestions, feel free to contact Zubeida or me.

References

[1] Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A. Ontology library. WonderWeb Deliverable D18 (ver. 1.0, 31-12-2003). (2003) http://wonderweb.semanticweb.org.

[2] Khan, Z., Keet, C.M. Toward semantic interoperability with aligned foundational ontologies in ROMULUS. Seventh International Conference on Knowledge Capture (K-CAP’13), ACM proceedings. 23-26 June 2013, Banff, Canada. (accepted as poster &demo with short paper)

Some ideas about what the Semantic Web will look like in 2022

Research into realizing a vision of the Semantic Web has been ongoing for little over 10 years, and a call has gone out to ponder, daydream, fantasize, think wishfully or with fear about “What will the Semantic Web look like 10 years from now?” (SW2022). A selection of the many ideas will be presented on November 11, 2012, at the SW2022 workshop, held in conjunction with the 11th International Semantic Web Conference (ISWC’12) in Boston, USA.

For the curious: all SW2022 papers that will be presented are online on the SW2022 page (scroll down to about half-way on the web page for the programme). I picked out a few that I will summarise and comment on below; my selection is based on topic and/or author(s) and/or curious title, and I am a co-author of one of the papers.

Abraham Bernstein will present the first main paper [1], on the “global brain Semantic Web”, where the Internet is going to serve as the analogue to a brain’s neurons. The ‘global brain’ is used as a metaphor (or revamped old-fashioned AI?) for “distributed interleaved human-machine computation”, or, in fancier, more marketable, terms, now also called “collective intelligence” and “social computing”. In short: put the human in the Semantic Web, both as part of the knowledge provider and as educated user. Bernstein zooms in on the need to be able to manage the “motivational diversity, cognitive diversity, and error diversity” with respect to the possibility of realizing this global brain Semantic Web. Alessandro Oltramari’s vision for a cognitive Semantic Web [2] is quite similar to Bernstein’s one, where the semantic web is tuned to the individual user and “it will be an emergent social network of human and artificial cognitive agents interacting in a hybrid environment, where the distinction between physical and virtual will be superseded by the very nature of the entities populating it, namely knowledge objects and knowledge agents” [2]. Compared to these, our vision of interoperability is somewhat more humble.

Oliver Kutz will present our paper [3] about interoperability among ontologies, to be realized with the Distributed Ontology Language (DOL) that is currently in the process of standardisation at ISO (scheduled to be finalized by 2015). DOL is a metalanguage for distributed ontologies that may be represented in different ontology languages (some of the technical details can be found in a recent paper that won the best paper award at FOIS’12 [4] and a few examples are described in [5]). Overall then, it would be nice if, by 2022, we have solved the interoperability issues not only among data, but also the ‘models’ (ontologies, services descriptions etc.) and, especially, their logic-based representation languages. For instance, being able to seamlessly link knowledge that is represented partially in OWL 2 DL and partially in an ontology represented in Common Logic or leaving an OBO ontology like that yet declare more semantics (e.g., cardinality constraints, property chains) ‘around’ it in a more expressive language for those who need it, and advanced features for modularization, which are all realistic usage scenarios with the DOL. Clearly, all this will need some tool support. Initial tools do exist—Hets for reasoning over heterogeneous ontologies and the Ontohub ontology repository—but more can and will have to be done to realize full interoperability.

The paper on the Semantic Web needs (vision?) for cultural heritage [6] offers nothing I did not already know. South Africa has its own programme in that area—albeit called “indigenous knowledge management”, not “cultural heritage”—and we did our own requirements analysis some time ago already [7, 8]. Our list of requirements lists matches the one by Vavliakis et al., and we have a technology maturity analysis, a set of OWL requirements, and actual use cases from the domain experts and users of the Department of Science & technology’s National Recordal System project for indigenous knowledge management (about which I blogged before). That the topics will receive attention also at SW2022 hopefully increases the chance that those requirements will be investigated further, solved, and realized, which, in turn, will improve the software developed here and, ultimately, the people will benefit from it all.

Mutharaju [9] emphasizes on the need for connectivity, personalization and abstraction. Regarding the latter, he notes that “There would be a need to provide multiple (and higher) levels of abstractions and facilitate drill-down mechanisms.” yey! maybe my work on granularity (among others, [10]) will find its way into implementations after all. Also, Mutharaju thinks that the Semantic Web may be of use for the benefit of the environment (e.g., calculating better traffic flow, using sensor data etc.).

A short paper scheduled for the panel session is entitled “The rise of the verb” [11], which I found a curious title: verbs are taken into account already, where a verb’s ontological foundation is, in the Semantic Web context, represented as an object property in OWL or reified under, say, DOLCE’s Perdurant. Considering the contents of the paper, a more suitable title with respect to the contents could have been “action in the Semantic Web”: the paper’s introduction suggests adding something executable to the semantic web by means of JavaScript but where the instruction is specified at the knowledge level. Heiko Paulheim and Jeff Pan also want some language extensions: they argue in favour of language extensions, so as to be able to handle imprecision/uncertainty in particular [12].

Vander Sande and co-authors present a rather bleak vision of the Semantic Web [13], in that it could endanger humanity. They spend the full 6 pages on highlighting the myriad of dangers and the possible misuses of Semantic Web technologies. Among others: ‘semantic spam’ instead of the dumb variety we have gotten used to, where spammers take advantage of the Linked Open Data cloud and otherwise linked social network data to make the spam look more believable; polluting the LOD cloud through link spoofing; identity theft and provenance manipulation; and the Web of Things for autonomous computerized weaponry. One also could have added a follow-through of the saying that ‘knowledge is power’, where better and scaled-up knowledge management facilitates obtaining more power (and power corrupts, and absolute power corrupts absolutely). All this, in turn, goes back to the philosophical issues regarding responsibility in research, engineering, and technology and whether some field is inherently bad, neutral, or good, or whether the bad pops up only with some application scenarios where the technologies could possibly be used. For the Semantic Web, I think it is only the latter, but you may try to convince me otherwise.

Although I won’t be attending, it’s appreciated that the papers are online already, and I can imagine there will be some lively discussions at the SW2022 workshop.

References

[1] Abraham Bernstein. The Global Brain Semantic Web – Interleaving Human-Machine Knowledge and Computation. SW2022, Boston, Nov 11, 2012.

[2] Alessandro Oltramari. Enabling the cognitive Semantic Web. SW2022, Boston, Nov 11, 2012.

[3] Oliver Kutz, Christoph Lange, Till Mossakowski, C. Maria Keet, Fabian Neuhaus, Michael Grüninger. The Babel of Semantic Web tongues – in search of the Rosetta Stone of interoperability. SW2022, Boston, Nov 11, 2012.

[4] Till Mossakowski, Christoph Lange, Oliver Kutz. Three Semantics for the Core of the Distributed Ontology Language. In Michael Gruninger (Ed.), FOIS 2012: 7th International Conference on Formal Ontology in Information Systems, Graz, Austria.

[5] Christoph Lange, Till Mossakowski, Oliver Kutz, Christian Galinski, Michael Grüninger, Daniel Couto Vale. The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility, Terminology and Knowledge Engineering Conference (TKE’12). Madrid, Spain.

[6] Konstantinos N. Vavliakis, Georgios Th. Karagiannis and Pericles A. Mitkas. Semantic Web in Cultural heritage after 2020. SW2022, Boston, Nov 11, 2012.

[7] Thomas Fogwill, Ronell Alberts, C. Maria Keet. The potential for use of semantic web technologies in IK management systems. IST-Africa Conference 2012. May 9-11, Dar es Salaam, Tanzania.

[8] Ronell Alberts, Thomas Fogwill, C. Maria Keet. Several Required OWL Features for Indigenous Knowledge Management Systems. 7th Workshop on OWL: Experiences and Directions (OWLED 2012). 27-28 May, Heraklion, Crete, Greece. CEUR-WS Vol-849. 12p.

[9] Raghava Mutharaju. How I would like Semantic Web to be, for my children. SW2022, Boston, Nov 11, 2012.

[10] C. Maria Keet. A formal theory of granularity. PhD Thesis, KRDB Research Centre, Faculty of Computer Science, Free University of Bozen-Bolzano, Italy. 2008.

[11] Paul Groth. The rise of the verb. SW2022, Boston, Nov 11, 2012.

[12] Heiko Paulheim and Jeff Z. Pan. Why the Semantic Web should become more imprecise. SW2022, Boston, Nov 11, 2012.

[13] Miel Vander Sande, Sam Coppens, Davy Van Deursen, Erik Mannens and Rik Van De Walle. The terminator’s origins or how the Semantic Web could endanger humanity. SW2022, Boston, Nov 11, 2012.

A few notes on a successful ESWC’12 and OWLED’12

Slightly later than near-realtime due to flight delays, here are a few notes on the 9th Extended Semantic Web Conference ESWC’12 and OWL: Experiences and Directions OWLED’12, which I attended about two weeks ago in Crete, Greece.

ESWC’12

ESWC’12 was as selective as previous years, with, on average, a 25% acceptance rate. The proceedings are published by Springer; where applicable, I’ve linked the freely available versions in the references below. There’s also metadata and a list of award winners.

Main background picture of the ESWC’12 conference, with Cretan hills

Keynotes

I assume that, like last year, The keynotes have been put on the video lectures website; below follows a brief impression. for now, you’ll have to make do with a brief impression through my lenses.

Alon Halevy, head of structured data at Google, gave his keynote the morning after the social dinner (but the conference hall was full nevertheless). He entertains the perspective of Knowledge Representation and the Semantic Web as being “databases on steroids”. The talk’s topics were Google fusion tables with lightweight semantics that are intended as a “data management for the 99%” and Webtables, which was about a search for data tables on the Web, with as goal to have an easy to use database system that is integrated with the web. The work on web tables was alike a very large-scale attempt at bottom-up lightweight conceptual data model and ontology development. They crawled the Web for raw tables (14 billion), of which an estimated 154 million can pass for real relations (relations from the database viewpoint, with structured data, not using a html table for the layout of a page), which then ended up as 2.5 million schemas as recovered table/relation semantics. And then there’s Halevy’s enthusiasm about coffee.

Aleksander Kolcz from Twitter went over a few problems they are trying to solve at Twitter, such as the tweet relevance, who to follow, content recommendation, language, anti-spam, and user interest modeling. As small tidbit of data: there are 140 million users, 340 million tweets/day, and 2.3 billion search queries/day (i.e., 26K/sec.). Apparently, when one has enough, i.e., very large amounts, of data, simple models work “remarkably well” and ensembles of classifiers perform better in accuracy.

Abraham Bernstein’s keynote was about getting our act together in the semantic web research area and promoting the “garbage can theory” that was introduced by Cohen, March and Olsen in 1973: or, some ideas, theories, and tools are ‘thrown away’ into the garbage, where they can meet others, and combine so that something beautiful can come of it after all (this is my simplistic, shorthand version of it).

Unfortunately I missed the pre-conference keynote by Julius van der Laar because OWLED was still ongoing. By hearsay, I’ve heard it was a good/interesting one about what (sneaky) social media strategies the Obama campaign used in the previous presidential elections in 2008.

Papers

There were several tracks that ran in parallel, hence attendance was necessarily limited due to those logistic constraints. I’ve attended the ontologies, reasoning, semantic data management, digital libraries and cultural heritage, and in use sessions. The following pointers are based on my attendance of the presentations and partial reading of the papers.

Ontologies track. Yves Raimond from the BBC presented a query-driven evaluation framework for ontologies, defining their way of ‘good’ with respect to the task and data, and applied it to the music ontology (online slides), noting some room for improvements. The paper also has a neat brief overview of techniques for ontology evaluation [1]. I presented the paper co-authored with Francis Fernandez and Annette Morales on mereotopology and the OntoPartS tool that helps modellers to represent part-whole relations [2], which I introduced in an earlier post. OntoPartS was also presented at the demo session [3], which generated quite some interest among logicians and practitioners alike. Besides my ‘toy ontology’ examples to demonstrate the tool’s functionality, Martin Hepp had brought his GoodRelations ontology for e-commerce, which I thus used instead to illustrate adding part-whole relations to a real ontology. The demo session ended officially at 9pm, but it was after 10pm before I packed up my tablet.

Semantic data management track. Craig Knoblock and co-authors developed a system to link data to ontologies and preserve the linking in a so-called (logic-based) “source model” that is computed semi-automatically by taking as input the data, an ontology, some learned semantic types, and a refinement step by the user in a nice GUI [4]. This was evaluated with a set of bio-informatics resources, such as UniProt. The presentation by Lorena Etcheverry was a bit long on the intro, but the idea nice: enhancing OLAP analysis with ‘good enough’ temporary cubes generated from web sources, the introduction of a new vocabulary, Open Cubes, for the specification and publication of multidimensional cubes on the Semantic Web (which, unfortunately, the authors still have not shared online), and an algorithm for creating the SPARQL 1.1 query for rollup [5].

In use track. Michel Dumontier demonstrated an extension to the HyQue hypothesis formulator and evaluator, using rules sets using the SPARQL Inferencing Notation (SPIN) so that users can trace their hypothesis evaluation [6]. Stefan Scheglmann presented a paper on their efforts how to provide “programming access” to ontologies and have an accompanying tool OntoMDE, a model-driven engineering toolkit (which, however, does not seem to be online available, although a link was shown in the presentation, and I jotted down something on Eclipse plugins) [7]. StorySpace was put in the Digital Libraries and cultural heritage track, but could just as well have been in in-use: it is an environment for constructing and navigating stories, plots, and narratives, guided by the newly introduced curate ontology [8]. We’ll have to look at all that in more detail in the context of our IKMS development [9].

OWLED’12

The proceedings of OWLED’12 are available on CEUR-WS. Over 30 papers were submitted, so, the workshop ended up to be somewhat selective compared to previous years. 18 papers were presented, a keynote, and two tutorials. The following is, again, a selection of that (mainly due to my time constraints reading the papers and typing up something).

Mariano Rodriguez presented the ontopQuest system [10] for Ontology-Based Data Access, providing SPARQL query answering with OWL 2 QL/RDFS entailments.  It works with the so-called “classic ABox mode” with an internal relational database and in “virtual ABox mode”, and, unlike, say, QuOnto, it embeds most of the TBox semantics into the database by availing of a (also recently developed) semantic indexing technique. (Hopefully that’ll help my ontologies & knowledge bases students to answer the OBDA questions better next time, who ought to have read at least David Toman’s slides on the principal approaches to realize OBDA before the test.) Staying with reasoning, Dmitry Tsarkov presented the idea of using metareasoning that takes into account both the features of current reasoners and modularisation to come up with the ‘best’ reasoning strategy to answer a query over only that part of the ontology that is relevant for the query [11].

An extension to the OWLGrEd tool for modeling OWL ontologies through a UML-like interface was presented: the developers have added a ‘splitter’ to enable a user to decide which axioms to close (using the OWL + Integrity Constraints), then to send the serialization to the reasoner and display the inferences [12]. Pity that it works only with the commercial RDF database Stardog by Clark & Parsia. Bijan Parsia  presented—among other things—a paper on automatically generating analogy questions, which are widely used in multiple choice questions, and determining somehow their difficulty. The automated generation was facilitated by an ontology, and the initial results are promising [13]. I presented the paper on OWL requirements for indigenous knowledge management systems [9], about which I blogged earlier, as one of my co-authors, Ronell Alberts, was already presenting a paper based on her recently completed MSc thesis [14].

One of the tutorials was about modularity, which was presented by Chiara del Vescovo and Dmitry Tsarkov from Manchester University (see their modularity website for more info). The tutorial presented an overview of where modularity is useful, and how. Some of the reasons to modularise are to facilitate the explanation services, to perform incremental reasoning, semantic diff, and hotspot detection (= splitting an ontology into the simple and the complex part). That is, it presented a viewpoint on modularity as possible solution for the issues of (and the need for) scalability and performance of automated reasoning. Modularity and modularization during modeling and to reduce the so-called cognitive overload—i.e., involving some, or even driven by, subject domain semantics—was here (and is in most other DL-oriented outlets) apparently entirely outside the scope, which is a missed opportunity (more about that another time).

Typical tourist picture of the conference hotel (the view from my room wasn’t that great, but with the busy schedule, that didn’t matter anyway)

Aside from the stimulating papers and keynotes, and ensuing conversations with fellow researchers, it was great to meet people again and meet new people, and we had a lot of fun socialising. Now back to work so as to have shot at next year’s installment of ESWC in Montpellier, France (which is close to a village I used to go on holidays for some 8 years, many years ago).

References

[1] Raimond, Y., Sandler, M. Evaluation of the music ontology framework. ESWC’12, Springer LNCS vol 7295, 255-269.

[2] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. In: Proceedings of the 9th Extended Semantic Web Conference (ESWC’12), 29-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

[3] Morales-Gonzalez, A., Fernandez-Reyes, F.C., Keet, C.M. OntoPartS: a tool to select part-whole relations in OWL ontologies. 9th Extended Semantic Web Conference (ESWC’12), 29-31 May 2012, Heraklion, Crete, Greece. Demo with paper.

[4] Knoblock et al. Semi-automatically mapping structured sources into the semantic web. ESWC’12, Springer LNCS vol 7295, 375-390

[5] Etcheverry, L., Vaisman, A. A. Enhancing OLAP analysis with web cubes. ESWC’12, Springer LNCS vol 7295, 467-483.

[6] Callahan, A, Dumontier, M. Evaluating scientific hypotheses using the SPARQL inferecing notation. ESWC’12, Springer LNCS vol 7295, 647-658.

[7] Scheglmann, S. Scherp, A, Staab, S. Declarative Representation of Programming Access to Ontologies. ESWC’12, Springer LNCS vol 7295, 659-673.

[8] Mulholland, P., Wolff, A., and Collins, T. Curate and StorySpace: On ontology and Web-based environment for describing curatorial narrative. ESWC’12, Springer LNCS vol 7295, 748-762.

[9] Alberts, R., Fogwill, T., Keet, C.M. Several Required OWL Features for Indigenous Knowledge Management Systems. 7th Workshop on OWL: Experiences and Directions (OWLED 2012).  Klinov, P. and Horridge, M. (Eds.). 27-28 May, Heraklion, Crete, Greece. CEUR-WS Vol. 849.

[10] Rodriguez-Muro, M., Calvanese, D. Quest, an OWL 2 QL reasoner for ontology-based data access.  OWLED’12. CEUR-WS Vol. 849.

[11] Dmitry Tsarkov and Ignazio Palmisano, Divide et Impera: Metareasoning for Large Ontologies. OWLED’12. CEUR-WS Vol. 849.

[12] Kārlis Čerāns, Guntis Barzdins, Renārs Liepiņš, Jūlija Ovčiņnikova, Sergejs Rikačovs and Arturs Sprogis, Graphical Schema Editing for Stardog OWL/RDF Databases using OWLGrEd/S. OWLED’12. CEUR-WS Vol. 849.

[13] Tahani Alsubait, Bijan Parsia and Uli Sattler, Mining Ontologies for Analogy Questions: A Similarity-based Approach. OWLED’12. CEUR-WS Vol. 849.

[14] Ronell Alberts and Enrico Franconi, An integrated method using conceptual modelling to generate an ontology-based query mechanism. OWLED’12. CEUR-WS Vol. 849.

A couple of OWL requirements for using ontologies in Indigenous Knowledge Management Systems

Knowledge about, say, long established agricultural practices, culinary customs and typical dishes (and its ingredient evolution over the centuries), medicinal plants and so on falls under the term indigenous knowledge in South Africa, cultural heritage in Europe (that I wrote about earlier), and traditional knowledge in other countries. Whichever term you prefer, it’s that kind of knowledge that is on the way of being lost due to changes in society. There is consensus to preserve it somehow (and possibly make some money from it along the way). Given that there’s lots of it—hence, lots of data, information, and knowledge, that has to be managed—computing and IT enter the picture.

For South Africa, this is managed through the large-scale project from the Department of Science & Technology’s NIKSO office that aims at building a “national recordal system” and an IT infrastructure (IKMS) to both store and access the indigenous knowledge. Setting up such a system consists of some typical software development themes (following consultation with stakeholders), such as the need for handling varied data formats (documents, images, audio), integration of the existing disparate databases and other IT resources in SA into the IKMS, availability of the information in all 11 official languages, the need for a citizen portal, and so on.

Some of the requirements smelled very much like a possible nice use case for Semantic Web Technologies so as to implement a really state of the art infrastructure with enhanced capabilities compared to standard applications. Ronell Alberts, Thomas Fogwill and I assessed that when I was visiting CSIR-Meraka in August and September 2010 as one of the secondments from the EU FP7 Net2 Project. The assessment of possibilities of using semantic web technologies, including the assessment of maturity for off-the-shelf usage, was accepted at IST-Africa recently [1]. We focused on enhanced querying, semantic browsing, questions answering, multilingual information access, knowledge generation, classification of information, formalisation of scientific knowledge & discovery, and knowledge-based data integration.

This we took a step further by zooming in on the ontologies-part of semantic web technologies for four of the usage scenarios, the selection of which was based on their potential for impact and maturity and inclusion into the IKMS. These are: ontology based querying and browsing; a natural language independent ontology for multilingual data access; support for collaborative knowledge generation; and the formalisation of IK for scientific discovery. More precisely, we investigated the requirements for ontology languages to meet the IKMS needs and how well they are met, if at all. A paper describing the details was just accepted for OWLED’12 [2].

In short: some of the required OWL features include representation of vagueness, mereotopology, modularisation, and extended support for internationalization (i.e., multilingualism) and annotation for collaborative ontology development. Thus, the first three put new requirements on the expressiveness of the OWL language itself, and the latter two formulate requirements akin to ‘usability’ extension for OWL. To motivate it all, we first describe each topic, provide real examples, and a few references to current research and tools, which is then followed by the OWL requirements taking into account the examples and generalizing from them; details can be found in the paper.

Hopefully there will an extensive and useful response at OWLED’12, like the feedback we received at OWLED’07 and DL’07 on the requirements on automated reasoning for bio-ontologies [3]. Obviously, if you have a solution to one or more of the gaps that we had overlooked, please leave a comment or send me an email.

References

[1] Fogwill, T., Alberts, R., Keet, C.M. The potential for use of semantic web technologies in IK management systems. IST-Africa Conference 2012. May 9-11, Dar es Salaam, Tanzania.

[2] Alberts, R., Fogwill, T., Keet, C.M. Several Required OWL Features for Indigenous Knowledge Management Systems. 7th Workshop on OWL: Experiences and Directions (OWLED 2012). 27-28 May, Heraklion, Crete, Greece. CEUR-WS Vol-xxx. 12p.

[3] Keet, C.M., Roos, M., Marshall, M.S. A survey of requirements for automated reasoning services for bio-ontologies in OWL. Third international Workshop OWL: Experiences and Directions (OWLED 2007), 6-7 June 2007, Innsbruck, Austria. CEUR-WS Vol-258. 10p. This was described informally in an earlier post.

Part-whole relations, mereotopology and the OntoPartS tool

Part-whole relations are considered essential in knowledge representation and reasoning and, more practically, in ontology development and conceptual data modelling, especially in the subject domains of biology, medicine, geographic information systems, and manufacturing. In contrast to Ontology that sticks to one type of part-of, the modellers and subject domain experts have come up with a plethora of part-whole relations, some of which are considered real parthood relations and others only meronymic (or: due to imprecise natural language use). For instance, the Foundational Model of Anatomy has 8 basic locative part-whole relations [1], GALEN has come up with 26 part-whole relations [2], and in cognitive science and conceptual data modelling, it hovers around about 6 types [3,4]. They have been structured in a taxonomy of part-whole relations that makes a distinction between mereology and meronomy, transitivity and in- or non-transitivity, and the domain and range of the relationship [5], and some initial usage guidelines were proposed in [6].

But that’s not enough for the complex subject domains and demands on the representation and reasoning over the ontologies. This holds in particular when one has to represent that some things are contained in or located in something else. For instance, the way how Paris and France relate is somehow different from how the euro coin in your wallet relate to each other—the latter being an example of  (spatial) containment, but not structural part of—whereas in other case, the spatial containment of regions of space and the structural parthood of the objects occupying those regions do coincide, e.g., your heart in your body. Or consider representing that Alto Adige/Südtirol is a border province of Italy (bordering Austria), where we have to handle both the notion of administrative entities and connecting geographical regions. That is, handling regions and ‘things’ that occupy those regions (mereotopology).

Being more precise about how the things relate provides nice inferences. Take, e.g., NTPLI as ‘non-tangential proper located in’—a part is located in the whole but not at the boundary of it—and EnclosedCountry \equiv Country \sqcap \exists NTPLI.Country , with the following instances in our knowledge base NTPLI(Lesotho, South Africa) , Country(Lesotho) , and Country(South Africa) , then it deduces correctly that EnclosedCountry(Lesotho) , whereas with a mere ‘part-of’, we would not have been able to obtain this result.

Besides these examples, there are actual system requirements for, among others, annotating and querying multimedia documents and cartographic maps, such as annotating a photo of a beach where the area of the photo that depicts the sand touches the area that depicts the seawater so that, together with the knowledge that Varadero is a tangential proper part of Cuba, the semantically enhanced system can infer possible locations where the photo has been taken, or, vv., it can propose that the photo may depict a beach scene.

But how to cater for such things?

Let me summarise the three main basic problems that have to be resolved first:

  1. There is lack of oversight on plethora of part-whole relations, that include real parthood (mereology) parts with their locations (mereotopology), and other part-whole relations (from meronymy);
  2. The challenge to figure out which one to use when;
  3. The underspecified representation and reasoning consequences when one has to put up with less expressive languages for which technological infrastructure exists.

We propose to solve that in the following way, which is described in detail in [7] that recently got accepted at the 9th Extended Semantic Web Conference (ESWC’12).

The short answer for the reader who is not interested in all the theory, design, and evaluation, but just wants to model quickly: the OntoPartS tool guides you to choose the most appropriate relation and saves the selection into your OWL file.

Now for a slightly longer answer. First, we extend the taxonomy of part-whole relations of [5] with the novel addition of a taxonomy of formally defined mereotopological relations, which is driven by the KGEMT mereotoplogical theory of Varzi [8], resulting in a taxonomy of 23 part-whole relations—mereological, mereotopological, and meronymic ones—therewith ensuring a solid ontological and logic-based foundation.

Second, some things have to be simplified from the KGEMT theory to make it implementable in OWL, and we describe the design rationale and trade-offs so that OntoPartS can load OWL/OWL2-formalised ontologies, and, if desired, modify the OWL file with the chosen relation. Which OWL species is best suited obviously depends on your individual requirements, but from a representation & reasoning and mereotopology viewpoint, OWL 2 DL and OWL 2 RL seem to fit better than the other ones. (Note: there are papers on DL and representing spatial relations and on DL and parthood, and alternative representation choices are discussed in the paper, yet, as far as we are aware of, none deals with mereotopological relations in OWL or, more generally, in DL.)

Third, there is the ‘how to select’ from the 23 relations. To enable a quick selection of the appropriate relation, we avail of a simplified OWL-ized DOLCE ontology—well, just the taxonomy of categories—for the domain and range restrictions imposed on the part-whole relations and with that, we can let the user take shortcuts compared to a lengthy decision procedure. In this way, we reduced the selection procedure to 0-4 options based on just 2-3 inputs. All of this has been structured neatly in implementation-independent activity diagrams, and subsequently has been implemented; see also the demos, the tool, and the OWL version of the taxonomy of the 23 relations.

Last, we have tested OntoPartS with modellers in controlled experiments and it was shown to improve efficiency and accuracy in modeling of part-whole relations.

As mentioned, further details can be found in [7], Representing mereotopological relations in OWL ontologies with OntoPartS, which I co-authored with Francis Fernández-Reyes, with the Instituto Superior Politécnico “José Antonio Echeverría” (CUJAE), and Annette Morales-González, with the Advanced Technologies Application Center (CENATAV), both located in Cuba (the example on semantic annotation of multimedia with spatial relations comes straight from the image processing research being done at CENATAV). A tidbit of non-scientific information: the first version of the OntoPartS tool was developed as part of the mini-project that Francis, Annette (and Alexis, who is into fish fulltime now) had chosen to carry out for the ontology engineering course I taught at the University of Havana in 2010 (mentioned earlier here and here). For the paper, we added some more theory, minor refinements to the tool, and a user evaluation with several CUJAE and UKZN students and a few FUB colleagues (thanks again for their cooperation and interest). We’ve started work on additional features, so if you have any particular request, drop me a line.

References

  1. Mejino, J.L.V., Agoncillo, A.V., Rickard, K.L., Rosse, C.: Representing complexity in part-whole relationships within the foundational model of anatomy. In: Proc. of the AMIA Fall Symposium. pp. 450–454 (2003)
  2. http://www.opengalen.org/tutorials/crm/tutorial9.html up to http://www.opengalen.org/tutorials/crm/tutorial16.html/.
  3. Winston, M., Chaffin, R., Herrmann, D.: A taxonomy of part-whole relations. Cognitive Science 11(4), 417–444 (1987)
  4. Odell, J.: Advanced Object-Oriented Analysis & Design using UML. Cambridge: Cambridge University Press (1998)
  5. Keet, C.M., Artale, A.: Representing and reasoning over a taxonomy of part-whole relations. Applied Ontology 3(1-2), 91–110 (2008)
  6. Keet, C.M.: Part-whole relations in object-role models. In: Proc. of ORM’06, OTM Workshops 2006. LNCS, vol. 4278, pp. 1116–1127. Springer (2006)
  7. Keet, C.M., Fernández Reyes, F.C., Morales-González, A.: Representing mereotopological relations in OWL ontologies with OntoPartS. In Simperl, et al., eds.: Proc. of ESWC’12. LNCS, Springer (2012) 27-31 May 2012, Heraklion, Greece.
  8. Varzi, A.: Handbook of Spatial Logics, chap. Spatial reasoning and ontology: parts, wholes, and locations, pp. 945–1038. Berlin Heidelberg: Springer Verlag (2007)

Lecture notes for the ontologies and knowledge bases course

The regular reader may recollect earlier posts about the ontology engineering courses I have taught at FUB, UH, UCI, Meraka, and UKZN. Each one had some sort of syllabus or series of blog posts with some introductory notes. I’ve put them together and extended them significantly now for the current installment of the Ontologies and Knowledge Bases Honours module (COMP718) at UKZN, and they are bound and printed into lecture notes for the enrolled students. These lecture notes are now online and I will add accompanying slides on the module’s webpage as we go along in the semester.

Given that the target audience is computer science students in their 4th year (honours), the notes are of an introductory nature. There are essentially three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, basics of Description Logics with ALC, all the DL-based OWL species, and some automated reasoning. The ontology engineering block covers top-down and bottom-up ontology development, and methods and methodologies, with top-down ontology development including mainly foundational ontologies and part-whole relations, and bottom-up the various approaches to extract knowledge from ‘legacy’ representations, such as from databases and thesauri. The advanced topics are balanced in two directions: one is toward ontology-based data access applications (i.e., an ontology-drive information system) and the other one has more theory with temporal ontologies.

Each chapter has a section with recommended/required reading and a set of exercises.

Unsurprisingly, the lecture notes have been written under time constraints and therefore the level of relative completeness of sections varies slightly. Suggestions and corrections are welcome!

The DiDOn method to develop bio-ontologies from semi-structured life science diagrams

It is well-known among (bio-)ontology developers that ontology development is a resource-consuming task (see [1] for data backing up this claim). Several approaches and tools do exists that speed up the time-consuming efforts of bottom-up ontology development, most notably natural language processing and database reverse engineering. They are generic and the technologies have been proposed from a computing angle, and are therefore noisy and/or contain many heuristics to make them fit for bio-ontology development. Yet, the most obvious one from a domain expert perspective is unexplored: the abundant diagrams in the sciences that function as existing/’legacy’ knowledge representation of the subject domain. So, how can one use them to develop domain ontologies?

The new DiDOn procedure—from Diagram to Domain Ontology—can speed up and simplify bio-ontology development by exploiting the knowledge represented in such semi-structured bio-diagrams. It does this by means of extracting explicit and implicit knowledge, preserving most of the subject domain semantics, and making formalisation decisions explicit, so that the process is done in a clear, traceable, and reproducible way.

DiDOn is a detailed, micro-level, procedure to formalise those diagrams in a logic of choice; it provides migration paths into OBO, SKOS, OWL and some arbitrary FOL, and guidelines which axioms, and how, have to be added to the bio-ontology. It also uses a foundational ontology so as to obtain more precise and interoperable subject domain semantics than otherwise would have been possible with syntactic transformations alone. (Choosing an appropriate foundational ontology is a separate topic and can be done wit, e.g., ONSET.)

The paper describing the rationale and details, Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn [2], has just been accepted at the Journal of Biomedical Informatics. They require a graphical abstract, so here it goes:

DiDOn consists of two principal steps: (1) formalising the ‘icon vocabulary’ of a bio-drawing tool, which then functions as a seed ontology, and (2) populating the seed ontology by processing the actual diagrams. The algorithm in the second step is informed by the formalisation decisions taken in the first step. Such decisions include, among others, the representation language and how to represent the diagram’s n-aries (with n≥2, such as choosing between n-aries as relationship or reified as classes).

In addition to the presentation of DiDOn, the paper contains a detailed application of it with Pathway Studio as case study.

The neatly formatted paper is behind a paywall for those with no or limited access to Elsevier’s journals, but the accepted manuscript is openly accessible from my home page.

References

[1] Simperl, E., Mochol, M., Bürger, T. Achieving maturity: the state of practice in ontology engineering in 2009. International Journal of Computer Science and Applications, 2010, 7(1):45-65.

[2] Keet, C.M. Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn. Journal of Biomedical Informatics. In print. DOI: http://dx.doi.org/10.1016/j.jbi.2012.01.004

The rough ontology language rOWL and basic rough subsumption reasoning

Following the feasibility assessments on marrying Rough Sets with Description Logic languages last year [1,2], which I blogged about before, I looked into ‘squeezing’ into OWL 2 DL the very basic aspects of rough sets. The resulting language is called, rOWL, which is described in a paper [3] accepted at SAICSIT’11—the South African CS and IT conference (which thus also gives me the opportunity to meet the SA research community in CS and IT).

DLs are not just about investigating decidable languages, but, perhaps more importantly, also about reasoning over the logical theories.  The obvious addition to the basic crisp automated reasoning services is to add the roughness component, somehow. There are various ways to do that. Crisp subsumption (and definite and possible satisfiability) of rough concepts have been defined by Jiang and co-authors [4], and there was a presentation at DL 2011 about paraconsistent rough DL [5]. I have added the notion of rough subsumption.

There are two principal cases to consider (the “\wr ” before the OWL class name denotes it is a rough class):

  • If \wr C \sqsubseteq \wr D is asserted in the ontology, what can be said about the subsumption relations among their respective approximations?
  • Given a subsumption between any of the lower and upper approximations of C and D, then can one deduce \wr C \sqsubseteq \wr D ?

Addressing this raises questions: because being rough or not depends entirely on the chosen properties for C together with the available data, should these two cases be solved only at the TBox level or necessarily include the ABox for it to make sense? And should that be under the assumption of standard instantiation and instance checking, or in the presence of a novel DL notion of rough instantiation and rough instance checking?

These questions are answered in the second part of the paper Rough Subsumption Reasoning with rOWL [3]. In an attempt to make the proofs more readable and because the presence of instances is intuitively tied to the matter, the proofs are done by counterexample, which is relatively ‘easy’ to grasp. But maybe I should have obfuscated it with another proof technique to make the results look more profound.

Last, but not least: just in case you thought there is little motivation to bother with rough ontologies: the hypothesis testing and experimentation described in [2] still holds, and a small example is added to [3].

The succinct paper abstract is as follows:

There are various recent efforts to broaden applications of ontologies with vague knowledge, motivated in particular by applications of bio(medical)-ontologies, as well as to enhance rough set information systems with a knowledge representation layer by giving more attention to the intension of a rough set. This requires not only representation of vague knowledge but, moreover, reasoning over it to make it interesting for both ontology engineering and rough set information systems. We propose a minor extension to OWL 2 DL, called rOWL, and define the novel notions of rough subsumption reasoning and classification for rough concepts and their approximations.

I’ll continue looking into the topic, and more is in the pipeline w.r.t. the logic aspects of rough ontologies (in collaboration with Arina Britz).

References

[1] C. M. Keet. On the feasibility of description logic knowledge bases with rough concepts and vague instances. Proceedings of the 23rd International Workshop on Description Logics (DL’10), CEUR-WS, pages 314-324, 2010. 4-7 May 2010, Waterloo, Canada.

[2] C. M. Keet. Ontology engineering with rough concepts and instances. P. Cimiano and H. Pinto, editors, 17th International Conference on Knowledge Engineering and Knowledge Management (EKAW’10), volume 6317 of LNCS, pages 507-517. Springer, 2010. 11-15 October 2010, Lisbon, Portugal.

[3] C.M. Keet. Rough Subsumption Reasoning with rOWL. SAICSIT Annual Research Conference 2011 (SAICSIT’11), Cape Town, South Africa, October 3-5, 2011. ACM Conference Proceedings. (accepted).

[4] Y. Jiang, J. Wang, S. Tang, and B. Xiao. Reasoning with rough description logics: An approximate concepts approach. Information Sciences, 179:600-612, 2009.

[5] H. Viana, J. Alcantara, and A.T. Martins. Paraconsistent rough description logic. Proceedings of the 24th International Workshop on Description Logics (DL’11), 2011. Barcelona, Spain, July 13-16, 2011.

A few notes on ESWC2011 in Heraklion

It’s the end of a interesting and enjoyable ESWC’11 conference in Heraklion, Crete. Compared to other conferences, there were many keynote speeches (and not all of them that much on the Semantic Web, but interesting nevertheless), and, as usual, there were parallel sessions with (unfortunately) many co-scheduled presentations I would have liked to attend. Here follows a few notes on them (which I might update once travelled back to SA, as this is written rather hastily before departure).

Keynotes

Jim Hendler’s talk was entitled “Why the Semantic Web will never work”—with the quotation marks. There have been quite a few people uttering that sentence, but, in Hendler’s review of the past 10 years, we actually have achieved more in some areas than initially anticipated and more than pessimists thought was feasible. For instance, “the semantic web will never scale”: it does, according to Hendler, as demonstrated, e.g., by participants in the billion triple challenge and the growing LOD data cloud. Or the “folksonomies will win” (as opposed to, at least, structured vocabularies): wrong again, mainly because it does not achieve its goal without “social context” and it lacks the crucial aspect of links between entities. However, these achievements are principally in the bottom part of the Semantic Web layer cake and Hendler claims that the “ontology story is still confused”, although OWL is to a large degree “succeeding as a KR standard”. Key challenges for Hendler include: relating linked data to ontologies, the equivalent of a database calculus for linked data, and the need for providing a means for evaluating reasoning with incomplete and possibly inconsistent data. UPDATE (13-6): Hendler’s slides are on slideshare.

Lars Backstrom, data scientist at Facebook, gave a keynote about analyzing FB data and working toward ranking and filtering news feeds by turning it into a classification problem using a set of properties (localization, relation to actor, and others). Interestingly, Backstrom emphasized that FB is moving toward more structured data, which makes it easier to manage and analyse with the algorithms they are developing. If that is a good thing or not is a separate discussion, especially regarding privacy issues, which was the talk of Abe Hsuan about (clearly, this does not hold only for FB but the web in general). According to Hsuan, “Privacy cannot exist on a lawless Semantic Web”. It was good for several after-talk discussions among the attendees, and the last word on how to deal with all this has not been said and done yet. In this context, someone may want to have a look at episode 3 of The virtual revolution documentary about non-free services on the Web, the TED-talk on The filter bubble, or the less recent Database nation book.

Andraz Tori, CTO of Zemanta, gave a keynote describing some background of the ‘writing help’, as offered by WordPress since recently, whilst trying to avoid wrong usage of it and cleaning up the data. As you may have guessed, I have not used that feature yet when writing my blog posts (and do not see the need for it from my perspective). Prasad Kantamneni from Yahoo! Gave an interactive keynote on HCI applied to the effects of different web interfaces for their search engines—and the consequences on revenue, which was lively and interesting. Seemingly ‘silly little things’ like putting the keyword in boldface in the search results makes a big difference on how a user scans through the results (more efficient), likewise auto-completion that in the end make you read more of the results page.

Last, but most certainly not least, Chris Welty gave the conference dinner keynote, which was entertaining. He described some hurdles they had overcome in building ‘Watson’, a sophisticated question answering engine that finds answers to trivia/general knowledge quizzes for the Jeopardy! game that, in the end, did consistently outperform the national human experts on it. The talk was filled with entertaining mistakes they encountered during the development of Watson, and what it required to fix them. The key message was that one cannot go in a linear fashion from natural language to knowledge management, but one has to use a integration of various technologies to make a successful ‘intelligent’ tool.

Sessions and other things

Normally I have a dense section on the papers presented in the session here, but due to the very busy conference schedule and shortage of free online papers before the conference, I did not get around reading all the papers that I would have liked (and I don’t cite papers I have not read, still roughly following my approach to conference blogging). The one on removing redundancy in ontologies presented by Jens Wissmann [1] was quite interesting, in particular for its creative reuse of computing justifications to remove ‘redundant’ axioms, i.e., those which can be derived from other knowledge represented in the ontology anyway. This was computationally costly, so they also developed another algorithm with better performance; details and experimental results can be found in the paper. My own paper [2] on the experiment of the use of foundational ontologies in ontology engineering was well-received, and generated quite some interest, such as on the quality of the foundational ontologies themselves and how the results presented could translate to their particular domain ontology scenario. I may add something on epistemic queries, computing generalizations, matching 4K ontologies in one year, and cross-lingual ontology mappings (provided I find the time to do so in the upcoming days).

The panel session about e- and open- Government was a bit meager and can be summarized as: Linked Open Data (LOD) is good and catching on well but the integration problems still exist, and we need (at least) structured controlled vocabularies to fix it.

I will close with an announcement that Alexander Garcia-Castro brought under my attention: there will be an “Ontologies come of Age in the Semantic Web” workshop co-located with ISWC’11.

References

[1] Stephan Grimm and Jens Wissmann. Elimination of redundancy in ontologies. In: Proceedings of the 8th Extended Semantic Web Conference (ESWC’11). Heraklion, Crete, Greece, 29 May – 2 June 2011. Springer LNCS 6643, 260-274.

[2] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. In: Proceedings of the 8th Extended Semantic Web Conference (ESWC’11). Heraklion, Crete, Greece, 29 May – 2 June 2011. Springer LNCS 6643, 321-335.

Outcome of the empirical assessment on the use of foundational ontologies in ontology development

In an earlier post, I described briefly an experiment I had carried out with 52 (novice) ontology developers who had developed 18 ontologies, 1/3 of whom had use a foundational ontology voluntarily, and whose ontologies were better than those who did not use a foundational ontology in domain ontology development. It being the first empirical experiment on this matter, the slightly shorter version of the tech report mentioned in that earlier blog post has been accepted as full paper at the 8th Extended Semantic Web Conference (ESWC’11).

The informal summary with some details were already introduced in the earlier post, so I will include only the abstract of the paper The use of foundational ontologies in ontology development: an empirical assessment here:

There is an assumption that ontology developers will use a top-down approach by using a foundational ontology, because it purportedly speeds up ontology development and improves quality and interoperability of the domain ontology. Informal assessment of these assumptions reveals ambiguous results that are not only open to different interpretations but also such that foundational ontology usage is not foreseen in most methodologies. Therefore, we investigated these assumptions in a controlled experiment. After a lecture about DOLCE, BFO, and part-whole relations, one-third chose to start domain ontology development with an OWLized foundational ontology. On average, those who commenced with a foundational ontology added more new classes and class axioms, and significantly less object properties than those who started from scratch. No ontology contained errors regarding part-of vs. is-a.

The comprehensive results show that the ‘cost’ incurred spending time getting acquainted with a foundational ontology compared to starting from scratch was more than made up for in size, understandability, and interoperability already within the limited time frame of the experiment.

The last thing has not been said about it though. E.g., is 1/3 few or a lot? It remains unclear why the participants preferred reusing DOLCE over BFO, and what the outcome will be if also much larger ontologies, such as Cyc or SUMO, were to be added to the options in a controlled experiment. Also, it may be interesting to see similar experiments with other lecturers and other types of participants, such as with non-computing domain experts with experience in modeling, or a longer time period than used for this experiment. Further, only preliminary suggestions were made how one may want to include the use of foundational ontologies in ontology development, which should be done both at the high-level steps in the development process—none includes something about that now—as well as methods for the actual modeling, where only OntoSpec makes a first attempt in that direction.

References

[1] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. 8th Extended Semantic Web Conference (ESWC’11). Heraklion, Crete, Greece, 29 May – 2 June 2011. Springer LNCS (in print).

Follow

Get every new post delivered to your Inbox.

Join 25 other followers

%d bloggers like this: