Posts Tagged ‘ontology development’

Quantitative results on pitfalls in ontologies

The amount of ontologies in the world is becoming large, and is increasing rapidly in amount and size of the ontologies. Such ontologies are only to some extent developed by ontologists/knowledge engineers and increasingly also by ‘novice modellers’ and domain experts. Does one do better than the other? When is an ontology ‘better’? What is the prevalence of pitfalls—potential errors or problems—in those ontologies? Which aspects are pitfalls?

I can go on with such general questions one would want to see answered, for the answers can help in designing better guidelines, methods, and methodologies, and therewith improving both on teaching ontology engineering and cracking the nut of what the ‘quality’ of an ontology really entails. To be sure, there has been done work in this direction, such as [1,2] for general modeling issues and authoring suggestions, methods [3,4], and structured cataloguing of “antipatterns” [5] and “pitfalls” [6,7]. The pitfall catalogue included 29 types of pitfalls (and growing), of which 21 are implemented in the OntOlogy Pitfall Scanner! (OOPS!). The nice thing of having such an automated pitfall detection tool for OWL ontologies, is that is offers the opportunity to obtain quantitative results on assumed quality of many ontologies (or, at least, on the presence of pitfalls that can be detected automatically), which has been noted to be lacking [8].

As you may have guessed already, we did exactly that—the ‘we’ being Mari Carmen Suárez Figueroa, María Poveda-Villalón, and I— with 406 OWL ontologies and we report the results in a recently accepted paper [9] at the 5th International Conference on Knowledge Engineering and Ontology Development (KEOD’13). We did not seek the answer to everything, but narrowed it down to the following questions and hypotheses (copied from the paper):

  1. A. What is the prevalence of each of those pitfalls in existing ontologies?
  2. B. To what extent do the pitfalls say something about quality of an ontology?

Question B is refined into two sub-questions:

  1. Which anomalies that appear in OWL ontologies are the most common?
  2. Are the ontologies developed by experienced developers and/or well-known or mature ontologies ‘better’ in some modelling quality sense than the ontologies developed by novices? This is refined into the following hypotheses:
    1. i. The prevalence and average of pitfalls is significantly higher in ontologies developed by novices compared to ontologies deemed established/mature.
    2. ii. The kind of pitfalls observed in novices’ ontologies differs significantly from those in well-known or mature ontologies.
    3. iii. The statistics on observed pitfalls of a random set of ontologies is closer to those of novices’ ontologies than the well-known or mature ones.
    4. iv. There exists a positive correlation between the detected pitfalls and the size or number of particular elements of the ontology.
    5. v. There exists a positive correlation between the detected pitfalls and the DL fragment of the OWL ontology.

The set of 406 ontologies we used in trying to answer these questions consists of three subsets, being 362 ontologies that were already scanned by OOPS! in the year until Oct 2012 (it was online available, so this can be considered a set of ‘random’ ontologies), 23 ontologies made by novices (students enrolled in an ontology engineering course), and 21 well-known ontologies that we assumed to be relatively mature (developed by ontologists, used applications, etc.), such as DOLCE, GFO, and GoodRelations. The ‘novices’ and ‘mature’ ontologies were scanned by OOPS! as well and also evaluated manually.

To make a long story short, I’ll go straight to the outcome (the materials & methods, data, statistical and qualitative analysis can be found in the paper and its supplementary material [9]). First, all 21 types of pitfalls that OOPS! scans for have been detected in the full set of 406 ontologies. The most common ones detected in the ontologies are the absence of annotations, declaring object properties but not their domain and range classes, and there are some issues with inverses. To a lesser extent, there are also issues with unconnected ontology elements and declaring a definition that is recursive. Second, with respect to the five hypotheses: the results falsify hypotheses (i), (ii), and (v), partially validate (iv) and validate (iii), where, regarding  (iv), for novices, the number of pitfalls/ontology does relate to the size and complexity of the ontology. Or, to put it bluntly: there are no striking differences between the sets of ‘novices’, ‘random’, and ‘mature’ ontologies, therewith providing a general landscape of pitfalls in ontologies.

This wasn’t quite what we had expected. We analysed the ‘novices’ and ‘mature’ ontologies in detail, and could find a few more candidate pitfalls as well as a few false positives (which, when taken into account, have an equalizing effect). Then, one has to ask whether the assumptions were valid. This we discuss in some detail in section 4 of the paper, and we could come up with several pros and cons. Based on the varied reviewers’ comments, I presume you’ll have your own opinion about it as well. Either way, the pitfall catalogue is being extended and the community may need to come up with a better way of defining what ‘maturity’ and ‘ontology quality’ means and how that relates to the presence of pitfalls in an ontology.

While it is certainly interesting and useful (imho) to have insight in the presence of pitfalls, the data and analysis also shows we need more quantitative data on the notion of ontology quality to better figure out what’s going on, so that it can inform the development of guidelines and methods for ontology development so that ontology development can be pushed from an art further into the realm of solid engineering underpinned with science.

References

[1] Noy, N. and McGuinness, D. (2001). Ontology development 101: A guide to creating your first ontology. TR KSL-01-05, Stanford Knowledge Systems Laboratory.

[2] Rector, A. et al. (2004). OWL pizzas: Practical experience of teaching OWL-DL: Common errors & common patterns. In Proc. of EKAW’04, volume 3257 of LNCS, pages 63–81. Springer.

[3] Keet, C. M. (2012). Detecting and revising flaws in OWL object property expressions. In Proc. of EKAW’12, volume 7603 of LNAI, pages 252–266. Springer.

[4] Guarino, N. and Welty, C. (2009). An overview of OntoClean. In Staab, S. and Studer, R., editors, Handbook on Ontologies, pages 201–220. Springer, 2 edition.

[5] Roussey, C., Corcho, O., and Vilches-Blázquez, L. (2009). A catalogue of OWL ontology antipatterns. In Proc. of K-CAP’09, pages 205–206.

[6] Poveda, M., Suárez-Figueroa, M.C., and Gómez-Pérez, A. (2010). Common pitfalls in ontology development. In Current Topics in Artificial Intelligence, CAEPIA 2009 Selected Papers, volume 5988 of LNAI, pages 91–100. Springer.

[7] Poveda-Villalón, M., Suárez-Figueroa, M. C., and Gómez- Pérez, A. (2012). Validating ontologies with OOPS! In Proc. of EKAW’12, volume 7603 of LNAI, pages 267–281. Springer.

[8] Vrandecic, D. (2009). Ontology evaluation. In Staab, S. and Studer, R., editors, Handbook on Ontologies, pages 293–313. Springer, 2nd edition.

[9] Keet, C.M., Suárez Figueroa, M.C., and Poveda-Villalón, M. (2013) The current landscape of pitfalls in ontologies. International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

 

Modelling issues and choices in the development of the Data Mining OPtimization ontology

The Data Mining OPtimization ontology (DMOP) is a sizeable ontology with about 600 classes, over 1000 subclass axioms, more than 100 object properties, 40 object sub-property axioms and about 10 property chains, and thus uses several SROIQ/OWL 2DL features. The ontology contains detailed knowledge represented about data mining tasks, algorithms, hypotheses (mined models or patterns), workflows, and data with its characteristics. Such detailed knowledge is required to meet its high-level aim: to support informed decision-making in the knowledge discovery process. While the ontology can be used as a reference by data miners, its primary purpose—at least, the main motivation why it was developed—is automation of algorithm and model selection that relies heavily on semantic meta-mining [1] (ontology-based meta-analysis where data mining experiments are conducted, annotated, and mined and analysed, and from that patterns are extracted about data mining performance). Unlike other data mining ontologies, DMOP helps proposing not just any set of valid workflows, but optimal workflows, thanks to all this detailed knowledge about data mining. (DMOP was developed in the EU FP7 e-lico project and is used in such a system that proposes relatively optimal workflows.)

DMOP’s development was no trivial exercise, however, and several modeling problems popped up that required use of OWL 2 DL features and started to stretch the recent performance improvements of the automated reasoners. A summary of the ontology and a description, discussion, and solution of those issues—or: the choices we made for version 5.3 of the ontology—is described in our OWLED’13 paper Modeling issues and choices in the Data Mining OPtimization Ontology [2], which was co-authored with Agnieszka Lawrynowicz (from uni of Poznan, who will present the paper at OWLED’13), Claudia d’Amato (uni of Bari), and Melanie Hilario (uni of Geneva, Axone, and e-lico coordinator).

The main issues we describe in the paper are about meta-modelling and punning, property chains, aligning DMOP to a foundational ontology, and qualities and attributes (and data properties). The meta-modelling topic arose primarily because of the ontological status of Algorithm: is it a class or an instance, and what are the consequences of modeling it either way? Generally, one would consider an algorithm to be an instance, and it can have zero or more implementations that are also instances. In addition, it can take types of inputs (data mining data sets) and outputs (data mining hypotheses), but one cannot assert an axiom that involves both an instance and a class other than instantiation (which is not applicable for an algorithm’s input and output).  In the end, we settled for OWL 2’s punning feature (for details and arguments, refer to the paper).

There is a brief section about property chains, its issues, and that they were resolved. A detailed description how this was done, as well as a generalization of and theoretical foundation for it, was described in my EKAW’12 paper [3] (there’s an informal introduction in an earlier blog post). There were chains that caused undesirable deductions, which are resolved in v5.3 of DMOP using the tests described in [3]. The chains themselves do not exceed the use of three object properties, i.e., two on the left-hand side of the inclusion, yet some nifty desirable inferences can be made now.

Linking DMOP to a foundational ontology does introduce several modelling issues besides the linking of DMOP classes and properties to the categories and relationship in the chosen foundational ontology. These include whether to import or to extend the foundational ontology (normally: import); whether the whole foundational ontology should be imported or only a relevant section of it (i.e., the need for module extraction); harmonize any expressiveness issues (e.g., the foundational ontology may be too expressive for the purpose of the domain ontology); and what to do with any possible differences in ‘modeling philosophies’ between the two ontologies (e.g., data properties). We ended up importing DOLCE-lite. Linking the data mining classes to DOLCE categories was performed manually, where most of them (like algorithm, software, strategy, task, and optimization problem) were asserted as subclasses of dolce:non-physical-endurant, and their characteristics and parameters are subclasses of dolce:abstract-quality.

A tricky representation issue concerns the ‘attributes’ of entities, such as that each FeatureExtractionAlgorithm has a transformation function that is either linear or non-linear. I’m skipping the arguments here in the blog post (it deserves its own one, and see also the paper), and I jump to the choices we made. Instead of using OWL’s data properties, we went for the ‘foundational ontology way’ of dealing with attributes, where an attribute is not a binary relation between a class and a data type, but an entity itself (subsumed by dolce:quality) that, in turn, is related to a space dolce:region. There is where DOLCE stops, but we needed the data types, so we added a data property hasDataValue from dolce:region to the data type anyType. A section of the ontology is depicted graphically in the next figure.

DMOPattr

A section of DMOP with a partial representation of DMOP’s ‘attributes’ (Source: [2]).

For instance, a ModelingAlgorithm has as quality exactly one LearningPolicy (so, LearningPolicy is a subclass of dolce:quality), this LearningPolicy has as quale exactly one abstract region Eager-Lazy, and that Eager-Lazy has as data value at most one anyType data type to record the value of the learning policy of a modeling algorithm. Although this is more cumbersome than with data properties, it makes the ontology much more reusable for a broader set of application scenarios. This comprehensive approach required quite some modeling effort: there are more than 40 DMOP classes made subclass of dolce:abstract-region, and Characteristic (with its 94 subclasses) and Parameter (with 42 subclasses) are subclasses of dolce:abstract-quality, and most are used in class expressions.

A few other choices are briefly mentioned in the paper.

Eventually, these and future improvements to DMOP are expected to pay off in the quality of the meta-miner so that it will compute better optimal workflows.

References

[1] Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A. Ontology-based meta-mining of knowledge discovery workflows. In: Meta-Learning in Computational Intelligence. Volume 358 of Studies in Computational Intelligence. Springer (2011) 273–315.

[2] Keet, C.M., Lawrynowicz, A., d’Amato, C., Hilario, M. Modeling issues and choices in the Data Mining OPtimisation Ontology. 8th Workshop on OWL: Experiences and Directions (OWLED’13), 26-27 May 2013, Montpellier, France. CEUR-WS vol xx (to appear).

[3] Keet, C.M.. Detecting and Revising Flaws in OWL Object Property Expressions. Proc. of EKAW’12. Springer LNAI vol 7603, pp2 52-266.

Release of the (beta version of the) foundational ontology library ROMULUS

With the increase on ontology development and networked ontologies, both good ontology development and ontology matching for ontology linking and integration are becoming a more pressing issue. Many contributions have been proposed in these areas. One of the ideas to tackle both—supposedly in one fell swoop—is the use of a foundational ontology. A foundational ontology aims to (i) serve as a building block in ontology development by providing the developer with guidance how to model the entities in a domain, and  (ii) serve as a common top-level when integrating different domain ontologies, so that one can identify which entities are equivalent according to their classification in the foundational ontology. Over the years, several foundational ontologies have been developed, such as DOLCE, BFO, GFO, SUMO, and YAMATO, which have been used in domain ontology development. The problem that has arisen now, is how to link domain ontologies that are mapped to different foundational ontologies?

To be able to do this in a structured fashion, the foundational ontologies have to be matched somehow, and ideally have to have some software support for this. As early as 2003, this issue as foreseen already and the idea of a “WonderWeb Foundational Ontologies Library” (WFOL) proposed, so that—in the ideal case—different domain ontologies can to commit to different but systematically related (modules of) foundational ontologies [1]. However, the WFOL remained just an idea because it was not clear how to align those foundational ontologies and, at the time of writing, most foundational ontologies were still under active development, OWL was yet to be standardised, and there was scant stable software infrastructure. Within the Semantic Web setting, the solvability of the implementation issues is within reach yet not realised, but their alignment is still to be carried out systematically (beyond the few partial comparisons in the literature).

We’re trying to solve these theoretical and practical shortcomings through the creation of the first such online library of machine-processable, aligned and merged, foundational ontologies: the Repository of Ontologies for MULtiple USes ROMULUS. This version contains alignments, mappings, and merged ontologies for DOLCE, BFO, and GFO and some modularized versions thereof, as a start. It also has a section on logical inconsistencies; i.e., entities that were aligned manually and/or automatically and seemed to refer to the same thing—e.g., a mathematical set, a temporal region—actually turned out not to be (at least from a logical viewpoint) due to other ‘interfering’ axioms in the ontologies. What one should be doing with those, is a separate issue, but at least it is now clear where the matching problems really are down to the nitty-gritty entity-level.

We performed a small experiment on the evaluation of the mappings (thanks to participants from DERI, Net2 funds, and Aidan Hogan), and we would like to have more feedback on the alignments and mappings. It is one thing that we, or some alignment tool, aligned two entities, another that asserting an equivalence ends up logically consistent (hence mapped) or inconsistent, and yet another what you think of the alignments, especially the ontology engineers. You can participate in the evaluation: you will get a small set of a few alignments at a time, and then you decide whether you agree, partially agree, or disagree with it, are unsure about it, or skip it if you have no clue.

Finally, ROMULUS also has a range of other features, such as ontology selection, a high-level comparison, browsing the ontology through WebProtégé, a verbalization of the axioms, and metadata. It is the first online library of machine-processable, modularised, aligned, and merged foundational ontologies around. A poster/demo paper [2] was accepted at the Seventh International Conference on Knowledge Capture (K-CAP’13), and papers describing details are submitted and in the pipeline. In the meantime, if you have comments and/or suggestions, feel free to contact Zubeida or me.

References

[1] Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A. Ontology library. WonderWeb Deliverable D18 (ver. 1.0, 31-12-2003). (2003) http://wonderweb.semanticweb.org.

[2] Khan, Z., Keet, C.M. Toward semantic interoperability with aligned foundational ontologies in ROMULUS. Seventh International Conference on Knowledge Capture (K-CAP’13), ACM proceedings. 23-26 June 2013, Banff, Canada. (accepted as poster &demo with short paper)

A new version of ONSET and more technical details are now available

After the first release of the foundational ONtology Selection and Explanation Tool ONSET half a year ago, we—Zubeida Khan and I—continued its development by adding SUMO, conducting a user evaluation, and we wrote a paper about it, which was recently accepted [1] at the 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12).

There are theoretical and practical reasons why using a foundational ontology improves the quality and interoperability of the domain ontology, be this by means of reusing DOLCE, BFO, GFO, SUMO, YAMATO, or another one, in part or in whole (see, e.g., [2,3] for some motivations). But as a domain ontology developer, and those who are potentially interested in using a foundational ontology in particular, do ask: which one of them would be best to use for the task at hand? That is not an easy question to answer, and hitherto required from a developer to pore over all the documentation, weighing the pros and cons for the scenario, make an informed decision, know exactly why, and be able to communicate that. This bottleneck has been solved with the ONSET tool. Or, at least: we claim it does, and the user evaluation supports this claim.

In short, ONSET, the foundational ONtology Selection and Explanation Tool helps the domain ontology developer in this task. Upon answering one or more questions and, optionally, adding any scaling to indicate some criteria are more important to you than others, it computes the most suitable foundational ontology for that scenario and explains why this is so, including reporting any conflicting answers (if applicable). The questions themselves are divided into five different categories—Ontology, representation language, software engineering properties, applications, and subject domain—and there are “explain” buttons to clarify terms that may not be immediately clear to the domain ontology developer. (There are a few screenshots at the end of this post.)

Behind the scenes is a detailed comparison of the features of DOLCE, BFO, GFO, and SUMO, and an efficient algorithm. The latter and the main interesting aspects of the former are included in the paper; the complete set of criteria is available in a file on the ONSET webpage. You can play with ONSET using your real or a fictitious ontology development scenario after downloading the jar file. If you don’t have a scenario and can’t come up with one: try one of the scenarios we used for the user evaluation (also online). The user evaluation consisted of 5 scenarios/problems that the 18 participants had to solve, half of them used ONSET and half of them did not. On average, the ‘accuracy’ (computed from selecting the appropriate foundatinal ontology and explaining why) was 3 times higher for those who used ONSET compared to those who did not. The ONSET users also did it slightly faster.

Thus, ONSET greatly facilitates in selecting a foundational ontology. However, I concede that from the Ontology (philosophy) viewpoint, the real research component is, perhaps, only beginning. Among others, what is the real effect of the differences between those foundational ontolgoies for ontology development, if any? Is one category of criteria, or individual criterion, always deemed more important than others? Is there one or more ‘typical’ combination of criteria, and if so, is there a single particular foundational ontology suitable, and if not, where/why are the current ones insufficient? In the case of conflicts, which criteria do they typically involve? ONSET clearly can be a useful aid investigating these questions, but answering them is left to future works. Either way, ONSET contributes to taking a scientific approach to comparing and using a foundational ontology in ontology development, and provides the hard arguments why.

We’d be happy to hear your feedback on ONSET, be this on the tool itself or when you have used it for a domain ontology development project. Also, the tool is very easy to extend thanks to the way it is programmed, so if you have your own pet foundational ontology that is not yet included in the tool, you may like to provide us with the values for the criteria so that we can include it.

Here are a few screenshots: of the start page, questions and an explanation, other questions, and the result (of a fictitious example):

Startpage of ONSET, where you select inclusion of additional questions that don’t make any difference right now, and where you can apply scaling to the five categories.

Section of the questions about ontological commitments and a pop-up screen once the related “Explain” button is clicked.

Another tab with questions. In this case, the user selected “yes” to modularity, upon which the tool expanded the question so that a way of modularisation can be selected.

Section of the results tab, after having clicked “calculate results” (in this case, of a fictitious scenario). Conflicting results, if any, will be shown here as well, and upon scrolling down, relevant literature is shown.

References

[1] Khan, Z., Keet, C.M. ONSET: Automated Foundational Ontology Selection and Explanation. 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12). Oct 8-12, Galway, Ireland. Springer, LNAI, 15p. (accepted)

[2] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. 8th Extended Semantic Web Conference (ESWC’11), G. Antoniou et al (Eds.), Heraklion, Crete, Greece, 29 May-2 June, 2011. Springer, Lecture Notes in Computer Science LNCS 6643, 321-335.

[3] Borgo, S., Lesmo, L. The attractiveness of foundational ontologies in industry. In: Proc. of FOMI’08, Amsterdam, The Netherlands, IOS Press (2008), 1-9.

A few notes on a successful ESWC’12 and OWLED’12

Slightly later than near-realtime due to flight delays, here are a few notes on the 9th Extended Semantic Web Conference ESWC’12 and OWL: Experiences and Directions OWLED’12, which I attended about two weeks ago in Crete, Greece.

ESWC’12

ESWC’12 was as selective as previous years, with, on average, a 25% acceptance rate. The proceedings are published by Springer; where applicable, I’ve linked the freely available versions in the references below. There’s also metadata and a list of award winners.

Main background picture of the ESWC’12 conference, with Cretan hills

Keynotes

I assume that, like last year, The keynotes have been put on the video lectures website; below follows a brief impression. for now, you’ll have to make do with a brief impression through my lenses.

Alon Halevy, head of structured data at Google, gave his keynote the morning after the social dinner (but the conference hall was full nevertheless). He entertains the perspective of Knowledge Representation and the Semantic Web as being “databases on steroids”. The talk’s topics were Google fusion tables with lightweight semantics that are intended as a “data management for the 99%” and Webtables, which was about a search for data tables on the Web, with as goal to have an easy to use database system that is integrated with the web. The work on web tables was alike a very large-scale attempt at bottom-up lightweight conceptual data model and ontology development. They crawled the Web for raw tables (14 billion), of which an estimated 154 million can pass for real relations (relations from the database viewpoint, with structured data, not using a html table for the layout of a page), which then ended up as 2.5 million schemas as recovered table/relation semantics. And then there’s Halevy’s enthusiasm about coffee.

Aleksander Kolcz from Twitter went over a few problems they are trying to solve at Twitter, such as the tweet relevance, who to follow, content recommendation, language, anti-spam, and user interest modeling. As small tidbit of data: there are 140 million users, 340 million tweets/day, and 2.3 billion search queries/day (i.e., 26K/sec.). Apparently, when one has enough, i.e., very large amounts, of data, simple models work “remarkably well” and ensembles of classifiers perform better in accuracy.

Abraham Bernstein’s keynote was about getting our act together in the semantic web research area and promoting the “garbage can theory” that was introduced by Cohen, March and Olsen in 1973: or, some ideas, theories, and tools are ‘thrown away’ into the garbage, where they can meet others, and combine so that something beautiful can come of it after all (this is my simplistic, shorthand version of it).

Unfortunately I missed the pre-conference keynote by Julius van der Laar because OWLED was still ongoing. By hearsay, I’ve heard it was a good/interesting one about what (sneaky) social media strategies the Obama campaign used in the previous presidential elections in 2008.

Papers

There were several tracks that ran in parallel, hence attendance was necessarily limited due to those logistic constraints. I’ve attended the ontologies, reasoning, semantic data management, digital libraries and cultural heritage, and in use sessions. The following pointers are based on my attendance of the presentations and partial reading of the papers.

Ontologies track. Yves Raimond from the BBC presented a query-driven evaluation framework for ontologies, defining their way of ‘good’ with respect to the task and data, and applied it to the music ontology (online slides), noting some room for improvements. The paper also has a neat brief overview of techniques for ontology evaluation [1]. I presented the paper co-authored with Francis Fernandez and Annette Morales on mereotopology and the OntoPartS tool that helps modellers to represent part-whole relations [2], which I introduced in an earlier post. OntoPartS was also presented at the demo session [3], which generated quite some interest among logicians and practitioners alike. Besides my ‘toy ontology’ examples to demonstrate the tool’s functionality, Martin Hepp had brought his GoodRelations ontology for e-commerce, which I thus used instead to illustrate adding part-whole relations to a real ontology. The demo session ended officially at 9pm, but it was after 10pm before I packed up my tablet.

Semantic data management track. Craig Knoblock and co-authors developed a system to link data to ontologies and preserve the linking in a so-called (logic-based) “source model” that is computed semi-automatically by taking as input the data, an ontology, some learned semantic types, and a refinement step by the user in a nice GUI [4]. This was evaluated with a set of bio-informatics resources, such as UniProt. The presentation by Lorena Etcheverry was a bit long on the intro, but the idea nice: enhancing OLAP analysis with ‘good enough’ temporary cubes generated from web sources, the introduction of a new vocabulary, Open Cubes, for the specification and publication of multidimensional cubes on the Semantic Web (which, unfortunately, the authors still have not shared online), and an algorithm for creating the SPARQL 1.1 query for rollup [5].

In use track. Michel Dumontier demonstrated an extension to the HyQue hypothesis formulator and evaluator, using rules sets using the SPARQL Inferencing Notation (SPIN) so that users can trace their hypothesis evaluation [6]. Stefan Scheglmann presented a paper on their efforts how to provide “programming access” to ontologies and have an accompanying tool OntoMDE, a model-driven engineering toolkit (which, however, does not seem to be online available, although a link was shown in the presentation, and I jotted down something on Eclipse plugins) [7]. StorySpace was put in the Digital Libraries and cultural heritage track, but could just as well have been in in-use: it is an environment for constructing and navigating stories, plots, and narratives, guided by the newly introduced curate ontology [8]. We’ll have to look at all that in more detail in the context of our IKMS development [9].

OWLED’12

The proceedings of OWLED’12 are available on CEUR-WS. Over 30 papers were submitted, so, the workshop ended up to be somewhat selective compared to previous years. 18 papers were presented, a keynote, and two tutorials. The following is, again, a selection of that (mainly due to my time constraints reading the papers and typing up something).

Mariano Rodriguez presented the ontopQuest system [10] for Ontology-Based Data Access, providing SPARQL query answering with OWL 2 QL/RDFS entailments.  It works with the so-called “classic ABox mode” with an internal relational database and in “virtual ABox mode”, and, unlike, say, QuOnto, it embeds most of the TBox semantics into the database by availing of a (also recently developed) semantic indexing technique. (Hopefully that’ll help my ontologies & knowledge bases students to answer the OBDA questions better next time, who ought to have read at least David Toman’s slides on the principal approaches to realize OBDA before the test.) Staying with reasoning, Dmitry Tsarkov presented the idea of using metareasoning that takes into account both the features of current reasoners and modularisation to come up with the ‘best’ reasoning strategy to answer a query over only that part of the ontology that is relevant for the query [11].

An extension to the OWLGrEd tool for modeling OWL ontologies through a UML-like interface was presented: the developers have added a ‘splitter’ to enable a user to decide which axioms to close (using the OWL + Integrity Constraints), then to send the serialization to the reasoner and display the inferences [12]. Pity that it works only with the commercial RDF database Stardog by Clark & Parsia. Bijan Parsia  presented—among other things—a paper on automatically generating analogy questions, which are widely used in multiple choice questions, and determining somehow their difficulty. The automated generation was facilitated by an ontology, and the initial results are promising [13]. I presented the paper on OWL requirements for indigenous knowledge management systems [9], about which I blogged earlier, as one of my co-authors, Ronell Alberts, was already presenting a paper based on her recently completed MSc thesis [14].

One of the tutorials was about modularity, which was presented by Chiara del Vescovo and Dmitry Tsarkov from Manchester University (see their modularity website for more info). The tutorial presented an overview of where modularity is useful, and how. Some of the reasons to modularise are to facilitate the explanation services, to perform incremental reasoning, semantic diff, and hotspot detection (= splitting an ontology into the simple and the complex part). That is, it presented a viewpoint on modularity as possible solution for the issues of (and the need for) scalability and performance of automated reasoning. Modularity and modularization during modeling and to reduce the so-called cognitive overload—i.e., involving some, or even driven by, subject domain semantics—was here (and is in most other DL-oriented outlets) apparently entirely outside the scope, which is a missed opportunity (more about that another time).

Typical tourist picture of the conference hotel (the view from my room wasn’t that great, but with the busy schedule, that didn’t matter anyway)

Aside from the stimulating papers and keynotes, and ensuing conversations with fellow researchers, it was great to meet people again and meet new people, and we had a lot of fun socialising. Now back to work so as to have shot at next year’s installment of ESWC in Montpellier, France (which is close to a village I used to go on holidays for some 8 years, many years ago).

References

[1] Raimond, Y., Sandler, M. Evaluation of the music ontology framework. ESWC’12, Springer LNCS vol 7295, 255-269.

[2] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. In: Proceedings of the 9th Extended Semantic Web Conference (ESWC’12), 29-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

[3] Morales-Gonzalez, A., Fernandez-Reyes, F.C., Keet, C.M. OntoPartS: a tool to select part-whole relations in OWL ontologies. 9th Extended Semantic Web Conference (ESWC’12), 29-31 May 2012, Heraklion, Crete, Greece. Demo with paper.

[4] Knoblock et al. Semi-automatically mapping structured sources into the semantic web. ESWC’12, Springer LNCS vol 7295, 375-390

[5] Etcheverry, L., Vaisman, A. A. Enhancing OLAP analysis with web cubes. ESWC’12, Springer LNCS vol 7295, 467-483.

[6] Callahan, A, Dumontier, M. Evaluating scientific hypotheses using the SPARQL inferecing notation. ESWC’12, Springer LNCS vol 7295, 647-658.

[7] Scheglmann, S. Scherp, A, Staab, S. Declarative Representation of Programming Access to Ontologies. ESWC’12, Springer LNCS vol 7295, 659-673.

[8] Mulholland, P., Wolff, A., and Collins, T. Curate and StorySpace: On ontology and Web-based environment for describing curatorial narrative. ESWC’12, Springer LNCS vol 7295, 748-762.

[9] Alberts, R., Fogwill, T., Keet, C.M. Several Required OWL Features for Indigenous Knowledge Management Systems. 7th Workshop on OWL: Experiences and Directions (OWLED 2012).  Klinov, P. and Horridge, M. (Eds.). 27-28 May, Heraklion, Crete, Greece. CEUR-WS Vol. 849.

[10] Rodriguez-Muro, M., Calvanese, D. Quest, an OWL 2 QL reasoner for ontology-based data access.  OWLED’12. CEUR-WS Vol. 849.

[11] Dmitry Tsarkov and Ignazio Palmisano, Divide et Impera: Metareasoning for Large Ontologies. OWLED’12. CEUR-WS Vol. 849.

[12] Kārlis Čerāns, Guntis Barzdins, Renārs Liepiņš, Jūlija Ovčiņnikova, Sergejs Rikačovs and Arturs Sprogis, Graphical Schema Editing for Stardog OWL/RDF Databases using OWLGrEd/S. OWLED’12. CEUR-WS Vol. 849.

[13] Tahani Alsubait, Bijan Parsia and Uli Sattler, Mining Ontologies for Analogy Questions: A Similarity-based Approach. OWLED’12. CEUR-WS Vol. 849.

[14] Ronell Alberts and Enrico Franconi, An integrated method using conceptual modelling to generate an ontology-based query mechanism. OWLED’12. CEUR-WS Vol. 849.

Lecture notes for the ontologies and knowledge bases course

The regular reader may recollect earlier posts about the ontology engineering courses I have taught at FUB, UH, UCI, Meraka, and UKZN. Each one had some sort of syllabus or series of blog posts with some introductory notes. I’ve put them together and extended them significantly now for the current installment of the Ontologies and Knowledge Bases Honours module (COMP718) at UKZN, and they are bound and printed into lecture notes for the enrolled students. These lecture notes are now online and I will add accompanying slides on the module’s webpage as we go along in the semester.

Given that the target audience is computer science students in their 4th year (honours), the notes are of an introductory nature. There are essentially three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, basics of Description Logics with ALC, all the DL-based OWL species, and some automated reasoning. The ontology engineering block covers top-down and bottom-up ontology development, and methods and methodologies, with top-down ontology development including mainly foundational ontologies and part-whole relations, and bottom-up the various approaches to extract knowledge from ‘legacy’ representations, such as from databases and thesauri. The advanced topics are balanced in two directions: one is toward ontology-based data access applications (i.e., an ontology-drive information system) and the other one has more theory with temporal ontologies.

Each chapter has a section with recommended/required reading and a set of exercises.

Unsurprisingly, the lecture notes have been written under time constraints and therefore the level of relative completeness of sections varies slightly. Suggestions and corrections are welcome!

The DiDOn method to develop bio-ontologies from semi-structured life science diagrams

It is well-known among (bio-)ontology developers that ontology development is a resource-consuming task (see [1] for data backing up this claim). Several approaches and tools do exists that speed up the time-consuming efforts of bottom-up ontology development, most notably natural language processing and database reverse engineering. They are generic and the technologies have been proposed from a computing angle, and are therefore noisy and/or contain many heuristics to make them fit for bio-ontology development. Yet, the most obvious one from a domain expert perspective is unexplored: the abundant diagrams in the sciences that function as existing/’legacy’ knowledge representation of the subject domain. So, how can one use them to develop domain ontologies?

The new DiDOn procedure—from Diagram to Domain Ontology—can speed up and simplify bio-ontology development by exploiting the knowledge represented in such semi-structured bio-diagrams. It does this by means of extracting explicit and implicit knowledge, preserving most of the subject domain semantics, and making formalisation decisions explicit, so that the process is done in a clear, traceable, and reproducible way.

DiDOn is a detailed, micro-level, procedure to formalise those diagrams in a logic of choice; it provides migration paths into OBO, SKOS, OWL and some arbitrary FOL, and guidelines which axioms, and how, have to be added to the bio-ontology. It also uses a foundational ontology so as to obtain more precise and interoperable subject domain semantics than otherwise would have been possible with syntactic transformations alone. (Choosing an appropriate foundational ontology is a separate topic and can be done wit, e.g., ONSET.)

The paper describing the rationale and details, Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn [2], has just been accepted at the Journal of Biomedical Informatics. They require a graphical abstract, so here it goes:

DiDOn consists of two principal steps: (1) formalising the ‘icon vocabulary’ of a bio-drawing tool, which then functions as a seed ontology, and (2) populating the seed ontology by processing the actual diagrams. The algorithm in the second step is informed by the formalisation decisions taken in the first step. Such decisions include, among others, the representation language and how to represent the diagram’s n-aries (with n≥2, such as choosing between n-aries as relationship or reified as classes).

In addition to the presentation of DiDOn, the paper contains a detailed application of it with Pathway Studio as case study.

The neatly formatted paper is behind a paywall for those with no or limited access to Elsevier’s journals, but the accepted manuscript is openly accessible from my home page.

References

[1] Simperl, E., Mochol, M., Bürger, T. Achieving maturity: the state of practice in ontology engineering in 2009. International Journal of Computer Science and Applications, 2010, 7(1):45-65.

[2] Keet, C.M. Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn. Journal of Biomedical Informatics. In print. DOI: http://dx.doi.org/10.1016/j.jbi.2012.01.004

First release of the foundational ONtology SElection Tool ONSET

It is well-known that there are theoretical and practical reasons why using a foundational ontology—such as DOLCE, BFO, GFO, SUMO—improve the quality and interoperability of the domain ontology, which recently also has been shown experimentally. However, it is also known that when one desires to use one, it is difficult to choose which one should be used, and why. Reading all the documentation, becoming familiar with the philosophical underpinnings, looking up what other ontology developers did in similar situation and so on, is a time-consuming task. This bottleneck has now been solved with ONSET.

ONSET, the foundational ONtology SElection Tool, does the hard work for you (download jar file). You answer one or more questions, and it will compute a suggestion based on the answers and your priorities, and it explains why the particular foundational ontology was selected. As usability is important, several “explain” buttons were added, in particular in the “ontology commitments” category. To increase a user’s confidence, ONSET not only simply selects a foundational ontology for you, but also explains why by relating it back to the answers the user chose, and it displays all (if any) request that was not met by the selected ontology. The rather basic main page of ONSET contains an example and links to the various versions of the three ontologies.

Zubeida Khan, a recently graduated (cum laude) BSc honours student I supervised, did most of the work to realise ONSET. She went painstakingly through some 50 publications to extract the features of the ontologies, by considering the ‘selling points’ from the side of the foundational ontology developers, assessing what motivates domain ontology developers of ongoing and completed ontology development projects to choose one over the other, and examined independent characteristics (such as the language in which it is available, modularity). A list was compiled consisting of foundational ontology parameters, and the values were filled in for each ontology (in the current version, they are BFO, DOLCE, and GFO). These values were subsequently verified by the respective foundational ontology developers. Zubeida then implemented it in ONSET (download jar file), following good software design practices and taking into account extensibility of the tool.

While ONSET makes it a lot easier for a domain ontology developer to select a foundational ontology, from the Ontology (philosophy) side of things, it, perhaps, raises more questions than it answers (which deserve attention, but not in this blog post).

Feedback is welcome!

Bottom-up ontology development using bio-diagrams

Development of (bio-)ontologies takes up a lot of resources, especially when conducted manually. This is a well-known hurdle to overcome, and various strategies and tools for bottom-up ontology development have been proposed from a computing angle, such as the reverse engineering of databases and, most prominently in the bio-ontologies area, natural language processing (NLP) (e.g. [1,2] and a review by [3]). Both, however, generate a rather crude, noisy, and simple ontology that requires substantial manual intervention to clean up and to add ‘missing’ knowledge. Nevertheless, NLP provides at least a set of terms one can start with instead of starting with an empty screen and adding everything de novo. There is, however, a way to have your cake and eat it too: exploiting the plentiful diagrams in the life sciences.

Diagrams are very important in biology, and from early on in the education, students are taught to read and draw them. There is even a rule of thumb that one should be able to understand an article by reading the abstract, conclusions, and diagrams alone. Diagrams also summarise the accompanying text, or even can tell more than what is explained in the text. That much from the biology side. They can be useful from the computing angle as well. They are at least semi-structured (compared to natural language), with conventions about depicting lipid bi-layers, DNA, sequences of interactions by means of arrows, and so forth, and over the years more and more drawing applications have been developed. The nice thing (still for computing) is that those tools have an ‘alphabet’—legend—with permissible icons and colours and how they can be used in the diagrams. There are many diagrams that represent our understanding of biological reality.

Now, imagine that those diagrams can be transferred into an ontology in one fell swoop, and subsequently used for whatever purpose ontologies are being used (such as annotation, consistency checking, and finding implicit knowledge). And because those diagrams are more structured than natural language, we can obtain a richer ontology than with NLP alone—with less effort.

How?

One thing is recognizing there’s much to be gained in improving bottom-up bio-ontology development by availing of such diagrams (already observed in [4]), another thing is how to go about doing this in the most effective way—not for just one diagram tool, but for any one. This problem I aim to tackle in the paper “Bottom-up ontology development reusing semi-structured life sciences diagrams”, which was recently accepted for the AFRICON’11 Special Session on Robotics and AI in Africa. This 6-page paper is a very condensed version of its 12-page draft, so not everything could be included. Nevertheless, it does give the basics of the method to formalize bio-diagrams in an ontology and a use case to demonstrate it.

The approach consists of a four-stage process: (i) choosing the appropriate language (OBO, SKOS, OWL, and arbitrary FOL are considered), (ii) inclusion of a foundational ontology (DOLCE, BFO, RO etc.), (iii) formalizing the icons of the diagram tool’s ‘legend’ (e.g., ‘enzyme’), and (iv) devising an algorithm to populate the TBox to mine the actual diagrams so that the individual components (e.g., ‘protease’) end up in the right position in the ontology. The main details are described in the paper.

Thus, this bottom-up method is not one of only formalising ‘legacy’ information, but also takes into account subject domain semantics that can be represented better by using a foundational ontology during the principal transformation of the diagram’s vocabulary. In addition to the more precise, formal, representation of the subject domain semantics, the use of a foundational ontology also increases interoperability.

The guidelines are demonstrated with a transformation of the Pathway Studio [6] diagrams into an OWLized (OWL 2 DL) bio-ontology with BFO and RO.

As an aside (from my perspective), it may be of interest to note that such formalized diagrams then can be deployed also as intermediate representation of the knowledge, which can facilitate understanding and communication between logicians and domain experts. And, for the financially challenged: it can bring the information modelled in such diagrams, which are often locked in expensive hardcopy textbooks and pay-per-view scientific articles, into the open access domain for free use and reuse.

References

[1] Alexopoulou D, Wachter T, Pickersgill L, Eyre C, Schroeder M. Terminologies for text-mining: an experiment in the lipoprotein metabolism domain. BMC Bioinformatics 2008;9(Suppl 4).

[2] Coulet A, Shah NH, Garten Y, Musen M, Altman RB. Using text to build semantic networks for pharmacogenomics. Journal of Biomedical Informatics 2010;43(6):1009-19.

[3] Liu K, Hogan WR, Crowley RS. Natural language processing methods and systems for biomedical ontology learning. Journal of Biomedical Informatics 2011;44(1):163-79.

[4] Keet CM. Factors affecting ontology development in ecology. In: Ludaescher B, Raschid L, editors. Data Integration in the Life Sciences 2005 (DILS2005); vol. 3615 of LNBI. Springer Verlag; 2005, p. 46-62. San Diego, USA, 20-22 July 2005.

[5] Keet CM. Bottom-up ontology development reusing semi-structured life sciences diagrams. AFRICON’11 — Special Session on Robotics and Artificial Intelligence in Africa, Livingstone, Zambia 13-15 September, 2011. IEEE (to appear).

[6] Nikitin A, Egorov S, Daraselia N, Mazo I. Pathway studio—the analysis and navigation of molecular networks. Bioinformatics 2003;19(16):2155-2157.

Outcome of the empirical assessment on the use of foundational ontologies in ontology development

In an earlier post, I described briefly an experiment I had carried out with 52 (novice) ontology developers who had developed 18 ontologies, 1/3 of whom had use a foundational ontology voluntarily, and whose ontologies were better than those who did not use a foundational ontology in domain ontology development. It being the first empirical experiment on this matter, the slightly shorter version of the tech report mentioned in that earlier blog post has been accepted as full paper at the 8th Extended Semantic Web Conference (ESWC’11).

The informal summary with some details were already introduced in the earlier post, so I will include only the abstract of the paper The use of foundational ontologies in ontology development: an empirical assessment here:

There is an assumption that ontology developers will use a top-down approach by using a foundational ontology, because it purportedly speeds up ontology development and improves quality and interoperability of the domain ontology. Informal assessment of these assumptions reveals ambiguous results that are not only open to different interpretations but also such that foundational ontology usage is not foreseen in most methodologies. Therefore, we investigated these assumptions in a controlled experiment. After a lecture about DOLCE, BFO, and part-whole relations, one-third chose to start domain ontology development with an OWLized foundational ontology. On average, those who commenced with a foundational ontology added more new classes and class axioms, and significantly less object properties than those who started from scratch. No ontology contained errors regarding part-of vs. is-a.

The comprehensive results show that the ‘cost’ incurred spending time getting acquainted with a foundational ontology compared to starting from scratch was more than made up for in size, understandability, and interoperability already within the limited time frame of the experiment.

The last thing has not been said about it though. E.g., is 1/3 few or a lot? It remains unclear why the participants preferred reusing DOLCE over BFO, and what the outcome will be if also much larger ontologies, such as Cyc or SUMO, were to be added to the options in a controlled experiment. Also, it may be interesting to see similar experiments with other lecturers and other types of participants, such as with non-computing domain experts with experience in modeling, or a longer time period than used for this experiment. Further, only preliminary suggestions were made how one may want to include the use of foundational ontologies in ontology development, which should be done both at the high-level steps in the development process—none includes something about that now—as well as methods for the actual modeling, where only OntoSpec makes a first attempt in that direction.

References

[1] Keet, C.M. The use of foundational ontologies in ontology development: an empirical assessment. 8th Extended Semantic Web Conference (ESWC’11). Heraklion, Crete, Greece, 29 May – 2 June 2011. Springer LNCS (in print).

Follow

Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: