Modelling issues and choices in the development of the Data Mining OPtimization ontology

The Data Mining OPtimization ontology (DMOP) is a sizeable ontology with about 600 classes, over 1000 subclass axioms, more than 100 object properties, 40 object sub-property axioms and about 10 property chains, and thus uses several SROIQ/OWL 2DL features. The ontology contains detailed knowledge represented about data mining tasks, algorithms, hypotheses (mined models or patterns), workflows, and data with its characteristics. Such detailed knowledge is required to meet its high-level aim: to support informed decision-making in the knowledge discovery process. While the ontology can be used as a reference by data miners, its primary purpose—at least, the main motivation why it was developed—is automation of algorithm and model selection that relies heavily on semantic meta-mining [1] (ontology-based meta-analysis where data mining experiments are conducted, annotated, and mined and analysed, and from that patterns are extracted about data mining performance). Unlike other data mining ontologies, DMOP helps proposing not just any set of valid workflows, but optimal workflows, thanks to all this detailed knowledge about data mining. (DMOP was developed in the EU FP7 e-lico project and is used in such a system that proposes relatively optimal workflows.)

DMOP’s development was no trivial exercise, however, and several modeling problems popped up that required use of OWL 2 DL features and started to stretch the recent performance improvements of the automated reasoners. A summary of the ontology and a description, discussion, and solution of those issues—or: the choices we made for version 5.3 of the ontology—is described in our OWLED’13 paper Modeling issues and choices in the Data Mining OPtimization Ontology [2], which was co-authored with Agnieszka Lawrynowicz (from uni of Poznan, who will present the paper at OWLED’13), Claudia d’Amato (uni of Bari), and Melanie Hilario (uni of Geneva, Axone, and e-lico coordinator).

The main issues we describe in the paper are about meta-modelling and punning, property chains, aligning DMOP to a foundational ontology, and qualities and attributes (and data properties). The meta-modelling topic arose primarily because of the ontological status of Algorithm: is it a class or an instance, and what are the consequences of modeling it either way? Generally, one would consider an algorithm to be an instance, and it can have zero or more implementations that are also instances. In addition, it can take types of inputs (data mining data sets) and outputs (data mining hypotheses), but one cannot assert an axiom that involves both an instance and a class other than instantiation (which is not applicable for an algorithm’s input and output).  In the end, we settled for OWL 2’s punning feature (for details and arguments, refer to the paper).

There is a brief section about property chains, its issues, and that they were resolved. A detailed description how this was done, as well as a generalization of and theoretical foundation for it, was described in my EKAW’12 paper [3] (there’s an informal introduction in an earlier blog post). There were chains that caused undesirable deductions, which are resolved in v5.3 of DMOP using the tests described in [3]. The chains themselves do not exceed the use of three object properties, i.e., two on the left-hand side of the inclusion, yet some nifty desirable inferences can be made now.

Linking DMOP to a foundational ontology does introduce several modelling issues besides the linking of DMOP classes and properties to the categories and relationship in the chosen foundational ontology. These include whether to import or to extend the foundational ontology (normally: import); whether the whole foundational ontology should be imported or only a relevant section of it (i.e., the need for module extraction); harmonize any expressiveness issues (e.g., the foundational ontology may be too expressive for the purpose of the domain ontology); and what to do with any possible differences in ‘modeling philosophies’ between the two ontologies (e.g., data properties). We ended up importing DOLCE-lite. Linking the data mining classes to DOLCE categories was performed manually, where most of them (like algorithm, software, strategy, task, and optimization problem) were asserted as subclasses of dolce:non-physical-endurant, and their characteristics and parameters are subclasses of dolce:abstract-quality.

A tricky representation issue concerns the ‘attributes’ of entities, such as that each FeatureExtractionAlgorithm has a transformation function that is either linear or non-linear. I’m skipping the arguments here in the blog post (it deserves its own one, and see also the paper), and I jump to the choices we made. Instead of using OWL’s data properties, we went for the ‘foundational ontology way’ of dealing with attributes, where an attribute is not a binary relation between a class and a data type, but an entity itself (subsumed by dolce:quality) that, in turn, is related to a space dolce:region. There is where DOLCE stops, but we needed the data types, so we added a data property hasDataValue from dolce:region to the data type anyType. A section of the ontology is depicted graphically in the next figure.

DMOPattr

A section of DMOP with a partial representation of DMOP’s ‘attributes’ (Source: [2]).

For instance, a ModelingAlgorithm has as quality exactly one LearningPolicy (so, LearningPolicy is a subclass of dolce:quality), this LearningPolicy has as quale exactly one abstract region Eager-Lazy, and that Eager-Lazy has as data value at most one anyType data type to record the value of the learning policy of a modeling algorithm. Although this is more cumbersome than with data properties, it makes the ontology much more reusable for a broader set of application scenarios. This comprehensive approach required quite some modeling effort: there are more than 40 DMOP classes made subclass of dolce:abstract-region, and Characteristic (with its 94 subclasses) and Parameter (with 42 subclasses) are subclasses of dolce:abstract-quality, and most are used in class expressions.

A few other choices are briefly mentioned in the paper.

Eventually, these and future improvements to DMOP are expected to pay off in the quality of the meta-miner so that it will compute better optimal workflows.

References

[1] Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A. Ontology-based meta-mining of knowledge discovery workflows. In: Meta-Learning in Computational Intelligence. Volume 358 of Studies in Computational Intelligence. Springer (2011) 273–315.

[2] Keet, C.M., Lawrynowicz, A., d’Amato, C., Hilario, M. Modeling issues and choices in the Data Mining OPtimisation Ontology. 8th Workshop on OWL: Experiences and Directions (OWLED’13), 26-27 May 2013, Montpellier, France. CEUR-WS vol xx (to appear).

[3] Keet, C.M.. Detecting and Revising Flaws in OWL Object Property Expressions. Proc. of EKAW’12. Springer LNAI vol 7603, pp2 52-266.

Release of the (beta version of the) foundational ontology library ROMULUS

With the increase on ontology development and networked ontologies, both good ontology development and ontology matching for ontology linking and integration are becoming a more pressing issue. Many contributions have been proposed in these areas. One of the ideas to tackle both—supposedly in one fell swoop—is the use of a foundational ontology. A foundational ontology aims to (i) serve as a building block in ontology development by providing the developer with guidance how to model the entities in a domain, and  (ii) serve as a common top-level when integrating different domain ontologies, so that one can identify which entities are equivalent according to their classification in the foundational ontology. Over the years, several foundational ontologies have been developed, such as DOLCE, BFO, GFO, SUMO, and YAMATO, which have been used in domain ontology development. The problem that has arisen now, is how to link domain ontologies that are mapped to different foundational ontologies?

To be able to do this in a structured fashion, the foundational ontologies have to be matched somehow, and ideally have to have some software support for this. As early as 2003, this issue as foreseen already and the idea of a “WonderWeb Foundational Ontologies Library” (WFOL) proposed, so that—in the ideal case—different domain ontologies can to commit to different but systematically related (modules of) foundational ontologies [1]. However, the WFOL remained just an idea because it was not clear how to align those foundational ontologies and, at the time of writing, most foundational ontologies were still under active development, OWL was yet to be standardised, and there was scant stable software infrastructure. Within the Semantic Web setting, the solvability of the implementation issues is within reach yet not realised, but their alignment is still to be carried out systematically (beyond the few partial comparisons in the literature).

We’re trying to solve these theoretical and practical shortcomings through the creation of the first such online library of machine-processable, aligned and merged, foundational ontologies: the Repository of Ontologies for MULtiple USes ROMULUS. This version contains alignments, mappings, and merged ontologies for DOLCE, BFO, and GFO and some modularized versions thereof, as a start. It also has a section on logical inconsistencies; i.e., entities that were aligned manually and/or automatically and seemed to refer to the same thing—e.g., a mathematical set, a temporal region—actually turned out not to be (at least from a logical viewpoint) due to other ‘interfering’ axioms in the ontologies. What one should be doing with those, is a separate issue, but at least it is now clear where the matching problems really are down to the nitty-gritty entity-level.

We performed a small experiment on the evaluation of the mappings (thanks to participants from DERI, Net2 funds, and Aidan Hogan), and we would like to have more feedback on the alignments and mappings. It is one thing that we, or some alignment tool, aligned two entities, another that asserting an equivalence ends up logically consistent (hence mapped) or inconsistent, and yet another what you think of the alignments, especially the ontology engineers. You can participate in the evaluation: you will get a small set of a few alignments at a time, and then you decide whether you agree, partially agree, or disagree with it, are unsure about it, or skip it if you have no clue.

Finally, ROMULUS also has a range of other features, such as ontology selection, a high-level comparison, browsing the ontology through WebProtégé, a verbalization of the axioms, and metadata. It is the first online library of machine-processable, modularised, aligned, and merged foundational ontologies around. A poster/demo paper [2] was accepted at the Seventh International Conference on Knowledge Capture (K-CAP’13), and papers describing details are submitted and in the pipeline. In the meantime, if you have comments and/or suggestions, feel free to contact Zubeida or me.

References

[1] Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A. Ontology library. WonderWeb Deliverable D18 (ver. 1.0, 31-12-2003). (2003) http://wonderweb.semanticweb.org.

[2] Khan, Z., Keet, C.M. Toward semantic interoperability with aligned foundational ontologies in ROMULUS. Seventh International Conference on Knowledge Capture (K-CAP’13), ACM proceedings. 23-26 June 2013, Banff, Canada. (accepted as poster &demo with short paper)