I’ve been writing here about bits and pieces of the Data Mining OPtimization ontology (DMOP) before (modeling issues, reasoner performance), but there never was something about the whole setting. I’m happy to say that now there is, for the Semantic Web Journal paper about DMOP is in print now and its in-press version is online, waiting in the queue to be assigned a volume [1]. The ontology itself (v5.4) is freely accessible and downloadable in several formats from the dmo foundry.
The paper can be considered the new so-called ‘reference paper’ of the ontology: it describes the rationale, the non-trivial design choices, content, and its use. The abstract sums it up nicely:
The Data Mining OPtimization Ontology (DMOP) has been developed to support informed decision-making at various choice points of the data mining process. The ontology can be used by data miners and deployed in ontology-driven information systems. The primary purpose for which DMOP has been developed is the automation of algorithm and model selection through semantic meta-mining that makes use of an ontology-based meta-analysis of complete data mining processes in view of extracting patterns associated with mining performance.
To this end, DMOP contains detailed descriptions of data mining tasks (e.g., learning, feature selection), data, algorithms, hypotheses such as mined models or patterns, and workflows. A development methodology was used for DMOP, including items such as competency questions and foundational ontology reuse. Several non-trivial modeling problems were encountered and due to the complexity of the data mining details, the ontology requires the use of the OWL 2 DL profile.
DMOP was successfully evaluated for semantic meta-mining and used in constructing the Intelligent Discovery Assistant, deployed at the popular data mining environment RapidMiner.
As two more teasers to lift the veil a bit, the architecture of various related components is shown in the first figure below, and how it is integrated in the RapidMiner Intellgent Discovery Assistant is shown in the second figure.
Unfortunately, the paper is behind Elsevier’s paywall, but we’re free to distribute to individuals. (If only the copyright stuff question from Elsevier would have come some 1.5 month later, this would not have been the case—things have improved and there are addenda and whatnot so that apparently it could have been put in an institutional repository. But, alas, better next time.) More precisely, the ‘we’ are Agniezska Lawrynowicz (shared first author), Claudia d’Amato, Alexandros Kalousis, Phong Nguyen, Raul Palma, Robert Stevens, and Melanie Hilario, and I.If you use DMOP, experiment with it, or would like to contribute to its further development, please let us know.
References
[1] Keet, C.M., Lawrynowicz, A., d’Amato, C., Kalousis, A., Nguyen, P., Palma, R., Stevens, R., Hilario, M. The Data Mining OPtimization ontology. Web Semantics: Science, Services and Agents on the World Wide Web. in press. http://dx.doi.org/10.1016/j.websem.2015.01.001