A useful abstract relational model and SQL path queries

Whilst visiting David Toman at the University of Waterloo during my sabbatical earlier this year, one of the topics we looked into was their experiments on whether their SQLP—SQL with path queries, extended from [1]—would be better than plain SQL in terms of time it takes to understand queries and correctness in writing them. Turned out (in a user evaluation) that it’s faster with SQLP whilst maintaining accuracy. The really interesting aspect in all this from my perspective, however, was the so-called Abstract Relational Model (ARM), or: the modelling side of things rather than making the querying easier, as the latter is made easier with the ARM. In simple terms, the ARM [1] is alike the relational model, but then with identifiers, which makes those path queries doable and mostly more succinct, and one can partition the relations into class-relationship-like models (approaching the look-and-feel of a conceptual model) or lump stuff together into relational-model-like models, as preferred. Interestingly, it turns out that the queries remain exactly the same regardless whether one makes the ARM look more relational-like or ontology-like, which is called “invariance under vertical partitioning” in the paper [2]. Given all these nice things, there’s now also an algorithm to go from the usual relational model to an ARM schema, so that even if one has legacy resources, it’s possible to bump it up to this newer technology with more features and ease of use.

Our paper [2] that describes these details (invariance, RM-to-ARM, the evaluation), entitled “The Utility of the Abstract Relational Model and Attribute Paths in SQL”, is being published as part of the proceedings of the 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW’18), which will be held in Nancy, France, in about two weeks.

This sort of Conceptual Model(like)-based Data Access (CoMoDA, if you will) may sound a bit like Ontology-Based Data Access (OBDA). Yes and No. Roughly, yes on the conceptual querying sort of thing (there’s still room for quite some hair splitting there, though); no regarding the conceptual querying sort of thing. The ARM doesn’t pretend to be an ontology, but easily has a reconstruction in a Description Logic language [3] (with n-aries! and identifiers!). SQLP is much more expressive than the union of conjunctive queries one can pose in a typical OBDA setting, however, for it is full SQL + those path queries. So, both the theory and technology are different from the typical OBDA setting. Now, don’t think I’m defecting on the research topics—I still have a whole chapter on OBDA in my textbook—but it’s interesting to learn about and play with alternative approaches toward solutions to (at a high level) the same problem of trying to make querying for information easier and faster.

 

References

[1] Borgida, A., Toman, D., Weddell, G.E. On referring expressions in information systems derived from conceptual modelling. Proc. of ER’16. Springer LNCS, vol. 9974, 183-197.

[2] Ma, W., Keet, C.M., Olford, W., Toman, D., Weddell, G. The Utility of the Abstract Relational Model and Attribute Paths in SQL. 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW’18). Springer LNAI. (in print). 12-16 Nov. 2018, Nancy, France.

[3] Jacques, J.S., Toman, D., Weddell, G.E. Object-relational queries over CFDInc knowledge bases: OBDA for the SQL-Literate. Proc. of IJCAI’16. 1258-1264 (2016)

Advertisement

An Ontology Engineering textbook

My first textbook “An Introduction to Ontology Engineering” (pdf) is just released as an open textbook. I have revised, updated, and extended my earlier lecture notes on ontology engineering, amounting to about 1/3 more new content cf. its predecessor. Its main aim is to provide an introductory overview of ontology engineering and its secondary aim is to provide hands-on experience in ontology development that illustrate the theory.

The contents and narrative is aimed at advanced undergraduate and postgraduate level in computing (e.g., as a semester-long course), and the book is structured accordingly. After an introductory chapter, there are three blocks:

  • Logic foundations for ontologies: languages (FOL, DLs, OWL species) and automated reasoning (principles and the basics of tableau);
  • Developing good ontologies with methods and methodologies, the top-down approach with foundational ontologies, and the bottom-up approach to extract as much useful content as possible from legacy material;
  • Advanced topics that has a selection of sub-topics: Ontology-Based Data Access, interactions between ontologies and natural languages, and advanced modelling with additional language features (fuzzy and temporal).

Each chapter has several review questions and exercises to explore one or more aspects of the theory, as well as descriptions of two assignments that require using several sub-topics at once. More information is available on the textbook’s page [also here] (including the links to the ontologies used in the exercises), or you can click here for the pdf (7MB).

Feedback is welcome, of course. Also, if you happen to use it in whole or in part for your course, I’d be grateful if you would let me know. Finally, if this textbook will be used half (or even a quarter) as much as the 2009/2010 blogposts have been visited (around 10K unique visitors since posting them), that would mean there are a lot of people learning about ontology engineering and then I’ll have achieved more than I hoped for.

UPDATE: meanwhile, it has been added to several open (text)book repositories, such as OpenUCT and the Open Textbook Archive, and it has been featured on unglue.it in the week of 13-8 (out of its 14K free ebooks).

Automatically finding the feasible object property

Late last month I wrote about the updated taxonomy of part-whole relations and claimed it wasn’t such a big deal during the modeling process to have that many relations to choose from. Here I’ll back up that claim. Primarily, it is thanks to the ‘Foundational Ontology and Reasoner enhanced axiomatiZAtion’ (FORZA) approach which includes the Guided ENtity reuse and class Expression geneRATOR (GENERATOR) method that was implemented in the OntoPartS-2 tool [1]. The general idea of the GENERATOR method is depicted in the figure below, which outlines two scenarios: one in which the experts perform the authoring of their domain ontology with the help of a foundational ontology, and the other one without a foundational ontology.

generator

I think the pictures are clearer than the following text, but some prefer text, so here goes the explanation attempt. Let’s start with scenario A on the left-hand side of the figure: a modeller has a domain ontology and a foundational ontology and she wants to relate class two domain classes (indicated with C and D) and thus needs to select some object property. The first step is, indeed, selecting C and D (e.g., Human and Heart in an anatomy ontology); this is step (1) in the Figure.

Then (step 2) there are those long red arrows, which indicate that somehow there has to be a way to deal with the alignment of Human and of Heart to the relevant categories in the foundational ontology. This ‘somehow’ can be either of the following three options: (i) the domain ontology was already aligned to the foundational ontology, so that step (2) is executed automatically in the background and the modeler need not to worry, (ii) she manually carries out the alignment (assuming she knows the foundational ontology well enough), or, more likely, (iii) she chooses to be guided by a decision diagram that is specific to the selected foundational ontology. In case of option (ii) or (iii), she can choose to save it permanently or just use it for the duration of the application of the method. Step (3) is an automated process that moves up in the taxonomy to find the possible object properties. Here is where an automated reasoner comes into the equation, which can step-wise retrieve the parent class, en passant relying on taxonomic classification that offers the most up-to-date class hierarchy (i.e., including implicit subsumptions) and therewith avoiding spurious candidates. From a modeller’s viewpoint, one thus only has to select which classes to relate, and, optionally, align the ontology, so that the software will do the rest, as each time it finds a domain and range axiom of a relationship in which the parents of C and D participate, it is marked as a candidate property to be used in the class expression. Finally, the candidate object properties are returned to the user (step 4).

While the figure shows only one foundational ontology, one equally well can use a separate relation ontology, like PW or PWMT, which is just an implementation variant of scenario A: the relation ontology is also traversed upwards and on each iteration, the base ontology class is matched against relational ontology to find relations where the (parent of the) class is defined in a domain and range axiom, also until the top is reached before returning candidate relations.

The second scenario with a domain ontology only is a simplified version of option A, where the alignment step is omitted. In Figure-B above, GENERATOR would return object properties W and R as options to choose from, which, when used, would not generate an inconsistency (in this part of the ontology, at least). Without this guidance, a modeler could, erroneously, select, say, object property S, which, if the branches are disjoint, would result in an inconsistency, and if not declared disjoint, move class C from the left-hand branch to the one in the middle, which may be an undesirable deduction.

For the Heart and Human example, these entities are, in DOLCE terminology, physical objects, so that it will return structural parthood or plain parthood, if the PW ontology is used as well. If, on the other hand, say, Vase and Clay would have been the classes selected from the domain ontology, then a constitution relation would be proposed (be this with DOLCE, PW, or, say, GFO), for Vase is a physical object and Clay an amount of matter. Or with Limpopo and South Africa, a tangential proper parthood would be proposed, because they are both geographic entities.

The approach without the reasoner and without the foundational ontology decision diagram was tested with users, and showed that such a tool (OntoPartS) made the ontology authoring more efficient and accurate [2], and that aligning to DOLCE was the main hurdle for not seeing even more impressive differences. This is addressed with OntoPartS-2, so it ought to work better. What still remains to be done, admittedly, is that larger usability study with the updated version OntoPartS-2. In the meantime: if you use it, please let us know your opinion.

 

References

[1] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM proceedings, pp569-578. Oct. 27 – Nov. 1, 2013, San Francisco, USA.

[2] Keet, C.M., Fernandez-Reyes, F.C., Morales-Gonzalez, A. Representing mereotopological relations in OWL ontologies with OntoPartS. 9th Extended Semantic Web Conference (ESWC’12), Simperl et al. (eds.), 27-31 May 2012, Heraklion, Crete, Greece. Springer, LNCS 7295, 240-254.

Conference notes from EKAW 2014

Yet another successful International Conference on Knowledge Engineering and Knowledge Management 2014 (EKAW’14) (in Linköping, Sweden) has just concluded. It was packed with three keynotes, long ans short presentations, posters and demo session, and related workshops and PhD symposium. Big thanks to Patrick Lambrix for the excellent local organisation, and to Stefan Schlobach and Krzysztof Janowicz for putting an interesting programme together! The remainder of the post touches upon some highlights.

Invited talks

The first keynote was by Pascal Hitzler, who talked about ontology design patterns (ODPs) for large-scale data interchange and discovery. He emphasised the need for principled use of ODPs, including the development of a theory of patterns concerning generic vs specific modelling patterns, developing pattern languages and tools, and understanding and formalising relationships between patterns. It sort of did set the tone, and ODPs were a recurring item of the conference. Oscar Corcho gave a reflective and very entertaining keynote on ontology engineering (slides on slideshare). Not to mention the language and tool wars (DL and Protégé won), are you an alpha (philosopher—one term a day), beta, gamma, delta, or epsilon (schema.org contributor), or a ‘savage’ in the brave little world of knowledge management? He identified five deadlocks on communicating the message to ‘the masses’ (ontology reuse, inferences, lightweight vs heavyweight, tooling, multilingualism) and four recommendations; the one missing being on what to do with multilingualism. A lively discussion followed, and references to some of the aspects raised were returning throughout the conference and probably will afterwards as well. The third keynote was by philosopher Arianna Betti, who was basically putting forward the question what we can give her for helping her in the digital humanities on tracking scientific ideas, as described in humanities texts, over time—toward a computational history of ideas. The view from outside in a way was describing some requirements for us and generated some brainstorming afterward, as it does not seem unfeasible to do. A brief handout with some more precise ideas on where models would fit is available via here twitter account (direct link).

Papers

Unlike in my PhD student years where I typically tried to read at least a third of the papers before going to the conference, I’ve gotten in the habit of selecting papers to read based on the titles and presentations, and I haven’t read yet the ones I’m mentioning now, but they seem worth mentioning anyway (obviously with my bias and interests, daily intake-capacity, and time constraints writing this the evening before departure in the very early morning).

Several people at UCT are looking into crowdsourcing, and there were two papers about that, being one using pay-as-you-go alignments [1] and one Protégé plugin linked to CrowdFlower for ontology development that despite the CrowdFlower costs, ended up to being cheaper than a few manual experts [2]. Somewhat related to that is Klink UM for extracting hierarchical and similarity relationships based on user feedback [3], and when we’re at it with relationships, there’s a paper on finding (improving) the semantics of relations, being DBpedia’s wikiPage wiki links [4], as well as how object properties are used in ontologies [5]. The latter discovered that object properties are used quite differently when using ODPs vs not using ODPs: the former more often reuses a property and constrains it in an axiom, the latter uses more subtyping and domain and range axioms, and the latter appears to be computationally more efficient (so there are some interesting trade-offs to look into). Other considerations in modelling included further works on anti-patterns with results from real knowledge base development [6]. Related to my own talk about the stuff ontology, was the paper on supply chains and traceability of datasets [7], which we possibly can combine in some way. The paper on clinical guidelines [8] will be passed on to one of my students, who’s trying to build one tailored to a low resource setting with less-skilled health workers, and we probably also will follow up on the study question generation paper [9] that used a knowledge base and template questions to generate natural language questions that the system also can answer, therewith automating to some extent interactive learning by the student. The latter also won the best demo award. The best paper award went to the paper on adaptive knowledge propagation in web ontologies [10].

The other activities

A conference would not be complete without some social event(s). There was even an extra social event the first evening: ice hockey, which was fun, not only because it was the first time I watched such a game in a stadium, but also because there’s a lot of action and it never gets dull, and to top it off, the Linköping team won. Really impressive was the ‘movie’ at Norrköping’s Visualiseringscenter, being the “cosmos 3D” interactive show narrated live by the centre’s director Prof. Anders Ynnerman. We were treated on a trip through space—navigating from the ISS to the outer boundary of the universe—that was all based on current data and scientific evidence. This was followed by a walk-and-play-around in the rest of the centre, and a tasty dinner where Patrick made a fun story out of the talking frog joke. As per usual, it was also a great opportunity to meet colleagues again, discuss, and plan follow-up research, as well as meeting new people and finally meeting others in person whom I only knew by papers. The next EKAW will be in 2016 Bologna, Italy (statistically less cold and dark than here, though the lights have their charm).

References

(note: in time, people will have their papers on their home pages; for now, most links are to the Springer version)

[1] I.F. Cruz, F. Loprete, M. Palmonari, C. Stroe and A. Taheri. Pay-As-You-Go Multi-user Feedback Model for Ontology Matching. EKAW’14. Springer LNAI 8876, 80-96.

[2] F. Hanika, G. Wohlgenannt and M. Sabou. The uComp Protégé Plugin: Crowdsourcing Enabled Ontology Engineering. EKAW’14. Springer LNAI 8876, 181-196.

[3] F. Osborne and E. Motta. Inferring Semantic Relations by User Feedback. EKAW’14. Springer LNAI 8876, 339-355.

[4] V. Presutti, S. Consoli, A.G. Nuzzolese, D.R. Recupero, A. Gangemi, I. Bannour and H. Zargayouna. Uncovering the Semantics of Wikipedia Pagelinks. EKAW’14. Springer LNAI 8876, 413-428.

[5] K. Hammar. Ontology Design Pattern Property Specialisation Strategies. EKAW’14. Springer LNAI 8876, 165-180

[6] V.K. Chaudhri, R. Katragadda, J. Shrager and M. Wessel. Inconsistency Monitoring in a Large Scientific Knowledge Base. EKAW’14. Springer LNAI 8876, 66-79

[7] M. Solanki and C. Brewster. A Knowledge Driven Approach towards the Validation of Externally Acquired Traceability Datasets in Supply Chain Business Processes. EKAW’14. Springer LNAI 8876, 503-518.

[8] V. Zamborlini, R. Hoekstra, M. da Silveira, C. Pruski, A. ten Teije and F. van Harmelen. A Conceptual Model for Detecting Interactions among Medical Recommendations in Clinical Guidelines: A Case-Study on Multimorbidity. EKAW’14. Springer LNAI 8876, 591-606.

[9] V.K. Chaudhri, P.E. Clark, A. Overholtzer and A. Spaulding. Question Generation from a Knowledge Base. EKAW’14. Springer LNAI 8876, 54-65

[10] P. Minervini, C. d’Amato, N. Fanizzi and F. Esposito. Adaptive Knowledge Propagation in Web Ontologies. EKAW’14. Springer LNAI 8876, 304-319.

Considering some stuff—scientifically

Yay, now I can say “I look into stuff” and actually be precise about what I have been working on (and get it published, too!), rather than just oversimplifying into vagaries about some of my research topics. The final title of the paper I settled on is not as funny as proposing a ‘pointless theory’ [1], though: it’s a Core Ontology of Macroscopic Stuff [2], which has been accepted at the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14).

The ‘stuff’, in philosophical terms, are those things that are in natural language indicated typically with mass nouns, being those things you can’t count other than in quantities, like gold, water, whipping cream, agar, milk, and so on. The motivation to look into that was both for practical and theoretical reasons. For instance, you are working in the food industry and thus have to be concerned with traceability of ingredients, so you will have to know which (bulk) ingredients originate from where. Then, if something goes wrong—say, an E. coli infection in a product for consumption—then it would be doable to find the source of the microbial contamination. Most people might not realize what happens in the production process; e.g., some quantity of milk comes from a dairy farm, and in the food processing plant, some components of a portion of the milk is separated into parts (whey separated from the cheese-in-the-making, fat for butter and the remainder buttermilk). To talk about parts and portions of such stuffs requires one to know about those stuffs, and how to model it, so there can be some computerized tracking system for swift responses.

On the theoretical side, philosophers were talking about hypothetical cases of sending molecules of mixtures to Venus and the Moon, which isn’t practically usable, in particular because it was glossing over some important details, like that milk is an emulsion and thus has a ‘minimum portion’ for it to remain an emulsion involving many molecules. Foundational ontologies, which I like for their modeling guidance, didn’t come to the rescue either; e.g., DOLCE has Amount of Matter for stuffs but stops there, BFO has none of it. Domain ontologies for food, but also in other areas, such as ecology and biomedicine, each have their own way of modelling stuff, be this by source, usage, or whatever, making things incompatible because several criteria are used. So, there was quite a gap. The core ontology of macroscopic stuff aims to bridge this gap.

This stuff ontology contains categories of stuff and is formalised in OWL. There are distinctions between pure stuff and mixtures, and differences among the mixtures, e.g., true solutions vs colloids among homogeneous mixtures, and solid heterogeneous mixtures vs. suspension among heterogeneous mixtures, and each one with a set of defining criteria. So, Milk is an Emulsion by its very essence, regardless if you want to assign it a role that it is a beverage (Envo ontology) or an animal-associated habitat (MEO ontology), Blood is a Sol (type of colloid), and (table) Sugar a StructuredPureStuff. A basic alignment of the relations involved is possible with the stuff ontology as well regarding granules, grains, and sub-stuffs (used in cyc and biotop, among others).

The ontology both refines the DOLCE and BFO foundational ontologies and it resolves the main type of interoperability issues with stuffs in domain ontologies, thereby also contributing to better ontology quality. To make the ontology usable, modelling guidelines are provided, with examples of inferences, a decision diagram, outline of a template, and illustrations solving the principal interoperability issues among domain ontologies (scroll down to the last part of the paper). The decision diagram, which also gives an informal idea of what’s in the stuff ontology, is depicted below.

Decision diagram to select the principal kind of stuff (Source: [2])

Decision diagram to select the principal kind of stuff (Source: [2])

You can access the stuff ontology on its own, as well as versions linked to DOLCE and BFO. I’ll be presenting it in Sweden at EKAW late November.

p.s.: come to think of it, maybe I should have called it smugly “a real ontology of substance”… (substance being another term used for stuff/matter)

References

[1] Borgo S., Guarino N., and Masolo C.. A Pointless Theory of Space Based On Strong Connection and Congruence, in L. Carlucci Aiello, J. Doyle (eds.), in Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning (KR’96), Morgan Kaufmann, Cambridge Massachusetts (USA), 5-8 November 1996, pp. 220-229.

[2] Keet, C.M. A Core Ontology of Macroscopic Stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI. (accepted)