Archive for the ‘Reasoning’ Category

Logical and ontological reasoning services?

The SubProS and ProChainS compatibility services for OWL ontologies to check for good and ‘safe’ OWL object property expression [5] may be considered ontological reasoning services by some, but according others, they are/ought to be plain logical reasoning services. I discussed this issue with Alessandro Artale back in 2007 when we came up with the RBox Compatibility service [1]—which, in the end, we called an ontological reasoning service—and it came up again during EKAW’12 and the Ontologies and Conceptual Modelling Workshop (OCM) in Pretoria in November. Moreover, in all three settings, the conversation was generalized to the following questions:

  1. Is there a difference between a logical and an ontological reasoning service (be that ‘onto’-logical or ‘extra’-logical)? If so,
    1. Why, and what, then, is an ontological reasoning service?
    2. Are there any that can serve at least as prototypical example of an ontological reasoning service?

There’s still no conclusive answer on either of the questions. So, I present here some data and arguments I had and that I’ve heard so far, and I invite you to have your say on the matter. I will first introduce a few notions, terms, tools, and implicit assumptions informally, then list the three positions and their arguments I am aware of.

Some aspects about standard, non-standard, and ontological reasoning services

Let me first introduce a few ideas informally. Within Description Logics and the Semantic Web, a distinction is made between so-called ‘standard’ and ‘non-standard’ reasoning services. The standard reasoning services—which most of the DL-based reasoners support—are subsumption reasoning, satisfiability, consistency of the knowledge base, instance checking, and instance retrieval (see, e.g., [2,3] for explanations). Non-standard reasoning services include, e.g., glass-box reasoning and computing the least common subsumer, they are typically designed with the aim to facilitate ontology development, and tend to have their own plugin or extension to an existing reasoner. What these standard and non-standard reasoners have in common, is that they all focus on the (subset of first order predicate logic) logical theory only.

Take, on the other hand, OntoClean [4], which assigns meta-properties (such as rigidity and unity) to classes, and then, according to some rules involving those meta-properties, computes the class taxonomy. Those meta-properties are borrowed from Ontology in philosophy and the rules do not use the standard way of computing subsumption (where every instance of the subclass is also an instance of its super class and, thus, practically, the subclass has more or features or has the same features but with more constrained values/ranges). Moreover, OntoClean helps to distinguish between alternative logical formalisations of some piece of knowledge so as to choose the one that is better with respect to the reality we want to represent; e.g., why it is better to have the class Apple that has as quality a color green, versus the option of a class GreenObject that has shape apple-shaped. This being the case, OntoClean may be considered an ontological reasoning service. My SubProS and ProChainS [5] put constraints on OWL object property expressions so as to have safe and good hierarchies of object properties and property chains, based on the same notion of class subsumption, but then applied to role inclusion axioms: the OWL object sub-property (relationship, DL role) must be more constrained than its super-property and the two reasoning services check if that holds. But some of the flawed object property expressions do not cause a logical inconsistency (merely an undesirable deduction), so one might argue that the compatibility services are ontological.

The arguments so far

The descriptions in the previous paragraph contain implicit assumptions about the logical vs ontological reasoning, which I will spell out here. They are a synthesis from mine as well as other people’s voiced opinions about it (the other people being, among others and in alphabetical order, Alessandro Artale, Arina Britz, Giovanni Casini, Enrico Franconi, Aldo Gangemi, Chiara Ghidini, Tommie Meyer, Valentina Presutti, and Michael Uschold). It goes without saying they are my renderings of the arguments, and sometimes I state the things a little more bluntly to make the point.

1. If it is not entailed by the (standard, DL/other logic) reasoning service, then it is something ontological.

Logic is not about the study of the truth, but about the relationship of the truth of one statement and that of another. Effectively, it doesn’t matter what terms you have in the theory’s vocabulary—be this simply A, B, C, etc. or an attempt to represent Apple, Banana, Citrus, etc. conformant to what those entities are in reality—as it uses truth assignments and the usual rules of inference. If you want some reasoning that helps making a distinction between a good and a bad formalisation of what you aim to represent (where both theories are consistent), then that’s not the logician’s business but instead is relegated to the domain of whatever it is that ontologists get excited about. A counter-argument raised to that was that the early logicians were, in fact, concerned with finding a way to formalize reality in the best way; hence, not only syntax and semantics of the logic language, but also the semantics/meaning of the subject domain. A practical counter-example is that both Glimm et al [6] and Welty [7] managed to ‘hack’ OntoClean into OWL and use standard DL reasoners for it to obtain de desired inferences, so, presumably, then even OntoClean cannot be considered an ontological reasoning service after all?

2. Something ‘meta’ like OntoClean can/might be considered really ontological, but SubProS and ProChainS are ‘extra-logical’ and can be embedded like the extra-logical understanding of class subsumption, so they are logical reasoning services (for it is the analogue to class subsumption but then for role inclusion axioms).

This argument has to do with the notion of ‘standard way’ versus ‘alternative approach’ to compute something and the idea of having borrowed something from Ontology recently versus from mathematics and Aristotle somewhat longer ago. (note: the notion of subsumption in computing was still discussed in the 1980s, where the debate got settled in what is now considered the established understanding of class subsumption.) We simply can apply the underlying principles for class-subclass to one for relationships (/object properties/roles). DL/OWL reasoners and the standard view assume that the role box/object property expressions are correct and merely used to compute the class taxonomy only. But why should I assume the role box is fine, even when I know this is not always the case? And why do I have to put up with a classification of some class elsewhere in the taxonomy (or be inconsistent) when the real mistake is in the role box, not the class expression? Differently, some distinction seems to have been drawn between ‘meta’ (second order?), ‘extra’ to indicate the assumptions built into the algorithms/procedures, and ‘other, regular’ like satisfiability checking that we have for all logical theories. Another argument raised was that the ‘meta’ stuff has to do with second order logics, for which there are no good (read: sound and complete) reasoners.

3. Essentially, everything is logical, and services like OntoClean, SubProS, ProChainS can be represented formally with some clearly, precisely, formally, defined inferencing rules, so then there is no ontological reasoning, but there are only logical reasoning services.

This argument made me think of the “logic is everywhere” mug I still have (a goodie from the ICCL 2005 summer school in Dresden). More seriously, though, this argument raises some old philosophical debates whether everything can indeed be formalized, and provided any logic is fine and computation doesn’t matter. Further, it conflates the distinction, if any, between plain logical entailment, the notion of undesirable deductions (e.g., that a CarChassis is-a Perdurant [some kind of a process]), and the modeling choices and preferences (recall the apple with a colour vs. green object that has an apple-shape). But maybe that conflation is fine and there is no real distinction (if so: why?).

In my paper [5] and in the two presentations of it, I had stressed that SubProS and ProChainS were ontological reasoning services, because before that, I had tried but failed to convince logicians of the Type-I position that there’s something useful to those compatibility services and that they ought to be computed (currently, they are mostly not computed by the standard reasoners). Type-II adherents were plentiful at EKAW’12 and some at the OCM workshop. I encountered the most vocal Type-III adherent (mathematician) at the OCM workshop. Then there were the indecisive ones and people who switched and/or became indecisive. At the moment of writing this, I still lean toward Type-II, but I’m open to better arguments.

References

[1] Keet, C.M., Artale, A.: Representing and reasoning over a taxonomy of part-whole relations. Applied Ontology, 2008, 3(1-2), 91–110.

[2] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider (Eds). The Description Logics Handbook. Cambridge University Press, 2009.

[3] Pascal Hitzler, Markus Kroetzsch, Sebastian Rudolph. Foundations of Semantic Web Technologies. Chapman & Hall/CRC, 2009,

[4] Guarino, N. and Welty, C. An Overview of OntoClean. In S. Staab, R. Studer (eds.), Handbook on Ontologies, Springer Verlag 2009, pp. 201-220.

[5] Keet, C.M. Detecting and Revising Flaws in OWL Object Property Expressions. Proc. of EKAW’12. Springer LNAI vol 7603, pp2 52-266.

[6] Birte Glimm, Sebastian Rudolph, and Johanna Volker. Integrated metamodeling and diagnosis in OWL 2. In Peter F. Patel-Schneider, Yue Pan, Pascal Hitzler, Peter Mika, Lei Zhang, Jeff Z. Pan, Ian Horrocks, and Birte Glimm, editors, Proceedings of the 9th International Semantic Web Conference, volume 6496 of LNCS, pages 257-272. Springer, November 2010.

[7] Chris Welty. OntOWLclean: cleaning OWL ontologies with OWL. In B. Bennet and C. Fellbaum, editors, Proceedings of Formal Ontologies in Information Systems (FOIS’06), pages 347-359. IOS Press, 2006.

Fixing flaws in OWL object property expressions

OWL 2 DL is a very expressive language and, thanks to ontology developers’ persistent requests, has many features for declaring complex object property expressions: object sub-properties, (inverse) functional, disjointness, equivalence, cardinality, (ir)reflexivity, (a)symmetry, transitivity, and role chaining. A downside of this is that with the more one can do, the higher is the chance that flaws in the representation are introduced; hence, an unexpected or undesired classification or inconsistency may actually be due to a mistake in the object property box, not a class axiom. While there are nifty automated reasoners and explanation tools that help with the modeling exercise, the standard reasoning services for OWL ontologies assume that the axioms in the ‘object property box’ are correct and according to the ontologist’s intention. This may not be the case. Take, for instance, the following thee examples, where either the assertion is not according to the intention of the modeller, or the consequence may be undesirable.

  • Domain and range flaws; asserting hasParent \sqsubseteq hasMother instead of hasMother \sqsubseteq hasParent in accordance with their domain and range restrictions (i.e., a subsetting mistake—a more detailed example can be found in [1]), or declaring a domain or a range to be an intersection of disjoint classes;
  • Property characteristics flaws: e.g., the family-tree.owl (when accessed on 12-3-2012) has hasGrandFather \sqsubseteq  hasAncestor and Trans(hasAncestor) so that transitivity unintentionally is passed down the property hierarchy, yet hasGrandFather is really intransitive (but that cannot be asserted in OWL);
  • Property chain issues; for instance the chain hasPart \circ  hasParticipant \sqsubseteq  hasParticipant in the pharmacogenomics ontology [2] that forces the classes in class expressions using these properties—in casu, DrugTreatment and DrugGeneInteraction—to be either processes due to the domain of the hasParticipant object property, or they will be inconsistent.

Unfortunately, reasoner output and explanation features in ontology development environments do not point to the actual modelling flaw in the object property box. This is due to that implemented justification and explanation algorithms [3, 4, 5] consider logical deductions only and that class axioms and assertions about instances take precedence over what ‘ought to be’ concerning object property axioms, so that only instances and classes can move about in the taxonomy. This makes sense from a logic viewpoint, but it is not enough from an ontology quality viewpoint, as an object property inclusion axiom—being the property hierarchies, domain and range axioms to type the property, a property’s characteristics (reflexivity etc.), and property chains—may well be wrong, and this should be found as such, and corrections proposed.

So, we have to look at what type of mistakes can be made in object property expressions, how one can get the modeller to choose the ontologically correct options in the object property box so as to achieve a better quality ontology and, in case of flaws, how to guide the modeller to the root defect from the modeller’s viewpoint, and propose corrections. That is: the need to recognise the flaw, explain it, and to suggest revisions.

To this end, two non-standard reasoning services were defined [6], which has been accepted recently at the 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12): SubProS and ProChainS. The former is an extension to the RBox Compatibility Service for object subproperties by [1] so that it now also handles the object property characteristics in addition to the subsetting-way of asserting object sub-properties and covers the OWL 2 DL features as a minimum. For the latter, a new ontological reasoning service is defined, which checks whether the chain’s properties are compatible by assessing the domain and range axioms of the participating object properties. Both compatibility services exhaustively check all permutations and therewith pinpoint to the root cause of the problem (if any) in the object property box. In addition, if a test fails, one or more proposals are made how best to revise the identified flaw (depending on the flaw, it may include the option to ignore the warning and accept the deduction). Put differently: SubProS and ProChainS can be considered so-called ontological reasoning services, because the ontology does not necessarily contain logical errors in some of the flaws detected, and these two services thus fall in the category of tools that focus on both logic and additional ontology quality criteria, by aiming toward ontological correctness in addition to just a satisfiable logical theory. (on this topic, see also the works on anti-patterns [7] and OntoClean [8]). Hence, it is different from other works on explanation and pinpointing mistakes that concern logical consequences only [3,4,5], and SubProS and ProChainS also propose revisions for the flaws.

SubProS and ProChainS were evaluated (manually) with several ontologies, including BioTop and the DMOP, which demonstrate that the proposed ontological reasoning services indeed did isolate flaws and could propose useful corrections, which have been incorporated in the latest revisions of the ontologies.

Theoretical details, the definition of the two services, as well as detailed evaluation and explanation going through the steps can be found in the EKAW’12 paper [6], which I’ll present some time between 8 and 12 October in Galway, Ireland. The next phase is to implement an efficient algorithm and make a user-friendly GUI that assists with revising the flaws.

References

[1] Keet, C.M., Artale, A.: Representing and reasoning over a taxonomy of part-whole relations. Applied Ontology 3(1-2) (2008) 91–110

[2] Dumontier, M., Villanueva-Rosales, N.: Modeling life science knowledge with OWL 1.1. In: Fourth International Workshop OWL: Experiences and Directions 2008 (OWLED 2008 DC). (2008) Washington, DC (metro), 1-2 April 2008

[3] Horridge, M., Parsia, B., Sattler, U.: Laconic and precise justifications in OWL. In: Proceedings of the 7th International Semantic Web Conference (ISWC 2008). Volume 5318 of LNCS., Springer (2008)

[4] Parsia, B., Sirin, E., Kalyanpur, A.: Debugging OWL ontologies. In: Proceedings of the World Wide Web Conference (WWW 2005). (2005) May 10-14, 2005, Chiba, Japan.

[5] Kalyanpur, A., Parsia, B., Sirin, E., Grau, B.: Repairing unsatisfiable concepts in OWL ontologies. In: Proceedings of ESWC’06. Springer LNCS (2006)

[6] Keet, C.M. Detecting and Revising Flaws in OWL Object Property Expressions. 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW’12), Oct 8-12, Galway, Ireland. Springer, LNAI, 15p. (in press)

[7] Roussey, C., Corcho, O., Vilches-Blazquez, L.: A catalogue of OWL ontology antipatterns. In: Proceedings of K-CAP’09. (2009) 205–206

[8] Guarino, N., Welty, C.: An overview of OntoClean. In Staab, S., Studer, R., eds.: Handbook on ontologies. Springer Verlag (2004) 151–159

The rough ontology language rOWL and basic rough subsumption reasoning

Following the feasibility assessments on marrying Rough Sets with Description Logic languages last year [1,2], which I blogged about before, I looked into ‘squeezing’ into OWL 2 DL the very basic aspects of rough sets. The resulting language is called, rOWL, which is described in a paper [3] accepted at SAICSIT’11—the South African CS and IT conference (which thus also gives me the opportunity to meet the SA research community in CS and IT).

DLs are not just about investigating decidable languages, but, perhaps more importantly, also about reasoning over the logical theories.  The obvious addition to the basic crisp automated reasoning services is to add the roughness component, somehow. There are various ways to do that. Crisp subsumption (and definite and possible satisfiability) of rough concepts have been defined by Jiang and co-authors [4], and there was a presentation at DL 2011 about paraconsistent rough DL [5]. I have added the notion of rough subsumption.

There are two principal cases to consider (the “\wr ” before the OWL class name denotes it is a rough class):

  • If \wr C \sqsubseteq \wr D is asserted in the ontology, what can be said about the subsumption relations among their respective approximations?
  • Given a subsumption between any of the lower and upper approximations of C and D, then can one deduce \wr C \sqsubseteq \wr D ?

Addressing this raises questions: because being rough or not depends entirely on the chosen properties for C together with the available data, should these two cases be solved only at the TBox level or necessarily include the ABox for it to make sense? And should that be under the assumption of standard instantiation and instance checking, or in the presence of a novel DL notion of rough instantiation and rough instance checking?

These questions are answered in the second part of the paper Rough Subsumption Reasoning with rOWL [3]. In an attempt to make the proofs more readable and because the presence of instances is intuitively tied to the matter, the proofs are done by counterexample, which is relatively ‘easy’ to grasp. But maybe I should have obfuscated it with another proof technique to make the results look more profound.

Last, but not least: just in case you thought there is little motivation to bother with rough ontologies: the hypothesis testing and experimentation described in [2] still holds, and a small example is added to [3].

The succinct paper abstract is as follows:

There are various recent efforts to broaden applications of ontologies with vague knowledge, motivated in particular by applications of bio(medical)-ontologies, as well as to enhance rough set information systems with a knowledge representation layer by giving more attention to the intension of a rough set. This requires not only representation of vague knowledge but, moreover, reasoning over it to make it interesting for both ontology engineering and rough set information systems. We propose a minor extension to OWL 2 DL, called rOWL, and define the novel notions of rough subsumption reasoning and classification for rough concepts and their approximations.

I’ll continue looking into the topic, and more is in the pipeline w.r.t. the logic aspects of rough ontologies (in collaboration with Arina Britz).

References

[1] C. M. Keet. On the feasibility of description logic knowledge bases with rough concepts and vague instances. Proceedings of the 23rd International Workshop on Description Logics (DL’10), CEUR-WS, pages 314-324, 2010. 4-7 May 2010, Waterloo, Canada.

[2] C. M. Keet. Ontology engineering with rough concepts and instances. P. Cimiano and H. Pinto, editors, 17th International Conference on Knowledge Engineering and Knowledge Management (EKAW’10), volume 6317 of LNCS, pages 507-517. Springer, 2010. 11-15 October 2010, Lisbon, Portugal.

[3] C.M. Keet. Rough Subsumption Reasoning with rOWL. SAICSIT Annual Research Conference 2011 (SAICSIT’11), Cape Town, South Africa, October 3-5, 2011. ACM Conference Proceedings. (accepted).

[4] Y. Jiang, J. Wang, S. Tang, and B. Xiao. Reasoning with rough description logics: An approximate concepts approach. Information Sciences, 179:600-612, 2009.

[5] H. Viana, J. Alcantara, and A.T. Martins. Paraconsistent rough description logic. Proceedings of the 24th International Workshop on Description Logics (DL’11), 2011. Barcelona, Spain, July 13-16, 2011.

Nontransitive vs. intransitive direct part-whole relations in OWL

Confusing is-a with part-of is known to be a common mistake by novice ontology developers. Each time I taught the ontology engineering course, I had included a session of 1-2 hours to explain some basic aspects of part-whole relations and, lo and behold, none of the participants made that mistake in the labs or mini-projects! One awkward thing did pop-up there and at other occasions, though, which had to do with modelling direct parthood that does not go well at the moment, to say the least, for a plethora of reasons. Inclusion of direct parthood is not without philosophical quarrels, and the more I think of it, the more I dislike the relation, but somehow the issue appears often in the context of part-whole relations in ontologies. The observed underlying modelling issue—representing intransitivity versus nontransitivity—holds for any OWL object property anyway, so I will proceed with the general case with an example about giraffes.

Preliminaries

First of all, to clarify terms in the post’s title: INtransitive means that for all x, y, z, if Rxy and Ryz then Rxz does not hold; formally \forall x, y, z (R(x,y) \land R(y,z) \rightarrow \neg R(x,z) and an option to state this in a Description Logic is to use role chaining: R \circ R \sqsubseteq \neg R NONtransitive means that we cannot say either way if the property is transitive or intransitive, i.e., in some cases is may be transitive but not in other occasions. Direct parthood is to be understood as follows: if some part x is a direct part of a y, then there is no other object z such that x is a part of z and z is a part of y; formally, \forall x,y (dpo(x, y) \equiv \neg \exists z (partof(x,z) \land partof(z,y))) . If direct parthood is in- or non-transitive is beside the point at this stage, so let us look now at what happens with it in an OWL ontology when one tries to model it one way or another.

The OWL ontology and the reasoner

Given that I used the African Wildlife Ontology as a tutorial ontology earlier and the theme appeals to people, I will use it again here. Depending on what we do with the direct parthood relation in the ontology, Giraffe is, or is not, classified automatically as a subclass of Herbivore. Herbivore is a defined class, equivalent to, in Protégé 4.1 notation, (eats only plant) or (eats only (is-part-of some plant)), and Giraffe is a subclass of both Animal and eats only (leaf or Twig). Leaves are part of a twig, twigs of a branch, and branches of a tree that in turn is a subclass of plant. The is-part-of is, correctly according to mereology, included in the ontology as being transitive. Instead of all the is-part-of and is-proper-part-of between plant parts and plants in the AfricanWildlifeOntology1.owl, we model them using direct-part. AfricanWildlifeOntology4a.owl has direct-part as sister object property to is-part-of, AfricanWildlifeOntology4b.owl has it as sub-object property of is-part-of, and neither ontology has any “characteristics” (relational properties) checked for direct-part. Before running the reasoner to classify the taxonomy, what do you think will happen with our Giraffe in both cases?

In AfricanWildlifeOntology4a.owl, Giraffe is still a mere direct subclass of Animal, whereas with AfricanWildlifeOntology4b.owl, we do obtain the (desired) deduction that Giraffe is a Herbivore. That is, we obtain different results depending on where we put the uncharacterized direct-part object property in the RBox. Why is this so?

By not clicking the checkbox “transitive”, an object property is non­-transitive, but not in-transitive. In fact, we cannot represent explicitly that an object property is intransitive in OWL (see OWL guide and related documents). If we put the object property at the top level (or, as in Protégé 4.1, as immediate subproperty of topObjectProperty), then we obtain the behaviour as if the property were intransitive (and therefore Giraffe is not classified as a subclass of Herbivore). However, the direct-part property is really nontransitive in the ontology. When direct-part is put as subproperty of is-part-of, then it inherits the transitivity characteristic from is-part-of and therefore Giraffe is classified as a Herbivore (because now leaf and Twig are part of plant thanks to the transitivity).

Obviously, it holds for any OWL/OWL2 object property that one cannot assert intransitivity explicitly, that an object property’s characteristics are inherited to its subproperties, and this kind of behaviour of nontransitive object properties depends on where you place it in the RBox—whether you like it or not.

How to go forward?

Direct parthood is called isComponentOf in the componency ontology design pattern and is a subproperty of isPartOf. Its inverse is called haspart_directly in the W3C best practices document on Simple Part-Whole relations [1], and is a subproperty of the transitive haspart. The componency.owl notes that isComponentOf is “hasPart relation without transitivity”, the ODP page’s “intent” of the pattern is that it is intended to “represent (non-transitively) that objects either are proper parts of other objects, or have proper parts”, and the W3C best Practices note that, unlike mereological parthood, it is “not transitive”. Hence, if you include either one in your OWL ontology, you will not obtain the intended behaviour. Therefore, I do not recommend using either suggestion.

Setting aside the W3C’s best practices motivation for inclusion of haspart_directly—easier querying for immediate parts, but for the ontology purist this ought not to be the motivation for its inclusion—it is worth digging a little deeper into the semantics of the direct parthood. Maybe a modeller actually wants to represent collections with their members, like each Fleet has as direct parts more than one Ship, or constitution of objects, like clay is directly part of some vase? In both cases, however, we deal with meronymic part-whole relations, not mereological ones (see [2] and references therein); hence, they should not be subsumed by the mereological part-of relation anyway. They can be modelled as sister properties of the part-of relation and have the intended nontransitive behaviour as in, e.g., the pwrelations.owl ontology with a taxonomy of part-whole relations (that can be imported into the wildlife ontology).

Alternatively, there is always the option to choose a sufficiently expressive non-OWL language to represent the direct parthood and the rest of the subject domain and use one of the many first/second order theorem provers.

References

[1] Alan Rector and Chris Welty. Simple Part-Whole relations in OWL ontologies. W3C Editor’s draft, 11 August 2005.

[2] C. Maria Keet and Alessandro Artale. Representing and Reasoning over a Taxonomy of Part-Whole Relations. Applied Ontology, 2008, 3(1-2): 91-110.

Automating approximations in DL-Lite ontologies

As the avid keet blog reader or attendee to one of my ontology engineering courses may remember, I politely aired my frustration when one has an OWL 2 DL ontology that needs to be ‘slimmed’ to a DL-Lite (roughly: OWL 2 QL) one to make it useable for Ontology-Based Data Access (OBDA)—already since the experiment with the ontology/OBDA for disabilities [1]. This is a difficult and time-consuming exercise to do manually, especially when one has to go back and forward between the slimmed and expressive version of the ontology. Back in 2008, the difficulties were due both to a flaky Protégé 4.0-alpha and a mere syntactic approximation. Finally, things have improved and a preliminary semantic approximation is available [2] (and recently presented at AIMSA’10), which was developed by my colleagues at the KRDB Research centre.

Well, ok, only some aspects of the sound and complete approximations are addressed (more precisely: chains of existential role restrictions) and for DL-LiteA only, but they have been implemented already. The implementations are available in three forms: a Java API, a command line application suitable for batch approximations, and as a plug-in for Protégé 4.0. Note though, that the approximation algorithm is exponential, so with a large ontology it might take some time to simplify the expressive ontology. I did not test this myself yet, however, so if you have any comments or suggestions, please contact the authors of [2] directly. More is in the pipeline, and I am looking forward to more of such results—sure, this is with some self-interest: it will ease not only transparent, coordinated ontology management and development of ontology-driven information systems, but also facilitate implementation scenarios for rough ontologies [3].

References

[1] Keet, C.M., Alberts, R., Gerber, A., Chimamiwa, G. Enhancing web portals with Ontology-Based Data Access: the case study of South Africa’s Accessibility Portal for people with disabilities. Fifth International Workshop OWL: Experiences and Directions (OWLED’08). 26-27 Oct. 2008, Karlsruhe, Germany.

[2] Elena Botoeva, Diego Calvanese, and Mariano Rodriguez-Muro. Expressive Approximations in DL-Lite Ontologies. Proc. of the 14th Int. Conf. on Artificial Intelligence: Methodology, Systems, Applications (AIMSA’10). Sept 8-10, 2010, Varna, Bulgaria.

[3] Keet, C.M. Ontology engineering with rough concepts and instances17th International Conference on Knowledge Engineering and Knowledge Management (EKAW’10). 11-15 October 2010, Lisbon, Portugal. Springer LNAI 6317, 507-517.

Rough ontologies from an ontology engineering perspective

Somewhere buried in the blogpost about the DL’10 workshop, I mentioned the topic of my paper [1] at the 23rd International Description Logics Workshop (DL’10), which concerned the feasibility of rough DL knowledge bases. That paper was focussed on the theoretical assessment (result: there are serious theoretical hurdles for rough DL KBs) and had a rather short section where experimental results were crammed into the odd page (result: one can squeeze at least something out of the extant languages and tools, but more should be possible in the near future). More recently, my paper [2] submitted to the 17th International Conference on Knowledge Engineering and Knowledge Management (EKAW’10) got accepted, which focuses on the ontology engineering side of rough ontologies and therefore has a lot more information on how one can squeeze something out of the extant languages and tools; if that is not enough, there is also supplementary material that people can play with.

Ideally, they ought to go together in on paper to get a good overview at once, but there are page limits for conference papers and anyhow the last word has not been said about rough ontologies. For what it is worth, I have put the two together in the slides for the weekly KRDB Lunch Seminar that I will present tomorrow at, well, lunch hour in the seminar room on the first floor of the POS building.

References

[1] Keet, C. M. On the feasibility of Description Logic knowledge bases with rough concepts and vague instances. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp314-324.

[2] Keet, C. M. Ontology engineering with rough concepts and instances17th International Conference on Knowledge Engineering and Knowledge Management (EKAW’10). 11-15 October 2010, Lisbon, Portugal.  Springer LNCS.

From the Description Logics Workshop 2010, Waterloo

The 23rd International Workshop on Description Logics was held from 4-7 May at the University of Waterloo, in Canada. The full proceedings are online as one large pdf and as individual files for each paper, which contain the papers of the 29 oral presentations (including mine) and 14 posters. Unsurprisingly, the following brief report contains only a selection of the very latest research outcomes in the DL arena that passed the revue in the past 3 days.

Keynotes

Ian Horrocks’ keynote was about his quest to search for the “holy grail” and the lessons learned along the way. That is, he started his research with the problems of the GRAIL language and the too slow classification of the GALEN terminology. With much persistence and desire to solve the problems, eventually his FaCT reasoner managed to get the classification of GALEN core down from 24 hours to 400 seconds. The next steps were to extend the language and introduce optimizations to improve the performance (whereby careful study of typical inputs were crucial for successful optimization)—in an ongoing virtuous spiral. Moving on in the time line, the Semantic Web is, according to Horrocks, alike a “grand challenge” and “killer app” for DLs. Closing the presentation, OWL 2 DL finally contains all the features that GRAIL has (in particular role chaining), but the reasoners were still unable to classify GALEN (until Kazakov’s recent approach with consequence-driven reasoning that reduced it to < 10 seconds). So, while most papers that Horrocks wrote are not particularly written for (nor particularly readable according to) bio- and biomedical ontologists, they might find it nice to know that the base motivation comes from trying to solve the problems they brought in.

The keynote by Phokion Kolaitis was purely database-oriented and focused on schema mappings in the context of database integration (comprising the data federation and translation approaches) and schema evolution, which concerned a line of research originally motivated by the experiences obtained with the CLIO project. During the talk, the emphasis was on the composition and inverse operators and for the former the consequences of chaining different kinds of mappings (e.g., GAV + GAV, GAV + GLAV).
Unfortunately, I missed the keynote by Roberto Sebastiani due to the fuzzy notion of “nearby within walking distance” between the accommodation and the conference venue on the rather large and spacious campus.

Papers

The papers were grouped into sessions about theory, extensions, ontology, reasoning, EL, systems, querying, DL-Lite, OWL, and modules.

Extensions included, among others, complexity of temporal description logics in relation to temporal conceptual modelling and tractable reasoning (i.e., temporal extensions to the DL-Lite family that are the basis for the OWL 2 QL profile) [1], presented by Alessandro Artale. Other extensions, such as fuzzy, rough, and probabilistic, passed the revue in other sessions. For instance, using a probabilistic DL (that is, the option to represent defaults) for repairing TBoxes that was presented by Thomas Scharrenbach [2], approximate least common subsumer [3] by Anni-Yasmin Turham, and my paper in the ontologies section. My paper was about the feasibility of DL knowledge bases with rough concept or vague instances [4]—yes, or and not and, because there are both theoretical and practical limitations to have rough DL knowledge bases in their full glory even when we take into account only the basic aspects of rough sets. The upside is that several research lines on DL languages & tools on the interaction between ontologies and data (and the interest shown by reasoner developers, such as Volker Haarslev of RacerPro, in the experimentation) as well as other avenues, such as semantic scientific workflows, will be very useful to improve the situation so that the combination of ontologies and data can be used better for hypothesis testing to advance science at a faster pace.

Mariano Rodriguez presented a new case study of Ontology-Based Data Access in industry [5], which considers additional features of the system, such as dealing with incompleteness of the data and integrity constraints, and addressing performance issues by assessing the query structure better. Performance optimization was also a motivation for the query answering for expressive DLs by creating “islands” in the ABox [6] presented by Ralf Moeller, and for developing a scalable reasoner for OWL 2 EL and RL using Java and database technologies (MySQL), called OREL [7], presented by Sebastian Rudolph.

Two papers dealt with the topic of (ultimately) helping the modeller to figure out in the case when there is an inconsistency, why this is so. One paper dealt with the complexity of pinpointing (which is not great, as many a modeller who used Protégé 4.0-alpha) in the tractable DL-Lite [8], which was presented by Rafael Peñaloza, and the other one (presented by Matthew Horridge) was about masking the “irrelevant” parts of the justification so as to keep the explanation as short as possible [9]. Another requested feature is dealing with updates of the ontology, for which several strategies are possible, and one such approach for DL-lite ontologies [10] was presented by Dmitriy Zheleznyakov. Also modularization and extraction of sections of an ontology is a well-known request, and an empirical study was presented jointly by Chiara del Vescovo and Thomas Schneider discussing how well the algorithms work: full automated modularization does not look good from a practical perspective, and computing only some modules will be more feasible [11]. This is still fine, I think, because, generally, full modularization is not what the modelers are after anyway, but they only would want to have one or a few subsections extracted from the larger ontology. (In addition, one could use granularity to modularise a large ontology aside from letting one be guided solely by the syntactical features of the ontology.)

That’s it for this year’s DL workshop. DL’11 will be held in Barcelona (colocated with IJCAI’11).

References

[1] Alessandro Artale, Roman Kontchakov, Vladislav Ryzhikov and Michael Zakharyaschev. Temporal Conceptual Modelling with DL-Lite. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp9-19.
[2] Thomas Scharrenbach, Rolf Grütter, Bettina Waldvogel and Abraham Bernstein. Structure preserving TBox repair using defaults. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp384-395.
[3] Anni-Yasmin Turhan and Rafael Penaloza. Role-depth Bounded Least Common Subsumers by Completion for EL- and prob-EL-TBoxes. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp255-266.
[4] C. Maria Keet. On the feasibility of Description Logic knowledge bases with rough concepts and vague instances. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp314-324.
[5] Domenico Fabio Savo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Mariano Rodriguez-Muro, Vittorio Romagnoli, Marco Ruzzi and Gabriele Stella. Mastro at Work: Experiences on Ontology-Based Data Access. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp20-31.
[6] Sebastian Wandelt and Ralf Moeller. Distributed Island-based Query Answering for Expressive Ontologies. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp185-196.
[7] Markus Krotzsch, Anees Mehdi and Sebastian Rudolph. Orel: Database-Driven Reasoning for OWL 2 Profiles. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp114-124.
[8] Rafael Peñaloza and Baris Sertkaya. Complexity of Axiom Pinpointing in the DL-Lite Family. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp173-184.
[9] Matthew Horridge, Bijan Parsia and Ulrike Sattler. Justification Masking in OWL. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp32-42.
[10] Dmitriy Zheleznyakov, Diego Calvanese, Evgeny Kharlamov and Werner Nutt. Updating TBoxes in DL-Lite. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp102-113.
[11] Chiara Del Vescovo, Bijan Parsia, Ulrike Sattler and Thomas Schneider. The modular structure of an ontology: an empirical study. Proc. of DL’10, 4-7 May 2010, Waterloo, Canada. pp232-243.

Object-Role Modeling and Description Logics for conceptual modelling

Object-Role Modeling (ORM) is a so-called “true” conceptual modelling language in the sense that it is independent of the application scenario and it has been mapped into both UML class diagrams and ER [1]. That is, ORM and its successor ORM2 can be used in the conceptual analysis stage for database development, application software development, requirements engineering, business rules, and other areas [1-5]. If we can reason over such ORM conceptual data models, then we can guarantee the model (i.e., first order logic theory) is satisfiable and consistent, so that the corresponding application based on it definitely does behave correctly with respect to its specification (I summarised a more comprehensive argumentation and examples earlier). And, well, from a push-side: it widens the scope of possible scenarios where to use automated reasoners.

Various strategies and technologies are being developed to reason over conceptual data models to meet the same or slightly different requirements and aims. An important first distinction is between the assumption that modellers should be allowed to keep total freedom to model what they deem necessary to represent and subsequently put constraints on which parts can be used for reasoning or accept slow performance versus the assumption that it is better to constrain the language a priori to a subset of first order logic so as to achieve better performance and a guarantee that the reasoner terminates. The former approach is taken by Queralt and Teniente [6] using a dependency graph of the constraints in a UML Class Diagram + OCL and by first order logic (FOL) theorem provers. The latter approach is taken by [7-15] who experiment with different techniques. For instance, Smaragdakis et al and Kaneiwa et al [7-8] use special purpose reasoners for ORM and UML Class Diagrams, Cabot et al and Cadoli et al [9-10] encode a subset of UML class diagrams as a Constraint Satisfaction Problem, and [11-16] use a Description Logic (DL) framework for UML Class Diagrams, ER, EER, and ORM.

Perhaps not surprisingly, I also took the DL approach on this topic, on which I started working in 2006. I had put the principal version of the correspondence between ORM and the DL language DLRifd online on arXiv in February 2007 and got the discussion of the fundamental transformation problems published at DL’07 [15]. Admittedly, that technical report won’t ever win the beauty prize for its layout or concern for readability. In the meantime, I have corrected the typos, improved on the readability, proved correctness of encoding, and updated the related research with the recent works. On the latter, it also contains a discussion of a later, similar, attempt by others and the many errors in it. On the bright side: addressing those errors helps explaining the languages and trade-offs better (there are advantages to using a DL language to represent an ORM diagram, but also disadvantages). This new version (0702089v2), entitled “Mapping the Object-Role Modeling language ORM2 into Description Logic language DLRifd” [17] is now also online at arXiv.

As appetizer, here’s the abstract:

In recent years, several efforts have been made to enhance conceptual data modelling with automated reasoning to improve the model’s quality and derive implicit information. One approach to achieve this in implementations, is to constrain the language. Advances in Description Logics can help choosing the right language to have greatest expressiveness yet to remain within the decidable fragment of first order logic to realise a workable implementation with good performance using DL reasoners. The best fit DL language appears to be the ExpTime-complete DLRifd. To illustrate trade-offs and highlight features of the modelling languages, we present a precise transformation of the mappable features of the very expressive (undecidable) ORM/ORM2 conceptual data modelling languages to exactly DLRifd. Although not all ORM2 features can be mapped, this is an interesting fragment because it has been shown that DLRifd can also encode UML Class Diagrams and EER, and therefore can foster interoperation between conceptual data models and research into ontological aspects of the modelling languages.

And well, for those of you who might be disappointed that not all ORM features can be mapped: computers have their limitations and people have a limited amount of time and patience. To achieve ‘scalability’ of reasoning over initially large theories represented in a very expressive language, modularisation of the conceptual models and ontologies is one of the lines of research. But it is a separate topic and not quite close to implementation just yet.

References

[1] Halpin, T.: Information Modeling and Relational Databases. San Francisco: Morgan Kaufmann Publishers (2001)

[2] Balsters, H., Carver, A., Halpin, T., Morgan, T.: Modeling dynamic rules in ORM. In: OTM Workshops 2006. Proc. of ORM’06. Volume 4278 of LNCS., Springer (2006) 1201-1210

[3] Evans, K.: Requirements engineering with ORM. In: OTM Workshops 2005. Proc. of ORM’05. Volume 3762 of LNCS., Springer (2005) 646-655

[4] Halpin, T., Morgan, T.: Information modeling and relational databases. 2nd edn. Morgan Kaufmann (2008)

[5] Pepels, B., Plasmeijer, R.: Generating applications from object role models. In: OTM Workshops 2005. Proc. of ORM’05. Volume 3762 of LNCS., Springer (2005) 656-665

[6] Queralt, A., Teniente, E.: Decidable reasoning in UML schemas with constraints. In: Proc. of CAiSE’08. Volume 5074 of LNCS., Springer (2008) 281-295

[7] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable automatic test data generation from modeling diagrams. In: Proc. of ASE’07. (2007) 4-13

[8] Kaneiwa, K., Satoh, K.: Consistency checking algorithms for restricted UML class diagrams. In: Proc. of FoIKS ’06, Springer Verlag (2006)

[9] Cabot, J., Clariso, R., Riera, D.: Verification of UML/OCL class diagrams using constraint programming. In: Proc. of MoDeVVA 2008. (2008)

[10] Cadoli, M., Calvanese, D., De Giacomo, G., Mancini, T.: Finite model reasoning on UML class diagrams via constraint programming. In: Proc. of AI*IA 2007. Volume 4733 of LNAI., Springer (2007) 36-47

[11] Calvanese, D., De Giacomo, G., Lenzerini, M.: On the decidability of query containment under constraints. In: Proc. of PODS’98. (1998) 149-158

[12] Artale, A., Calvanese, D., Kontchakov, R., Ryzhikov, V., Zakharyaschev, M.: Reasoning over extended ER models. In: Proc. of ER’07. Volume 4801 of LNCS., Springer (2007) 277-292

[13] Jarrar, M.: Towards automated reasoning on ORM schemes–mapping ORM into the DLRidf Description Logic. In: ER’07. Volume 4801 of LNCS. (2007) 181-197

[14] Franconi, E., Ng, G.: The ICOM tool for intelligent conceptual modelling. In: Proc. of KRDB’00. (2000) Berlin, Germany, 2000.

[15] Keet, C.M.: Prospects for and issues with mapping the Object-Role Modeling language into DLRifd. In: Proc. of DL’07. Volume 250 of CEUR-WS. (2007) 331-338

[16] Berardi, D., Calvanese, D., De Giacomo, G.: Reasoning on UML class diagrams. Artificial Intelligence 168(1-2) (2005) 70-118

[17] Keet, C.M. Mapping the Object-Role Modeling language ORM2 into Description Logic language DLRifd. KRDB Research Centre, Free University of Bozen-Bolzano, Italy. 22 April 2009. arXiv:cs.LO/0702089v2.

Working towards WONDER Data

Duncan mentioned in a comment on his recent SciFoo invitation his “Google and the Semantic, Satanic, Romantic Web” post where he describes and summarises an encounter between the pro-Semantic Web Tim Berners-Lee and the ‘anti’-Semantic Web (or should I say realistic?) Peter Norvig, Director of Research at Google. I quote a relevant section here, with some changes in emphases:

Norvig: People are stupid: […] this is the world, imperfect and messy and we just have to deal with it. These same people can’t be expected to use the Resource Description Framework (RDF) and the Web Ontology Language (OWL), which are much more complicated and considerably less fool-proof. (Perhaps you could call this the dumb-antic web?!)

Berners-Lee: replied that a large part of the semantic web can be populated by taking existing relational databases and mapping them into RDF/OWL. The structured data is already there, it just needs web-izing in a mashup-friendly format. (What I like to call the romantic web: people will publish their data freely on the web this way, especially in e-science for example. This will allow sharing and re-use in unexpected ways.)

While Duncan looks at the openness of data, here I want to put the focus on the part in bold face: that you can reuse relational databases just like that and map them into RDF/OWL. Positively described, that is a romantic assumption; negatively, described, it is rather naïve and more painful to realise than it sounds. Well, if the database developers would have remained truthful to what they had learned during college, then it might have worked out to some extent at least. So let us for a moment ignore the issues of data duplication, violations of integrity constraints, hacks, outdated imports from other databases to fill a boutique database, outdated conceptual data models (if there was one), and what have you. Then it is still not trivial.

To do this ‘easy mapping’, one has to start over with the data analysis and add some new requirements analysis while one is at it. First, some data in the database (DB)—mathematically instances—actually are thought of by the users as concepts/universals/classes. For instance, the GO terms in the GO database are assumed to be representing universals and used to annotate instances in other tables of some database, and let us not go into the ontological status of species (as instances in the database of the NCBI taxonomy). Second, each tuple is assumed to denote an instance and, by virtue of key definitions, to be unique in that table, but such a tuple has values in each cell of the participating columns; however, those values are not objects that the OWL ABox is assuming to be dealing with (this is known under the term impedance mismatch). So, when we have divided the data in the DB into instances-but-actually-concepts-that-should-become-OWL-classes and real-instances-that-should-become-OWL-instances, we need to convert the real instances of the DB to objects in the ontology, where some function has to be used to convert (combinations of) values into objects proper.

For one experiment we are working on here at FUB, we have the HGT-DB with about 1.7 million genes of about 500 bacteria, and all sorts of data about each one of them (tables with 15-20 columns, some with instance data, some with type-level information like the function of the gene product). Try to load this data into Protégé’s ABox. Obviously, we do not; more about that further below.

What, you may ask, about reusing the physical DB schema and, if present, the conceptual data model (in ER, EER, UML, ORM, …)? A good question that more people have asked, i.e., lots of research has been done in that area, primarily under the banner of reverse engineering and extracting ‘ontologies’ from such schemas where it was noted that extra ‘ontology enrichment’ steps were necessary (see e.g. Lina Lubyte’s work). A fundamental problem with such reverse engineering is that, assuming there was a fully normalised conceptual data model, oftentimes denormalization steps have been carried out to flatten the database structure and improve performance, which, if simply reverse engineered, ends up in the ‘ontology’ as a class with umpteen attributes (one for each column). This is not nice and the automated reasoning one can do with it is minimal, if at all. Put differently: if we stick to a flat, subject domain semantics-poor, structure, then why bother with the automated reasoning machinery of the Semantic Web?

To mitigate this, one can redo the normalization steps to try to get some structure back into the conceptual view of the data or perhaps add a section of another ontology to brighten up the ‘ontology’ into an ontology (or, if you are lucky and there was a conceptual data model, to use that one). We did that for the HGT-DB’s conceptual model, manually; an early diagram and its import into Protégé are included in the appendix of [1].

In any case, having more structure in the ‘ontology’ than in the DB, one ends up defining multiple views in the DB, i.e., external ABox, where a part of a table has the instances of an OWL class. (How to do this in the OWL-ABox, I do not know—we have databases that are too large to squeeze into the knowledge base). In turn, this requires a mechanism to link persistently an OWL class to a SQL or SPARQL query over the DB. (One can argue if this DB should be the legacy relational database or an RDF-ized version of it; I ignore that debate for now.)

After doing all that, one has contributed the proverbial ‘2 cents’ that has cost you ‘blood, sweat and tears’ (maybe the latter is just Dutch idiom) to populating the Semantic Web.

But what can one really do with it?

The least one can do is to make querying the database easier so that users do not have to learn yet another query language. Earlier technologies in that direction were called query-by-diagram and conceptual queries, and a newer term for the same idea is called Ontology-Based Data Access (OBDA) that uses Semantic Web Technologies. Then one can add reasoner-facilitated query completion to guarantee that the user asks something that the system can answer (e.g. [2]). Having the reasoner anyway, one might as well use it for more sophisticated queries that are not easily, or not at all, possible with traditional database systems. One of them is using terms in the query for which there is no data in the database, of which several examples were described in a case study [3] (and summarised). For the HGT-DB, these are queries involving the species taxonomy and gene product functions.

Another useful addition with respect to the ‘legacy’ (well, currently operational) HGT-DB is that our domain experts, upon having seen the conceptual view of the database, came up with all sorts of other sample queries they were thinking of but where the knowledge was not yet explicitly represented in the ontology even though one can retrieve the data from the database. For instance, adjacent or nearby genes that are horizontally transferred, or clusters of such genes that are permitted to have a gap between them consisting of non-coding DNA or of a non-horizontally transferred gene. Put differently, one can do a sophisticated analysis of one’s data and unlock new information from the database by using the ontology-based approach. In our enthusiasm, we have called the experiment Web ONtology mediateD Extraction of Relational data (WONDER) for Web-based ObdA with the Hgt-db (WOAH!). We have the tools such as QuOnto for scalable reasoning and the OBDA Plugin for Protégé for management of the mappings between an OWL class, the SQL query over the database, and the transformation function (skolemization) from values to objects. The last step to make it all Web-usable—from a technical point of view, that is—is the Web-based ontology browser and graphical query builder. This interface is well down in the pipeline with a first working version sent out for review by our domain experts. One of them thought that it looked a bit simplistic; so perhaps we achieved more than we bargained for where the AI & engineering behind it did its work well—from a user-perspective, at least.

More automation of all those steps to get it working, however, will be a welcome addition from the engineering side. Until then, Norvig’s down to earth comment is closer to reality than Berners-Lee’s vision.

[1] R. Alberts, D. Calvanese, G. De Giacomo, A. Gerber, M. Horridge, A. Kaplunova, C. M. Keet, D. Lembo, M. Lenzerini, M. Milicic, R. Moeller, M. Rodríguez-Muro, R. Rosati, U. Sattler, B. Suntisrivaraporn, G. Stefanoni, A.-Y. Turhan, S. Wandelt, M. Wessel. Analysis of Test Results on Usage Scenarios. Deliverable TONES-D27 v1.0, Oct. 10 2008.

[2] Paolo Dongilli, Enrico Franconi (2006). An Intelligent Query Interface with Natural Language Support. FLAIRS Conference 2006: 658-663.

[3] Keet, C.M., Alberts, R., Gerber, A., Chimamiwa, G. Enhancing web portals with Ontology-Based Data Access: the case study of South Africa’s Accessibility Portal for people with disabilities. Fifth International Workshop OWL: Experiences and Directions (OWLED’08 ). 26-27 Oct. 2008, Karlsruhe, Germany.

Brief review of the Handbook of Knowledge Representation

The new Handbook of Knowledge Representation edited by Frank van Harmelen, Vladimir Lifschitz and Bruce Porter [1] is an important addition to the body of reference and survey literature. The 25 chapters cover the main areas in Knowledge Representation (KR), ranging from basic KR, such as SAT solvers, Description Logics, Constraint Programming, and Belief Revision, to specific core domains of knowledge, such as Spatial and Temporal KR & R, and Nonmonotonic Reasoning, to shorter ‘application’ chapters that touch upon the Semantic Web, Question Answering, Cognitive Robotics, and Automated Planning.

Each chapter roughly follows the approach of charting the motivation and problems the research area attempts to solve, the major developments in the area over the past 25 years, important achievements in the research, and where there is still work to do. In a way, each chapter is a structured ‘annotated bibliography’—many chapters have about 150-250 references each—that serve as an introduction and a high-level overview. This is useful, for instance, if your specific interests are not covered in a university course but have a thesis student and you would want him to work on that topic, then the appropriate chapter will be informative for the student not only to get an idea about it but also to have an entry point as to which further principal background literature to read; or you are a researcher writing a paper and do not want to put a Wikipedia URL in the references (yes, I’ve seen papers where authors had done that) but a proper reference; or you are, say, well-versed in DL-based reasoners, but come across a paper where one based on constraint programming is proposed and you want to have a quick reference to check what CP is about without ploughing through the handbook on constraint programming. Comparatively with the other topics, anyone interested in ‘something about time’ will be satisfied with the four chapters on temporal KR & R, situation calculus, event calculus, and temporal action logics. Clearly, the chapters in the handbook on KR are not substitutes for the corresponding “handbook on [topic-x]” books, but they do provide a good introduction and overview.

Some chapters are denser in providing a detailed overview than others (e.g., qualitative spatial reasoning vs. CP, respectively), however, and yet other chapters provide a predominantly text-based overview whereas others do include formalisms with precise definitions, other axioms, and theorems (Qualitative Modelling, Physical Reasoning, and Knowledge Engineering vs. most others, respectively). That most chapters do include some logic comes as no surprise for the KR researcher but may be for the novice or a searching ontology engineer. For the latter group, and logic-sceptics in general, there is a juicy section in chapter 1, “General Methods in Knowledge Representation and Reasoning”, called “Suitability of Logic for Knowledge Representation” that takes on the principal anti-logicist arguments and the about 6-page long rebuttal of each complaint. Another section that can be good for heated debates is Guus Schreiber’s (too) brief comment on the difference between “Ontologies and Data Models” (chapter 25), which easily can fill a few pages instead of the now less than half a page used for arguing there is a distinction between the two.

Although I warmly recommend the handbook as addition to the library, there are also a few shortcomings. One may have to do with the space limitations (even though the book is already over 1000 pages), whereas the other one might be due to the characteristics of research in KR & R itself (to some extent at least). They overlap with the kind of shortcomings Erik Sandewall has mentioned in his review of the handbook. Several topics that are grouped under KR are not, or very minimally, dealt with in the book (e.g., uncertainty and ontologies, respectively) or in a fragmented, isolated, way across chapters what perhaps should have been consolidated into a separate chapter (i.e., abstraction, but also ontologies). In addition, within the chapters, it may well occur that some subtopics are perceived to be missing from the overview or mentioned too briefly in passing (e.g., mereology and DL-Lite for scalable reasoning), but this also depends on one’s background. On the other hand, the chapters on Qualitative Modelling and Physical Reasoning could have been merged into one chapter.

The other point concerns the lack of elaboration on real life success stories as significant contribution of that topic that a KR novice or a specialised researcher venturing in another sub-topic may be looking for. However, the handbook charts the research progress in the respective fields, not the knowledge transfer from KR research output to the engineering areas where the theory is put to the test and implementations are tried out. It is a philosophical debate if doing science in KR should include testing one’s theories. To give an idea about this discrepancy, part III of the handbook is called “Knowledge Representation in Applications” (emphasis added), which contains a chapter, among five others, on “The Semantic Web: Webizing Knowledge Representation”. From a user perspective, including software engineers and the most active domain expert adopters (in biology and medicine), the Semantic Web is still largely a vision, but not yet a success story of applications—people experiment with implementations, but the fact that there are people willing to give it a try does not imply it is a success from their point of view. Put differently, it says more about the point of view of KR&R that it is already categorised under applications. True, as the editors note, one needs to build upon advances achieved in the base areas surveyed in parts I and II, but is it really ‘merely’ ‘applying’, or does the required linking of the different KR topics in these application areas bring about new questions and perhaps even solutions to the base KR topics? The six chapters in part III differ in the answer to this question—as in any healthy research field: there are many accomplishments, but much remains to be done.

[1] Frank van Harmelen, Vladimir Lifschitz and Bruce Porter (Eds.). Handbook of Knowledge Representation. Elsevier, 2008, 1034p. ISBN-13: 978-0-444-52211-5 / ISBN-10: 0-444-52211-5.

Follow

Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: