First tractable encoding of ORM conceptual data models

For (relatively) many years I’ve been focusing on as-expressive-as-possible languages to represent information and knowledge, including the computationally impractical full first order logic, because one would/should want to be as precise as possible and required to represent the subject domain in an ontology and universe of discourse for the application in a conceptual data model. After all, one can always throw out the computationally unpleasant constructs later during the implementation stage, if the ontology or conceptual data model is intended for use at runtime, such as OBDA [1], test data generate for verification [2], and in the query compilation stage in RDBMSs [3]. The resulting slimmed theories/models may be different for different applications, but then at least the set of slimmed theories/models share their common understanding.

So, now I ventured in that area, not because there’s some logic x and conceptual modeling language y has to be forced into it, but it actually appears that many fancy construct/features are not used in publicly available conceptual data models anyway (see data set and xls with some analysis). The timing of the outcome of the analysis of the data set coincided with David Toman’s visit to UCT as part of his sabbatical and Pablo Fillottrani’s visit, who enjoyed the last exchange of our bi-lateral project on the unification of conceptual data modelling languages (project page). To sum up the issue we were looking at: the need for run-time usage of conceptual data models requires a tractable logic-based reconstruction of the conceptual models (i.e., in at most PTIME), which appeared to hardly exist or miss constructs important for conceptual models (regardless whether that was ORM, EER or UML Class Diagrams), or both.

The solution ended up to be a logic-based reconstruction for most of ORM2 using the \mathcal{CFDI}_{nc}^{\forall -} Description Logic, which also happens to be the first tractable encoding of (most of) ORM/ORM2. With this logic, several features important for conceptual models (i.e., occur relatively often) do have their proper encoding in the logic, notably n-aries, complex identification constraints, and n-ary role subsumption. The, admittedly quite tedious, mapping

Low resolution and small version of our DL15 poster summarising the contributions.

Low resolution and small version of our DL15 poster summarising the contributions.

captures over 96% of the constructs used in practice in the set of 33 ORM diagrams we analysed (see data set). Further, the results are easily transferable to EER and UML Class diagrams, with an even greater coverage. The results (and comparison with related works) are presented in our recently accepted paper at the 28th International Workshop on Description Logics (DL’15) that will take place form 7 to 11 June in Athens, Greece.

The list of accepted papers of DL’15 is available, listing 21 papers with long presentations, 16 papers with short presentation, and 26 papers with poster presentations. David will present our results in the poster session, as it’s probably of more relevance in the conceptual modelling community (and I’ll be marking exams then), and some other accepted papers cover more new ground, such as casting schema.org as a description logic, temporal query answering in EL, exact learning of ontologies, and more. The proceedings is will be online on CEUR-WS in the upcoming days as volume 1350. I’ve added a mini version of our poster on the right. I tried tikzposter, as they look really cool, but it doesn’t support figures (other than those made in latex), so I resorted to ppt (that doesn’t support math), wondering why these issues haven’t been solved by now.

Anyway, more about this topic is in the pipeline that I soon hope to be able to give updates on.

 

References

[1] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland. pp 1389-1396.

[2] Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compilation. Synthesis Lectures on Data Management, Morgan & Claypool  Publishers (2011)

[3] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking and test data generation from modeling diagrams. Automation in Software Engineering 16, 73–99 (2009)

[4] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in \mathcal{CFDI}_{nc}^{\forall -} . 28th International Workshop on Description Logics (DL’15). CEUR-WS vol xx., 7-10 June 2015, Athens, Greece.

Advertisement

Formalization of the unifying metamodel of UML, EER, and ORM

Last year Pablo Fillottrani and I introduced an ontology-driven unifying metamodel of the static, structural, entities of UML Class Diagrams (v2.4.1), ER, EER, ORM, and ORM2 in [1,2], which was informally motivated and described here. This now also includes the constraints and we have formalised it in First Order Predicate Logic to put some precision to the UML Class Diagram fragments and their associated textual constraints, which is described in the technical report of the metamodel formalization [3]. Besides having such precision for the sake of it, it is also useful for automated checking of inter-model assertions and computing model transformations, which we illustrated in our RuleML’14 paper earlier this year [4] (related blog post).

The ‘bummer’ of the formalization is that it probably requires an undecidable language, due to having formulae with five variables, counting quantifiers, and ternary predicates (see section 2.11 of the tech report for details). To facilitate various possible uses nevertheless, we therefore also made a slightly simpler OWL version of it (the modelling decisions are described in Section 3 of the technical report). Having that OWL version, it was easy to also generate a verbalisation of the OWL version of the metamodel (thanks to SWAT NL Tools) so as to facilitate reading of the ontology by the casually interested reader and the very interested one who doesn’t like logic.

Although our DST/MINCyT-funded South Africa-Argentina scientific collaboration project (entitled Ontology-driven unification of conceptual data modelling languages) is officially in its last few months by now, more results are in the pipeline, which I hope to report on shortly.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). 11-13 November, 2013, Hong Kong. Springer LNCS vol 8217, 313-326.

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS (in print).

[3] Fillottrani, P.R., Keet, C.M. KF metamodel formalization. Technical report, Arxiv.org http://arxiv.org/abs/1412.6545. Dec 19, 2014, 26p.

[4] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

Towards a metamodel for conceptual data modeling languages

There are several conceptual data modelling languages one can use to develop a conceptual data model that should capture the subject domain of the application area in an implementation-independent way. Complex software development may need to leverage the strengths of each language yet have the need for interoperability between the software components; e.g., an application layer object-oriented software design in a UML Class diagram that needs to be able to talk to the EER diagram for a relational database. Or one is at a state where there are already several conceptual data models for different applications, but they need to be integrated (or at least made compatible). For various reasons, each of these models may well be represented in a different language, such as in UML, EER, ORM, MADS, Telos etc. Superficially, these languages all seem quite similar, even though they are known to be distinct in ‘a few’ features, such as that UML Class Diagrams typically have methods, but EER does not.

To adequately deal with such scenarios, we need not a comparison of language features, but a unification to foster interoperability. However, no unifying framework exists that respects all of their language features. In addition, one may wonder about questions such as: where are the real commonalities ontologically, what is fundamentally different, and what is the same in underlying idea or meaning but only looks different on the surface? We—Pablo Fillottrani (with the Universidad Nacional Del Sur in Argentina) and I—aim to fill this gap.

As a first step, we designed a common, ontology-driven, metamodel[1] of the static, structural, components of ER, EER, UML v2.4.1, ORM, and ORM2, in such a way that each language is strictly a fragment of the encompassing metamodel. In the meantime, we also have developed the metamodel for the constraints, but for now, the results of the metamodel for the static, structural, components have been accepted at the 32nd International Conference on Conceptual Modeling (ER’13) [1] and 3rd International Conference on Model & Data Engineering (MEDI’13) [2]. There is no repetition among the papers; it merely has been split up into two papers because of page limitations of conference proceedings and the amount of results we had.

The ER’13 paper [1] presents an overview on the core entities and constraints, and an analysis on roles and relationships, their interaction with predicates, and attributes and value types (which we refine with the notion of dimensional attribute). The MEDI’13 paper [2] focuses more on all the structural components, and covers a discussion on classes/concepts, subsumption, aggregation, and nested entity types.

(Warning: spoiler alert…) Perhaps surprisingly, the intersection of all the features in the selected languages is rather small: role, relationship (including subsumption), and object type. The attributions—attributes, value types—are represented differently, but they aim to represent the same underlying idea of attributive properties, and several implicit aspects, such as dimensional attribute and its reusability and relationship versus predicate, have been made explicit. Regarding constraints, only disjointness, completeness, mandatory, object cardinality, and the subset constraint appear in the three language families. The two overview figures in the paper have the classes colour-coded to give an easier overview on how many of the elements are shared across languages, and the appendix contains a table/list of terminology across UML, EER and ORM2, like that UML’s “association” and EER’s “relationship” denote the same kind of thing.

This also received attention in the UKZN e-news letter here, which combined the announcement of the ER’13 paper with my participation in the Dagstuhl seminar on Reasoning over Conceptual Schemas last May, the DST/NRF funded South Africa – Argentina bi-lateral project on the unification of conceptual modelling languages, and Pablo’s visit to UKZN the previous two weeks.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). 11-13 November, 2013, Hong Kong. Springer LNCS (in print).

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS (in print).


[1] the term “metamodel” is used here in the common usage of the term in the literature, but, more precisely, we have a conceptual model that represents the entities and the constraints on their usage in a conceptual data model; e.g., it states that each relationship (association) is composed of at least two roles (association ends), that a nested entity type objectifies one relationship, that a multi-valued attribute is an attribute, and so on.

Book chapter on conceptual data modelling for biological data

My invited book chapter, entitled “Ontology-driven formal conceptual data modeling for biological data analysis” [1], recently got accepted for publication in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, edited by Mourad Elloumi and Albert Y. Zomaya, and is scheduled for printing by Wiley early 2012.

All this started off with my BSc(Hons) in IT & Computing thesis back in 2003 and my first paper about the trials and tribulations of conceptual data modelling for bio-databases [2] (which is definitely not well-written, but has some valid points and has been cited a bit). In the meantime, much progress has been made on the topic, and I’ve learned, researched, and published a few things about it, too. So, what is the chapter about?

The main aspect is the ‘conceptual data modelling’ with EER, ORM, and UML Class Diagrams, i.e., concerning implementation-independent representations of the data to be managed for a specific application (hence, not ontologies for application-independence).

The adjective ‘formal’ is to point out that the conceptual modeling is not just about drawing boxes, roundtangles, and lines with some adornments, but there is a formal, logic-based, foundation. This is achieved with the formally defined CMcom conceptual data modeling language, which has the greatest common denominator between ORM, EER, and UML Class Diagrams. CMcom has, on the one hand, a mapping the Description Logic language DLRifd and, on the other hand, mappings to the icons in the diagrammatic languages. The nice aspect of this it that, at least in theory and to some extent in practice as well, one can subject it to automated reasoning to check consistency of the classes, of the whole conceptual data model, and derive implicit constraints (an example) or use it in ontology-based data access (an example and some slides on ‘COMODA’ [COnceptual MOdel-based Data Access], tailored to ORM and the horizontal gene transfer database as example).

Then there is the ‘ontology-driven’ component: Ontology and ontologies can aid in conceptual data modeling by providing solution to recurring modeling problems, an ontology can be used to generate several conceptual data models, and one can integrate (a section of) an ontology into a conceptual data model that is subsequently converted into data in database tables.

Last, but not least, it focuses on ‘biological data analysis’. A non-(biologist or bioinformatician) might be inclined to say that should not matter, but it does. Biological information is not as trivial as the typical database design toy examples like “Student is enrolled in Course”, but one has to dig deeper and figure out how to represent, e.g., catalysis, pathway information, the ecological niche. Moreover, it requires an answer to ‘which language features are ‘essential’ for the conceptual data modeling language?’ and if it isn’t included yet, how do we get it in? Some of such important features are n-aries (n>2) and the temporal dimension. The paper includes a proposal for more precisely representing catalysis, informed by ontology (mainly thanks to making the distinction between the role and its bearer), and shows how certain temporal information can be captured, which is illustrated by enhancing the model for SARS viral infection, among other examples.

The paper is not online yet, but I did put together some slides for the presentation at MAIS’11 reported on earlier, which might serve as a sneak preview of the 25-page book chapter, or you can contact me for the CRC.

References

[1] Keet, C.M. Ontology-driven formal conceptual data modeling for biological data analysis. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data. Mourad Elloumi and Albert Y. Zomaya (Eds.). Wiley (in print).

[2] Keet, C.M. Biological data and conceptual modelling methods. Journal of Conceptual Modeling, Issue 29, October 2003. http://www.inconcept.com/jcm.

Identification and keys in UML Class Diagrams

ER, its extended version EER, ORM, and it extended version ORM2 have several options for identification with keys/reference schemes. UML Class Diagrams, on the other hand, have internal, system-generated, identifiers, with a little-known and underspecified option for user-defined identifiers inspired by ER’s keys, whose description is buried in the standard on pp290-293 [1]. Although UML’s ease of system-generated identifiers relieves the burden of detailed conceptual analysis by the modeller, it is exactly making implicit subject domain semantics explicit that is crucial in the analysis stage; or: less analysis during the modelling stage stores up more problems down the road in terms of software bugs and interoperability. A uniform, or at least a structured and unambiguous approach, can reduce or even avoid inconsistencies in a conceptual data model, especially concerning taxonomies, achieve interoperability through less resource-consuming information integration, and if the identification mechanisms are the same across conceptual data modeling language, then unification among them comes a step closer. The present lack of harmonisation in handling identification hampers all this.

But which identification mechanism(s) could, or should, be specified more precisely for UML, and how does it rhyme with progress about identity in Ontology? How to incorporate it in UML to foster consistent usage? For instance, should the procedure to find and represent identity, or at least good keys, be a step in the modelling methodology, also be enforced in the CASE tool, or be part of the metamodel and/or in the conceptual modelling language itself?

The distinct implicit assumptions and explicit formalisations of extant identification schemes are elucidated in [2] for ER, EER, ORM, and ORM2. As it appears, not even ER/EER and ORM/ORM2 agree fully on keys and reference schemes, neither from a formal perspective (though the difference is minimal), nor from a methodological or CASE tool perspective. Both, however, do aim at strong or weak identification of entities with so-called natural keys (/semantic identifiers) in particular.

At the other end of the spectrum, Ontology does have something on offer, but the philosophers are only interested in identity of entities. Moreover, it is not just that they don’t agree on the details, but there is a tendency to admit it is nigh on inapplicable (see [3] for a good introduction). Watered-down versions of the notion of identity in philosophy that have been proposed in AI, are to used either only the necessary or only the sufficient conditions, which are at least somewhat applicable, be it as a basis for OntoClean [4], or in Description Logic languages (including OWL 2) with their primitive and defined classes. The details of the definitions, explanations, and pros and cons are presented in a ‘digested’ format in the first part of [2].

This analysis was subsequently used to increase the ontological foundations of UML in the second part of [2] to introduce two language enhancements for UML, being formally defined simple and compound identifiers and the notion of defined class, which also have a corresponding extension of UML’s metamodel. The proposed extensions focus on practical usability in conceptual data modelling, informed by ontology, and are approximations to the qualitative, relative, and synchronic identity, and to the notion of equivalent class (defined concept) in OWL (DL).

For those of you who do not care about the ‘unnecessary philosophizing’ (as one reviewer had put it) and justifications, there is a short (4-page) version [5] with the formal definitions and the UML extensions, which has been accepted at the SAICSIT’11 conference in Cape Town, South Africa. The longer version that provides explanation as to why the proposed extensions are the way they are is available as a SoCS technical report [2].

References

[1] Object Management Group. Ontology definition metamodel v1.0. Technical Report formal/2009-05-01, Object Management Group, 2009.

[2] Keet, C.M. Enhancing identification mechanisms in UML class diagrams with meaningful keys (extended version). Technical Report SoCS11-1, School of Computer Science, University of KwaZulu-Natal, South Africa. August 8, 2011.

[3] H. Noonan. Identity. In E. N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Fall 2008 edition, 2008.

[4] N. Guarino and C. Welty. Identity, unity, and individuality: towards a formal toolkit for ontological analysis. In W. Horn, editor, Proc. of ECAI’00. IOS Press, Amsterdam, 2000.

[5] Keet, C.M. Enhancing Identification Mechanisms in UML Class Diagrams with Meaningful Keys. SAICSIT Annual Research Conference 2011 (SAICSIT’11), Cape Town, October 3-5, 2011. ACM Conference Proceedings (in print).

Every American is a NamedPizza

Or: verbalizing OWL ontologies still doesn’t really work well.

Ever since we got the multi-lingual verbalization of ORM conceptual data models (restricted FOL theories) working in late 2005 [1]—well: the implementation worked in the DOGMA tool, but the understandability of the output depended on the natural language—I have been following on and off the progress on solutions to the problem. It would be really nice if it all had worked by now, because it is a way for non-logician domain experts to validate the knowledge represented in the ontology and verbalization has been shown to be very useful for domain experts (mainly enterprise) validating (business) knowledge represented in the ORM conceptual data modeling language. (Check out the NORMA tool for the latest fancy implementation, well ahead of OWL verbalization in English Controlled Natural Language).

Some of my students worked on it as an elective ‘mini-project’ topic of the ontology engineering courses I have taught [SWT at FUB, UH, UCI, UKZN]. They have tried to implement it for OWL into Italian and Spanish natural language using a template-based approach with some additional mini-grammar-engine to improve the output, or in English as a competitor to the Manchester syntax. All of them invariable run, to a greater or lesser extent, into the problems discussed in [1], especially when it comes to non-English languages, as English is grammatically challenged. Now, I do not intend to offend people who have English as first language, but English does not have features like gendered articles (just ‘the’ instead of ‘el’ and ‘la’, in Spanish), declensions (still ‘the’ instead of ‘der’ ‘des’, ‘dem’, ‘den’ depending on the proposition, in German), conjunction depending on the nouns (just ‘and’ instead of ‘na’, ‘ne’, ‘no’ that is glued onto the second noun depending on the first letter of that noun, in isiZulu), or subclauses where the verb tense changes by virtue of being in a subclause (in Italian). To sort out such basic matters to generate an understandable pseudo-natural language sentence, a considerable amount of grammar rules and a dictionary have to be added to a template-based approach to make it work.

But let us limit ourselves to English for the moment. Then it is still not trivial. There is a paper comparing the different OWL verbalizers [2], such as Rabbit (ROO) and ACE, which considers issues like how to map, e.g., an AllValuesFrom to “Each…”, “Every…” etc. This is an orthogonal issue to the multi-lingual aspects, and I don’t know how that affects the user’s understanding of the sentences.

I had another look at ACE, as ACE also has a web-interface that accepts OWL/XML files (i.e., OWL 2). I tried it out with the Pizza tutorial ontology, and it generated many intelligible sentences. However, there were also phrases like (i) “Everything that is hasTopping by a Mushroom is something that is a MozzarellaTopping or that is a MushroomTopping or that is a TomatoTopping.”, the (ii) “Every American is a NamedPizza” mentioned in the title of this post, and then there are things like  (iii) “Every DomainConcept that is America or that is England or that is France or that is Germany or that is Italy is a Country”. Example (iii) is not a problem of the verbalizer, but merely an instance of GIGO and the ontology should be corrected.

Examples (i) and (ii) exhibit other problems, though. Regarding (ii), I have noticed that when (novice) ontologists use an ontology development tool, it is a not uncommon practice to not name the entity fully, probably because it is easy for a human reader to fill in the rest from the context; in casu, American is not an adjective to people, but relates to pizza. A more precise name could have avoided such issues (AmericanPizza), or a new solution to ‘context’ can be devised. The weird “is hasTopping by” is due, I think, to the lexicalization of OWL’s ObjectPropertyRange in ACE, which takes the object property, assumes that to be in the infinitive and then puts it in the past participle form (see the Web-ACE page, section 4). So, if the Pizza Ontology developers had chosen not hasTopping but, say, the verb ‘top’, ACE would have changed it into ‘is topped by’. In idea the rule makes sense, but it can be thwarted by the names used in the ontology.

Fliedl and co-authors [3] are trying to resolve just such issues. They propose a rigid naming convention to make it easier to verbalize the ontology. I do not think it is a good proposal, because it is ‘blaming’ the ontologists for failing natural language generation (NLG) systems, and syntactic sugar (verbalization) should not be the guiding principle when adding knowledge to the ontology. Besides, it is not that difficult to add another rule or two to cater for variations, which is probably what will be needed in the near future anyway once ontology reuse and partial imports become more commonplace in ontology engineering.

Power and Third [4] readily admit that verbalizing OWL is “dubious in theory”, but they provide data that it may be “feasible in practice”. The basis of their conclusion lies in the data analysis of about 200 ontologies, which show that the ‘problematic’ cases seldom arise. For instance, OWL’s SubClassOf takes two class expressions, but in praxis it is only used in the format of SubClassOf(C CE) or SubClassOf(C C), idem regarding EquivalentClasses—I think that is probably due to Protégé’s interface—which makes the verbalization easier. They did not actually build a verbalizer, though, but the tables on page 1011 can be of use what to focus on first; e.g., out of the 633,791 axioms, there were only 12 SubDataPropertyOf assertions, whereas SubClassOf(Class,Class) appeared 297,293 times (46.9% of the total) and SubClassOf(Class,ObjectSomeValuesFrom(ObjectProperty,Class)) 158,519 times (25.0%). Why this distribution is the way it is, is another topic.

Going back to the multi-lingual dimension, there is a general problem with OWL ontologies, which is, from a theoretical perspective, addressed more elegantly with OBO ontologies. In OBO, each class has an identifier and the name is just a label. So one could, in principle, amend this by adding labels for each natural language; e.g., have a class “PIZZA:12345” in the ontology with associated labels “tomato @en”, “pomodoro @it”, “utamatisi @zulu” and so forth, and when verbalizing it in one of those languages, the system picks the right label, compared to the present cumbersome and error-prone way of developing and maintaining an OWL file for each language. Admitted, this has its limitations for terms and verbs that do not have a neat 1:1 translation, but a fully lexicalized ontology should be able to solve this (though does not do so yet).

It is very well possible that I have missed some recent paper that addresses the issues but that I have not come across. At some point in time, we’ll probably will (have to) develop an isiZulu verbalization system, so anyone who has/knows of references that point to (partial) solutions is most welcome to add them in the comments section of the post.

References

[1] M. Jarrar, C.M. Keet, and P. Dongilli. Multilingual verbalization of ORM conceptual models and axiomatized ontologies. STARLab Technical Report, Vrije Universiteit Brussels, Belgium. February 2006.

[2] R. Schwitter, K. Kaljurand, A. Cregan, C. Dolbear, G. Hart. A comparison of three controlled natural languages for OWL 1.1. Proc. of OWLED 2008 DC. Washington, DC, USA, 1-2 April 2008.

[3] Fliedl, G., Kop, C., Vöhringer, J. Guideline based evaluation and verbalization of OWL class and property labels. Data & Knowledge Engineering, 2010, 69: 331-342.

[4] Power, R., Third, A. Expressing OWL axioms by English sentences: dubious in theory, feasible in practice. Coling 2010: Poster Volume, pages 1006–1013,

Beijing, August 2010.

Constraints for migrating relations over time

Migrating objects in a database is nothing new—employees become managers, MSc students PhD students and so forth—and this has been investigated and implemented widely in temporal databases. But imagine a scenario for an airline company’s passenger RDBMS and a passenger who books a flight, hence we have a relation ⟨John, AZ123⟩ ∈ Booking with John ∈ Passenger and AZ123 ∈ Flight, which is normally followed by the events that John also checks in and boards the plane afterward, i.e., ⟨John, AZ123⟩ ∈ CheckIn and then ⟨John, AZ123⟩ ∈ Boarding. While the booking relation holds even after the tuple extended to the check-in relation, i.e., ⟨John, AZ123⟩ is a member of both Booking and CheckIn relations, this is not the case for the step from check-in to boarding which causes the tuple ⟨John, AZ123⟩ to be moved from one table to another in the operational database. In addition, for any tuple that is member of the Boarding relation, we know that it must have been a member of CheckIn relation sometime earlier. Clearly, airline companies implement some code to keep track of such changes, but how to represent this in a conceptual data model?

Or take a simple change in relation between two objects can be caused by the fact that a is structurally a part of b but a gets loose so after that a becomes spatially contained in b; e.g., a component in a medical device breaks loose due to wear and tear. Then it would be nice if a fault detection system can send such a message back to control, compared to the imprecise “there’s something wrong over there”.

A related issue is keeping track of the status of the same relation. Take, for instance, the issues with subquantities with, say, a bottle of wine and pouring a subquantity of the wine into a wineglass so that this subquantity in the glass used to be—but not is—a subquantity of the wine in the bottle and one wants to maintain traceability of quantities over time. This is important especially in the food industry for food safety in the food processing chain, hence, the data management has to be able to deal with such cases adequately and transparently.

Perhaps surprisingly, there is no conceptual data modelling language that lets you model the business knowledge where relations migrate during the conceptual analysis stage.  By relation migration, I mean the change of membership of a tuple from one relation to another. Thus, relation migration is distinct from state transition diagrams that concern states of single objects, from activity diagrams that concern processes but do not explicitly consider the participating entities, and from interaction diagrams for modelling use cases. Here we focus explicitly on the migration of facts/tuples/relation instances and the corresponding temporal behaviour of fact types/relations. So, in analogy to object migration, there is a usefulness of a similar set of constraints for relations that can be called relation migration.

Alessandro Artale and I specified the basic constraints that hold for relation migration, which recently got accepted for the ORM’10 workshop co-located with the OTM conference. The paper’s [1] abstract is as follows:

Representing and reasoning over evolving objects has been investigated widely. Less attention has been devoted to the similar notion of relation migration, i.e., how tuples of a relation (ORM facts) can evolve along time. We identify different ways how a relation can change over time and give a logic-based semantics to the notion of relation migration to capture its behaviour. We also introduce the notion of lifespan of a relation and clarify the interactions between object migration and relation migration. Its use in graphical conceptual data modelling is illustrated with a minor extension to ORM2 so as to more easily communicate such constraints with domain experts.

We distinguish between evolution constraints—specifying how elements of a relation can possibly migrate to another relation—persistence constraints—specifying persistent states for a relation—and quantitative evolution constraints—specifying the exact amount of time for the relation migration to happen. In addition, one has to consider the lifespan of relations. Together, they result in 15 axioms for the evolution and persistence constraints, and 3 propositions concerning the logical implications with respect to subsumption and relation migration, relation migration vs. lifespan, and objects vs. a relation’s lifespan.

Concerning the interaction object and relation migration, we found two types, one where an object migration forces a relation to migrate, and one where a relation migration forces an object migration. For instance, take a company where, at some point in time, each employee will be promoted within the same department he or she works for (for simplicity: the employee works for exactly one department) and such that demotion does not occur. This means an object migration of type PEX (Persistent Extension) between Employee and Manager (see figure, below). This forces a relation migration of type RPEX between worksFor and manages in order to maintain consistency of the conceptual data model (see figure below).

Example of an object migration (dashed purple arrow) that forces a relation migration (dashed green arrow)

The last word has not been said yet about incorporating rigidity fully in this framework, nor tractable reasoning with relation migration, but at least the foundational aspects of relation migration have been identified and characterized formally, which already can be added to conceptual data modelling languages, as illustrated for the Object-Role Modeling language in [1].

References

[1] Keet, C.M. and Artale, A. A basic characterization of relation migration. International Workshop on Fact-Oriented Modeling (ORM’10), Crete, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, Lecture Notes in Computer Science LNCS. (to appear)

Object-Role Modeling and Description Logics for conceptual modelling

Object-Role Modeling (ORM) is a so-called “true” conceptual modelling language in the sense that it is independent of the application scenario and it has been mapped into both UML class diagrams and ER [1]. That is, ORM and its successor ORM2 can be used in the conceptual analysis stage for database development, application software development, requirements engineering, business rules, and other areas [1-5]. If we can reason over such ORM conceptual data models, then we can guarantee the model (i.e., first order logic theory) is satisfiable and consistent, so that the corresponding application based on it definitely does behave correctly with respect to its specification (I summarised a more comprehensive argumentation and examples earlier). And, well, from a push-side: it widens the scope of possible scenarios where to use automated reasoners.

Various strategies and technologies are being developed to reason over conceptual data models to meet the same or slightly different requirements and aims. An important first distinction is between the assumption that modellers should be allowed to keep total freedom to model what they deem necessary to represent and subsequently put constraints on which parts can be used for reasoning or accept slow performance versus the assumption that it is better to constrain the language a priori to a subset of first order logic so as to achieve better performance and a guarantee that the reasoner terminates. The former approach is taken by Queralt and Teniente [6] using a dependency graph of the constraints in a UML Class Diagram + OCL and by first order logic (FOL) theorem provers. The latter approach is taken by [7-15] who experiment with different techniques. For instance, Smaragdakis et al and Kaneiwa et al [7-8] use special purpose reasoners for ORM and UML Class Diagrams, Cabot et al and Cadoli et al [9-10] encode a subset of UML class diagrams as a Constraint Satisfaction Problem, and [11-16] use a Description Logic (DL) framework for UML Class Diagrams, ER, EER, and ORM.

Perhaps not surprisingly, I also took the DL approach on this topic, on which I started working in 2006. I had put the principal version of the correspondence between ORM and the DL language DLRifd online on arXiv in February 2007 and got the discussion of the fundamental transformation problems published at DL’07 [15]. Admittedly, that technical report won’t ever win the beauty prize for its layout or concern for readability. In the meantime, I have corrected the typos, improved on the readability, proved correctness of encoding, and updated the related research with the recent works. On the latter, it also contains a discussion of a later, similar, attempt by others and the many errors in it. On the bright side: addressing those errors helps explaining the languages and trade-offs better (there are advantages to using a DL language to represent an ORM diagram, but also disadvantages). This new version (0702089v2), entitled “Mapping the Object-Role Modeling language ORM2 into Description Logic language DLRifd” [17] is now also online at arXiv.

As appetizer, here’s the abstract:

In recent years, several efforts have been made to enhance conceptual data modelling with automated reasoning to improve the model’s quality and derive implicit information. One approach to achieve this in implementations, is to constrain the language. Advances in Description Logics can help choosing the right language to have greatest expressiveness yet to remain within the decidable fragment of first order logic to realise a workable implementation with good performance using DL reasoners. The best fit DL language appears to be the ExpTime-complete DLRifd. To illustrate trade-offs and highlight features of the modelling languages, we present a precise transformation of the mappable features of the very expressive (undecidable) ORM/ORM2 conceptual data modelling languages to exactly DLRifd. Although not all ORM2 features can be mapped, this is an interesting fragment because it has been shown that DLRifd can also encode UML Class Diagrams and EER, and therefore can foster interoperation between conceptual data models and research into ontological aspects of the modelling languages.

And well, for those of you who might be disappointed that not all ORM features can be mapped: computers have their limitations and people have a limited amount of time and patience. To achieve ‘scalability’ of reasoning over initially large theories represented in a very expressive language, modularisation of the conceptual models and ontologies is one of the lines of research. But it is a separate topic and not quite close to implementation just yet.

References

[1] Halpin, T.: Information Modeling and Relational Databases. San Francisco: Morgan Kaufmann Publishers (2001)

[2] Balsters, H., Carver, A., Halpin, T., Morgan, T.: Modeling dynamic rules in ORM. In: OTM Workshops 2006. Proc. of ORM’06. Volume 4278 of LNCS., Springer (2006) 1201-1210

[3] Evans, K.: Requirements engineering with ORM. In: OTM Workshops 2005. Proc. of ORM’05. Volume 3762 of LNCS., Springer (2005) 646-655

[4] Halpin, T., Morgan, T.: Information modeling and relational databases. 2nd edn. Morgan Kaufmann (2008)

[5] Pepels, B., Plasmeijer, R.: Generating applications from object role models. In: OTM Workshops 2005. Proc. of ORM’05. Volume 3762 of LNCS., Springer (2005) 656-665

[6] Queralt, A., Teniente, E.: Decidable reasoning in UML schemas with constraints. In: Proc. of CAiSE’08. Volume 5074 of LNCS., Springer (2008) 281-295

[7] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable automatic test data generation from modeling diagrams. In: Proc. of ASE’07. (2007) 4-13

[8] Kaneiwa, K., Satoh, K.: Consistency checking algorithms for restricted UML class diagrams. In: Proc. of FoIKS ’06, Springer Verlag (2006)

[9] Cabot, J., Clariso, R., Riera, D.: Verification of UML/OCL class diagrams using constraint programming. In: Proc. of MoDeVVA 2008. (2008)

[10] Cadoli, M., Calvanese, D., De Giacomo, G., Mancini, T.: Finite model reasoning on UML class diagrams via constraint programming. In: Proc. of AI*IA 2007. Volume 4733 of LNAI., Springer (2007) 36-47

[11] Calvanese, D., De Giacomo, G., Lenzerini, M.: On the decidability of query containment under constraints. In: Proc. of PODS’98. (1998) 149-158

[12] Artale, A., Calvanese, D., Kontchakov, R., Ryzhikov, V., Zakharyaschev, M.: Reasoning over extended ER models. In: Proc. of ER’07. Volume 4801 of LNCS., Springer (2007) 277-292

[13] Jarrar, M.: Towards automated reasoning on ORM schemes–mapping ORM into the DLRidf Description Logic. In: ER’07. Volume 4801 of LNCS. (2007) 181-197

[14] Franconi, E., Ng, G.: The ICOM tool for intelligent conceptual modelling. In: Proc. of KRDB’00. (2000) Berlin, Germany, 2000.

[15] Keet, C.M.: Prospects for and issues with mapping the Object-Role Modeling language into DLRifd. In: Proc. of DL’07. Volume 250 of CEUR-WS. (2007) 331-338

[16] Berardi, D., Calvanese, D., De Giacomo, G.: Reasoning on UML class diagrams. Artificial Intelligence 168(1-2) (2005) 70-118

[17] Keet, C.M. Mapping the Object-Role Modeling language ORM2 into Description Logic language DLRifd. KRDB Research Centre, Free University of Bozen-Bolzano, Italy. 22 April 2009. arXiv:cs.LO/0702089v2.