First tractable encoding of ORM conceptual data models

For (relatively) many years I’ve been focusing on as-expressive-as-possible languages to represent information and knowledge, including the computationally impractical full first order logic, because one would/should want to be as precise as possible and required to represent the subject domain in an ontology and universe of discourse for the application in a conceptual data model. After all, one can always throw out the computationally unpleasant constructs later during the implementation stage, if the ontology or conceptual data model is intended for use at runtime, such as OBDA [1], test data generate for verification [2], and in the query compilation stage in RDBMSs [3]. The resulting slimmed theories/models may be different for different applications, but then at least the set of slimmed theories/models share their common understanding.

So, now I ventured in that area, not because there’s some logic x and conceptual modeling language y has to be forced into it, but it actually appears that many fancy construct/features are not used in publicly available conceptual data models anyway (see data set and xls with some analysis). The timing of the outcome of the analysis of the data set coincided with David Toman’s visit to UCT as part of his sabbatical and Pablo Fillottrani’s visit, who enjoyed the last exchange of our bi-lateral project on the unification of conceptual data modelling languages (project page). To sum up the issue we were looking at: the need for run-time usage of conceptual data models requires a tractable logic-based reconstruction of the conceptual models (i.e., in at most PTIME), which appeared to hardly exist or miss constructs important for conceptual models (regardless whether that was ORM, EER or UML Class Diagrams), or both.

The solution ended up to be a logic-based reconstruction for most of ORM2 using the \mathcal{CFDI}_{nc}^{\forall -} Description Logic, which also happens to be the first tractable encoding of (most of) ORM/ORM2. With this logic, several features important for conceptual models (i.e., occur relatively often) do have their proper encoding in the logic, notably n-aries, complex identification constraints, and n-ary role subsumption. The, admittedly quite tedious, mapping

Low resolution and small version of our DL15 poster summarising the contributions.

Low resolution and small version of our DL15 poster summarising the contributions.

captures over 96% of the constructs used in practice in the set of 33 ORM diagrams we analysed (see data set). Further, the results are easily transferable to EER and UML Class diagrams, with an even greater coverage. The results (and comparison with related works) are presented in our recently accepted paper at the 28th International Workshop on Description Logics (DL’15) that will take place form 7 to 11 June in Athens, Greece.

The list of accepted papers of DL’15 is available, listing 21 papers with long presentations, 16 papers with short presentation, and 26 papers with poster presentations. David will present our results in the poster session, as it’s probably of more relevance in the conceptual modelling community (and I’ll be marking exams then), and some other accepted papers cover more new ground, such as casting schema.org as a description logic, temporal query answering in EL, exact learning of ontologies, and more. The proceedings is will be online on CEUR-WS in the upcoming days as volume 1350. I’ve added a mini version of our poster on the right. I tried tikzposter, as they look really cool, but it doesn’t support figures (other than those made in latex), so I resorted to ppt (that doesn’t support math), wondering why these issues haven’t been solved by now.

Anyway, more about this topic is in the pipeline that I soon hope to be able to give updates on.

 

References

[1] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland. pp 1389-1396.

[2] Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compilation. Synthesis Lectures on Data Management, Morgan & Claypool  Publishers (2011)

[3] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking and test data generation from modeling diagrams. Automation in Software Engineering 16, 73–99 (2009)

[4] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in \mathcal{CFDI}_{nc}^{\forall -} . 28th International Workshop on Description Logics (DL’15). CEUR-WS vol xx., 7-10 June 2015, Athens, Greece.

Advertisements

Every American is a NamedPizza

Or: verbalizing OWL ontologies still doesn’t really work well.

Ever since we got the multi-lingual verbalization of ORM conceptual data models (restricted FOL theories) working in late 2005 [1]—well: the implementation worked in the DOGMA tool, but the understandability of the output depended on the natural language—I have been following on and off the progress on solutions to the problem. It would be really nice if it all had worked by now, because it is a way for non-logician domain experts to validate the knowledge represented in the ontology and verbalization has been shown to be very useful for domain experts (mainly enterprise) validating (business) knowledge represented in the ORM conceptual data modeling language. (Check out the NORMA tool for the latest fancy implementation, well ahead of OWL verbalization in English Controlled Natural Language).

Some of my students worked on it as an elective ‘mini-project’ topic of the ontology engineering courses I have taught [SWT at FUB, UH, UCI, UKZN]. They have tried to implement it for OWL into Italian and Spanish natural language using a template-based approach with some additional mini-grammar-engine to improve the output, or in English as a competitor to the Manchester syntax. All of them invariable run, to a greater or lesser extent, into the problems discussed in [1], especially when it comes to non-English languages, as English is grammatically challenged. Now, I do not intend to offend people who have English as first language, but English does not have features like gendered articles (just ‘the’ instead of ‘el’ and ‘la’, in Spanish), declensions (still ‘the’ instead of ‘der’ ‘des’, ‘dem’, ‘den’ depending on the proposition, in German), conjunction depending on the nouns (just ‘and’ instead of ‘na’, ‘ne’, ‘no’ that is glued onto the second noun depending on the first letter of that noun, in isiZulu), or subclauses where the verb tense changes by virtue of being in a subclause (in Italian). To sort out such basic matters to generate an understandable pseudo-natural language sentence, a considerable amount of grammar rules and a dictionary have to be added to a template-based approach to make it work.

But let us limit ourselves to English for the moment. Then it is still not trivial. There is a paper comparing the different OWL verbalizers [2], such as Rabbit (ROO) and ACE, which considers issues like how to map, e.g., an AllValuesFrom to “Each…”, “Every…” etc. This is an orthogonal issue to the multi-lingual aspects, and I don’t know how that affects the user’s understanding of the sentences.

I had another look at ACE, as ACE also has a web-interface that accepts OWL/XML files (i.e., OWL 2). I tried it out with the Pizza tutorial ontology, and it generated many intelligible sentences. However, there were also phrases like (i) “Everything that is hasTopping by a Mushroom is something that is a MozzarellaTopping or that is a MushroomTopping or that is a TomatoTopping.”, the (ii) “Every American is a NamedPizza” mentioned in the title of this post, and then there are things like  (iii) “Every DomainConcept that is America or that is England or that is France or that is Germany or that is Italy is a Country”. Example (iii) is not a problem of the verbalizer, but merely an instance of GIGO and the ontology should be corrected.

Examples (i) and (ii) exhibit other problems, though. Regarding (ii), I have noticed that when (novice) ontologists use an ontology development tool, it is a not uncommon practice to not name the entity fully, probably because it is easy for a human reader to fill in the rest from the context; in casu, American is not an adjective to people, but relates to pizza. A more precise name could have avoided such issues (AmericanPizza), or a new solution to ‘context’ can be devised. The weird “is hasTopping by” is due, I think, to the lexicalization of OWL’s ObjectPropertyRange in ACE, which takes the object property, assumes that to be in the infinitive and then puts it in the past participle form (see the Web-ACE page, section 4). So, if the Pizza Ontology developers had chosen not hasTopping but, say, the verb ‘top’, ACE would have changed it into ‘is topped by’. In idea the rule makes sense, but it can be thwarted by the names used in the ontology.

Fliedl and co-authors [3] are trying to resolve just such issues. They propose a rigid naming convention to make it easier to verbalize the ontology. I do not think it is a good proposal, because it is ‘blaming’ the ontologists for failing natural language generation (NLG) systems, and syntactic sugar (verbalization) should not be the guiding principle when adding knowledge to the ontology. Besides, it is not that difficult to add another rule or two to cater for variations, which is probably what will be needed in the near future anyway once ontology reuse and partial imports become more commonplace in ontology engineering.

Power and Third [4] readily admit that verbalizing OWL is “dubious in theory”, but they provide data that it may be “feasible in practice”. The basis of their conclusion lies in the data analysis of about 200 ontologies, which show that the ‘problematic’ cases seldom arise. For instance, OWL’s SubClassOf takes two class expressions, but in praxis it is only used in the format of SubClassOf(C CE) or SubClassOf(C C), idem regarding EquivalentClasses—I think that is probably due to Protégé’s interface—which makes the verbalization easier. They did not actually build a verbalizer, though, but the tables on page 1011 can be of use what to focus on first; e.g., out of the 633,791 axioms, there were only 12 SubDataPropertyOf assertions, whereas SubClassOf(Class,Class) appeared 297,293 times (46.9% of the total) and SubClassOf(Class,ObjectSomeValuesFrom(ObjectProperty,Class)) 158,519 times (25.0%). Why this distribution is the way it is, is another topic.

Going back to the multi-lingual dimension, there is a general problem with OWL ontologies, which is, from a theoretical perspective, addressed more elegantly with OBO ontologies. In OBO, each class has an identifier and the name is just a label. So one could, in principle, amend this by adding labels for each natural language; e.g., have a class “PIZZA:12345” in the ontology with associated labels “tomato @en”, “pomodoro @it”, “utamatisi @zulu” and so forth, and when verbalizing it in one of those languages, the system picks the right label, compared to the present cumbersome and error-prone way of developing and maintaining an OWL file for each language. Admitted, this has its limitations for terms and verbs that do not have a neat 1:1 translation, but a fully lexicalized ontology should be able to solve this (though does not do so yet).

It is very well possible that I have missed some recent paper that addresses the issues but that I have not come across. At some point in time, we’ll probably will (have to) develop an isiZulu verbalization system, so anyone who has/knows of references that point to (partial) solutions is most welcome to add them in the comments section of the post.

References

[1] M. Jarrar, C.M. Keet, and P. Dongilli. Multilingual verbalization of ORM conceptual models and axiomatized ontologies. STARLab Technical Report, Vrije Universiteit Brussels, Belgium. February 2006.

[2] R. Schwitter, K. Kaljurand, A. Cregan, C. Dolbear, G. Hart. A comparison of three controlled natural languages for OWL 1.1. Proc. of OWLED 2008 DC. Washington, DC, USA, 1-2 April 2008.

[3] Fliedl, G., Kop, C., Vöhringer, J. Guideline based evaluation and verbalization of OWL class and property labels. Data & Knowledge Engineering, 2010, 69: 331-342.

[4] Power, R., Third, A. Expressing OWL axioms by English sentences: dubious in theory, feasible in practice. Coling 2010: Poster Volume, pages 1006–1013,

Beijing, August 2010.

Recap of the sixth workshop on Fact-Oriented Modelling: ORM’10

The sixth workshop on Fact-Oriented/Object-Role Modelling (ORM’10) in Hersonissou, Crete, Greece, and co-located with the OTM conference just came to a close after a long session on metamodelling to achieve a standard exchange format for the different ORM tools that are in use and under development (such as NORMA, DocTool, and CaseTalk). The other sessions during these three days were filled with paper presentations and several tool demos, reflecting not only the mixed audience of academia and industry, but also the versatility of fact-oriented modelling. I will illustrate some of that in the remainder of the post. (Note: ORM is a conceptual data modelling language that enjoys a formal foundation, and a graphical interface to draw the diagrams and a textual interface to verbalize the domain knowledge so as to facilitate communication with, and validation by, the domain experts.)

An overview of a novel mapping of ORM2 to DatalogLB was presented by Terry Halpin from LogicBlox and INTI International University [1]. The choice for such a mapping was motivated by the support for rules in Datalog so as to also have a formal foundation and implemented solution for the (derivation) rules one can define in an ORM conceptual data model in the NORMA tool.

Staying with formalisms (but of a different kind and scope), Fazat Nur Azizah from the Bandung Institute of Technology proposed a grammar to specify modelling patterns so that actual patterns can be reused for different conceptual data models—alike software design patterns, but then for the FCO-IM flavour of fact-oriented conceptual data modelling [2].

At the other end of the spectrum were two papers that proposed and assessed the use and benefits of ORM in the setting of understanding natural language text documents. Ron McFadyen from the University of Winnipeg introduced document literacy and ORM [3]. Peter Bollen from Maastricht University showed how ORM can improve the completeness and maintenance of specifications like the Business Process Model and Notation [4], which is in analogy with the WSML-documentation-in-ORM [5] and thereby thus strengthening the case that one indeed can be both more precise and communicative with one’s specification if accompanied by a representation in ORM.

There was a session on Master Data Management (MDM), presented by Baba Piprani from MetaGlobal Systems and Patricia Schiefelbein from Boston Scientific. However, I got a bit sidetracked when Baba Piprani had an interesting quote called the “Helsinki principle”, being

Any meaningful exchange of utterances depends upon the prior existence of an agreed set of semantic and syntactic rules. The recipients of the utterances must use only these rules to interpret the received utterances, if it is to mean the same as that which was meant by the utterer. (ISO TR9007)

whereas I was associating the term “Helsinki principle” with a wholly different story, being the right to self-determination described in the Helsinki accords on security and cooperation in Europe. Now, it happens to be the case that proper MDM contributes to solving semantic mismatches.

Last, there was a session on extensions. Tony Morgan from INTI International University [6] had a go at folding and zooming, presenting an alternative approach to abstraction for large ORM diagram (that is, alternative to [7,8] and the many other proposals outside ORM); it introduced new notations, the code-folding idea for but then for ORM diagrams, and a lightweight algorithm. Yan Tang from STARLab at the Free University of Brussels elaborated on the interaction between semantic decision tables and DOGMA [9] (DOGMA is an approach and tool that reuses ORM notation for ontology engineering). Last, but not least, I presented the paper by Alessandro Artale and myself about the basic constraints for relation migration [10], about which I wrote in an earlier blog post.

To wrap up, the workgroup on the common exchange format for fact-oriented modelling tools—chaired by Serge Valera from the European Space Agency—will continue their work toward standardization, the slides of the presentations will be made available on the ORM Foundation website in these days, and else it is on heading towards the 7th ORM workshop next year somewhere in the Mediterranean.

References

(Unfortunately, at the time of writing, most of the papers are still in the proceedings behind Springer’s paywall)

[1] Terry Halpin, Matthew Curland, Kurt Stirewalt, Navin Viswanath, Matthew McGill, and Steven Beck. Mapping ORM to Datalog: An Overview.
 International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, 504-513.

[2] Fazat Nur Azizah, Guido P. Bakema, Benhard Sitohang, and Oerip S. Santoso. Information Grammar for Patterns (IGP) for Pattern Language of Data Model Patterns Based on Fully Communication Oriented Information Modeling (FCO-IM). International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, 522-531.

[3] Ron McFadyen and Susan Birdwise. Literacy and Data Modeling. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer,
LNCS 6428, p. 532-540.

[4] Peter Bollen. A Fact-Based Meta Model for Standardization Documents. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer,
LNCS 6428, p. 464-473.

[5] Tziviskou, C. and Keet, C.M. A Meta-Model for Ontologies with ORM2. Third International Workshop on Object-Role Modelling (ORM’07), Algarve, Portugal, Nov 28-30, 2007. Meersman, R., Tari, Z., Herrero., P. et al. (Eds.), Springer, LNCS 4805, 624-633.

[6] Tony Morgan. A Proposal for Folding in ORM Diagrams. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, 474-483.

[7] Keet, C.M. Using abstractions to facilitate management of large ORM models and ontologies. International Workshop on Object-Role Modeling (ORM’05). Cyprus, 3-4 November 2005. In: OTM Workshops 2005. Halpin, T., Meersman, R. (eds.), Lecture Notes in Computer Science LNCS 3762. Berlin: Springer-Verlag, 2005. pp603-612.

[8] Campbell, L.J., Halpin, T.A. and Proper, H.A.: Conceptual Schemas with Abstractions: Making flat conceptual schemas more comprehensible. Data & Knowledge Engineering (1996) 20(1): 39-85

[9] Yan Tang. Towards Using Semantic Decision Tables for Organizing Data Semantics. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, 494-503.

[10] Keet, C.M. and Artale, A. A basic characterization of relation migration. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS. 6428, 484-493.

Constraints for migrating relations over time

Migrating objects in a database is nothing new—employees become managers, MSc students PhD students and so forth—and this has been investigated and implemented widely in temporal databases. But imagine a scenario for an airline company’s passenger RDBMS and a passenger who books a flight, hence we have a relation ⟨John, AZ123⟩ ∈ Booking with John ∈ Passenger and AZ123 ∈ Flight, which is normally followed by the events that John also checks in and boards the plane afterward, i.e., ⟨John, AZ123⟩ ∈ CheckIn and then ⟨John, AZ123⟩ ∈ Boarding. While the booking relation holds even after the tuple extended to the check-in relation, i.e., ⟨John, AZ123⟩ is a member of both Booking and CheckIn relations, this is not the case for the step from check-in to boarding which causes the tuple ⟨John, AZ123⟩ to be moved from one table to another in the operational database. In addition, for any tuple that is member of the Boarding relation, we know that it must have been a member of CheckIn relation sometime earlier. Clearly, airline companies implement some code to keep track of such changes, but how to represent this in a conceptual data model?

Or take a simple change in relation between two objects can be caused by the fact that a is structurally a part of b but a gets loose so after that a becomes spatially contained in b; e.g., a component in a medical device breaks loose due to wear and tear. Then it would be nice if a fault detection system can send such a message back to control, compared to the imprecise “there’s something wrong over there”.

A related issue is keeping track of the status of the same relation. Take, for instance, the issues with subquantities with, say, a bottle of wine and pouring a subquantity of the wine into a wineglass so that this subquantity in the glass used to be—but not is—a subquantity of the wine in the bottle and one wants to maintain traceability of quantities over time. This is important especially in the food industry for food safety in the food processing chain, hence, the data management has to be able to deal with such cases adequately and transparently.

Perhaps surprisingly, there is no conceptual data modelling language that lets you model the business knowledge where relations migrate during the conceptual analysis stage.  By relation migration, I mean the change of membership of a tuple from one relation to another. Thus, relation migration is distinct from state transition diagrams that concern states of single objects, from activity diagrams that concern processes but do not explicitly consider the participating entities, and from interaction diagrams for modelling use cases. Here we focus explicitly on the migration of facts/tuples/relation instances and the corresponding temporal behaviour of fact types/relations. So, in analogy to object migration, there is a usefulness of a similar set of constraints for relations that can be called relation migration.

Alessandro Artale and I specified the basic constraints that hold for relation migration, which recently got accepted for the ORM’10 workshop co-located with the OTM conference. The paper’s [1] abstract is as follows:

Representing and reasoning over evolving objects has been investigated widely. Less attention has been devoted to the similar notion of relation migration, i.e., how tuples of a relation (ORM facts) can evolve along time. We identify different ways how a relation can change over time and give a logic-based semantics to the notion of relation migration to capture its behaviour. We also introduce the notion of lifespan of a relation and clarify the interactions between object migration and relation migration. Its use in graphical conceptual data modelling is illustrated with a minor extension to ORM2 so as to more easily communicate such constraints with domain experts.

We distinguish between evolution constraints—specifying how elements of a relation can possibly migrate to another relation—persistence constraints—specifying persistent states for a relation—and quantitative evolution constraints—specifying the exact amount of time for the relation migration to happen. In addition, one has to consider the lifespan of relations. Together, they result in 15 axioms for the evolution and persistence constraints, and 3 propositions concerning the logical implications with respect to subsumption and relation migration, relation migration vs. lifespan, and objects vs. a relation’s lifespan.

Concerning the interaction object and relation migration, we found two types, one where an object migration forces a relation to migrate, and one where a relation migration forces an object migration. For instance, take a company where, at some point in time, each employee will be promoted within the same department he or she works for (for simplicity: the employee works for exactly one department) and such that demotion does not occur. This means an object migration of type PEX (Persistent Extension) between Employee and Manager (see figure, below). This forces a relation migration of type RPEX between worksFor and manages in order to maintain consistency of the conceptual data model (see figure below).

Example of an object migration (dashed purple arrow) that forces a relation migration (dashed green arrow)

The last word has not been said yet about incorporating rigidity fully in this framework, nor tractable reasoning with relation migration, but at least the foundational aspects of relation migration have been identified and characterized formally, which already can be added to conceptual data modelling languages, as illustrated for the Object-Role Modeling language in [1].

References

[1] Keet, C.M. and Artale, A. A basic characterization of relation migration. International Workshop on Fact-Oriented Modeling (ORM’10), Crete, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, Lecture Notes in Computer Science LNCS. (to appear)

Object-Role Modeling and Description Logics for conceptual modelling

Object-Role Modeling (ORM) is a so-called “true” conceptual modelling language in the sense that it is independent of the application scenario and it has been mapped into both UML class diagrams and ER [1]. That is, ORM and its successor ORM2 can be used in the conceptual analysis stage for database development, application software development, requirements engineering, business rules, and other areas [1-5]. If we can reason over such ORM conceptual data models, then we can guarantee the model (i.e., first order logic theory) is satisfiable and consistent, so that the corresponding application based on it definitely does behave correctly with respect to its specification (I summarised a more comprehensive argumentation and examples earlier). And, well, from a push-side: it widens the scope of possible scenarios where to use automated reasoners.

Various strategies and technologies are being developed to reason over conceptual data models to meet the same or slightly different requirements and aims. An important first distinction is between the assumption that modellers should be allowed to keep total freedom to model what they deem necessary to represent and subsequently put constraints on which parts can be used for reasoning or accept slow performance versus the assumption that it is better to constrain the language a priori to a subset of first order logic so as to achieve better performance and a guarantee that the reasoner terminates. The former approach is taken by Queralt and Teniente [6] using a dependency graph of the constraints in a UML Class Diagram + OCL and by first order logic (FOL) theorem provers. The latter approach is taken by [7-15] who experiment with different techniques. For instance, Smaragdakis et al and Kaneiwa et al [7-8] use special purpose reasoners for ORM and UML Class Diagrams, Cabot et al and Cadoli et al [9-10] encode a subset of UML class diagrams as a Constraint Satisfaction Problem, and [11-16] use a Description Logic (DL) framework for UML Class Diagrams, ER, EER, and ORM.

Perhaps not surprisingly, I also took the DL approach on this topic, on which I started working in 2006. I had put the principal version of the correspondence between ORM and the DL language DLRifd online on arXiv in February 2007 and got the discussion of the fundamental transformation problems published at DL’07 [15]. Admittedly, that technical report won’t ever win the beauty prize for its layout or concern for readability. In the meantime, I have corrected the typos, improved on the readability, proved correctness of encoding, and updated the related research with the recent works. On the latter, it also contains a discussion of a later, similar, attempt by others and the many errors in it. On the bright side: addressing those errors helps explaining the languages and trade-offs better (there are advantages to using a DL language to represent an ORM diagram, but also disadvantages). This new version (0702089v2), entitled “Mapping the Object-Role Modeling language ORM2 into Description Logic language DLRifd” [17] is now also online at arXiv.

As appetizer, here’s the abstract:

In recent years, several efforts have been made to enhance conceptual data modelling with automated reasoning to improve the model’s quality and derive implicit information. One approach to achieve this in implementations, is to constrain the language. Advances in Description Logics can help choosing the right language to have greatest expressiveness yet to remain within the decidable fragment of first order logic to realise a workable implementation with good performance using DL reasoners. The best fit DL language appears to be the ExpTime-complete DLRifd. To illustrate trade-offs and highlight features of the modelling languages, we present a precise transformation of the mappable features of the very expressive (undecidable) ORM/ORM2 conceptual data modelling languages to exactly DLRifd. Although not all ORM2 features can be mapped, this is an interesting fragment because it has been shown that DLRifd can also encode UML Class Diagrams and EER, and therefore can foster interoperation between conceptual data models and research into ontological aspects of the modelling languages.

And well, for those of you who might be disappointed that not all ORM features can be mapped: computers have their limitations and people have a limited amount of time and patience. To achieve ‘scalability’ of reasoning over initially large theories represented in a very expressive language, modularisation of the conceptual models and ontologies is one of the lines of research. But it is a separate topic and not quite close to implementation just yet.

References

[1] Halpin, T.: Information Modeling and Relational Databases. San Francisco: Morgan Kaufmann Publishers (2001)

[2] Balsters, H., Carver, A., Halpin, T., Morgan, T.: Modeling dynamic rules in ORM. In: OTM Workshops 2006. Proc. of ORM’06. Volume 4278 of LNCS., Springer (2006) 1201-1210

[3] Evans, K.: Requirements engineering with ORM. In: OTM Workshops 2005. Proc. of ORM’05. Volume 3762 of LNCS., Springer (2005) 646-655

[4] Halpin, T., Morgan, T.: Information modeling and relational databases. 2nd edn. Morgan Kaufmann (2008)

[5] Pepels, B., Plasmeijer, R.: Generating applications from object role models. In: OTM Workshops 2005. Proc. of ORM’05. Volume 3762 of LNCS., Springer (2005) 656-665

[6] Queralt, A., Teniente, E.: Decidable reasoning in UML schemas with constraints. In: Proc. of CAiSE’08. Volume 5074 of LNCS., Springer (2008) 281-295

[7] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable automatic test data generation from modeling diagrams. In: Proc. of ASE’07. (2007) 4-13

[8] Kaneiwa, K., Satoh, K.: Consistency checking algorithms for restricted UML class diagrams. In: Proc. of FoIKS ’06, Springer Verlag (2006)

[9] Cabot, J., Clariso, R., Riera, D.: Verification of UML/OCL class diagrams using constraint programming. In: Proc. of MoDeVVA 2008. (2008)

[10] Cadoli, M., Calvanese, D., De Giacomo, G., Mancini, T.: Finite model reasoning on UML class diagrams via constraint programming. In: Proc. of AI*IA 2007. Volume 4733 of LNAI., Springer (2007) 36-47

[11] Calvanese, D., De Giacomo, G., Lenzerini, M.: On the decidability of query containment under constraints. In: Proc. of PODS’98. (1998) 149-158

[12] Artale, A., Calvanese, D., Kontchakov, R., Ryzhikov, V., Zakharyaschev, M.: Reasoning over extended ER models. In: Proc. of ER’07. Volume 4801 of LNCS., Springer (2007) 277-292

[13] Jarrar, M.: Towards automated reasoning on ORM schemes–mapping ORM into the DLRidf Description Logic. In: ER’07. Volume 4801 of LNCS. (2007) 181-197

[14] Franconi, E., Ng, G.: The ICOM tool for intelligent conceptual modelling. In: Proc. of KRDB’00. (2000) Berlin, Germany, 2000.

[15] Keet, C.M.: Prospects for and issues with mapping the Object-Role Modeling language into DLRifd. In: Proc. of DL’07. Volume 250 of CEUR-WS. (2007) 331-338

[16] Berardi, D., Calvanese, D., De Giacomo, G.: Reasoning on UML class diagrams. Artificial Intelligence 168(1-2) (2005) 70-118

[17] Keet, C.M. Mapping the Object-Role Modeling language ORM2 into Description Logic language DLRifd. KRDB Research Centre, Free University of Bozen-Bolzano, Italy. 22 April 2009. arXiv:cs.LO/0702089v2.

New book on innovations in information systems modeling

To give my bias upfront: the book that contains my first book chapter is released today, in Innovations in Information Systems Modeling: Methods and Best Practices (part of the Advances in Database Research Book Series), which is edited by Terry Halpin, John Krogstie, and Erik Proper. To lazily copy the short description, the book has as scope (see title information sheet):

Modeling is used across a number of tasks in connection to information systems, but it is rare to see and easily compare all the uses of diagrammatical models as knowledge representation in one place, highlighting both commonalities and differences between different kinds of modeling.

Innovations in Information Systems Modeling: Methods and Best Practices provides up-to-date coverage of central topics in information systems modeling and architectures by leading researchers in the field. With chapters presented by top researchers from countries around the globe, this book provides a truly international perspective on the latest developments in information systems modeling, methods, and best practices.

The book has 15 chapters divided into four sections, being (I) language issues and improvements, (II) modelling approaches, (III) frameworks, architectures, and applications, and (IV) selected readings, containing altogether 15 chapters. The book chapters, whose abstracts are online here, range from refinements on subtyping, representing part-whole relations, and adapting ORM for representing application ontologies, to methodologies for enterprise and active knowledge modelling, to an ontological framework for method engineering and designing web information systems. The selected readings sections deal with, among others, a formal agent based approach for the modeling and verification of intelligent information systems and metamodelling in relation to software quality.

The chapter that I co-authored with Alessandro Artale is called “Essential, Mandatory, and Shared Parts in Conceptual Data Models” [1], which zooms in on formally representing the life cycle semantics of part-whole relations in conceptual data models such as those represented with ER, ORM and UML. We do this by using the temporal modality and some new fancy extensions to ERvt—a temporal EER based on the description logic language DLRus—to cover things such as essential parts, temporally suspended relations, and shareability options such as sequentially versus concurrently being part of some whole. To aid the modeler in applying it during the conceptual analysis stage, we also provide a set of closed questions and decision diagrams to find the appropriate life cycle.

A disadvantage of publishing with IGI is that they don’t accept latex files, but the poor lad from the typesetting office was patient and did his best to make something presentable out of it in MS Word (ok, I wasted quite some time on it, too). I don’t have a soft copy of the final layout version, but if you would like to have a latex-ed preprint, feel free to drop me an email. Alternatively, to gain access to all the chapters: the early-bird price (until Feb. 1, 2009) knocks off $15 of the full price of the hardcover.

[1] Alessandro Artale, and C. Maria Keet. Essential, Mandatory, and Shared Parts in Conceptual Data Models (chapter 2). In: Innovations in Information Systems Modeling: Methods and Best Practices, Terry Halpin, John Krogstie, and Erik Proper (Eds.). IGI Global, 2008, pp 17-52. ISBN: 978-1-60566-278-7

A note on improving the quality of conceptual data models with a reasoner

Moving back to work-related topics, let us have a look at quality of conceptual data models; for the ontologies-person: a conceptual data model is, roughly, a so-called “application ontology”, with data types, a relatively close resemblance to the database or application it was developed for, and generally without the heavy logical apparatus behind it. Some of the well-known conceptual data modeling languages are UML, ER/EER, and ORM/ORM2.

What exactly a “good” and a “bad” conceptual model is, is not clearly specified, but experienced modellers know when they see one. However, there are few experienced modellers compared to the databases and application software around, they are not flawless (no one is), and when the conceptual model becomes large, errors creep in anyway due to the so-called “cognitive overload”. Much effort has gone into improving the methodology of, what is called in information systems development, the conceptual analysis stage of the whole software development process as well as the, mostly graphical, conceptual modelling languages; both topics seem to be tremendous sources of turf wars. In addition, the contribution that computers and specialised software can make beyond the standard CASE tools—that, at best, can validate the conceptual model (i.e., that the model is syntactically correct, but not semantically)—is barely known, and largely an ignored aspect of the whole modeling process.

Now, I will try to talk together three seemingly, but not quite, independent events leading to the real point. (note to the reader: they give a context, but could be skipped)

First, a few researchers in conceptual data modeling have taking notice of what is happening in the ontologies arena and, not surprisingly, taken up the idea recently that, perhaps, something like that could well be used to improve conceptual data models, too [1-3]. The basic idea to reason over conceptual data models, however, was conceived at least as far back as the early ‘90s, when it was both intended to improve the quality of the models and for schema-based database integration, although it has not entertained wide-spread user-adoption (nor those reported in refs [1-3], for that matter). See the works by the DIS group of Lenzerini at “la Sapienza” University in Rome (Calvanese, Di Giacomo, Lenzerini, Nardi, and cs.).

Second, having had to visit the KSG at Meraka for 6 weeks as part of the Italy-South Africa cooperation on “Conceptual modeling and intelligent query formulation” with as aim to find some common interest to work on within the project’s scope (which was obviously not the PsyOps ontology development to ultimately streamline torture), the notion of ‘intelligent conceptual modelling’ came up again (see also the extended EMMSAD’08 presentation).

Third, last year a distinguished ex-Microsoft Visio senior programmer, Matthew Curland, visited us to explain the machinery behind the NORMA CASE tool (a free MS Visio plugin, available from sourceforge), which automatically generates a range of types application code (C#, SQL, etc.) based on an ORM2 conceptual data model. He, as well as other modellers, however, did not see an advantage to enhancing the quality of conceptual models by using reasoners compared to a validator that both NORMA and its predecessor, VisioModeler, already have. Admitted, we did not have many clear examples readily at hand back then.

Given these three events, and a recent ORM Foundation forum digression on solving problems vs. inventing them, I’ve tried to put my layout skills, preference for figures, and sense of colour-coding to, hopefully, good use to unambiguously demonstrate the differences between mere validation of a conceptual model and, among others, satisfiability checking. The automated reasoning over the conceptual data model fishes out semantic errors and, equally useful, derives additional constraints that were not explicitly modelled in the conceptual model (well, missed by the modeler). The examples were done with the reasoner-enhanced modeling tool, icom [5], and compared to the NORMA CASE tool.

Considering the demonstrated differences in the pdf we can go back to the notion of quality of conceptual data models: clearly one that is (i) consistent and (ii) as inclusive with the constraints as necessary is a better one. Regarding the former, timely detecting inconsistent, unsatisfiable, classes prevents the error(s) from propagating down to the implementation, where it otherwise results in, e.g., a class that is never instantiated or a table that remains empty, which is normally not the intention. Regarding the latter, having implicit constraints explicit at the modeling stage can ensure their correct implementation in the software or undesirable consequences can be fixed before implementation as opposed to find out during testing or operation and having to back-track the issue.

A reasoner, be it special purpose one as in [2,3] or DL-based [1,4], thus, does contribute to the goal of improving the quality of the conceptual model and, hence, the software. Or, to rephrase it in terms of solving problems: there is a lot of buggy software, and good conceptual modeling is a well-known, comparatively cheap, way of preventing such problems compared to the costly and time-consuming bug-fixing and laborious maintenance. The reasoner-enhanced conceptual modelling, then, is another feature in the conceptual modeller’s ‘toolbox’ to prevent such problems that hitherto still fell through the cracks with traditional conceptual modeling.

But someone’s interests may not lie in obtaining good conceptual data models—after all, testers and programmers want to keep their job, and so do consultants for database and application software reengineering, or researchers who focus on how to deal with inconsistent databases, methods for elaborate maintenance strategies, and whatnot. There are other advantages, though, than just good conceptual data models or application ontologies, such as relegating the graphical syntax to being “syntactic sugar” by unifying the modeling languages (see [5] and references therein), which then enables HCI researchers to have a look at what would be the best set of icons for graphical modeling or natural language experts to enhance the textual interfaces for conceptual modelling to make the modelling a more fruitful process for the modeller and domain expert alike, or to accommodate ardent supporters of, say, UML to constructively collaborate with modellers who fancy ORM in a way so that they all can keep their preferred diagrams yet work on one common conceptual data model. But more about that another time.

—————-

[1] M. Balaban, A. Maraee, A UML-based method for deciding finite satisfiability in description logics, in: F. Baader, C. Lutz, B. Motik (eds.), Proceedings of the 21st International Workshop on Description Logics (DL’08), vol. 353 of CEUR-WS, 2008, Dresden, Germany, May 13-16, 2008.

[2] K. Kaneiwa, K. Satoh, Consistency checking algorithms for restricted UML class diagrams, in: Proceedings of FoIKS ’06, Springer Verlag, 2006.

[3] Y. Smaragdakis, C. Csallner, R. Subramanian, Scalable automatic test data generations from modeling diagrams, in: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07), 2007, Nov. 5-9, Atlanta, Georgia, USA

[4] E. Franconi, G. Ng, The ICOM tool for intelligent conceptual modelling, in: 7th Workshop on Knowledge Representation meets Databases (KRDB’00), 2000, Berlin, Germany, 2000.

[5] C.M. Keet, Unifying industry-grade class-based conceptual data modeling languages with CMcom. 21st International Workshop on Description Logics (DL’08), 13-16 May 2008, Dresden, Germany. CEUR-WS, Vol-353.