A note on improving the quality of conceptual data models with a reasoner

Moving back to work-related topics, let us have a look at quality of conceptual data models; for the ontologies-person: a conceptual data model is, roughly, a so-called “application ontology”, with data types, a relatively close resemblance to the database or application it was developed for, and generally without the heavy logical apparatus behind it. Some of the well-known conceptual data modeling languages are UML, ER/EER, and ORM/ORM2.

What exactly a “good” and a “bad” conceptual model is, is not clearly specified, but experienced modellers know when they see one. However, there are few experienced modellers compared to the databases and application software around, they are not flawless (no one is), and when the conceptual model becomes large, errors creep in anyway due to the so-called “cognitive overload”. Much effort has gone into improving the methodology of, what is called in information systems development, the conceptual analysis stage of the whole software development process as well as the, mostly graphical, conceptual modelling languages; both topics seem to be tremendous sources of turf wars. In addition, the contribution that computers and specialised software can make beyond the standard CASE tools—that, at best, can validate the conceptual model (i.e., that the model is syntactically correct, but not semantically)—is barely known, and largely an ignored aspect of the whole modeling process.

Now, I will try to talk together three seemingly, but not quite, independent events leading to the real point. (note to the reader: they give a context, but could be skipped)

First, a few researchers in conceptual data modeling have taking notice of what is happening in the ontologies arena and, not surprisingly, taken up the idea recently that, perhaps, something like that could well be used to improve conceptual data models, too [1-3]. The basic idea to reason over conceptual data models, however, was conceived at least as far back as the early ‘90s, when it was both intended to improve the quality of the models and for schema-based database integration, although it has not entertained wide-spread user-adoption (nor those reported in refs [1-3], for that matter). See the works by the DIS group of Lenzerini at “la Sapienza” University in Rome (Calvanese, Di Giacomo, Lenzerini, Nardi, and cs.).

Second, having had to visit the KSG at Meraka for 6 weeks as part of the Italy-South Africa cooperation on “Conceptual modeling and intelligent query formulation” with as aim to find some common interest to work on within the project’s scope (which was obviously not the PsyOps ontology development to ultimately streamline torture), the notion of ‘intelligent conceptual modelling’ came up again (see also the extended EMMSAD’08 presentation).

Third, last year a distinguished ex-Microsoft Visio senior programmer, Matthew Curland, visited us to explain the machinery behind the NORMA CASE tool (a free MS Visio plugin, available from sourceforge), which automatically generates a range of types application code (C#, SQL, etc.) based on an ORM2 conceptual data model. He, as well as other modellers, however, did not see an advantage to enhancing the quality of conceptual models by using reasoners compared to a validator that both NORMA and its predecessor, VisioModeler, already have. Admitted, we did not have many clear examples readily at hand back then.

Given these three events, and a recent ORM Foundation forum digression on solving problems vs. inventing them, I’ve tried to put my layout skills, preference for figures, and sense of colour-coding to, hopefully, good use to unambiguously demonstrate the differences between mere validation of a conceptual model and, among others, satisfiability checking. The automated reasoning over the conceptual data model fishes out semantic errors and, equally useful, derives additional constraints that were not explicitly modelled in the conceptual model (well, missed by the modeler). The examples were done with the reasoner-enhanced modeling tool, icom [5], and compared to the NORMA CASE tool.

Considering the demonstrated differences in the pdf we can go back to the notion of quality of conceptual data models: clearly one that is (i) consistent and (ii) as inclusive with the constraints as necessary is a better one. Regarding the former, timely detecting inconsistent, unsatisfiable, classes prevents the error(s) from propagating down to the implementation, where it otherwise results in, e.g., a class that is never instantiated or a table that remains empty, which is normally not the intention. Regarding the latter, having implicit constraints explicit at the modeling stage can ensure their correct implementation in the software or undesirable consequences can be fixed before implementation as opposed to find out during testing or operation and having to back-track the issue.

A reasoner, be it special purpose one as in [2,3] or DL-based [1,4], thus, does contribute to the goal of improving the quality of the conceptual model and, hence, the software. Or, to rephrase it in terms of solving problems: there is a lot of buggy software, and good conceptual modeling is a well-known, comparatively cheap, way of preventing such problems compared to the costly and time-consuming bug-fixing and laborious maintenance. The reasoner-enhanced conceptual modelling, then, is another feature in the conceptual modeller’s ‘toolbox’ to prevent such problems that hitherto still fell through the cracks with traditional conceptual modeling.

But someone’s interests may not lie in obtaining good conceptual data models—after all, testers and programmers want to keep their job, and so do consultants for database and application software reengineering, or researchers who focus on how to deal with inconsistent databases, methods for elaborate maintenance strategies, and whatnot. There are other advantages, though, than just good conceptual data models or application ontologies, such as relegating the graphical syntax to being “syntactic sugar” by unifying the modeling languages (see [5] and references therein), which then enables HCI researchers to have a look at what would be the best set of icons for graphical modeling or natural language experts to enhance the textual interfaces for conceptual modelling to make the modelling a more fruitful process for the modeller and domain expert alike, or to accommodate ardent supporters of, say, UML to constructively collaborate with modellers who fancy ORM in a way so that they all can keep their preferred diagrams yet work on one common conceptual data model. But more about that another time.

—————-

[1] M. Balaban, A. Maraee, A UML-based method for deciding finite satisfiability in description logics, in: F. Baader, C. Lutz, B. Motik (eds.), Proceedings of the 21st International Workshop on Description Logics (DL’08), vol. 353 of CEUR-WS, 2008, Dresden, Germany, May 13-16, 2008.

[2] K. Kaneiwa, K. Satoh, Consistency checking algorithms for restricted UML class diagrams, in: Proceedings of FoIKS ’06, Springer Verlag, 2006.

[3] Y. Smaragdakis, C. Csallner, R. Subramanian, Scalable automatic test data generations from modeling diagrams, in: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07), 2007, Nov. 5-9, Atlanta, Georgia, USA

[4] E. Franconi, G. Ng, The ICOM tool for intelligent conceptual modelling, in: 7th Workshop on Knowledge Representation meets Databases (KRDB’00), 2000, Berlin, Germany, 2000.

[5] C.M. Keet, Unifying industry-grade class-based conceptual data modeling languages with CMcom. 21st International Workshop on Description Logics (DL’08), 13-16 May 2008, Dresden, Germany. CEUR-WS, Vol-353.