A review on logics for conceptual data modelling

Pablo and I thought we could write the review quickly. We probably could have done so for a superficial review, describing the popular logics, formalisation decisions, and reasoning services for conceptual data models. Those sections were the easiest sections to write as well, but reviewing some 30 years of research on only that theme was heading toward a ‘boring’ read. If the lingering draft review could have spoken to us last year, it would have begged to be fed and nurtured… and we listened, or, rather, we decided to put in some extra work.

There’s much more to the endeavour than a first glance would suggest, and so we started digging deeper to add more flavour and content. Clarifying the three main strands on logics for conceptual data modelling, for instance. Spelling out what the key dimensions are where one has to make choices when formalising a conceptual data model, just in case anyone else wants to give it a try, too. Elucidating distinctions between the two approaches to formalising the models, being rule-based and mapping-based, and where and how exactly that affects the whole thing.

A conceptual model describing the characteristics of the two main approaches used for creating logic-based reconstructions of conceptual data models: Mapping-based and rule-based. (See paper for details)

Specifically, along the way in the paper, we try to answer four questions:

  • Q1: What are the tasks and challenges in that formalisation?
  • Q2: Which logics are popular for which (sub-)aim?
  • Q3: What are the known benefits of a logic-based reconstruction in terms of the outcome and in terms of reasoning services that one may use once a CDM is formalised?
  • Q4: What are some of the outstanding problems in logic-based conceptual data modelling?

Is there anything to do still on this topic, one may wonder, considering that it has been around since the 1990s? Few, if anyone, will care about just another formalisation and you’ll unlikely to get that published no matter how much effort it took you to do. Yet, Question 4 could indeed be answered and it’s far from a ‘no’.

We need more evidence-based research, more tools with more features, and conceptual modelling methodologies that incorporate the automated reasoner. There’s some work to do to integrate better, or at least offer lessons-learnt and have results re-purposed, with closely related areas, such as with ontology-based data access and with ShEx & SHACL with graphs. One could use the logic foundations to explore new applications in other contexts than just modelling and that also need such rigour, such as automated generation and maintenance of conceptual data models,  multilingual models and related tasks with controlled natural languages or summarization (text generation from models), test data generation, and query optimization, among others.

More details of all this can be found in the (open access) paper:  

Pablo R. Fillottrani and C. Maria Keet. Logics for Conceptual Data Modelling: A Review. In Special Issue on Trends in Graph Data and Knowledge – Part 2. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 1, pp. 4:1-4:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/TGDK.2.1.4

An illustration of an “ERDP” to create an EER diagram: the dance school database

How to develop a conceptual data model, such as an EER diagram, UML Class Diagram, or ORM model? Besides dropping icons here and there on an empty canvas, a few strategies exist for approaching it systematically or at least in an assisted way, be it for ‘small data’ or for ‘big data’. One of them that I found useful to experiment with when I started out many years ago with the ‘small data’ cases, was the Conceptual Schema Design Procedure (CSDP) for ORM, as summarised in Table 1 below. It is summarised in that whitepaper and its details span a few hundred pages in Terry Halpin’s books [Halpin01], which was further extended in his later works. Extended Entity-Relationship modelling is more popular than Object-Role Modeling, however, and yet there’s no such CSDP for it. The elements don’t have the same name and the list of possible constraints to take into account are not the same in both families of languages either [KeetFillottrani15]. So, I amended it to make it work for EER.

Table 1. CSDP as summarised by Halpin in the white paper about Object-Role Modeling.

StepDescription
1Transform familiar information examples into elementary facts, and apply quality checks
2Draw the fact types, and apply a population check
3Check for entity types that should be combined, and note any arithmetic derivations
4Add uniqueness constraints, and check arity of fact types
5Add mandatory role constraints, and check for logical derivations
6Add value, set comparison and subtyping constraints
7Add other constraints and perform final checks

Unsurprisingly, yes, it is feasible to rework the CSDP for ORM to also be of use for designing EER diagrams, in an “ERDP”, ER Design Procedure, if you will. A basic first version is described in Chapter 4 of my new book that is currently in print with Springer [Keet23] (and available for pre-order from multiple online retailers already). I padded the CSDP-like procedure of the example a bit on both ends. There’s an optional preceding ‘step 0’ to explore the domain to prepare for a client meeting. Steps 1-7 are summarised in Table 2: listing the sample facts, drawing the core elements, and then adding constraints: cardinality, mandatory/optional participation, value, disjointness and completeness. Step 7 mostly amounts to adding nothing more, since EER has fewer constraints than ORM. Later steps may include quality improvements and various additions that some, but not all, EER variants have.

Table 2. Revised basic CSDP for EER diagrams.

StepDescription
0Universe of discourse (subject domain) exploration
1Transform familiar or provided sample examples into elementary facts, and apply quality checks
2Draw the entity types, relationships, and attributes
3Check for entity types that should be combined or generalised
4Add cardinality constraints, and check arity of fact types
5Add mandatory/optional constraints
6Add value constraints and subtyping constraints
7Add any other constraints of the EER variant used and perform final checks

The book’s chapter on conceptual data models also includes an example of the size that fits neatly when taking into account the page numbers and the rest of the content. As bonus material, I made a longer example now available on this page, which is about developing an EER diagram for a database to manage data for a dance school.

Picture of dancing the Ball de pastors del pirineo
Picture of our group dancing the “Ball de pastors del pirineo”.

I did go through a ‘step 0’ to explore the subject domain to explore my knowledge of dance schools, which was facilitated by having been member of several dance schools over the years. The example then goes through the 7-step procedure. All this gets us from devising facts such as

in a step-wise fashion with intermediate partial models to the final one, in Information Engineering notation, as shown in the following image:

Figure 1. The final EER diagram at the end of “step 6” of the procedure.

The dance school model description also hints at what lies beyond step 7, such as automated reasoning and ontology-driven aspects (not included in this basic version), and the page has a few notes on notations. I used IE notation because I really like the visuals of the crow’s feet for cardinality, but there’s a snag and some textbooks use Chen’s or a ‘Chen-like’ notation. Therefore, I added those variants on the page near the end.

Are the resulting models any better with such a basic procedure than without? I don’t know; it has never been tested. We have around 450 students who will have to learn EER in the first semester of their second year in computer science, so there may be plenty of participants for an experiment to make the conclusions more convincing. If you’re interested in teaming up for the research to find out, feel free to email me. 

References

[Halpin01] Halpin, T. Information Modeling and Relational Databases. San Francisco: Morgan Kaufmann Publishers. 2001.

[KeetFillottrani15] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 2015, 98:30-53.

[Keet23] Keet, C.M. The What and How of Modelling Information and Knowledge: From Mind Maps to Ontologies. Springer, in press. ISBN-10: 3031396944; ISBN-13: 978-3031396946.

Systematic design of conceptual modelling languages

What would your ideal modelling language look like if you were to design one yourself? How would you go about defining your own language? The act of creating your own pet language can be viewed as a design process. Processes can be structured. It wasn’t the first thing we wanted to address when my collaborator Pablo Fillottrani and I were trying to design evidence-based conceptual data modelling languages. Yet. All those other conceptual modelling languages out there did not sprout from a tree; people designed them, albeit most often not always in a systematic way. We wanted to design ours in a systematic, repeatable, and justified way.

More broadly, modelling is growing up as a field of specialisation, and is even claimed by some to be deserving to be its own discipline [CabotVallecillo22]. Surely someone must have thought of this notion of language design processes before? To a limited extent, yes. There are a few logicians who have thought about procedures and have used a procedure or part thereof. Two notable examples are OWL and DOL, which both went through a requirements specification phase, goals were formulated, and the language was designed. OWL was also assessed on usage and a ‘lessons learned’ was extracted from it to add one round of improvements, which resulted in OWL 2.

But what would a systematic procedure look like? Ulrich Frank devised a waterfall methodology for domain-specific languages [Frank13], which are a bit different from conceptual data modelling languages. Pablo and I modified that to make it work for designing ontology languages. Its details, and focussing on the additional ‘ontological analysis’ step, is described in our FOIS2020 paper [FillottraniKeet20] and I wrote a blogpost about that before. It also includes the option to iterate over the steps, there are optional steps, and there is that ontological analysis where deciding on certain elements entail philosophical choices for one theory or another. We tweaked it further so that it also would work for conceptual data modelling language design, which was published in a journal article on the design of a set of evidence-based conceptual data modelling languages [FillottraniKeet21] in late 2021, but that I hadn’t gotten around to writing a blog post about yet. Let me summarise the steps visually in the figure below.

Overview of a procedure for conceptual modelling and ontology language design (coloured in from [FillottraniKeet21])

For marketing purposes, I probably should come up with a easily pronounceable name for the proposed procedure, like MeCModeL (Methodology for the Creation of Modelling Languages) or something, We’re open to suggestions. Be that as it may, let’s briefly summarise each step in the remainder of this post.

Step 1. Clarification of scope and purpose

We first need to clarify the scope, purpose, expected benefits, and possible long-term perspective, and consider the feasibility given the resources available. For instance, if you were to want to design a new conceptual data modelling language tailored to temporal model-based data access, and surpass UML class diagrams, it’s unlikely going to work. For one, the Object Management Group has more resources both in the short and in the long term to promote and sustain UML. Second, reasoning over temporal constraints is computationally expensive so it won’t scale to access large amounts of data. We’re halted in our tracks already. Let’s try this again. What about a new temporal UML that has a logic-based reconstruction for precision? Its purpose is to model more of the subject domain more precisely. The expected benefits would be better quality models, because more precise, and thus better quality applications. A long-term perspective does not apply, as it’s just a use case scenario here. Regarding feasibility, let’s assume we do have the competencies, people, and funding to develop the language and tool, and to carry out the evaluation.

Step 2. Analysis of general requirements

The “analysis and general requirements” step can be divided into three parallel or sequential tasks: determining the requirements for modelling (and possibly the associated automated reasoning over the models), devising use case scenarios, and assigning priorities to each. An example of a requirement is the ability to represent change in the data and to keep track of it, such as the successive stages in signing computational legal contracts. Devising a list of requirements out of the blue is nontrivial, but there are a few libraries of possible requirements out there that can help with picking and choosing. For conceptual modelling languages, there is no such library yet, however, but we created a preliminary library of features for ontology languages that may be of use.

Use cases can vary widely, depending on the scope, purpose, and requirements of the language aimed for. For requirements, use cases can be described as the kind of things you want to be able to represent in the prospective language. For instance, that employee Jane as Product Manager
may change her job in the company to Area Manager or that she’s repeatedly assigned on a project for a specified duration. The former is an example of object migration and the latter of a ternary relationship or a binary with an attribute. An end user stakeholder bringing up these examples may not know that, but as language designer, one would need to recognise the language feature(s) needed for it. Another type of use case may be about how a modeller would interact with the language and the prospective modelling tool.

Step 3. Analysis of specific requirements and ontological analysis

Here’s were the ontological commitments are made, even if you don’t want to or think you don’t do so. Even before looking at the temporal aspects, the fact that we committed to UML class diagrams already entails we committed to, among others, the so-called positionalist commitment of relations and a class-based approach (cf. first order predicate logic, where there are just ordered relations of arity >=1), and we adhere to the most common take on representing temporality, where there are 3-dimensional objects and a separate temporal dimension is added whenever the entity needs it (the other option being 4-dimensionalism). Different views affect how time is included in the language. With the ‘add time to a-temporal’ choice, there are still more decisions to take, like whether time is linear and whether it consists of adjacent successive timepoints (chronons) or that another point can always be squeezed in-between (dense time). Ontological differences they really are, even if you chose ‘intuitively’ hitherto. There are more such ontological decisions, besides these obvious ones on time and relations, which are described in our FOIS2020 paper. In all but one paper about languages, such choices were left implicit and time will tell whether it’ll be picked up for the design of new languages.

The other sub-step of step 3 has been very much to the fore if logic plays a role in the language design. Which elements are going to be in the language, how are they going to look like, how scalable does it have to be, and should it extend existing infrastructure or be something entirely separate from it? For our temporal UML, the answers may be that the atemporal elements are those from UML class diagrams, all the temporal stuff with their icons shall be carried over from the TREND conceptual data modelling language [KeetBerman17], and the underlying logic, DLRus, is not even remotely close to being scalable so there is no existing tool infrastructure. Of course, someone else may make other decisions here.

Step 4. Language specification

Now we’re finally getting down to what from the outside may seem to be the only task: defining the language. There are two key ways of doing it, being either to define the syntax and the semantics or to make a metamodel for your language. The syntax can be informal-ish, like listing the permissible graphical elements and then a BNF grammar for how they can be used. This we can do also for logics more precisely, like that UML’s arrow for class subsumption is a ⇒ in our logic-based reconstruction rather than a →, as you wish. Once the syntax is settled, we need to give it meaning, or: define the semantics of the language. For instance, that a rectangle means that it’s a class that can have instances and a line between classes denotes a relationship. Or that that fancy arrow means that if C ⇒ D, then all instances of C are also instances of D in all possible worlds (that in the interpretation of C ⇒ D we have that CI ⊂ DI). Since logic is not everyone’s preference, metamodelling to define the language may be a way out; sometimes a language can be defined in its own language, sometimes not (e.g., ORM can be [Halpin04]). For our temporal UML example, we can use the conversions from EER to UML class diagrams (see, e.g., our FaCIL paper with the framework, implementation and the theory it uses), and then also reuse the extant logic-based reconstruction in the DLRus Description Logic.

Once all that has been sorted, there’s still the glossary and documentation to write so that potential users and tool developers can figure out what you did. There’s neither a minimum nor a maximum page limit for it. The UML standard is over 700 pages long, DOL is 209 pages, and the CL Standard is 70 pages. Others hide their length by rendering it as a web page and toggle figures and examples; the OWL 2 functional style syntax in A4-sized MS Word amounts to 118 pages in 12-point Times New Roman font, whereas the original syntax and semantics of the underlying logic SROIQ [HorrocksEtAl06], including the key algorithms, is just 11 pages or about 20 reformatted in 12-point single-column A4. And it may need to be revised due to potential infelicities in steps 5-7. For our temporal UML, there will be quite a number of pages.

Step 5. Design of notation for modeller

It may be argued that designing the notation is part of the language specification, but, practically, different stakeholders want different things out of it, especially if your language is more like a programming language or a logic rather than diagrammatic. Depending on your intended audience, graphical or textual notations may be preferred. You’ll need to tweak that additional notation and evaluate it with a representative selection of prospective users on whether the models are easy to understand and to create. To the best of my knowledge, that never happened at the bedrock of any of the popular logics, be it first order predicate logic, Description Logics, or OWL, which may well be a reason why there are so many research papers on providing nicer renderings of them, sugar-coating it either diagrammatically, with a controlled natural language, or a different syntax. OWL 2 has 5 different official syntaxes, even. For our hypothetical temporal UML: since we’re transferring TREND, we may as well do so for the graphical notation and the controlled natural language for it.

Step 6. Development of modelling tool

Create a computer-processable format of it, i.e., a serialisation, which assumes 1) you want to have it implemented and a modelling tool for it and 2) it wasn’t already serialised in step 4. If you don’t want an implementation, this step can be skipped. Creating such a serialisation format, however, will help getting it adopted more widely than yourself (although it’s by no means a guarantee that it will). There are also other reasons why you may want to create a computer processable version for the new language, such as sending it to an automated reasoner or automatically checking that a model adheres to the language specifications and to highlight syntax errors, or any other application scenario. Our fictitious temporal UML doesn’t have a computer-processable format and neither does TREND to copy it from, but we ought to because we do want a tool for both.

Step 7. Evaluation and refinement

Evaluation involves defining and executing test cases to validate and verify them on the language. Remember those use cases from step 2 and the ontological requirements of step 3? They count as test cases: can that be modelled in the new language and does it have the selected features? If so, good; if not, you better have a good reason for why not. If you don’t, then you’ll need to return to step 4 to improve the language. For our temporal UML, we’re all sorted, as both the object and relation migration constraints can be represented, as well as ternaries.

Let’s optimistically assume it all went well with your design, and your language passes all those tests. The last task, at least for the first round, is to analyse the effect of usage in practice. Do users use it in the way intended? Are they under-using some language features and discovering they want another, now that they’re deploying it? Are there unexpected user groups with additional requirements that may be strategically savvy to satisfy? If the answers are a resounding ‘no’ to the second and third question in particular, you may rest on your laurels. If the answer is ‘yes’, you may need to cycle through the procedure again to incorporate updates and meet moving goalposts. There’s no shame in that. UML’s version 1.0 was released in 1997 and then came 1.1, 1.3, 1.4, 1.5, 2.0, 2.1, 2.1.1, 2.1.2, 2.2, 2.3, 2.4.1, 2.5, and 2.5.1. The UML 2.6 Revision Task Force faces an issue tracker of around 800 issues, five years after the 2.5.1 official release. They are not all issues with the UML class diagram language, but it does indicate things change. OWL had a first version in 2004 and then a revised one in 2008. ER evolved into EER; ORM into ORM2.

Regardless of whether your pet language is used by anyone other than yourself, it’s fun designing one, even if only because then you don’t have to abide by other people’s decisions on what features modelling language should have and if it turns out the same as an existing one, you’ll have a better understanding of why that is the way it is. What the procedure does not include, but may help marketing your pet language, is how to name it. UML, ER, and ORM are not the liveliest acronyms and not easy to pronounce. Compare that to Mind Maps, which is a fine alliteration at least. OWL, for the web ontology language, is easy to pronounce and it is nifty in that owl-as-animal is associated with knowledge, and OWL is a knowledge representation language, albeit that this explanation is a tad bit long for explaining a name. Some of the temporal ER languages have good names too, like TimER and TREND. With this last naming consideration, we have pushed it as far as possible in the current language development process.

In closing

The overall process is, perhaps, not an exciting one, but it will get the job done and you’ll be able to justify what you did and why. Such an explanation beats an ‘I just liked it this way’. It also may keep language scope creep in check, or at least help to become cognizant about it, and you may have the answer ready to a user asking for a feature.

Our evidence-based conceptual data modelling languages introduced in [FillottraniKeet21] have clear design rationales and evidence to back it up. We initially didn’t like them much ourselves, for they are lean languages rather than the very expressive ones that we’d hoped for when we started out with the investigation, but they do have their advantages, such as run-time usage in applications including ontology-based data access, automated verification, query compilation, and, last but not least, seamless interoperability among EER, UML class diagrams and ORM2 [BraunEtAl23].

References

[BraunEtAl23] Braun, G., Fillottrani, P.R., Keet, C.M. A Framework for Interoperability Between Models with Hybrid Tools, Journal of Intelligent Information Systems, (in print since July 2022).

[CabotVallecillo22] Cabot, Jordi and Vallecillo, Antonio. Modeling should be an independent scientific discipline. Software and Systems Modeling, 2022, 22:2101–2107.

[Frank13] Frank, Ulrich. Domain-specific modeling languages – requirements analysis anddesign guidelines. In Reinhartz-Berger, I.; Sturm, A.; Clark, T.; Bettin, J., and Cohen, S., editors, Domain Engineering: Product Lines, Conceptual Models, and Languages, pages 133–157. Springer, 2013

[Halpin04] Halpin, T. A. Advanced Topics in Database Research, volume 3, chapter Comparing Metamodels for ER, ORM and UML Data Models, pages 23–44. Idea Publishing Group, Hershey PA, USA, 2004.

[HorrocksEtAl06] Horrocks, Ian, Kutz, Oliver, and Sattler, Ulrike. The even more irresistible SROIQ. Proceedings of KR-2006, AAAI, pages 457–467, 2006.

[FillottraniKeet20] Fillottrani, P.R., Keet, C.M.. An analysis of commitments in ontology language design. 11th International Conference on Formal Ontology in Information Systems 2020 (FOIS’20). Brodaric, B and Neuhaus, F. (Eds.). IOS Press, FAIA vol. 330, 46-60.

[FillottraniKeet21] Fillottrani, P.R., Keet, C.M. Evidence-based lean conceptual data modelling languages. Journal of Computer Science and Technology, 2021, 21(2): 93-111.

[KeetBerman17] Keet, C.M., Berman, S. Determining the preferred representation of temporal constraints in conceptual models. 36th International Conference on Conceptual Modeling (ER’17). Mayr, H.C., Guizzardi, G., Ma, H. Pastor. O. (Eds.). Springer LNCS vol. 10650, 437-450. 6-9 Nov 2017, Valencia, Spain.

Semantic interoperability of conceptual data modelling languages: FaCIL

Software systems aren’t getting any less complex to design, implement, and maintain, which applies to both the numerous diverse components and the myriad of people involved in the development processes. Even a straightforward configuration of a data­base back-end and an object-oriented front-end tool requires coordination among database analysts, programmers, HCI people, and increasing involvement of domain experts and stakeholders. They each may prefer, and have different competencies in, certain specific design mechanisms; e.g., one may want EER for the database design, UML diagrams for the front-end app, and perhaps structured natural language sentences with SBVR or ORM for expressing the business rules. This requires multi-modal modelling in a plurality of paradigms. This would then need to be supported by hybrid tools that offer interoperability among those modelling languages, since such heterogeneity won’t go away any time soon, or ever.

Example of possible interactions between the various developers of a software system and the models they may be using.

It is far from trivial to have these people work together whilst maintaining their preferred view of a unified system’s design, let alone doing all this design in one system. In fact, there’s no such tool that can seamlessly render such varied models across multiple modelling languages whilst preserving the semantics. At best, there’s either only theory that aims to do that, or only a subset of the respective languages’ features, or a subset of the required combinations. Well, more precisely, until our efforts. We set out to fill this gap in functionality, both in a theoretically sound way and implemented as proof-of-concept to demonstrate its feasibility. The latest progress was recently published in the paper entitled A framework for interoperability with hybrid tools in the Journal of Intelligent Information Systems [1], in collaboration with Germán Braun and Pablo Fillottrani.

First, we propose the Framework for semantiC Interoperability of conceptual data modelling Languages, FaCIL, which serves as the core orchestration mechanism for hybrid modelling tools with relations between components and a workflow that uses them. At its centre, it has a metamodel that is used for the interchange between the various conceptual models represented in different languages and it has sets of rules to and from the metamodel (and at the metamodel level) to ensure the semantics is preserved when transforming a model in one language into a model in a different language and such that edits to one model automatically propagate correctly to the model in another language. In addition, thanks to the metamodel-based approach, logic-based reconstructions of the modelling languages also have become easier to manage, and so a path to automated reasoning is integrated in FaCIL as well.

This generic multi-modal modelling interoperability framework FaCIL was instantiated with a metamodel for UML Class Diagrams, EER, and ORM2 interoperability specifically [2] (introduced in 2015), called the KF metamodel [3] with its relevant rules (initial and implemented ones), an English controlled natural language, and a logic-based reconstruction into a fragment of OWL (orchestration graphically from the paper). This enables a range of different user interactions in the modelling process, of which an example of a possible workflow is shown in the following figure.

A sample workflow in the hybrid setting, showing interactions between visual conceptual data models (i.e., in their diagram version) and in their (pseudo-)natural language versions, with updates propagating to the others automatically. At the start (top), there’s a visual model in one’s preferred language from which a KF runtime model is generated. From there, it can go in various directions: verbalise, convert, or modify it. If the latter, then the KF runtime model is also updated and the changes are propagated to the other versions of the model, as often as needed. The elements in yellow/green/blue are thanks to FaCIL and the white ones are the usual tasks in the traditional one-off one-language modelling setting.

These theoretical foundations were implemented in the web-based crowd 2.0 tool (with source code). crowd 2.0 is the first hybrid tool of its kind, tying together all the pieces such that now, instead of partial or full manual model management of transformations and updates in multiple disparate tools, these tasks can be carried out automatically in one application and therewith also allow diverse developers and stakeholders to work from a shared single system.

We also describe a use case scenario for it – on Covid-19, as pretty much all of the work for this paper was done during the worse-than-today’s stage of the pandemic – that has lots of screenshots from the tool in action, both in the paper (starting here, with details halfway in this section) and more online.

Besides evaluating the framework with an instantiation, a proof-of-concept implementation of that instantiation, and a use case, it was also assessed against the reference framework for conceptual data modelling of Delcambre and co-authors [4] and shown to meet those requirements. Finally, crowd 2.0’s features were assessed against five relevant tools, considering the key requirements for hybrid tools, and shown to compare favourable against them (see Table 2 in the paper).

Distinct advantages can be summed up as follows, from those 26 pages of the paper, where the, in my opinion, most useful ones are underlined here, and the most promising ones to solve another set of related problems with conceptual data modelling (in one fell swoop!) in italics:

  • One system for related tasks, including visual and text-based modelling in multiple modelling languages, automated transformations and update propagation between the models, as well as verification of the model on coherence and consistency.
  • Any visual and text-based conceptual model interaction with the logic has to be maintained only in one place rather than for each conceptual modelling and controlled natural language separately;
  • A controlled natural language can be specified on the KF metamodel elements so that it then can be applied throughout the models regardless the visual language and therewith eliminating duplicate work of re-specifications for each modelling language and fragment thereof;
  • Any further model management, especially in the case of large models, such as abstraction and modularisation, can be specified either on the logic or on the KF metamodel in one place and propagate to other models accordingly, rather than re-inventing or reworking the algorithms for each language over and over again;
  • The modular design of the framework allows for extensions of each component, including more variants of visual languages, more controlled languages in your natural language of choice, or different logic-based reconstructions.

Of course, more can be done to make it even better, but it is a milestone of sorts: research into the  theoretical foundations of this particular line or research had commenced 10 years ago with the DST/MINCyT-funded bi-lateral project on ontology-driven unification of conceptual data modelling languages. Back then, we fantasised that, with more theory, we might get something like this sometime in the future. And we did.

References

[1] Germán Braun, Pablo Fillottrani, and C Maria Keet. A framework for interoperability with hybrid tools. Journal of Intelligent Information Systems, in print since 29 July 2022.

[2] Keet, C. M., & Fillottrani, P. R. (2015). An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 98, 30–53.

[3] Fillottrani, P.R., Keet, C.M. KF metamodel formalization. Technical Report, Arxiv.org http://arxiv.org/abs/1412.6545. Dec 19, 2014. 26p.

[4] Delcambre, L. M. L., Liddle, S. W., Pastor, O., & Storey, V. C. (2018). A reference framework for conceptual modeling. In: 37th International Conference on Conceptual Modeling (ER’18). LNCS. Springer, vol. 11157, 27–42.

Experimentally-motivated non-trivial intermodel links between conceptual models

I am well aware that some people prefer Agile and mash-ups and such to quickly, scuffily, put an app together, but putting a robust, efficient, lasting, application together does require a bit of planning—analysis and design in the software development process. For instance, it helps to formalise one’s business rules or requirements, or at least structure them with, say, SBVR or ORM, so as to check that the rules obtained from the various stakeholders do not contradict themselves cf. running into problems when testing down the line after having implemented it during a testing phase. Or analyse a bit upfront which classes are needed in the front-end application layer cf. perpetual re-coding to fix one’s mistakes (under the banner ‘refactoring’, as if naming the process gives it an air of respectability), and create, say, a UML diagram or two. Or generating a well-designed database based on an EER model.

Each of these three components can be done in isolation, but how to do this for complex system development where the object-oriented application layer hast to interact with the database back-end, all the while ensuring that the business rules are still adhered to? Or you had those components already, but they need to be integrated? One could link the code to tables in the implementation layer, on an ad hoc basis, and figure it out again and again for any combination of languages and systems. Another one is to do that at the conceptual modelling layer irrespective of the implementation language. The latter approach is reusable (cf. reinventing the mapping wheel time and again), and at a level of abstraction that is easier to cope with for more people, and even more so if the system is really large. So, we went after that option for the past few years and have just added another step to realising all this: how to link which elements in the different models for the system.

It is not difficult to imagine a tool where one can have several windows open, each with a model in some conceptual modelling language—many CASE tools already support modelling in different notations anyway. It is conceptually also fairly straightforward when in, say, the UML model there is a class ‘Employee’ and in the ER diagram there’s an ‘Employee’ entity type: it probably will work out to align these classes. Implementing just this is a bit of an arduous engineering task, but doable. In fact, there is such a tool for models represented in the same language, where the links can be subsumption, equivalence, or disjointness between classes or between relationships: ICOM [2]. But we need something like that to work across modelling languages as well, and for attributes, too. In the hand-waiving abstract sense, this may be intuitively trivial, but the gory details of the conceptual and syntax aspects are far from it. For instance, what should a modeller do if one model has ‘Address’ as an attribute and the other model has it represented as a class? Link the two despite being different types of constructs in the respective languages? Or that ever-recurring example of modelling marriage: a class ‘Marriage’ with (at least) two participants, or ‘Marriage’ as a recursive relationship (UML association) of a ‘Person’ class? What to do if a modeller in one model had chosen the former option and another modeller the latter? Can they be linked up somehow nonetheless, or would one have to waste a lot of time redesigning the other model?

Instead of analysing this for each case, we sought to find a generic solution to it; with we being: Zubeida Khan, Pablo Fillottrani, Karina Cenci, and I. The solution we propose will appear soon in the proceedings of the 20th Conference on Advances in DataBases and Information Systems (ADBIS’16) that will be held at the end of this month in Prague.

So, what did we do? First, we tried to narrow down the possible links between elements in the models: in theory, one might want to try to link anything to anything, but we already knew some model elements are incompatible, and we were hoping that others wouldn’t be needed yet other suspected to be needed, so that a particular useful subset could be the focus. To determine that, we analysed a set of ICOM projects created by students at the Universidad Nacionál del Sur (in Bahía Blanca), and we created model integration scenarios based on publicly available conceptual models of several subject domains, such as hospitals, airlines, and so on, including EER diagrams, UML class diagrams, and ORM models. An example of an integration scenario is shown in the figure below: two conceptual models about airline companies, with on the left the ER diagram and on the right the UML diagram.

One of the integration scenarios [1]

One of the integration scenarios [1]

The solid purple links are straightforward 1:1 mappings; e.g., er:Airlines = uml:Airline. Long-dashed dashed lines represent ‘half links’ that are semantically very similar, such as er:Flight.Arr_time ≈ uml:Flight.arrival_time, where the idea of attribute is the same, but ER attributes don’t have a datatype specified whereas UML attributes do. The red short-dashed dashed lines require some transformation: e.g., er:Airplane.Type is an attribute yet uml:Aircraft is a class, and er:Airport.code is an identifier (with its mandatory 1:1 constraint, still no datatype) but uml:Airport.ID is just a simple attribute. Overall, we had 40 models with 33 schema matchings, with 25 links in the ICOM projects and 258 links in the integration scenarios. The detailed aggregates are described in the paper and the dataset is available for reuse (7MB). Unsurprisingly, there were more attribute links than class links (if a class can be linked, then typically also some of its attributes). There were 64 ‘half’ links and 48 transformation links, notably on the slightly compatible attributes, attributes vs. identifiers, attribute<->value type, and attribute<->class.

Armed with these insights from the experiment, a general intermodel link validation approach [3] that uses the unified metamodel [4], and which type of elements occur mostly in conceptual models with their logic-based profiles [5,6], we set out to define those half links and transformation links. While this could have been done with a logic of choice, we opted for a clear step toward implementability by exploiting the ATLAS Transformation Language (ATL) [7] to specify the transformations. As there’s always a source model and a target model in ATL, we constructed the mappings such that both models in question are the ‘source’ and both are mapped into a new, single, ‘target’ model that still adheres to the constraints imposed by the unifying metamodel. A graphical depiction of the idea is shown in the figure below; see paper for details of the mapping rules (they don’t look nice in a blog post).

Informal, graphical rendering of the rule AttributeObject Type output [1]

Informal, graphical rendering of the rule Attribute<->Object Type output [1]

Someone into this matter might think, based on this high-level description, there’s nothing new here. However, there is, and the paper has an extensive related works section. For instance, there’s related work on Distributed Description Logics with bridge rules [8], but they don’t do attributes and the logics used for that doesn’t fit well with the features needed for conceptual modelling, so it cannot be adopted without substantial modifications. Triple Graph Grammars look quite interesting [9] for this sort of task, as does DOL [10], but either would require another year or two to figure it out (feel free to go ahead already). On the practical side, e.g., the Eclipse metamodel of the popular Eclipse Modeling Framework didn’t have enough in the metamodel for what needs to be included, both regarding types of entities and the constraints that would need to be enforced. And so on, such that by a process of elimination, we ended up with ATL.

It would be good to come up with those logic-based linking options and proofs of correctness of the transformation rules presented in the paper, but in the meantime, an architecture design of the new tool was laid out in [11], which is in the stage of implementation as I write this. For now, at least a step has been taken from the three years of mostly theory and some experimentation toward implementation of all that. To be continued J.

 

References

[1] Khan, Z.C., Keet, C.M., Fillottrani, P.R., Cenci, K.M. Experimentally motivated transformations for intermodel links between conceptual models. 20th Conference on Advances in Databases and Information Systems (ADBIS’16). Springer LNCS. August 28-31, Prague, Czech Republic. (in print)

[2] Fillottrani, P.R., Franconi, E., Tessaris, S. The ICOM 3.0 intelligent conceptual modelling tool and methodology. Semantic Web Journal, 2012, 3(3): 293-306.

[3] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer Lecture Notes in Computer Science LNCS vol. 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

[4] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 2015, 98:30-53.

[5] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Johannesson, P., Lee, M.L. Liddle, S.W., Opdahl, A.L., Pastor López, O. (Eds.). Springer LNCS vol 9381, 585-593. 19-22 Oct, Stockholm, Sweden.

[6] Fillottrani, P.R., Keet, C.M. Evidence-based Languages for Conceptual Data Modelling Profiles. 19th Conference on Advances in Databases and Information Systems (ADBIS’15). Morzy et al. (Eds.). Springer LNCS vol. 9282, 215-229. Poitiers, France, Sept 8-11, 2015.

[7] Jouault, F. Allilaire, F. Bzivin, J. Kurtev, I. ATL: a model transformation tool. Science of Computer Programming, 2008, 72(12):31-39.

[8] Ghidini, C., Serafini, L., Tessaris, S., Complexity of reasoning with expressive ontology mappings. Formal ontology in Information Systems (FOIS’08). IOS Press, FAIA vol. 183, 151-163.

[9] Golas, U., Ehrig, H., Hermann, F. Formal specification of model transformations by triple graph grammars with application conditions. Electronic Communications of the ESSAT, 2011, 39: 26.

[10] Mossakowsi, T., Codescu, M., Lange, C. The distributed ontology, modeling and specification language. Proceedings of the Workshop on Modular Ontologies 2013 (WoMo’13). CEUR-WS vol 1081. Corunna, Spain, September 15, 2013.

[11] Fillottrani, P.R., Keet, C.M. A Design for Coordinated and Logics-mediated Conceptual Modelling. 29th International Workshop on Description Logics (DL’16). Peñaloza, R. and Lenzerini, M. (Eds.). CEUR-WS Vol. 1577. April 22-25, Cape Town, South Africa. (abstract)

Fruitful ADBIS’15 in Poitiers

The 19th Conference on Advances in Databases and Information Systems (ADBIS’15) just finished yesterday. It was an enjoyable and well-organised conference in the lovely town of Poitiers, France. Thanks to the general chair, Ladjel Bellatreche, and the participants I had the pleasure to meet up with, listen to, and receive feedback from. The remainder of this post mainly recaps the keynotes and some of the presentations.

 

Keynotes

The conference featured two keynotes, one by Serge Abiteboul and on by Jens Dittrich, both distinguished scientists in databases. Abiteboul presented the multi-year project on Webdamlog that ended up as a ‘personal information management system’, which is a simple term that hides the complexity that happens behind the scenes. (PIMS is informally explained here). It breaks with the paradigm of centralised text (e.g., Facebook) to distributed knowledge. To achieve that, one has to analyse what’s happening and construct the knowledge from that, exchange knowledge, and reason and infer knowledge. This requires distributed reasoning, exchanging facts and rules, and taking care of access control. It is being realised with a datalog-style language but that then also can handle a non-local knowledge base. That is, there’s both solid theory and implementation (going by the presentation; I haven’t had time to check it out).

The main part of the cool keynote talk by Dittrich was on ‘the case for small data management’. From the who-wants-to-be-a-millionaire style popquiz question asking us to guess the typical size of a web database, it appeared to be only in the MBs (which most of us overestimated), and sort of explains why MySQL [that doesn’t scale well] is used rather widely. This results in a mismatch between problem size and tools. Another popquiz question answer: the 100MB RDF can just as well be handled efficiently by python, apparently. Interesting factoids, and one that has/should have as consequence we should be looking perhaps more into ‘small data’. He presented his work on PDbF as an example of that small data management. Very briefly, and based on my scribbles from the talk: its an enhanced pdf where you can access the raw data behind the graphs in the paper as well (it is embedded in it, with OLAP engine for posing the same and other queries), has a html rendering so you can hover over the graphs, and some more visualisation. If there’s software associated with the paper, it can go into the whole thing as well. Overall, that makes the data dynamic, manageable, traceable (from figure back to raw data), and re-analysable. The last part of his talk was on his experiences with the flipped classroom (more here; in German), but that was not nearly as fun as his analysis and criticism of the “big data” hype. I can’t recall exactly his plain English terms for the “four V4”, but the ‘lots of crappy XML data that changes’ remained of it in my memory bank (it was similar to the first 5 minutes of another keynote talk he gave).

 

Sessions

Sure, despite the notes on big data, there were presentations in the sessions that could be categorised under ‘big data’. Among others, Ajantha Dahanayake presented a paper on a proposal for requirements engineering for big data [1]. Big data people tend to assume it is just there already for them to play with. But how did it get there, how to collect good data? The presentation outlined a scenario-based backwards analysis, so that one can reduce unnecessary or garbage data collection. Dahanayake also has a tool for it. Besides the requirements analysis for big data, there’s also querying the data and the desire to optimize it so as to keep having fast responses despite its large size. A solution to that was presented by Reuben Ndindi, whose paper also won the best paper award of the conference [2] (for the Malawians at CS@UCT: yes, the Reuben you know). It was scheduled in the very last session on Friday and my note-taking had grinded to a halt. If my memory serves me well, they make a metric database out of a regular database, compute the distances between the values, and evaluate the query on that, so as to obtain a good approximation of the true answer. There’s both the theoretical foundation and an experimental validation of the approach. In the end, it’s faster.

Data and schema evolution research is alive and well, as were time series and temporal aspects. Due to parallel sessions and my time constraints writing this post, I’ll mention only two on the evolution; one because it was a very good talk, the other because of the results of the experiments. Kai Herrmann presented the CoDEL language for database evolution [3]. A database and the application that uses it change (e.g., adding an attribute, splitting a table), which requires quite lengthy scripts with lots of SQL statements to execute. CoDEL does it with fewer statements, and the language has the good quality of being relationally complete [3]. Lesley Wevers approached the problem from a more practical angle and restricted to online databases. For instance, Wikipedia does make updates to their database schema, but they wouldn’t want to have Wikipedia go offline for that duration. How long does it take for which operation, in which RDBMS, and will it only slow down during the schema update, or block any use of the database entirely? The results obtained with MySQL, PostgreSQL and Oracle are a bit of a mixed bag [4]. It generated a lively debate during the presentation regarding the test set-up, what one would have expected the results to be, and the duration of blocking. There’s some work to do there yet.

The presentation of the paper I co-authored with Pablo Fillottrani [5] (informally described here) was scheduled for that dreaded 9am slot the morning after the social dinner. Notwithstanding, quite a few participants did show up, and they showed interest. The questions and comments had to do with earlier work we used as input (the metamodel), qualifying quality of the conceptual model, and that all too familiar sense of disappointment that so few language features were used widely in publicly available conceptual models (the silver lining of excellent prospects of runtime usage of conceptual models notwithstanding). Why this is so, I don’t know, though I have my guesses.

 

And the other things that make conference useful and fun to go to

In short: Networking, meeting up again with colleagues not seen for a while (ranging from a few months [Robert Wrembel] to some 8 years [Nadeem Iftikhar] and in between [a.o., Martin Rezk, Bernhard Thalheim]), meeting new people, exchanging ideas, and the social events.

2008 was the last time I’d been in France, for EMMSAD’08, where, looking back now, I coincidentally presented a paper also on conceptual modelling languages and logic [6], but one that looked at comprehensive feature coverage and comparing languages rather than unifying. It was good to be back in France, and it was nice to realise my understanding and speaking skills in French aren’t as rusty as I thought they were. The travels from South Africa are rather long, but definitely worthwhile. And it gives me time to write blog posts killing time on the airport.

 

References

(note: most papers don’t show up at Google scholar yet, hence, no links; they are on the Springer website, though)

[1] Noufa Al-Najran and Ajantha Dahanayake. A Requirements Specification Framework for Big Data Collection and Capture. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, .

[2] Boris Cule, Floris Geerts and Reuben Ndindi. Space-bounded query approximation. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, 397-414.

[3] Kai Herrmann, Hannes Voigt, Andreas Behrend and Wolfgang Lehner. CoDEL – A Relationally Complete Language for Database Evolution. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, 63-76.

[4] Lesley Wevers, Matthijs Hofstra, Menno Tammens, Marieke Huisman and Maurice van Keulen. Analysis of the Blocking Behaviour of Schema Transformations in Relational Database Systems. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, 169-183.

[5] Pablo R. Fillottrani and C. Maria Keet. Evidence-based Languages for Conceptual Data Modelling Profiles. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, 215-229.

[6] C. Maria Keet. A formal comparison of conceptual data modeling languages. EMMSAD’08. CEUR-WS Vol-337, 25-39.

The ontology-driven unifying metamodel of UML class diagrams, ER, EER, ORM, and ORM2

Metamodelling of conceptual data modelling languages is nothing new, and one may wonder why one would need yet another one. But you do, if you want to develop complex systems or integrate various legacy sources (which South Africa is going to invest more money in) and automate at least some parts of it. For instance: you want to link up the business rules modelled in ORM, the EER diagram of the database, and the UML class diagram that was developed for the application layer. Are the, say, Student entity types across the models really the same kind of thing? And UML’s attribute StudentID vs. the one in the EER diagram? Or EER’s EmployeesDependent weak entity type with the ORM business rule that states that “each dependent of an employee is identified by EmployeeID an the Dependent’s Name?

Ascertaining the correctness of such inter-model assertions in different languages does not require a comparison and contrast of their differences, but a way to harmonise or unify them. Some such models already exist, but they take subsets of the languages, whereas all those features do appear in actual models [1] (described here informally). Our metamodel, in contrast, aims to capture all constructs of the aforementioned languages and the constraints that hold between them, and generalize in an ontology-driven way so that the integrated metamodel subsumes the structural, static elements of them (i.e., the integrated metamodel has as them as fragments). Besides some updates to the earlier metamodel fragment presented in [2,3], the current version [4,5] also includes the metamodel fragment of their constraints (though omits temporal aspects and derived constraints). The metamodel and its explanation can be found in the paper in An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2 [4] that I co-authored with Pablo Fillottrani, and which was recently accepted in Data & Knowledge Engineering.

Methodologically, the unifying metamodel presented in An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2 [4], is ontological rather than formal (cf. all other known works). On that ‘ontology-driven approach’, here is meant the use of insights from Ontology (philosophy) and ontologies (in computing) to enhance the quality of a conceptual data model and obtain that ‘glue stuff’ to unify the metamodels of the languages. The DKE paper describes all that, such as: on the nature of the UML association/ORM fact type (different wording, same ontological commitment), attributes with and without data types, the plethora of identification constraints (weak entity types, reference modes, etc.), where can one reuse an ‘attribute’ if at all, and more. The main benefit of this approach is being able to cope with the larger amount of elements that are present in those languages, and it shows that, in the details, the overlap in features across the languages is rather small: 4 among the set of 23 types of relationship, role, and entity type are essentially the same across the languages (see figure below), and 6 of the 49 types of constraints. The metamodel is stable for the modelling languages covered. It is represented in UML for ease of communication, but, as mentioned earlier, it also has been formalised in the meantime [5].

Types of elements in the languages; black-shaded: entity is present in all three language families (UML, EER, ORM); darg grey: on two of the three; light grey: in one; while-filled: in none, but we added it to glue things together. (Source: [6])

Types of elements in the languages; black-shaded: entity is present in all three language families (UML, EER, ORM); dark grey: on two of the three; light grey: in one; while-filled: in none, but we added the more general entities to ‘glue’ things together. (Source: [4])

Metamodel fragment with some constraints among some of the entities. (Source [4])

Metamodel fragment with some constraints among some of the entities. (Source [4])

The DKE paper also puts it in a broader context with examples, model analyses using the harmonised terminology, and a use case scenario that demonstrates the usefulness of the metamodel for inter-model assertions.

While the 24-page paper is rather comprehensive, research results wouldn’t live up to it if it didn’t uncover new questions. Some of them have been, and are being, answered in the meantime, such as its use for classifying models and comparing their characteristics [1,6] (blogged about here and here) and a rule-based approach to validating inter-model assertions [7] (informally here). Although the 3-year funded project on the Ontology-driven unification of conceptual data modelling languages—which surely contributed to realising this paper—just finished officially, we’re not done yet, or: more is in the pipeline. To be continued…

 

References

[1] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Springer LNCS. 19-22 Oct, Stockholm, Sweden. (in press)

[2] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). W. Ng, V.C. Storey, and J. Trujillo (Eds.). Springer LNCS 8217, 313-326. 11-13 November, 2013, Hong Kong.

[3] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). A. Cuzzocrea and S. Maabout (Eds.) September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[4] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering. 2015. DOI: 10.1016/j.datak.2015.07.004. (in press)

[5] Fillottrani, P.R., Keet, C.M. KF metamodel Formalization. Technical Report, Arxiv.org http://arxiv.org/abs/1412.6545. Dec 19, 2014. 26p.

[6] Fillottrani, P.R., Keet, C.M. Evidence-based Languages for Conceptual Data Modelling Profiles. 19th Conference on Advances in Databases and Information Systems (ADBIS’15). Springer LNCS. Poitiers, France, Sept 8-11, 2015. (in press)

[7] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

Formalization of the unifying metamodel of UML, EER, and ORM

Last year Pablo Fillottrani and I introduced an ontology-driven unifying metamodel of the static, structural, entities of UML Class Diagrams (v2.4.1), ER, EER, ORM, and ORM2 in [1,2], which was informally motivated and described here. This now also includes the constraints and we have formalised it in First Order Predicate Logic to put some precision to the UML Class Diagram fragments and their associated textual constraints, which is described in the technical report of the metamodel formalization [3]. Besides having such precision for the sake of it, it is also useful for automated checking of inter-model assertions and computing model transformations, which we illustrated in our RuleML’14 paper earlier this year [4] (related blog post).

The ‘bummer’ of the formalization is that it probably requires an undecidable language, due to having formulae with five variables, counting quantifiers, and ternary predicates (see section 2.11 of the tech report for details). To facilitate various possible uses nevertheless, we therefore also made a slightly simpler OWL version of it (the modelling decisions are described in Section 3 of the technical report). Having that OWL version, it was easy to also generate a verbalisation of the OWL version of the metamodel (thanks to SWAT NL Tools) so as to facilitate reading of the ontology by the casually interested reader and the very interested one who doesn’t like logic.

Although our DST/MINCyT-funded South Africa-Argentina scientific collaboration project (entitled Ontology-driven unification of conceptual data modelling languages) is officially in its last few months by now, more results are in the pipeline, which I hope to report on shortly.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). 11-13 November, 2013, Hong Kong. Springer LNCS vol 8217, 313-326.

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS (in print).

[3] Fillottrani, P.R., Keet, C.M. KF metamodel formalization. Technical report, Arxiv.org http://arxiv.org/abs/1412.6545. Dec 19, 2014, 26p.

[4] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

A metamodel-driven approach for linking conceptual data models

Interoperability among applications and components of large complex software is still a bit of a nightmare and a with-your-hands-in-the-mud scenario that no-one looks forward to—people look forward to already having linked them up, so they can pose queries across departmental and institutional boundaries, or even across the different data sets within a research unit to advance their data analysis and discover new things.

Sure, ontologies can help with that, but you have to develop one if none is available, and sometimes it’s not even necessary. For instance, you have an ER diagram for the database and a UML model for the business layer. How do you link up those two?

Superficially, this looks easy: an ER entity type matches up with a UML class, and an ER relationship with an UML association. The devil is in the details, however. To name just a few examples: how are you supposed to match a UML qualified association, an ER weak entity type, or an ORM join-subset constraint, to any of the others?

Within the South Africa – Argentina bilateral collaboration project (scope), we set out to solve such things. Although we first planned to ‘simply’ formalize the most common conceptual data modelling languages (ER, UML, and ORM families), we quickly found out we needed not just an ‘inventory’ of terms used in each language matched to one in the other languages, but also under what conditions these entities can be used, hence, we needed a proper metamodel. This we published last year at ER’13 and MEDI’13 [1,2], which I blogged about last year. In the meantime, we not only have finalized the metamodel for the constraints, but also formalized the metamodel, and a journal article describing all this is close to being submitted.

But a metamodel alone doesn’t link up the conceptual data models. To achieve that, we, Pablo Follottrani and I, devised a metamodel-driven approach for conceptual model interoperability, which uses a formalised metamodel with a set of modular rules to mediate the linking and transformation of elements in the conceptual models represented in different languages. This also simplifies the verification of inter-model assertions and model conversion. Its description has recently been accepted as a full paper at the 8th International Web Rule Symposium 2014 (RuleML’14) [3], which I’ll present in Prague on 18 August.

To be able to assert a link between two entities in different models and evaluate automatically (or at least: systematically) whether it is a valid assertion and what it entails, you have to know i) what type of entities they are, ii) whether they are the same, and if not, whether one can be transformed into the other for that particular selection. So, to be able to have those valid inter-model assertions, an approach is needed for transforming one or more elements of a model in one language into another. The general idea of that is shown in the following picture, and explained briefly afterward.

Overview of the approach to transform a model represented in language A to one in language B, illustrated with some sample data from UML to ORM2 (Fig 1 in [3])

Overview of the approach to transform a model represented in language A to one in language B, illustrated with some sample data from UML to ORM2 (Fig 1 in [3])

We have three input items (top of the figure, with the ovals), then a set of algorithms and rules (on the right), and two outputs (bottom, green). The conceptual model is provided by the user, the formalized metamodel is available and a selection of it is included in the RuleML’14 paper [3], and the “vocabulary containing a terminology comparison” was published in ER’13 [1]. Our RuleML paper [3] zooms in on those rules for the algorithms, availing of the formalized metamodel and vocabulary. To give a taste of that (more below): the algorithm has to know that a UML class in the diagram can be mapped 1:1 to a ORM entity type, and that there is some rule or set of rules to transform a UML attribute to an ORM value type.

This is also can be used for the inter-model assertions, albeit in a slightly modified way overall, which is depicted below. Here we use not only the formalised metamodel and the algorithms, but also which entities have 1:1 mappings, which are equivalent but need several steps (called transformations), and which once can only be approximated (requiring user input), and it can be run in both directions from one fragment to the other (one direction is chosen arbitrarily).

Overview for checking the inter-model assertions, and some sample data, checking whether the UML Flower is the same as the ORM Flower (Fig. 2 in [3]).

Overview for checking the inter-model assertions, and some sample data, checking whether the UML Flower is the same as the ORM Flower (Fig. 2 in [3]).

The rules themselves are not directly from one entity in one model to another entity in another, as that would become too messy, isn’t really scalable, and would have lots of repetitions. We use the more efficient way of declaring rules for mapping a conceptual data model entity into its corresponding entity in the metamodel, do any mapping, transformation, or approximation there in the metamodel, and then map it into the matching entity in the other conceptual data model. The rules for the main entities are described in the paper: those for object types, relationships, roles, attributes, and value types, and how one can use those to build up more complex ones for validation of intermodal assertions.

This metamodel-mediated approach to the mappings is one nice advantage of having a metamodel, but one possibly could have gotten away with just having an ‘inventory’ of the entities, not all the extra effort with a full metamodel. But there are benefits to having that metamodel, in particular when actually validating mappings: it can drive the validation of mappings and the generation of model transformations thanks to the constraints declared in the metamodel. How this can work, is illustrated in the following example, showing one example of how the “process mapping assertions using the transformation algorithms” in the centre-part of Fig. 2, above, works out.

Example. Take i) two models, let’s call them Ma and Mb, ii) an inter-model assertion, e.g., a UML association Ra and an ORM fact type Rb, ii) the look-up list with the mappings, transformations, approximations, and the non-mappable elements, and iii) the formalised metamodel. Then the model elements of Ma and Mb are classified in terms of the metamodel, so that the mapping validation process can start. Let us illustrate that for some Ra to Rb (or vv.) mapping of two relationships.

  1. For the vocabulary table, we see that UML association and ORM fact type correspond to Relationship in the metamodel, and enjoy a 1:1 mapping. The ruleset that will be commenced with are R1 from UML to the metamodel and 2R to ORM’s fact type (see rules in the paper).
  2. The R1 and 2R rules refer to Role and Object type in the metamodel. Now things become interesting. The metamodel has represented that each Relationship has at least two Roles, which there are, and each one causes the role-rules to be evaluated, with Ro1 of Ra’s two association ends into the metamodel’s Role and 2Ro to ORM’s roles (‘2Ro’ etc. are the abbreviations of the rules; see paper [3] for details).
  3. The metamodel asserts that Role must participate in the rolePlaying relationship and thus that it has a participating Object type (possibly a subtype thereof) and, optionally, a Cardinality constraint. Luckily, they have 1:1 mappings.
  4. This, in turn causes the rules for classes to be evaluated. From the classes, we see in the metamodel that each Object type must have at least one Identification constraint that involves one or more attributes or value types (which one it is has, been determined by the original classification). This also then has to be mapped using the rules specified.

This whole sequence was set in motion thanks to the mandatory constraints in the metamodel, having gone Relationship to Role to Object type to Single identification (that, in turn, consults Attribute and Datatype for the UML to ORM example here). The ‘chain reaction’ becomes longer with more elaborate participating entities, such as a Nested object type.

Overall, the whole orchestration is no trivial matter, requiring all those inputs, and it won’t be implemented in one codefest on a single rainy Sunday afternoon. Nevertheless, the prospect for semantically good (correct) inter-model assertions across conceptual data modeling languages and automated validation thereof is now certainly a step closer to becoming a reality.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). W. Ng, V.C. Storey, and J. Trujillo (Eds.). Springer LNCS 8217, 313-326. 11-13 November, 2013, Hong Kong.

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). A. Cuzzocrea and S. Maabout (Eds.) September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[3] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

Book chapter on conceptual data modeling for biology published

Just a quick note that my book chapter on “Ontology-driven formal conceptual data modeling for biological data analysis” finally has been published in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data (edited by Mourad Elloumi and Albert Y. Zomaya). A summary of the chapter’s contents is described in an earlier blog post from little over two years ago, and I’ve put the preprint online.

The whole book is an impressive 1192 pages consisting of 48 chapters of about 25 pages each, which are grouped into three main sections. The first section, Biological data pre-processing, has four parts: biological data management, biological data modeling (which includes my chapter), biological feature extraction, and biological feature selection. The second section, biological data mining, has six parts: Regression Analysis of Biological Data, Biological Data Clustering, Biological Data Classification, Association Rules Learning from Biological Data, Text Mining and Application to Biological Data, and High-Performance Computing for Biological Data Mining. The third section, biological data post-processing, has only one part: biological knowledge integration and visualization. (check the detailed table of contents). Happy reading!