# Semantic interoperability of conceptual data modelling languages: FaCIL

Software systems aren’t getting any less complex to design, implement, and maintain, which applies to both the numerous diverse components and the myriad of people involved in the development processes. Even a straightforward configuration of a data­base back-end and an object-oriented front-end tool requires coordination among database analysts, programmers, HCI people, and increasing involvement of domain experts and stakeholders. They each may prefer, and have different competencies in, certain specific design mechanisms; e.g., one may want EER for the database design, UML diagrams for the front-end app, and perhaps structured natural language sentences with SBVR or ORM for expressing the business rules. This requires multi-modal modelling in a plurality of paradigms. This would then need to be supported by hybrid tools that offer interoperability among those modelling languages, since such heterogeneity won’t go away any time soon, or ever.

It is far from trivial to have these people work together whilst maintaining their preferred view of a unified system’s design, let alone doing all this design in one system. In fact, there’s no such tool that can seamlessly render such varied models across multiple modelling languages whilst preserving the semantics. At best, there’s either only theory that aims to do that, or only a subset of the respective languages’ features, or a subset of the required combinations. Well, more precisely, until our efforts. We set out to fill this gap in functionality, both in a theoretically sound way and implemented as proof-of-concept to demonstrate its feasibility. The latest progress was recently published in the paper entitled A framework for interoperability with hybrid tools in the Journal of Intelligent Information Systems [1], in collaboration with Germán Braun and Pablo Fillottrani.

First, we propose the Framework for semantiC Interoperability of conceptual data modelling Languages, FaCIL, which serves as the core orchestration mechanism for hybrid modelling tools with relations between components and a workflow that uses them. At its centre, it has a metamodel that is used for the interchange between the various conceptual models represented in different languages and it has sets of rules to and from the metamodel (and at the metamodel level) to ensure the semantics is preserved when transforming a model in one language into a model in a different language and such that edits to one model automatically propagate correctly to the model in another language. In addition, thanks to the metamodel-based approach, logic-based reconstructions of the modelling languages also have become easier to manage, and so a path to automated reasoning is integrated in FaCIL as well.

This generic multi-modal modelling interoperability framework FaCIL was instantiated with a metamodel for UML Class Diagrams, EER, and ORM2 interoperability specifically [2] (introduced in 2015), called the KF metamodel [3] with its relevant rules (initial and implemented ones), an English controlled natural language, and a logic-based reconstruction into a fragment of OWL (orchestration graphically from the paper). This enables a range of different user interactions in the modelling process, of which an example of a possible workflow is shown in the following figure.

These theoretical foundations were implemented in the web-based crowd 2.0 tool (with source code). crowd 2.0 is the first hybrid tool of its kind, tying together all the pieces such that now, instead of partial or full manual model management of transformations and updates in multiple disparate tools, these tasks can be carried out automatically in one application and therewith also allow diverse developers and stakeholders to work from a shared single system.

We also describe a use case scenario for it – on Covid-19, as pretty much all of the work for this paper was done during the worse-than-today’s stage of the pandemic – that has lots of screenshots from the tool in action, both in the paper (starting here, with details halfway in this section) and more online.

Besides evaluating the framework with an instantiation, a proof-of-concept implementation of that instantiation, and a use case, it was also assessed against the reference framework for conceptual data modelling of Delcambre and co-authors [4] and shown to meet those requirements. Finally, crowd 2.0’s features were assessed against five relevant tools, considering the key requirements for hybrid tools, and shown to compare favourable against them (see Table 2 in the paper).

Distinct advantages can be summed up as follows, from those 26 pages of the paper, where the, in my opinion, most useful ones are underlined here, and the most promising ones to solve another set of related problems with conceptual data modelling (in one fell swoop!) in italics:

• One system for related tasks, including visual and text-based modelling in multiple modelling languages, automated transformations and update propagation between the models, as well as verification of the model on coherence and consistency.
• Any visual and text-based conceptual model interaction with the logic has to be maintained only in one place rather than for each conceptual modelling and controlled natural language separately;
• A controlled natural language can be specified on the KF metamodel elements so that it then can be applied throughout the models regardless the visual language and therewith eliminating duplicate work of re-specifications for each modelling language and fragment thereof;
• Any further model management, especially in the case of large models, such as abstraction and modularisation, can be specified either on the logic or on the KF metamodel in one place and propagate to other models accordingly, rather than re-inventing or reworking the algorithms for each language over and over again;
• The modular design of the framework allows for extensions of each component, including more variants of visual languages, more controlled languages in your natural language of choice, or different logic-based reconstructions.

Of course, more can be done to make it even better, but it is a milestone of sorts: research into the  theoretical foundations of this particular line or research had commenced 10 years ago with the DST/MINCyT-funded bi-lateral project on ontology-driven unification of conceptual data modelling languages. Back then, we fantasised that, with more theory, we might get something like this sometime in the future. And we did.

References

[1] Germán Braun, Pablo Fillottrani, and C Maria Keet. A framework for interoperability with hybrid tools. Journal of Intelligent Information Systems, in print since 29 July 2022.

[2] Keet, C. M., & Fillottrani, P. R. (2015). An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 98, 30–53.

[3] Fillottrani, P.R., Keet, C.M. KF metamodel formalization. Technical Report, Arxiv.org http://arxiv.org/abs/1412.6545. Dec 19, 2014. 26p.

[4] Delcambre, L. M. L., Liddle, S. W., Pastor, O., & Storey, V. C. (2018). A reference framework for conceptual modeling. In: 37th International Conference on Conceptual Modeling (ER’18). LNCS. Springer, vol. 11157, 27–42.

# The ontology-driven unifying metamodel of UML class diagrams, ER, EER, ORM, and ORM2

Metamodelling of conceptual data modelling languages is nothing new, and one may wonder why one would need yet another one. But you do, if you want to develop complex systems or integrate various legacy sources (which South Africa is going to invest more money in) and automate at least some parts of it. For instance: you want to link up the business rules modelled in ORM, the EER diagram of the database, and the UML class diagram that was developed for the application layer. Are the, say, Student entity types across the models really the same kind of thing? And UML’s attribute StudentID vs. the one in the EER diagram? Or EER’s EmployeesDependent weak entity type with the ORM business rule that states that “each dependent of an employee is identified by EmployeeID an the Dependent’s Name?

Ascertaining the correctness of such inter-model assertions in different languages does not require a comparison and contrast of their differences, but a way to harmonise or unify them. Some such models already exist, but they take subsets of the languages, whereas all those features do appear in actual models [1] (described here informally). Our metamodel, in contrast, aims to capture all constructs of the aforementioned languages and the constraints that hold between them, and generalize in an ontology-driven way so that the integrated metamodel subsumes the structural, static elements of them (i.e., the integrated metamodel has as them as fragments). Besides some updates to the earlier metamodel fragment presented in [2,3], the current version [4,5] also includes the metamodel fragment of their constraints (though omits temporal aspects and derived constraints). The metamodel and its explanation can be found in the paper in An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2 [4] that I co-authored with Pablo Fillottrani, and which was recently accepted in Data & Knowledge Engineering.

Methodologically, the unifying metamodel presented in An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2 [4], is ontological rather than formal (cf. all other known works). On that ‘ontology-driven approach’, here is meant the use of insights from Ontology (philosophy) and ontologies (in computing) to enhance the quality of a conceptual data model and obtain that ‘glue stuff’ to unify the metamodels of the languages. The DKE paper describes all that, such as: on the nature of the UML association/ORM fact type (different wording, same ontological commitment), attributes with and without data types, the plethora of identification constraints (weak entity types, reference modes, etc.), where can one reuse an ‘attribute’ if at all, and more. The main benefit of this approach is being able to cope with the larger amount of elements that are present in those languages, and it shows that, in the details, the overlap in features across the languages is rather small: 4 among the set of 23 types of relationship, role, and entity type are essentially the same across the languages (see figure below), and 6 of the 49 types of constraints. The metamodel is stable for the modelling languages covered. It is represented in UML for ease of communication, but, as mentioned earlier, it also has been formalised in the meantime [5].

Types of elements in the languages; black-shaded: entity is present in all three language families (UML, EER, ORM); dark grey: on two of the three; light grey: in one; while-filled: in none, but we added the more general entities to ‘glue’ things together. (Source: [4])

Metamodel fragment with some constraints among some of the entities. (Source [4])

The DKE paper also puts it in a broader context with examples, model analyses using the harmonised terminology, and a use case scenario that demonstrates the usefulness of the metamodel for inter-model assertions.

While the 24-page paper is rather comprehensive, research results wouldn’t live up to it if it didn’t uncover new questions. Some of them have been, and are being, answered in the meantime, such as its use for classifying models and comparing their characteristics [1,6] (blogged about here and here) and a rule-based approach to validating inter-model assertions [7] (informally here). Although the 3-year funded project on the Ontology-driven unification of conceptual data modelling languages—which surely contributed to realising this paper—just finished officially, we’re not done yet, or: more is in the pipeline. To be continued…

References

[1] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Springer LNCS. 19-22 Oct, Stockholm, Sweden. (in press)

[2] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). W. Ng, V.C. Storey, and J. Trujillo (Eds.). Springer LNCS 8217, 313-326. 11-13 November, 2013, Hong Kong.

[3] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). A. Cuzzocrea and S. Maabout (Eds.) September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[4] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering. 2015. DOI: 10.1016/j.datak.2015.07.004. (in press)

[5] Fillottrani, P.R., Keet, C.M. KF metamodel Formalization. Technical Report, Arxiv.org http://arxiv.org/abs/1412.6545. Dec 19, 2014. 26p.

[6] Fillottrani, P.R., Keet, C.M. Evidence-based Languages for Conceptual Data Modelling Profiles. 19th Conference on Advances in Databases and Information Systems (ADBIS’15). Springer LNCS. Poitiers, France, Sept 8-11, 2015. (in press)

[7] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

# What’s in a publicly available conceptual data model?

We know the answer. And it’s not great from the viewpoint of usage of fancy language features and automated reasoning, which you may have guessed from an earlier post on a tractable encoding of ORM models (see also [1]). How we got there is summarised in the paper [2] that will be presented at the 34th Conference on Conceptual Modeling (ER’15) 19-22 in October in Sweden. An even shorter digest of it follows in the remainder of this post.

I collected 35 models in UML class diagram notation, 35 in ER or EER, and 35 in ORM or ORM2 from various resources, such as online diagrams and repositories (such as GenMyModel), scientific papers (ER conferences, ORM workshops and the like), and some textbooks; see dataset. Each element in the diagram was classified in terms of the unifying metamodel developed earlier with my co-author Pablo Fillottrani [3,4], so that we could examine and compare the contents across the aforementioned three modelling language ‘families’. This data was analysed (incidence, percentages, averages, etc. of the language features), and augmented with further assessments of ratios, like class:relationship, class:attribute and so on. All that had as scope to falsify the following hypotheses:

A: When more features are available in a language, they are used in the models.

B: Following the “80-20 principle”, about 80% of the entities present in the models are from the set of entities that appear in all three language families.

C: Given the different (initial) purposes of UML class diagrams, (E)ER, and ORM, models in each language still have a different characteristic ‘profile’.

To make a long story really short: Hypothesis A is validated, Hypothesis B is falsified, and Hypothesis C validated. Relaxing the constraints for hypothesis B in the sense of including a known transformation between attribute and value type, then it does make it to 87.5%. What was nice to see was the emerging informal profiles of how a typical diagram looks like in a language family, in that ER and EER and ORM and ORM are much more relationship-oriented than UML Class Diagrams, and that there are more hierarchies in the latter. Intuitively, we already kind of knew that, but it’s always better to have that backed up by data.

There are lots of other noteworthy things, like the large amount of aggregations in UML (yay) and weak entity types in ER and EER, and the—from an automated reasoning viewpoint—‘bummer’ of so few disjointness and completeness assertions in all of the models. More of that can be found in the paper (including a top-5 and a bottom-5 for each family), and you may want attend the ER conference so we can discuss more about it in person.

References

[1] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in CFDInc∀-. 28th International Workshop on Description Logics (DL’15). Calvanese, D., and Konev, B. (Eds.), CEUR-WS vol. 1350, pp401-414. 7-10 June 2015, Athens, Greece.

[2] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Springer LNCS. 19-22 Oct, Stockholm, Sweden. (accepted)

[3] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). W. Ng, V.C. Storey, and J. Trujillo (Eds.). Springer LNCS 8217, 313-326. 11-13 November, 2013, Hong Kong.

[4] Fillottrani, P.R., Keet, C.M. KF metamodel formalization. Technical Report, Arxiv.org 1412.6545. Dec 19, 2014. 26p.

# First tractable encoding of ORM conceptual data models

For (relatively) many years I’ve been focusing on as-expressive-as-possible languages to represent information and knowledge, including the computationally impractical full first order logic, because one would/should want to be as precise as possible and required to represent the subject domain in an ontology and universe of discourse for the application in a conceptual data model. After all, one can always throw out the computationally unpleasant constructs later during the implementation stage, if the ontology or conceptual data model is intended for use at runtime, such as OBDA [1], test data generate for verification [2], and in the query compilation stage in RDBMSs [3]. The resulting slimmed theories/models may be different for different applications, but then at least the set of slimmed theories/models share their common understanding.

So, now I ventured in that area, not because there’s some logic x and conceptual modeling language y has to be forced into it, but it actually appears that many fancy construct/features are not used in publicly available conceptual data models anyway (see data set and xls with some analysis). The timing of the outcome of the analysis of the data set coincided with David Toman’s visit to UCT as part of his sabbatical and Pablo Fillottrani’s visit, who enjoyed the last exchange of our bi-lateral project on the unification of conceptual data modelling languages (project page). To sum up the issue we were looking at: the need for run-time usage of conceptual data models requires a tractable logic-based reconstruction of the conceptual models (i.e., in at most PTIME), which appeared to hardly exist or miss constructs important for conceptual models (regardless whether that was ORM, EER or UML Class Diagrams), or both.

The solution ended up to be a logic-based reconstruction for most of ORM2 using the $\mathcal{CFDI}_{nc}^{\forall -}$ Description Logic, which also happens to be the first tractable encoding of (most of) ORM/ORM2. With this logic, several features important for conceptual models (i.e., occur relatively often) do have their proper encoding in the logic, notably n-aries, complex identification constraints, and n-ary role subsumption. The, admittedly quite tedious, mapping

Low resolution and small version of our DL15 poster summarising the contributions.

captures over 96% of the constructs used in practice in the set of 33 ORM diagrams we analysed (see data set). Further, the results are easily transferable to EER and UML Class diagrams, with an even greater coverage. The results (and comparison with related works) are presented in our recently accepted paper at the 28th International Workshop on Description Logics (DL’15) that will take place form 7 to 11 June in Athens, Greece.

The list of accepted papers of DL’15 is available, listing 21 papers with long presentations, 16 papers with short presentation, and 26 papers with poster presentations. David will present our results in the poster session, as it’s probably of more relevance in the conceptual modelling community (and I’ll be marking exams then), and some other accepted papers cover more new ground, such as casting schema.org as a description logic, temporal query answering in EL, exact learning of ontologies, and more. The proceedings is will be online on CEUR-WS in the upcoming days as volume 1350. I’ve added a mini version of our poster on the right. I tried tikzposter, as they look really cool, but it doesn’t support figures (other than those made in latex), so I resorted to ppt (that doesn’t support math), wondering why these issues haven’t been solved by now.

References

[1] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland. pp 1389-1396.

[2] Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compilation. Synthesis Lectures on Data Management, Morgan & Claypool  Publishers (2011)

[3] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking and test data generation from modeling diagrams. Automation in Software Engineering 16, 73–99 (2009)

[4] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in $\mathcal{CFDI}_{nc}^{\forall -}$. 28th International Workshop on Description Logics (DL’15). CEUR-WS vol xx., 7-10 June 2015, Athens, Greece.

# A metamodel-driven approach for linking conceptual data models

Interoperability among applications and components of large complex software is still a bit of a nightmare and a with-your-hands-in-the-mud scenario that no-one looks forward to—people look forward to already having linked them up, so they can pose queries across departmental and institutional boundaries, or even across the different data sets within a research unit to advance their data analysis and discover new things.

Sure, ontologies can help with that, but you have to develop one if none is available, and sometimes it’s not even necessary. For instance, you have an ER diagram for the database and a UML model for the business layer. How do you link up those two?

Superficially, this looks easy: an ER entity type matches up with a UML class, and an ER relationship with an UML association. The devil is in the details, however. To name just a few examples: how are you supposed to match a UML qualified association, an ER weak entity type, or an ORM join-subset constraint, to any of the others?

Within the South Africa – Argentina bilateral collaboration project (scope), we set out to solve such things. Although we first planned to ‘simply’ formalize the most common conceptual data modelling languages (ER, UML, and ORM families), we quickly found out we needed not just an ‘inventory’ of terms used in each language matched to one in the other languages, but also under what conditions these entities can be used, hence, we needed a proper metamodel. This we published last year at ER’13 and MEDI’13 [1,2], which I blogged about last year. In the meantime, we not only have finalized the metamodel for the constraints, but also formalized the metamodel, and a journal article describing all this is close to being submitted.

But a metamodel alone doesn’t link up the conceptual data models. To achieve that, we, Pablo Follottrani and I, devised a metamodel-driven approach for conceptual model interoperability, which uses a formalised metamodel with a set of modular rules to mediate the linking and transformation of elements in the conceptual models represented in different languages. This also simplifies the verification of inter-model assertions and model conversion. Its description has recently been accepted as a full paper at the 8th International Web Rule Symposium 2014 (RuleML’14) [3], which I’ll present in Prague on 18 August.

To be able to assert a link between two entities in different models and evaluate automatically (or at least: systematically) whether it is a valid assertion and what it entails, you have to know i) what type of entities they are, ii) whether they are the same, and if not, whether one can be transformed into the other for that particular selection. So, to be able to have those valid inter-model assertions, an approach is needed for transforming one or more elements of a model in one language into another. The general idea of that is shown in the following picture, and explained briefly afterward.

Overview of the approach to transform a model represented in language A to one in language B, illustrated with some sample data from UML to ORM2 (Fig 1 in [3])

We have three input items (top of the figure, with the ovals), then a set of algorithms and rules (on the right), and two outputs (bottom, green). The conceptual model is provided by the user, the formalized metamodel is available and a selection of it is included in the RuleML’14 paper [3], and the “vocabulary containing a terminology comparison” was published in ER’13 [1]. Our RuleML paper [3] zooms in on those rules for the algorithms, availing of the formalized metamodel and vocabulary. To give a taste of that (more below): the algorithm has to know that a UML class in the diagram can be mapped 1:1 to a ORM entity type, and that there is some rule or set of rules to transform a UML attribute to an ORM value type.

This is also can be used for the inter-model assertions, albeit in a slightly modified way overall, which is depicted below. Here we use not only the formalised metamodel and the algorithms, but also which entities have 1:1 mappings, which are equivalent but need several steps (called transformations), and which once can only be approximated (requiring user input), and it can be run in both directions from one fragment to the other (one direction is chosen arbitrarily).

Overview for checking the inter-model assertions, and some sample data, checking whether the UML Flower is the same as the ORM Flower (Fig. 2 in [3]).

The rules themselves are not directly from one entity in one model to another entity in another, as that would become too messy, isn’t really scalable, and would have lots of repetitions. We use the more efficient way of declaring rules for mapping a conceptual data model entity into its corresponding entity in the metamodel, do any mapping, transformation, or approximation there in the metamodel, and then map it into the matching entity in the other conceptual data model. The rules for the main entities are described in the paper: those for object types, relationships, roles, attributes, and value types, and how one can use those to build up more complex ones for validation of intermodal assertions.

This metamodel-mediated approach to the mappings is one nice advantage of having a metamodel, but one possibly could have gotten away with just having an ‘inventory’ of the entities, not all the extra effort with a full metamodel. But there are benefits to having that metamodel, in particular when actually validating mappings: it can drive the validation of mappings and the generation of model transformations thanks to the constraints declared in the metamodel. How this can work, is illustrated in the following example, showing one example of how the “process mapping assertions using the transformation algorithms” in the centre-part of Fig. 2, above, works out.

Example. Take i) two models, let’s call them Ma and Mb, ii) an inter-model assertion, e.g., a UML association Ra and an ORM fact type Rb, ii) the look-up list with the mappings, transformations, approximations, and the non-mappable elements, and iii) the formalised metamodel. Then the model elements of Ma and Mb are classified in terms of the metamodel, so that the mapping validation process can start. Let us illustrate that for some Ra to Rb (or vv.) mapping of two relationships.

1. For the vocabulary table, we see that UML association and ORM fact type correspond to Relationship in the metamodel, and enjoy a 1:1 mapping. The ruleset that will be commenced with are R1 from UML to the metamodel and 2R to ORM’s fact type (see rules in the paper).
2. The R1 and 2R rules refer to Role and Object type in the metamodel. Now things become interesting. The metamodel has represented that each Relationship has at least two Roles, which there are, and each one causes the role-rules to be evaluated, with Ro1 of Ra’s two association ends into the metamodel’s Role and 2Ro to ORM’s roles (‘2Ro’ etc. are the abbreviations of the rules; see paper [3] for details).
3. The metamodel asserts that Role must participate in the rolePlaying relationship and thus that it has a participating Object type (possibly a subtype thereof) and, optionally, a Cardinality constraint. Luckily, they have 1:1 mappings.
4. This, in turn causes the rules for classes to be evaluated. From the classes, we see in the metamodel that each Object type must have at least one Identification constraint that involves one or more attributes or value types (which one it is has, been determined by the original classification). This also then has to be mapped using the rules specified.

This whole sequence was set in motion thanks to the mandatory constraints in the metamodel, having gone Relationship to Role to Object type to Single identification (that, in turn, consults Attribute and Datatype for the UML to ORM example here). The ‘chain reaction’ becomes longer with more elaborate participating entities, such as a Nested object type.

Overall, the whole orchestration is no trivial matter, requiring all those inputs, and it won’t be implemented in one codefest on a single rainy Sunday afternoon. Nevertheless, the prospect for semantically good (correct) inter-model assertions across conceptual data modeling languages and automated validation thereof is now certainly a step closer to becoming a reality.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). W. Ng, V.C. Storey, and J. Trujillo (Eds.). Springer LNCS 8217, 313-326. 11-13 November, 2013, Hong Kong.

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). A. Cuzzocrea and S. Maabout (Eds.) September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[3] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

# CFP for WS on Logics and reasoning for conceptual models (LRCM’13)

From the ‘advertising department’ of promoting events I co-organise: here’s the Call for Papers for the LRCM’13 workshop.

================================================================
First Workshop on Logics and Reasoning for Conceptual Models (LRCM 2013)
14th of December 2013, Stellenbosch, South Africa
http://www.cair.za.net/LRCM2013/
co-located with the 19th International Conference on Logic for Programming,
Artificial Intelligence and Reasoning (LPAR-19), Stellenbosch, South Africa
==============================================================

There is an increase in complexity of information systems due to, among others, company mergers with information system integration, upscaling of scientific collaborations, e-government etc., which push the necessity for good quality information systems. An information system’s quality is largely determined in the conceptual modeling stage, and avoiding or fixing errors of the conceptual model saves resources during design, implementation, and maintenance. The size and high expressivity of conceptual models represented in languages such as EER, UML, and ORM require a logic-based approach in the representation of information and adoption of automated reasoning techniques to assist in the development of good quality conceptual models. The theory to achieve this is still in its infancy, however, with only a limited set of theories and tools that address subtopics in this area. This workshop aims at bringing together researchers working on the logic foundations of conceptual data modeling languages and the reasoning techniques that are being developed so as to discuss the latest results in the area.

**** Topics ****

Topics of interest include, but are not limited to:
– Logics for temporal and spatial conceptual models and BPM
– Deontic logics for SBVR
– Other logic-based extensions to standard conceptual modeling languages
– Unifying formalisms for conceptual schemas
– Decidable reasoning over conceptual models
– Dealing with finite and infinite satisfiability of a conceptual model
– Reasoning over UML state and behaviour diagrams
– Reasoning techniques for EER/UML/ORM
– Interaction between ontology languages and conceptual data modeling languages
– Tools for logic-based modeling and reasoning over conceptual models
– Experience reports on logic-based modelling and reasoning over conceptual models

To this end, we solicit mainly theoretical contributions with regular talks and implementation/system demonstrations and some modeling experience reports to facilitate cross-fertilization between theory and praxis. Selection of presentations is based on peer-review of submitted papers by at least 2 reviewers, with a separation between theory and implementation & experience-type of papers.

**** Submissions ****

We welcome submissions in LNCS style in the following two formats for oral presentation:
– Extended abstracts of maximum 2 pages;
– Research papers of maximum 10 pages.
Both can be submitted in pdf format via the EasyChair website at https://www.easychair.org/conferences/?conf=lrcm13

**** Important dates ****

Submission of papers/abstracts: 14 October 2013
Notification of acceptance:     14 November 2013
Workshop:                       14 December 2013

**** Organisation ****

Maria Keet, University of KwaZulu-Natal, South Africa, keet@ukzn.ac.za
Diego Calvanese, Free University of Bozen-Bolzano, Italy, calvanese@inf.unibz.it
Szymon Klarman, CAIR, UKZN / CSIR-Meraka Institute, South Africa, szymon.klarman@gmail.com
Arina Britz, CAIR, UKZN / CSIR-Meraka Institute, South Africa, abritz@csir.co.za

**** Programme Committee ****

Diego Calvanese, Free University of Bozen-Bolzano, Italy
Szymon Klarman, CAIR, UKZN / CSIR-Meraka Institute, South Africa
Maria Keet, University of KwaZulu-Natal, South Africa
Marco Montali, Free University of Bozen-Bolzano, Italy
Mira Balaban, Ben-Gurion University of the Negev, Israel
Meghyn Bienvenu, CNRS and Universite Paris-Sud, France
Terry Halpin, INTI International University, Malaysia
Anna Queralt, Barcelona Supercomputing Center, Spain
Vladislav Ryzhikov, Free University of Bozen-Bolzano, Italy
Till Mossakowski, University of Bremen, Germany
Alessandro Artale, Free University of Bozen-Bolzano, Italy
Giovanni Casini, CAIR, UKZN / CSIR-Meraka Institute, South Africa
Pablo Fillottrani, Universidad Nacional del Sur, Argentina
Chiara Ghidini, Fondazione Bruno Kessler, Italy
Roman Kontchakov, Birkbeck, University of London, United Kingdom
Oliver Kutz, University of Bremen, Germany
Tommie Meyer, CAIR, UKZN / CSIR-Meraka Institute, South Africa
David Toman, University of Waterloo, Canada

# Towards a metamodel for conceptual data modeling languages

There are several conceptual data modelling languages one can use to develop a conceptual data model that should capture the subject domain of the application area in an implementation-independent way. Complex software development may need to leverage the strengths of each language yet have the need for interoperability between the software components; e.g., an application layer object-oriented software design in a UML Class diagram that needs to be able to talk to the EER diagram for a relational database. Or one is at a state where there are already several conceptual data models for different applications, but they need to be integrated (or at least made compatible). For various reasons, each of these models may well be represented in a different language, such as in UML, EER, ORM, MADS, Telos etc. Superficially, these languages all seem quite similar, even though they are known to be distinct in ‘a few’ features, such as that UML Class Diagrams typically have methods, but EER does not.

To adequately deal with such scenarios, we need not a comparison of language features, but a unification to foster interoperability. However, no unifying framework exists that respects all of their language features. In addition, one may wonder about questions such as: where are the real commonalities ontologically, what is fundamentally different, and what is the same in underlying idea or meaning but only looks different on the surface? We—Pablo Fillottrani (with the Universidad Nacional Del Sur in Argentina) and I—aim to fill this gap.

As a first step, we designed a common, ontology-driven, metamodel[1] of the static, structural, components of ER, EER, UML v2.4.1, ORM, and ORM2, in such a way that each language is strictly a fragment of the encompassing metamodel. In the meantime, we also have developed the metamodel for the constraints, but for now, the results of the metamodel for the static, structural, components have been accepted at the 32nd International Conference on Conceptual Modeling (ER’13) [1] and 3rd International Conference on Model & Data Engineering (MEDI’13) [2]. There is no repetition among the papers; it merely has been split up into two papers because of page limitations of conference proceedings and the amount of results we had.

The ER’13 paper [1] presents an overview on the core entities and constraints, and an analysis on roles and relationships, their interaction with predicates, and attributes and value types (which we refine with the notion of dimensional attribute). The MEDI’13 paper [2] focuses more on all the structural components, and covers a discussion on classes/concepts, subsumption, aggregation, and nested entity types.

(Warning: spoiler alert…) Perhaps surprisingly, the intersection of all the features in the selected languages is rather small: role, relationship (including subsumption), and object type. The attributions—attributes, value types—are represented differently, but they aim to represent the same underlying idea of attributive properties, and several implicit aspects, such as dimensional attribute and its reusability and relationship versus predicate, have been made explicit. Regarding constraints, only disjointness, completeness, mandatory, object cardinality, and the subset constraint appear in the three language families. The two overview figures in the paper have the classes colour-coded to give an easier overview on how many of the elements are shared across languages, and the appendix contains a table/list of terminology across UML, EER and ORM2, like that UML’s “association” and EER’s “relationship” denote the same kind of thing.

This also received attention in the UKZN e-news letter here, which combined the announcement of the ER’13 paper with my participation in the Dagstuhl seminar on Reasoning over Conceptual Schemas last May, the DST/NRF funded South Africa – Argentina bi-lateral project on the unification of conceptual modelling languages, and Pablo’s visit to UKZN the previous two weeks.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). 11-13 November, 2013, Hong Kong. Springer LNCS (in print).

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS (in print).

[1] the term “metamodel” is used here in the common usage of the term in the literature, but, more precisely, we have a conceptual model that represents the entities and the constraints on their usage in a conceptual data model; e.g., it states that each relationship (association) is composed of at least two roles (association ends), that a nested entity type objectifies one relationship, that a multi-valued attribute is an attribute, and so on.

# Book chapter on conceptual data modelling for biological data

My invited book chapter, entitled “Ontology-driven formal conceptual data modeling for biological data analysis” [1], recently got accepted for publication in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, edited by Mourad Elloumi and Albert Y. Zomaya, and is scheduled for printing by Wiley early 2012.

All this started off with my BSc(Hons) in IT & Computing thesis back in 2003 and my first paper about the trials and tribulations of conceptual data modelling for bio-databases [2] (which is definitely not well-written, but has some valid points and has been cited a bit). In the meantime, much progress has been made on the topic, and I’ve learned, researched, and published a few things about it, too. So, what is the chapter about?

The main aspect is the ‘conceptual data modelling’ with EER, ORM, and UML Class Diagrams, i.e., concerning implementation-independent representations of the data to be managed for a specific application (hence, not ontologies for application-independence).

The adjective ‘formal’ is to point out that the conceptual modeling is not just about drawing boxes, roundtangles, and lines with some adornments, but there is a formal, logic-based, foundation. This is achieved with the formally defined CMcom conceptual data modeling language, which has the greatest common denominator between ORM, EER, and UML Class Diagrams. CMcom has, on the one hand, a mapping the Description Logic language DLRifd and, on the other hand, mappings to the icons in the diagrammatic languages. The nice aspect of this it that, at least in theory and to some extent in practice as well, one can subject it to automated reasoning to check consistency of the classes, of the whole conceptual data model, and derive implicit constraints (an example) or use it in ontology-based data access (an example and some slides on ‘COMODA’ [COnceptual MOdel-based Data Access], tailored to ORM and the horizontal gene transfer database as example).

Then there is the ‘ontology-driven’ component: Ontology and ontologies can aid in conceptual data modeling by providing solution to recurring modeling problems, an ontology can be used to generate several conceptual data models, and one can integrate (a section of) an ontology into a conceptual data model that is subsequently converted into data in database tables.

Last, but not least, it focuses on ‘biological data analysis’. A non-(biologist or bioinformatician) might be inclined to say that should not matter, but it does. Biological information is not as trivial as the typical database design toy examples like “Student is enrolled in Course”, but one has to dig deeper and figure out how to represent, e.g., catalysis, pathway information, the ecological niche. Moreover, it requires an answer to ‘which language features are ‘essential’ for the conceptual data modeling language?’ and if it isn’t included yet, how do we get it in? Some of such important features are n-aries (n>2) and the temporal dimension. The paper includes a proposal for more precisely representing catalysis, informed by ontology (mainly thanks to making the distinction between the role and its bearer), and shows how certain temporal information can be captured, which is illustrated by enhancing the model for SARS viral infection, among other examples.

The paper is not online yet, but I did put together some slides for the presentation at MAIS’11 reported on earlier, which might serve as a sneak preview of the 25-page book chapter, or you can contact me for the CRC.

References

[1] Keet, C.M. Ontology-driven formal conceptual data modeling for biological data analysis. In: Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data. Mourad Elloumi and Albert Y. Zomaya (Eds.). Wiley (in print).

[2] Keet, C.M. Biological data and conceptual modelling methods. Journal of Conceptual Modeling, Issue 29, October 2003. http://www.inconcept.com/jcm.

# Identification and keys in UML Class Diagrams

ER, its extended version EER, ORM, and it extended version ORM2 have several options for identification with keys/reference schemes. UML Class Diagrams, on the other hand, have internal, system-generated, identifiers, with a little-known and underspecified option for user-defined identifiers inspired by ER’s keys, whose description is buried in the standard on pp290-293 [1]. Although UML’s ease of system-generated identifiers relieves the burden of detailed conceptual analysis by the modeller, it is exactly making implicit subject domain semantics explicit that is crucial in the analysis stage; or: less analysis during the modelling stage stores up more problems down the road in terms of software bugs and interoperability. A uniform, or at least a structured and unambiguous approach, can reduce or even avoid inconsistencies in a conceptual data model, especially concerning taxonomies, achieve interoperability through less resource-consuming information integration, and if the identification mechanisms are the same across conceptual data modeling language, then unification among them comes a step closer. The present lack of harmonisation in handling identification hampers all this.

But which identification mechanism(s) could, or should, be specified more precisely for UML, and how does it rhyme with progress about identity in Ontology? How to incorporate it in UML to foster consistent usage? For instance, should the procedure to find and represent identity, or at least good keys, be a step in the modelling methodology, also be enforced in the CASE tool, or be part of the metamodel and/or in the conceptual modelling language itself?

The distinct implicit assumptions and explicit formalisations of extant identification schemes are elucidated in [2] for ER, EER, ORM, and ORM2. As it appears, not even ER/EER and ORM/ORM2 agree fully on keys and reference schemes, neither from a formal perspective (though the difference is minimal), nor from a methodological or CASE tool perspective. Both, however, do aim at strong or weak identification of entities with so-called natural keys (/semantic identifiers) in particular.

At the other end of the spectrum, Ontology does have something on offer, but the philosophers are only interested in identity of entities. Moreover, it is not just that they don’t agree on the details, but there is a tendency to admit it is nigh on inapplicable (see [3] for a good introduction). Watered-down versions of the notion of identity in philosophy that have been proposed in AI, are to used either only the necessary or only the sufficient conditions, which are at least somewhat applicable, be it as a basis for OntoClean [4], or in Description Logic languages (including OWL 2) with their primitive and defined classes. The details of the definitions, explanations, and pros and cons are presented in a ‘digested’ format in the first part of [2].

This analysis was subsequently used to increase the ontological foundations of UML in the second part of [2] to introduce two language enhancements for UML, being formally defined simple and compound identifiers and the notion of defined class, which also have a corresponding extension of UML’s metamodel. The proposed extensions focus on practical usability in conceptual data modelling, informed by ontology, and are approximations to the qualitative, relative, and synchronic identity, and to the notion of equivalent class (defined concept) in OWL (DL).

For those of you who do not care about the ‘unnecessary philosophizing’ (as one reviewer had put it) and justifications, there is a short (4-page) version [5] with the formal definitions and the UML extensions, which has been accepted at the SAICSIT’11 conference in Cape Town, South Africa. The longer version that provides explanation as to why the proposed extensions are the way they are is available as a SoCS technical report [2].

References

[1] Object Management Group. Ontology definition metamodel v1.0. Technical Report formal/2009-05-01, Object Management Group, 2009.

[2] Keet, C.M. Enhancing identification mechanisms in UML class diagrams with meaningful keys (extended version). Technical Report SoCS11-1, School of Computer Science, University of KwaZulu-Natal, South Africa. August 8, 2011.

[3] H. Noonan. Identity. In E. N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Fall 2008 edition, 2008.

[4] N. Guarino and C. Welty. Identity, unity, and individuality: towards a formal toolkit for ontological analysis. In W. Horn, editor, Proc. of ECAI’00. IOS Press, Amsterdam, 2000.

[5] Keet, C.M. Enhancing Identification Mechanisms in UML Class Diagrams with Meaningful Keys. SAICSIT Annual Research Conference 2011 (SAICSIT’11), Cape Town, October 3-5, 2011. ACM Conference Proceedings (in print).

# Recap of the sixth workshop on Fact-Oriented Modelling: ORM’10

The sixth workshop on Fact-Oriented/Object-Role Modelling (ORM’10) in Hersonissou, Crete, Greece, and co-located with the OTM conference just came to a close after a long session on metamodelling to achieve a standard exchange format for the different ORM tools that are in use and under development (such as NORMA, DocTool, and CaseTalk). The other sessions during these three days were filled with paper presentations and several tool demos, reflecting not only the mixed audience of academia and industry, but also the versatility of fact-oriented modelling. I will illustrate some of that in the remainder of the post. (Note: ORM is a conceptual data modelling language that enjoys a formal foundation, and a graphical interface to draw the diagrams and a textual interface to verbalize the domain knowledge so as to facilitate communication with, and validation by, the domain experts.)

An overview of a novel mapping of ORM2 to DatalogLB was presented by Terry Halpin from LogicBlox and INTI International University [1]. The choice for such a mapping was motivated by the support for rules in Datalog so as to also have a formal foundation and implemented solution for the (derivation) rules one can define in an ORM conceptual data model in the NORMA tool.

Staying with formalisms (but of a different kind and scope), Fazat Nur Azizah from the Bandung Institute of Technology proposed a grammar to specify modelling patterns so that actual patterns can be reused for different conceptual data models—alike software design patterns, but then for the FCO-IM flavour of fact-oriented conceptual data modelling [2].

At the other end of the spectrum were two papers that proposed and assessed the use and benefits of ORM in the setting of understanding natural language text documents. Ron McFadyen from the University of Winnipeg introduced document literacy and ORM [3]. Peter Bollen from Maastricht University showed how ORM can improve the completeness and maintenance of specifications like the Business Process Model and Notation [4], which is in analogy with the WSML-documentation-in-ORM [5] and thereby thus strengthening the case that one indeed can be both more precise and communicative with one’s specification if accompanied by a representation in ORM.

There was a session on Master Data Management (MDM), presented by Baba Piprani from MetaGlobal Systems and Patricia Schiefelbein from Boston Scientific. However, I got a bit sidetracked when Baba Piprani had an interesting quote called the “Helsinki principle”, being

Any meaningful exchange of utterances depends upon the prior existence of an agreed set of semantic and syntactic rules. The recipients of the utterances must use only these rules to interpret the received utterances, if it is to mean the same as that which was meant by the utterer. (ISO TR9007)

whereas I was associating the term “Helsinki principle” with a wholly different story, being the right to self-determination described in the Helsinki accords on security and cooperation in Europe. Now, it happens to be the case that proper MDM contributes to solving semantic mismatches.

Last, there was a session on extensions. Tony Morgan from INTI International University [6] had a go at folding and zooming, presenting an alternative approach to abstraction for large ORM diagram (that is, alternative to [7,8] and the many other proposals outside ORM); it introduced new notations, the code-folding idea for but then for ORM diagrams, and a lightweight algorithm. Yan Tang from STARLab at the Free University of Brussels elaborated on the interaction between semantic decision tables and DOGMA [9] (DOGMA is an approach and tool that reuses ORM notation for ontology engineering). Last, but not least, I presented the paper by Alessandro Artale and myself about the basic constraints for relation migration [10], about which I wrote in an earlier blog post.

To wrap up, the workgroup on the common exchange format for fact-oriented modelling tools—chaired by Serge Valera from the European Space Agency—will continue their work toward standardization, the slides of the presentations will be made available on the ORM Foundation website in these days, and else it is on heading towards the 7th ORM workshop next year somewhere in the Mediterranean.

References

(Unfortunately, at the time of writing, most of the papers are still in the proceedings behind Springer’s paywall)

[1] Terry Halpin, Matthew Curland, Kurt Stirewalt, Navin Viswanath, Matthew McGill, and Steven Beck. Mapping ORM to Datalog: An Overview.  International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, 504-513.

[2] Fazat Nur Azizah, Guido P. Bakema, Benhard Sitohang, and Oerip S. Santoso. Information Grammar for Patterns (IGP) for Pattern Language of Data Model Patterns Based on Fully Communication Oriented Information Modeling (FCO-IM). International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, 522-531.

[3] Ron McFadyen and Susan Birdwise. Literacy and Data Modeling. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, p. 532-540.

[4] Peter Bollen. A Fact-Based Meta Model for Standardization Documents. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, p. 464-473.

[5] Tziviskou, C. and Keet, C.M. A Meta-Model for Ontologies with ORM2. Third International Workshop on Object-Role Modelling (ORM’07), Algarve, Portugal, Nov 28-30, 2007. Meersman, R., Tari, Z., Herrero., P. et al. (Eds.), Springer, LNCS 4805, 624-633.

[6] Tony Morgan. A Proposal for Folding in ORM Diagrams. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, 474-483.

[7] Keet, C.M. Using abstractions to facilitate management of large ORM models and ontologies. International Workshop on Object-Role Modeling (ORM’05). Cyprus, 3-4 November 2005. In: OTM Workshops 2005. Halpin, T., Meersman, R. (eds.), Lecture Notes in Computer Science LNCS 3762. Berlin: Springer-Verlag, 2005. pp603-612.

[8] Campbell, L.J., Halpin, T.A. and Proper, H.A.: Conceptual Schemas with Abstractions: Making flat conceptual schemas more comprehensible. Data & Knowledge Engineering (1996) 20(1): 39-85

[9] Yan Tang. Towards Using Semantic Decision Tables for Organizing Data Semantics. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS 6428, 494-503.

[10] Keet, C.M. and Artale, A. A basic characterization of relation migration. International Workshop on Fact-Oriented Modeling (ORM’10), Hersonissou, Greece, October 27-29, 2010. Meersman, R., Herrero, P. (Eds.), OTM Workshops, Springer, LNCS. 6428, 484-493.