A metamodel-driven approach for linking conceptual data models

Interoperability among applications and components of large complex software is still a bit of a nightmare and a with-your-hands-in-the-mud scenario that no-one looks forward to—people look forward to already having linked them up, so they can pose queries across departmental and institutional boundaries, or even across the different data sets within a research unit to advance their data analysis and discover new things.

Sure, ontologies can help with that, but you have to develop one if none is available, and sometimes it’s not even necessary. For instance, you have an ER diagram for the database and a UML model for the business layer. How do you link up those two?

Superficially, this looks easy: an ER entity type matches up with a UML class, and an ER relationship with an UML association. The devil is in the details, however. To name just a few examples: how are you supposed to match a UML qualified association, an ER weak entity type, or an ORM join-subset constraint, to any of the others?

Within the South Africa – Argentina bilateral collaboration project (scope), we set out to solve such things. Although we first planned to ‘simply’ formalize the most common conceptual data modelling languages (ER, UML, and ORM families), we quickly found out we needed not just an ‘inventory’ of terms used in each language matched to one in the other languages, but also under what conditions these entities can be used, hence, we needed a proper metamodel. This we published last year at ER’13 and MEDI’13 [1,2], which I blogged about last year. In the meantime, we not only have finalized the metamodel for the constraints, but also formalized the metamodel, and a journal article describing all this is close to being submitted.

But a metamodel alone doesn’t link up the conceptual data models. To achieve that, we, Pablo Follottrani and I, devised a metamodel-driven approach for conceptual model interoperability, which uses a formalised metamodel with a set of modular rules to mediate the linking and transformation of elements in the conceptual models represented in different languages. This also simplifies the verification of inter-model assertions and model conversion. Its description has recently been accepted as a full paper at the 8th International Web Rule Symposium 2014 (RuleML’14) [3], which I’ll present in Prague on 18 August.

To be able to assert a link between two entities in different models and evaluate automatically (or at least: systematically) whether it is a valid assertion and what it entails, you have to know i) what type of entities they are, ii) whether they are the same, and if not, whether one can be transformed into the other for that particular selection. So, to be able to have those valid inter-model assertions, an approach is needed for transforming one or more elements of a model in one language into another. The general idea of that is shown in the following picture, and explained briefly afterward.

Overview of the approach to transform a model represented in language A to one in language B, illustrated with some sample data from UML to ORM2 (Fig 1 in [3])

Overview of the approach to transform a model represented in language A to one in language B, illustrated with some sample data from UML to ORM2 (Fig 1 in [3])

We have three input items (top of the figure, with the ovals), then a set of algorithms and rules (on the right), and two outputs (bottom, green). The conceptual model is provided by the user, the formalized metamodel is available and a selection of it is included in the RuleML’14 paper [3], and the “vocabulary containing a terminology comparison” was published in ER’13 [1]. Our RuleML paper [3] zooms in on those rules for the algorithms, availing of the formalized metamodel and vocabulary. To give a taste of that (more below): the algorithm has to know that a UML class in the diagram can be mapped 1:1 to a ORM entity type, and that there is some rule or set of rules to transform a UML attribute to an ORM value type.

This is also can be used for the inter-model assertions, albeit in a slightly modified way overall, which is depicted below. Here we use not only the formalised metamodel and the algorithms, but also which entities have 1:1 mappings, which are equivalent but need several steps (called transformations), and which once can only be approximated (requiring user input), and it can be run in both directions from one fragment to the other (one direction is chosen arbitrarily).

Overview for checking the inter-model assertions, and some sample data, checking whether the UML Flower is the same as the ORM Flower (Fig. 2 in [3]).

Overview for checking the inter-model assertions, and some sample data, checking whether the UML Flower is the same as the ORM Flower (Fig. 2 in [3]).

The rules themselves are not directly from one entity in one model to another entity in another, as that would become too messy, isn’t really scalable, and would have lots of repetitions. We use the more efficient way of declaring rules for mapping a conceptual data model entity into its corresponding entity in the metamodel, do any mapping, transformation, or approximation there in the metamodel, and then map it into the matching entity in the other conceptual data model. The rules for the main entities are described in the paper: those for object types, relationships, roles, attributes, and value types, and how one can use those to build up more complex ones for validation of intermodal assertions.

This metamodel-mediated approach to the mappings is one nice advantage of having a metamodel, but one possibly could have gotten away with just having an ‘inventory’ of the entities, not all the extra effort with a full metamodel. But there are benefits to having that metamodel, in particular when actually validating mappings: it can drive the validation of mappings and the generation of model transformations thanks to the constraints declared in the metamodel. How this can work, is illustrated in the following example, showing one example of how the “process mapping assertions using the transformation algorithms” in the centre-part of Fig. 2, above, works out.

Example. Take i) two models, let’s call them Ma and Mb, ii) an inter-model assertion, e.g., a UML association Ra and an ORM fact type Rb, ii) the look-up list with the mappings, transformations, approximations, and the non-mappable elements, and iii) the formalised metamodel. Then the model elements of Ma and Mb are classified in terms of the metamodel, so that the mapping validation process can start. Let us illustrate that for some Ra to Rb (or vv.) mapping of two relationships.

  1. For the vocabulary table, we see that UML association and ORM fact type correspond to Relationship in the metamodel, and enjoy a 1:1 mapping. The ruleset that will be commenced with are R1 from UML to the metamodel and 2R to ORM’s fact type (see rules in the paper).
  2. The R1 and 2R rules refer to Role and Object type in the metamodel. Now things become interesting. The metamodel has represented that each Relationship has at least two Roles, which there are, and each one causes the role-rules to be evaluated, with Ro1 of Ra’s two association ends into the metamodel’s Role and 2Ro to ORM’s roles (‘2Ro’ etc. are the abbreviations of the rules; see paper [3] for details).
  3. The metamodel asserts that Role must participate in the rolePlaying relationship and thus that it has a participating Object type (possibly a subtype thereof) and, optionally, a Cardinality constraint. Luckily, they have 1:1 mappings.
  4. This, in turn causes the rules for classes to be evaluated. From the classes, we see in the metamodel that each Object type must have at least one Identification constraint that involves one or more attributes or value types (which one it is has, been determined by the original classification). This also then has to be mapped using the rules specified.

This whole sequence was set in motion thanks to the mandatory constraints in the metamodel, having gone Relationship to Role to Object type to Single identification (that, in turn, consults Attribute and Datatype for the UML to ORM example here). The ‘chain reaction’ becomes longer with more elaborate participating entities, such as a Nested object type.

Overall, the whole orchestration is no trivial matter, requiring all those inputs, and it won’t be implemented in one codefest on a single rainy Sunday afternoon. Nevertheless, the prospect for semantically good (correct) inter-model assertions across conceptual data modeling languages and automated validation thereof is now certainly a step closer to becoming a reality.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). W. Ng, V.C. Storey, and J. Trujillo (Eds.). Springer LNCS 8217, 313-326. 11-13 November, 2013, Hong Kong.

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). A. Cuzzocrea and S. Maabout (Eds.) September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[3] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

Advertisements

More lively Theory of Computation lectures with peer instruction

Although I don’t teach theory of computation these days, I did encounter something that I’m pretty sure of that would have made the lectures more lively compared to my attempts (blogged about earlier here), and deserves some ‘air time’: peer instruction [1].

My database systems, computer literacy, and networks students already encountered it during some of my lectures, but to briefly recap the general idea (pioneered by Eric Mazur): you insert ‘quiz questions’ (i.e., concept tests) into the lectures, students vote on the multiple choice question, then there is a discussion among the students about the answers, and a re-vote. This sounds perhaps too playful for higher education, but it has been shown to improve course grades by 34-45% [1, 2], thanks to, among others, active engagement, focusing on understanding principles, and student discussion involving ‘teaching each other’.

Even if one thinks it can be suitable for some courses but not others (like tough computer science courses), there’s even peer instruction course material for something so abstract as theory of computation, pioneered by Cynthia Lee and available through peerinstruction4cs. And it works [2].

The peer instruction slides for theory of computation and several other courses can be downloaded after registration. This I did, both out of curiosity and for getting ideas how I might improve the quizzes for the networks course I’m teaching, for which no materials are available yet. Here, I’ll give two examples of such questions for theory of computation to give an indication that it is feasible.

The first one is about tracing through an NFA, and grasping the notion that if any trace through the NFA computation ends in a final state with all input read, it accepts the string. The question is shown in the figure below, which I rendered with the AutoSim automata simulator.

Peer instruction quiz question about NFAs.

Peer instruction quiz question about NFAs.

A and D can be excluded ‘easily’: A has the second trace as [accept] but it rejects, and D has the first trace [reject] but it accepts (which can be checked nicely with the AutoSim file loaded into the tool). The real meat is with options B and C: if one trace accepts and the other rejects, then does the NFA as a whole accept the string or not? Yes, it does; thus, the answer is B.

The other question is further into the course material, on understanding reductions. Let us have ATM and it reduces to MYSTERY_LANGUAGE. Which of the following is true (given the above statement is true):

  1. a) You can reduce to MYSTERY_LANG from ATM.
  2. b) MYSTERY_LANG is undecidable.
  3. c) A decider for MYSTERY_LANG (if it exists) could be used to decide ATM.
  4. d) All of the above

Answering this during the lecture requires more than consulting a ‘cheat sheet’ on reductions. A is a rephrasing of the statement, so that holds (though this option may not be very fair with students whose first language is not English). You will have to remember that ATM is undecidable, and, if visually oriented, the following figure:

Graphical depiction of reduction

Graphical depiction of reduction

 

So, if ATM is undecidable (the “P1” on the left), then so is MYSTERY_LANGUAGE (the “P2” on the right); for the aficionados: there’s a theorem for that in Chapter 9 of [3], being ‘if there is a reduction from P1 to P2, then if P1 is undecidable, then so is P2’. So B holds. Thus, the answer is probably D, but let’s look at C just to be sure of D. Because ATM is ‘in’ MYSTERY_LANGUAGE, then if we can decide MYSTERY_LANGUAGE, then so can we decide ATM—it just happens to be the case we can’t (but if we could, it would have been possible). Thus, indeed, D is the answer.

Sure, there is more to theory of computation than quizzes alone, but feasible it is, and on top of that, student perception of peer instruction is apparently positive [4]. But before making the post too long on seemingly playful matters in education, I’m going to have a look at how that other game—the second semi-final—will unfold to see how interesting the world cup final is going be.

References

[1] Crouch, C.H., Mazur, E. (2001). Peer Instruction: Ten years of experience and results. American Journal of Physics, 69, 970-977.

[2] Bailey Lee, C., Garcia, S., Porter, L. (2013). Can Peer Instruction Be Effective in Upper-Division Computer Science Courses?. ACM Transactions on Computing Education, 13(3): 12-22.

[3] Hopcroft, J.E., Motwani, R. & Ullman, J.D. (2007). Introduction to Automata Theory, Languages, and Computation, 3rd ed., Pearson Education.

[4] Simon, B., Esper, S., Porter, L., Cutts, Q. (2013). Student experience in a student-centered peer instruction classroom. Proceedings of the International Computing Education Research Conference (ICER’13), ACM Conference Proceedings. pp129-136.