Experimentally-motivated non-trivial intermodel links between conceptual models

I am well aware that some people prefer Agile and mash-ups and such to quickly, scuffily, put an app together, but putting a robust, efficient, lasting, application together does require a bit of planning—analysis and design in the software development process. For instance, it helps to formalise one’s business rules or requirements, or at least structure them with, say, SBVR or ORM, so as to check that the rules obtained from the various stakeholders do not contradict themselves cf. running into problems when testing down the line after having implemented it during a testing phase. Or analyse a bit upfront which classes are needed in the front-end application layer cf. perpetual re-coding to fix one’s mistakes (under the banner ‘refactoring’, as if naming the process gives it an air of respectability), and create, say, a UML diagram or two. Or generating a well-designed database based on an EER model.

Each of these three components can be done in isolation, but how to do this for complex system development where the object-oriented application layer hast to interact with the database back-end, all the while ensuring that the business rules are still adhered to? Or you had those components already, but they need to be integrated? One could link the code to tables in the implementation layer, on an ad hoc basis, and figure it out again and again for any combination of languages and systems. Another one is to do that at the conceptual modelling layer irrespective of the implementation language. The latter approach is reusable (cf. reinventing the mapping wheel time and again), and at a level of abstraction that is easier to cope with for more people, and even more so if the system is really large. So, we went after that option for the past few years and have just added another step to realising all this: how to link which elements in the different models for the system.

It is not difficult to imagine a tool where one can have several windows open, each with a model in some conceptual modelling language—many CASE tools already support modelling in different notations anyway. It is conceptually also fairly straightforward when in, say, the UML model there is a class ‘Employee’ and in the ER diagram there’s an ‘Employee’ entity type: it probably will work out to align these classes. Implementing just this is a bit of an arduous engineering task, but doable. In fact, there is such a tool for models represented in the same language, where the links can be subsumption, equivalence, or disjointness between classes or between relationships: ICOM [2]. But we need something like that to work across modelling languages as well, and for attributes, too. In the hand-waiving abstract sense, this may be intuitively trivial, but the gory details of the conceptual and syntax aspects are far from it. For instance, what should a modeller do if one model has ‘Address’ as an attribute and the other model has it represented as a class? Link the two despite being different types of constructs in the respective languages? Or that ever-recurring example of modelling marriage: a class ‘Marriage’ with (at least) two participants, or ‘Marriage’ as a recursive relationship (UML association) of a ‘Person’ class? What to do if a modeller in one model had chosen the former option and another modeller the latter? Can they be linked up somehow nonetheless, or would one have to waste a lot of time redesigning the other model?

Instead of analysing this for each case, we sought to find a generic solution to it; with we being: Zubeida Khan, Pablo Fillottrani, Karina Cenci, and I. The solution we propose will appear soon in the proceedings of the 20th Conference on Advances in DataBases and Information Systems (ADBIS’16) that will be held at the end of this month in Prague.

So, what did we do? First, we tried to narrow down the possible links between elements in the models: in theory, one might want to try to link anything to anything, but we already knew some model elements are incompatible, and we were hoping that others wouldn’t be needed yet other suspected to be needed, so that a particular useful subset could be the focus. To determine that, we analysed a set of ICOM projects created by students at the Universidad Nacionál del Sur (in Bahía Blanca), and we created model integration scenarios based on publicly available conceptual models of several subject domains, such as hospitals, airlines, and so on, including EER diagrams, UML class diagrams, and ORM models. An example of an integration scenario is shown in the figure below: two conceptual models about airline companies, with on the left the ER diagram and on the right the UML diagram.

One of the integration scenarios [1]

One of the integration scenarios [1]

The solid purple links are straightforward 1:1 mappings; e.g., er:Airlines = uml:Airline. Long-dashed dashed lines represent ‘half links’ that are semantically very similar, such as er:Flight.Arr_time ≈ uml:Flight.arrival_time, where the idea of attribute is the same, but ER attributes don’t have a datatype specified whereas UML attributes do. The red short-dashed dashed lines require some transformation: e.g., er:Airplane.Type is an attribute yet uml:Aircraft is a class, and er:Airport.code is an identifier (with its mandatory 1:1 constraint, still no datatype) but uml:Airport.ID is just a simple attribute. Overall, we had 40 models with 33 schema matchings, with 25 links in the ICOM projects and 258 links in the integration scenarios. The detailed aggregates are described in the paper and the dataset is available for reuse (7MB). Unsurprisingly, there were more attribute links than class links (if a class can be linked, then typically also some of its attributes). There were 64 ‘half’ links and 48 transformation links, notably on the slightly compatible attributes, attributes vs. identifiers, attribute<->value type, and attribute<->class.

Armed with these insights from the experiment, a general intermodel link validation approach [3] that uses the unified metamodel [4], and which type of elements occur mostly in conceptual models with their logic-based profiles [5,6], we set out to define those half links and transformation links. While this could have been done with a logic of choice, we opted for a clear step toward implementability by exploiting the ATLAS Transformation Language (ATL) [7] to specify the transformations. As there’s always a source model and a target model in ATL, we constructed the mappings such that both models in question are the ‘source’ and both are mapped into a new, single, ‘target’ model that still adheres to the constraints imposed by the unifying metamodel. A graphical depiction of the idea is shown in the figure below; see paper for details of the mapping rules (they don’t look nice in a blog post).

Informal, graphical rendering of the rule AttributeObject Type output [1]

Informal, graphical rendering of the rule Attribute<->Object Type output [1]

Someone into this matter might think, based on this high-level description, there’s nothing new here. However, there is, and the paper has an extensive related works section. For instance, there’s related work on Distributed Description Logics with bridge rules [8], but they don’t do attributes and the logics used for that doesn’t fit well with the features needed for conceptual modelling, so it cannot be adopted without substantial modifications. Triple Graph Grammars look quite interesting [9] for this sort of task, as does DOL [10], but either would require another year or two to figure it out (feel free to go ahead already). On the practical side, e.g., the Eclipse metamodel of the popular Eclipse Modeling Framework didn’t have enough in the metamodel for what needs to be included, both regarding types of entities and the constraints that would need to be enforced. And so on, such that by a process of elimination, we ended up with ATL.

It would be good to come up with those logic-based linking options and proofs of correctness of the transformation rules presented in the paper, but in the meantime, an architecture design of the new tool was laid out in [11], which is in the stage of implementation as I write this. For now, at least a step has been taken from the three years of mostly theory and some experimentation toward implementation of all that. To be continued J.

 

References

[1] Khan, Z.C., Keet, C.M., Fillottrani, P.R., Cenci, K.M. Experimentally motivated transformations for intermodel links between conceptual models. 20th Conference on Advances in Databases and Information Systems (ADBIS’16). Springer LNCS. August 28-31, Prague, Czech Republic. (in print)

[2] Fillottrani, P.R., Franconi, E., Tessaris, S. The ICOM 3.0 intelligent conceptual modelling tool and methodology. Semantic Web Journal, 2012, 3(3): 293-306.

[3] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer Lecture Notes in Computer Science LNCS vol. 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

[4] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 2015, 98:30-53.

[5] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Johannesson, P., Lee, M.L. Liddle, S.W., Opdahl, A.L., Pastor López, O. (Eds.). Springer LNCS vol 9381, 585-593. 19-22 Oct, Stockholm, Sweden.

[6] Fillottrani, P.R., Keet, C.M. Evidence-based Languages for Conceptual Data Modelling Profiles. 19th Conference on Advances in Databases and Information Systems (ADBIS’15). Morzy et al. (Eds.). Springer LNCS vol. 9282, 215-229. Poitiers, France, Sept 8-11, 2015.

[7] Jouault, F. Allilaire, F. Bzivin, J. Kurtev, I. ATL: a model transformation tool. Science of Computer Programming, 2008, 72(12):31-39.

[8] Ghidini, C., Serafini, L., Tessaris, S., Complexity of reasoning with expressive ontology mappings. Formal ontology in Information Systems (FOIS’08). IOS Press, FAIA vol. 183, 151-163.

[9] Golas, U., Ehrig, H., Hermann, F. Formal specification of model transformations by triple graph grammars with application conditions. Electronic Communications of the ESSAT, 2011, 39: 26.

[10] Mossakowsi, T., Codescu, M., Lange, C. The distributed ontology, modeling and specification language. Proceedings of the Workshop on Modular Ontologies 2013 (WoMo’13). CEUR-WS vol 1081. Corunna, Spain, September 15, 2013.

[11] Fillottrani, P.R., Keet, C.M. A Design for Coordinated and Logics-mediated Conceptual Modelling. 29th International Workshop on Description Logics (DL’16). Peñaloza, R. and Lenzerini, M. (Eds.). CEUR-WS Vol. 1577. April 22-25, Cape Town, South Africa. (abstract)

Advertisement

Reblogging 2009: Building bias into your database

From the “10 years of keetblog – reblogging: 2009”: The tl;dr of it: bad data management -> bad policy decisions, and how you can embed political preferences and prejudices in a conceptual data model.

While the post has a computing flavor to it especially on the database design and a touch of ontologies, it is surely also of general interest, because it gives some insight into the management of data that is used for policy-making in and for conflict zones. A nicer version of this blog post and the one after that made it into a paper-review article “Dirty wars, databases, and indices” in the Peace & Conflict Review journal (Fall 2009 issue) of the UN-mandated University for Peace in Costa Rica.

Building bias into your database; Jan 7, 2009

 p.s.: while I intended to write a post on attending the ER’15 conferences, the exciting times with the student protests in South Africa put that plan on the backburner for a few more days at least.

—–

For developing bio-ontologies, if one follows Barry Smith and cs., then one is solely concerned with the representation of reality; moreover, it has been noted that ontologies can, or should be, seen as a representation of a scientific theory [1] or at least that they are an important part of doing science [2]. In that case, life is easy, not hard, for we have the established method of scientific inquiry to settle disputes (among others, by doing additional lab experiments to figure out more about reality). Domain- and application ontologies, as well as conceptual data models, for the enterprise universe of discourse require, at times, a consensus-based approach where some parts of the represented information are the outcome of negotiations and agreements among the stakeholders.

Going one step further on the sliding scale: for databases and application software for the humanities, and conflict databases in particular, one makes an ontology or conceptual data model conforming to one’s own (or the funding organisation’s) political convictions and with the desired conclusions in mind. Building data vaults seems to be the intended norm rather than the exception, hence, maintenance and usage and data analysis beyond the developers limited intentions, let alone integration, are a nightmare.

 In this post, I will outline some suggestions for building your own politicized representation—be it an ontology or conceptual data model—for armed conflict data, such as terrorist incidents, civil war, and inter-state war. I will discuss in the next post a few examples of conflict data analysis, both regarding extant databases and the ‘dirty war index’ application built on top of them. A later post may deal with a solution to the problems, but for now, it would already be a great help not to adhere to the tips below.

Tips for biasing the representation

In random order, you could do any of the following to pollute the model and hamper data analysis so as to ensure your data is scientifically unreliable but suitable to serve your political agenda.

1. Have a fairly flat taxonomy of types of parties; in fact, just two subtypes suffice: US and THEM, although one could subtype the latter into ‘they’, ‘with them’, and ‘for them’. The analogue, with ‘we’, ‘with us’, and ‘for us’ is too risky for potential of contagion of responsibility of atrocities and therefore not advisable to include; if you want to record any of it, then it is better to introduce types such as ‘unknown perpetrator’ or ‘not officially claimed event’ or ‘independent actor’.

2. Aggregate creatively. For instance, if some of the funding for your database comes from a building construction or civil engineering company, refine that section of target types, or include new target types only when you feel like it is targeted sufficiently often by the opponent to warrant a whole new tuple or table from then onwards. Likewise, some funding agencies would like to see a more detailed breakdown of types of victims by types of violence, some don’t. Last, be careful with the typology of arms used, in particular when your country is producing them; a category like ‘DIY explosive device’ helps masking the producer.

3. Under-/over-represent geography. Play with granularity (by city/village, region, country, continent) and categorization criteria (state borders, language, former chiefdoms, parishes, and so forth), e.g., include (or not) notions such as ‘occupied territory’ (related to the actors) and `liberated region’ or `autonomous zone’, or that an area may, or may not, be categorized or named differently at the same time. Above all, make the modelling decisions in an inconsistent way, so that no single dimension can be analysed properly.

4. Make an a-temporal model and pretend not to change it, but (a) allow non-traceable object migration so that defecting parties who used to be with US (see point 1) can be safely re-categorised as THEM, and (b) refine the hierarchy over time anyway so as to generate time-inconsistency for target types (see point 2) and geography (see point 3), in order to avoid time series analyses and prevent discovering possible patterns.

5. Have a minimal amount of classes for bibliographic information, lest someone would want to verify the primary/secondary sources that report on numbers of casualties and discovers you only included media reports from the government-censored newspapers (or the proxy-funding agency, or the rebel radio station, or the guerrilla pamphlets).

6. Keep natural language definitions for key concepts in a separate file, if recorded at all. This allows for time-inconsistency in operational definitions as well as ignorance of the data entry clerks so that each one can have his own ideas about where in the database the conflict data should go.

7. Minimize the use of database integrity constraints, hence, minimize representing constraints in the ontology to begin with, hence, use a very simple modelling language so you can blame the language for not representing the subject domain adequately.

I’m not saying all conflict databases use all of these tricks; but some use at least most of them, which ruins credibility of those database of which the analysts actually did try to avoid these pitfalls (assuming there are such databases, that is). Optimism wants me to believe developers did not think of all those issues when designing the database. However, there is a tendency that each conflict researcher compiles his own data set and that each database is built from scratch.

For the current scope, I will set aside the problems with data collection and how to arrive at guesstimated semi-reliable approximations of deaths, severe injuries, rape, torture victims and so forth (see e.g. [3] and appendix B of [4]). Inherent problems with data collection is one thing and difficult to fix, bad modelling and dubious or partial data analysis is a whole different thing and doable to fix. I elaborate on latter claim in the next post.

References

[1] Barry Smith. Ontology (Science). In: C. Eschenbach and M. Gruninger (eds.), Formal Ontology in Information Systems. Proceedings of FOIS 2008. preprint

[2] Keet, C.M. Factors affecting ontology development in ecology. Data Integration in the Life Sciences 2005 (DILS’05), Ludaescher, B, Raschid, L. (eds.). San Diego, USA, 20-22 July 2005. Lecture Notes in Bioinformatics LNBI 3615, Springer Verlag, 2005. pp46-62.

[3] Taback N (2008 ) The Dirty War Index: Statistical issues, feasibility, and interpretation. PLoS Med 5(12): e248. doi:10.1371/journal.pmed.0050248.

[4] Weinstein, Jeremy M. (2007). Inside rebellion—the politics of insurgent violence. Cambridge University Press. 402p.

Fruitful ADBIS’15 in Poitiers

The 19th Conference on Advances in Databases and Information Systems (ADBIS’15) just finished yesterday. It was an enjoyable and well-organised conference in the lovely town of Poitiers, France. Thanks to the general chair, Ladjel Bellatreche, and the participants I had the pleasure to meet up with, listen to, and receive feedback from. The remainder of this post mainly recaps the keynotes and some of the presentations.

 

Keynotes

The conference featured two keynotes, one by Serge Abiteboul and on by Jens Dittrich, both distinguished scientists in databases. Abiteboul presented the multi-year project on Webdamlog that ended up as a ‘personal information management system’, which is a simple term that hides the complexity that happens behind the scenes. (PIMS is informally explained here). It breaks with the paradigm of centralised text (e.g., Facebook) to distributed knowledge. To achieve that, one has to analyse what’s happening and construct the knowledge from that, exchange knowledge, and reason and infer knowledge. This requires distributed reasoning, exchanging facts and rules, and taking care of access control. It is being realised with a datalog-style language but that then also can handle a non-local knowledge base. That is, there’s both solid theory and implementation (going by the presentation; I haven’t had time to check it out).

The main part of the cool keynote talk by Dittrich was on ‘the case for small data management’. From the who-wants-to-be-a-millionaire style popquiz question asking us to guess the typical size of a web database, it appeared to be only in the MBs (which most of us overestimated), and sort of explains why MySQL [that doesn’t scale well] is used rather widely. This results in a mismatch between problem size and tools. Another popquiz question answer: the 100MB RDF can just as well be handled efficiently by python, apparently. Interesting factoids, and one that has/should have as consequence we should be looking perhaps more into ‘small data’. He presented his work on PDbF as an example of that small data management. Very briefly, and based on my scribbles from the talk: its an enhanced pdf where you can access the raw data behind the graphs in the paper as well (it is embedded in it, with OLAP engine for posing the same and other queries), has a html rendering so you can hover over the graphs, and some more visualisation. If there’s software associated with the paper, it can go into the whole thing as well. Overall, that makes the data dynamic, manageable, traceable (from figure back to raw data), and re-analysable. The last part of his talk was on his experiences with the flipped classroom (more here; in German), but that was not nearly as fun as his analysis and criticism of the “big data” hype. I can’t recall exactly his plain English terms for the “four V4”, but the ‘lots of crappy XML data that changes’ remained of it in my memory bank (it was similar to the first 5 minutes of another keynote talk he gave).

 

Sessions

Sure, despite the notes on big data, there were presentations in the sessions that could be categorised under ‘big data’. Among others, Ajantha Dahanayake presented a paper on a proposal for requirements engineering for big data [1]. Big data people tend to assume it is just there already for them to play with. But how did it get there, how to collect good data? The presentation outlined a scenario-based backwards analysis, so that one can reduce unnecessary or garbage data collection. Dahanayake also has a tool for it. Besides the requirements analysis for big data, there’s also querying the data and the desire to optimize it so as to keep having fast responses despite its large size. A solution to that was presented by Reuben Ndindi, whose paper also won the best paper award of the conference [2] (for the Malawians at CS@UCT: yes, the Reuben you know). It was scheduled in the very last session on Friday and my note-taking had grinded to a halt. If my memory serves me well, they make a metric database out of a regular database, compute the distances between the values, and evaluate the query on that, so as to obtain a good approximation of the true answer. There’s both the theoretical foundation and an experimental validation of the approach. In the end, it’s faster.

Data and schema evolution research is alive and well, as were time series and temporal aspects. Due to parallel sessions and my time constraints writing this post, I’ll mention only two on the evolution; one because it was a very good talk, the other because of the results of the experiments. Kai Herrmann presented the CoDEL language for database evolution [3]. A database and the application that uses it change (e.g., adding an attribute, splitting a table), which requires quite lengthy scripts with lots of SQL statements to execute. CoDEL does it with fewer statements, and the language has the good quality of being relationally complete [3]. Lesley Wevers approached the problem from a more practical angle and restricted to online databases. For instance, Wikipedia does make updates to their database schema, but they wouldn’t want to have Wikipedia go offline for that duration. How long does it take for which operation, in which RDBMS, and will it only slow down during the schema update, or block any use of the database entirely? The results obtained with MySQL, PostgreSQL and Oracle are a bit of a mixed bag [4]. It generated a lively debate during the presentation regarding the test set-up, what one would have expected the results to be, and the duration of blocking. There’s some work to do there yet.

The presentation of the paper I co-authored with Pablo Fillottrani [5] (informally described here) was scheduled for that dreaded 9am slot the morning after the social dinner. Notwithstanding, quite a few participants did show up, and they showed interest. The questions and comments had to do with earlier work we used as input (the metamodel), qualifying quality of the conceptual model, and that all too familiar sense of disappointment that so few language features were used widely in publicly available conceptual models (the silver lining of excellent prospects of runtime usage of conceptual models notwithstanding). Why this is so, I don’t know, though I have my guesses.

 

And the other things that make conference useful and fun to go to

In short: Networking, meeting up again with colleagues not seen for a while (ranging from a few months [Robert Wrembel] to some 8 years [Nadeem Iftikhar] and in between [a.o., Martin Rezk, Bernhard Thalheim]), meeting new people, exchanging ideas, and the social events.

2008 was the last time I’d been in France, for EMMSAD’08, where, looking back now, I coincidentally presented a paper also on conceptual modelling languages and logic [6], but one that looked at comprehensive feature coverage and comparing languages rather than unifying. It was good to be back in France, and it was nice to realise my understanding and speaking skills in French aren’t as rusty as I thought they were. The travels from South Africa are rather long, but definitely worthwhile. And it gives me time to write blog posts killing time on the airport.

 

References

(note: most papers don’t show up at Google scholar yet, hence, no links; they are on the Springer website, though)

[1] Noufa Al-Najran and Ajantha Dahanayake. A Requirements Specification Framework for Big Data Collection and Capture. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, .

[2] Boris Cule, Floris Geerts and Reuben Ndindi. Space-bounded query approximation. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, 397-414.

[3] Kai Herrmann, Hannes Voigt, Andreas Behrend and Wolfgang Lehner. CoDEL – A Relationally Complete Language for Database Evolution. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, 63-76.

[4] Lesley Wevers, Matthijs Hofstra, Menno Tammens, Marieke Huisman and Maurice van Keulen. Analysis of the Blocking Behaviour of Schema Transformations in Relational Database Systems. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, 169-183.

[5] Pablo R. Fillottrani and C. Maria Keet. Evidence-based Languages for Conceptual Data Modelling Profiles. ADBIS’15. Morzy et al. (Eds.). Springer LNCS vol. 9282, 215-229.

[6] C. Maria Keet. A formal comparison of conceptual data modeling languages. EMMSAD’08. CEUR-WS Vol-337, 25-39.

The ontology-driven unifying metamodel of UML class diagrams, ER, EER, ORM, and ORM2

Metamodelling of conceptual data modelling languages is nothing new, and one may wonder why one would need yet another one. But you do, if you want to develop complex systems or integrate various legacy sources (which South Africa is going to invest more money in) and automate at least some parts of it. For instance: you want to link up the business rules modelled in ORM, the EER diagram of the database, and the UML class diagram that was developed for the application layer. Are the, say, Student entity types across the models really the same kind of thing? And UML’s attribute StudentID vs. the one in the EER diagram? Or EER’s EmployeesDependent weak entity type with the ORM business rule that states that “each dependent of an employee is identified by EmployeeID an the Dependent’s Name?

Ascertaining the correctness of such inter-model assertions in different languages does not require a comparison and contrast of their differences, but a way to harmonise or unify them. Some such models already exist, but they take subsets of the languages, whereas all those features do appear in actual models [1] (described here informally). Our metamodel, in contrast, aims to capture all constructs of the aforementioned languages and the constraints that hold between them, and generalize in an ontology-driven way so that the integrated metamodel subsumes the structural, static elements of them (i.e., the integrated metamodel has as them as fragments). Besides some updates to the earlier metamodel fragment presented in [2,3], the current version [4,5] also includes the metamodel fragment of their constraints (though omits temporal aspects and derived constraints). The metamodel and its explanation can be found in the paper in An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2 [4] that I co-authored with Pablo Fillottrani, and which was recently accepted in Data & Knowledge Engineering.

Methodologically, the unifying metamodel presented in An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2 [4], is ontological rather than formal (cf. all other known works). On that ‘ontology-driven approach’, here is meant the use of insights from Ontology (philosophy) and ontologies (in computing) to enhance the quality of a conceptual data model and obtain that ‘glue stuff’ to unify the metamodels of the languages. The DKE paper describes all that, such as: on the nature of the UML association/ORM fact type (different wording, same ontological commitment), attributes with and without data types, the plethora of identification constraints (weak entity types, reference modes, etc.), where can one reuse an ‘attribute’ if at all, and more. The main benefit of this approach is being able to cope with the larger amount of elements that are present in those languages, and it shows that, in the details, the overlap in features across the languages is rather small: 4 among the set of 23 types of relationship, role, and entity type are essentially the same across the languages (see figure below), and 6 of the 49 types of constraints. The metamodel is stable for the modelling languages covered. It is represented in UML for ease of communication, but, as mentioned earlier, it also has been formalised in the meantime [5].

Types of elements in the languages; black-shaded: entity is present in all three language families (UML, EER, ORM); darg grey: on two of the three; light grey: in one; while-filled: in none, but we added it to glue things together. (Source: [6])

Types of elements in the languages; black-shaded: entity is present in all three language families (UML, EER, ORM); dark grey: on two of the three; light grey: in one; while-filled: in none, but we added the more general entities to ‘glue’ things together. (Source: [4])

Metamodel fragment with some constraints among some of the entities. (Source [4])

Metamodel fragment with some constraints among some of the entities. (Source [4])

The DKE paper also puts it in a broader context with examples, model analyses using the harmonised terminology, and a use case scenario that demonstrates the usefulness of the metamodel for inter-model assertions.

While the 24-page paper is rather comprehensive, research results wouldn’t live up to it if it didn’t uncover new questions. Some of them have been, and are being, answered in the meantime, such as its use for classifying models and comparing their characteristics [1,6] (blogged about here and here) and a rule-based approach to validating inter-model assertions [7] (informally here). Although the 3-year funded project on the Ontology-driven unification of conceptual data modelling languages—which surely contributed to realising this paper—just finished officially, we’re not done yet, or: more is in the pipeline. To be continued…

 

References

[1] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Springer LNCS. 19-22 Oct, Stockholm, Sweden. (in press)

[2] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). W. Ng, V.C. Storey, and J. Trujillo (Eds.). Springer LNCS 8217, 313-326. 11-13 November, 2013, Hong Kong.

[3] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). A. Cuzzocrea and S. Maabout (Eds.) September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[4] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering. 2015. DOI: 10.1016/j.datak.2015.07.004. (in press)

[5] Fillottrani, P.R., Keet, C.M. KF metamodel Formalization. Technical Report, Arxiv.org http://arxiv.org/abs/1412.6545. Dec 19, 2014. 26p.

[6] Fillottrani, P.R., Keet, C.M. Evidence-based Languages for Conceptual Data Modelling Profiles. 19th Conference on Advances in Databases and Information Systems (ADBIS’15). Springer LNCS. Poitiers, France, Sept 8-11, 2015. (in press)

[7] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

FAIR’14 and ‘modelling relationships’ tutorial

After a weekend of ‘loadshedding’ (one of those South African euphemisms) I’m posting a few notes on the Forum on Artificial Intelligence Research 2014 (FAIR’14) that took place from 3-5 Dec 2014 at Stellenbosch University, which was organised by CAIR and co-located with the FASTAR/Espresso Workshop 2014, which, in turn, was co-located with PRASA, AFLaT, and RobMech 2014 in Cape Town. FAIR’14 consisted of a presentation by Sergei Obiedkov of the Higher School of Economics, Russia, a tutorial on modelling relationships in ontologies by me, and a course on computational social choice theory by Ulle Endriss from the ILLC, University of Amsterdam, The Netherlands.

While not quite relevant to my current research except for judgement aggregation at the end (for crowdsourcing), Ulle’s course was one of those events that made me think “[why didn’t/if only] I was exposed to this material before?!”, when I had to make choices as to what to study and specialise in (though, admitted, once knowing about the math with game theory and applying that to peace negotiations in my MA pdf, I still went on in CS with KR&R and ontologies). Ulle’s course combined socially relevant topics, such as the fair allocation of resources and voting systems, with solid, precise, logic- and math-based representations and computation. Besides the engaging content, he’s also good at teaching it. The content and slides are a condensed version of his MSc course on social choice theory and are available online here, which also has links to related reading material.

I tried to condense into 2 hours some aspects of modelling relationships in ontologies. It started with some problems and questions, proceeded to touching upon the nature of relations and some detail of the formal semantics, common relationships (with some detail about mereotopology), and closing with some practical modelling guidance and reasoner performance when modelling it one way or another. It being a tutorial, and not all participants had Protégé installed, I resorted to a peer instruction audience response system to incorporate interactively some questions about modelling some relationships. The slides are available online (though also here the text on the slides only partially reflect what I’ve talked about).

Other than that, there’s always the social component. Despite the weird time-warp that Stellenbosch town constitutes, it was really nice to catch up with former colleagues and to see the progress of postgrads of UKZN, to hear about the future of CAIR, and that it’s a small world even when meeting people new to me. And the food & wine was delicious. The train travel back to Cape Town took a bit longer than the schedule said it ought to be, but I recommend it nevertheless.

A metamodel-driven approach for linking conceptual data models

Interoperability among applications and components of large complex software is still a bit of a nightmare and a with-your-hands-in-the-mud scenario that no-one looks forward to—people look forward to already having linked them up, so they can pose queries across departmental and institutional boundaries, or even across the different data sets within a research unit to advance their data analysis and discover new things.

Sure, ontologies can help with that, but you have to develop one if none is available, and sometimes it’s not even necessary. For instance, you have an ER diagram for the database and a UML model for the business layer. How do you link up those two?

Superficially, this looks easy: an ER entity type matches up with a UML class, and an ER relationship with an UML association. The devil is in the details, however. To name just a few examples: how are you supposed to match a UML qualified association, an ER weak entity type, or an ORM join-subset constraint, to any of the others?

Within the South Africa – Argentina bilateral collaboration project (scope), we set out to solve such things. Although we first planned to ‘simply’ formalize the most common conceptual data modelling languages (ER, UML, and ORM families), we quickly found out we needed not just an ‘inventory’ of terms used in each language matched to one in the other languages, but also under what conditions these entities can be used, hence, we needed a proper metamodel. This we published last year at ER’13 and MEDI’13 [1,2], which I blogged about last year. In the meantime, we not only have finalized the metamodel for the constraints, but also formalized the metamodel, and a journal article describing all this is close to being submitted.

But a metamodel alone doesn’t link up the conceptual data models. To achieve that, we, Pablo Follottrani and I, devised a metamodel-driven approach for conceptual model interoperability, which uses a formalised metamodel with a set of modular rules to mediate the linking and transformation of elements in the conceptual models represented in different languages. This also simplifies the verification of inter-model assertions and model conversion. Its description has recently been accepted as a full paper at the 8th International Web Rule Symposium 2014 (RuleML’14) [3], which I’ll present in Prague on 18 August.

To be able to assert a link between two entities in different models and evaluate automatically (or at least: systematically) whether it is a valid assertion and what it entails, you have to know i) what type of entities they are, ii) whether they are the same, and if not, whether one can be transformed into the other for that particular selection. So, to be able to have those valid inter-model assertions, an approach is needed for transforming one or more elements of a model in one language into another. The general idea of that is shown in the following picture, and explained briefly afterward.

Overview of the approach to transform a model represented in language A to one in language B, illustrated with some sample data from UML to ORM2 (Fig 1 in [3])

Overview of the approach to transform a model represented in language A to one in language B, illustrated with some sample data from UML to ORM2 (Fig 1 in [3])

We have three input items (top of the figure, with the ovals), then a set of algorithms and rules (on the right), and two outputs (bottom, green). The conceptual model is provided by the user, the formalized metamodel is available and a selection of it is included in the RuleML’14 paper [3], and the “vocabulary containing a terminology comparison” was published in ER’13 [1]. Our RuleML paper [3] zooms in on those rules for the algorithms, availing of the formalized metamodel and vocabulary. To give a taste of that (more below): the algorithm has to know that a UML class in the diagram can be mapped 1:1 to a ORM entity type, and that there is some rule or set of rules to transform a UML attribute to an ORM value type.

This is also can be used for the inter-model assertions, albeit in a slightly modified way overall, which is depicted below. Here we use not only the formalised metamodel and the algorithms, but also which entities have 1:1 mappings, which are equivalent but need several steps (called transformations), and which once can only be approximated (requiring user input), and it can be run in both directions from one fragment to the other (one direction is chosen arbitrarily).

Overview for checking the inter-model assertions, and some sample data, checking whether the UML Flower is the same as the ORM Flower (Fig. 2 in [3]).

Overview for checking the inter-model assertions, and some sample data, checking whether the UML Flower is the same as the ORM Flower (Fig. 2 in [3]).

The rules themselves are not directly from one entity in one model to another entity in another, as that would become too messy, isn’t really scalable, and would have lots of repetitions. We use the more efficient way of declaring rules for mapping a conceptual data model entity into its corresponding entity in the metamodel, do any mapping, transformation, or approximation there in the metamodel, and then map it into the matching entity in the other conceptual data model. The rules for the main entities are described in the paper: those for object types, relationships, roles, attributes, and value types, and how one can use those to build up more complex ones for validation of intermodal assertions.

This metamodel-mediated approach to the mappings is one nice advantage of having a metamodel, but one possibly could have gotten away with just having an ‘inventory’ of the entities, not all the extra effort with a full metamodel. But there are benefits to having that metamodel, in particular when actually validating mappings: it can drive the validation of mappings and the generation of model transformations thanks to the constraints declared in the metamodel. How this can work, is illustrated in the following example, showing one example of how the “process mapping assertions using the transformation algorithms” in the centre-part of Fig. 2, above, works out.

Example. Take i) two models, let’s call them Ma and Mb, ii) an inter-model assertion, e.g., a UML association Ra and an ORM fact type Rb, ii) the look-up list with the mappings, transformations, approximations, and the non-mappable elements, and iii) the formalised metamodel. Then the model elements of Ma and Mb are classified in terms of the metamodel, so that the mapping validation process can start. Let us illustrate that for some Ra to Rb (or vv.) mapping of two relationships.

  1. For the vocabulary table, we see that UML association and ORM fact type correspond to Relationship in the metamodel, and enjoy a 1:1 mapping. The ruleset that will be commenced with are R1 from UML to the metamodel and 2R to ORM’s fact type (see rules in the paper).
  2. The R1 and 2R rules refer to Role and Object type in the metamodel. Now things become interesting. The metamodel has represented that each Relationship has at least two Roles, which there are, and each one causes the role-rules to be evaluated, with Ro1 of Ra’s two association ends into the metamodel’s Role and 2Ro to ORM’s roles (‘2Ro’ etc. are the abbreviations of the rules; see paper [3] for details).
  3. The metamodel asserts that Role must participate in the rolePlaying relationship and thus that it has a participating Object type (possibly a subtype thereof) and, optionally, a Cardinality constraint. Luckily, they have 1:1 mappings.
  4. This, in turn causes the rules for classes to be evaluated. From the classes, we see in the metamodel that each Object type must have at least one Identification constraint that involves one or more attributes or value types (which one it is has, been determined by the original classification). This also then has to be mapped using the rules specified.

This whole sequence was set in motion thanks to the mandatory constraints in the metamodel, having gone Relationship to Role to Object type to Single identification (that, in turn, consults Attribute and Datatype for the UML to ORM example here). The ‘chain reaction’ becomes longer with more elaborate participating entities, such as a Nested object type.

Overall, the whole orchestration is no trivial matter, requiring all those inputs, and it won’t be implemented in one codefest on a single rainy Sunday afternoon. Nevertheless, the prospect for semantically good (correct) inter-model assertions across conceptual data modeling languages and automated validation thereof is now certainly a step closer to becoming a reality.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). W. Ng, V.C. Storey, and J. Trujillo (Eds.). Springer LNCS 8217, 313-326. 11-13 November, 2013, Hong Kong.

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). A. Cuzzocrea and S. Maabout (Eds.) September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[3] Fillottrani, P.R., Keet, C.M. Conceptual Model Interoperability: a Metamodel-driven Approach. 8th International Web Rule Symposium (RuleML’14), A. Bikakis et al. (Eds.). Springer LNCS 8620, 52-66. August 18-20, 2014, Prague, Czech Republic.

Book chapter on conceptual data modeling for biology published

Just a quick note that my book chapter on “Ontology-driven formal conceptual data modeling for biological data analysis” finally has been published in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data (edited by Mourad Elloumi and Albert Y. Zomaya). A summary of the chapter’s contents is described in an earlier blog post from little over two years ago, and I’ve put the preprint online.

The whole book is an impressive 1192 pages consisting of 48 chapters of about 25 pages each, which are grouped into three main sections. The first section, Biological data pre-processing, has four parts: biological data management, biological data modeling (which includes my chapter), biological feature extraction, and biological feature selection. The second section, biological data mining, has six parts: Regression Analysis of Biological Data, Biological Data Clustering, Biological Data Classification, Association Rules Learning from Biological Data, Text Mining and Application to Biological Data, and High-Performance Computing for Biological Data Mining. The third section, biological data post-processing, has only one part: biological knowledge integration and visualization. (check the detailed table of contents). Happy reading!

Notes on a successful ER 2013 conference

Unlike two other conferences earlier this year, the 32nd International Conference on Conceptual Modeling (ER’13) in Hong Kong, held 11-13 Nov, was a success: good presentations, inspiring discussions, new ideas, follow-ups, and an enjoyable crowd. As a bonus, the paper Pablo Fillottrani and I wrote on metamodelling [1] was nominated for best paper award. I’ve posted about our paper earlier, so I will highlight some of the other papers.

There were two sessions on ontology-driven conceptual modelling, of which one ran concurrent with the reasoning over conceptual data models. It was a tough choice, but in the end I attended both ontology-based conceptual modelling sessions. Skimming and reading through the three reasoning papers from John Mylopoulos and co-authors, they covered reasoning with decision-theoretic goals, reasoning with business plans, and the third was about automated reasoning for regulatory compliance, like in law and for answering questions such as ‘given situation S, what are alternative ways to comply with law L?’ [2]. Regarding the latter, there are models of the law represented in the Nomos 2 modelling language, which were formalized and sent to the automated reasoner, being the off-the-shelf Datalog-based reasoner DLV. It was demonstrated that it is actually feasible to do this, taking into account scalability. These are encouraging results for automated reasoning with such conceptual models.

The ontology-based modeling papers were varied. There were some fundamental results on a first extension of the UFO foundational ontology for conceptual data modeling of events [3], presented by Giancarlo Guizzardi, that has been used successfully in other projects, and our ontology-driven metamodelling, also using philosophy directly (notably, the positionalism of relations and quality properties) [1]. A ‘merger’ of ontology, information systems, and linked data was presented by Chiara Renso who talked about the Baquara ontology to help conceptual analysis of movement of people talking about some entity at a certain location [4], which won the best paper award. A use case of improving a conceptual data model using UFO was presented by Oscar Pastor [5], using an earlier developed conceptual model of the human genome. Not that I agree with Gene being a “collective”, but, overall, it gives a clear example how a model may be enhanced and indeed lays bare underlying assumptions and understanding that are missed in ‘plain’ conceptual modelling.

Besides ontology-driven conceptual modeling, there were four papers on fundamentals of conceptual modeling. One of the topics was about conceptual modeling and concepts [6], presented by Chris Partridge. To its credit, the paper refines some notions of concepts I wasn’t aware of, but I have grown a bit tired of the concept vs universal debate due to its intense discussions in ontology engineering (see links to debates and reference here). Roman Lukyanenko proposed a new way for conceptual modeling: instead of top-down, go bottom-up and gather the classes and attributes from the crowd using citizen science and free-form annotations without any form of curation [7]. It’s on the other end of the spectrum compared to standard conceptual data modeling, which is a bit too loose to my liking especially because of the lack of curation of proposed terms, but a hybrid certainly may be useful.  Not in this session, but somewhat related, was Tilmann Zäschke’s presentation about optimizing conceptual data models using the actual database [8]. They proposed a method and patterns for updating the conceptual data model based on usage of the database (including path navigation), using DBLP as a case study.

There were two sessions on business process modeling and two sessions on applications, one on network modeling, security, data semantics, and a demo session, several keynotes, workshops, and panels that partially overlapped with other sessions that I don’t have the time for writing up the notes here. I did go to the panel on “open models”, or: why is there open source software, but hardly any open source conceptual models? I plan to get back to this question in a later post.

The food was good, and so were the generous reception and social dinner (eating some sort of a sweet bean soup for desert was a bit curious, though), and it was great to meet again with people I’ve met before and to finally meet several people in person of whom I only had read and cited papers over the years, including Brian Henderson-Sellers, Veda Storey, Sudha Ram, Antoni Olivé, and Peter Chen. Even though ER’14 is in the USA next year (Atlanta), I may give it a try anyway.

References

(note: most of the links point to the version at Springer; search again later or ask the authors for a free copy. In fact, it smells as if this is due to a collaboration between Google Scholar and Springer: when I search for my own paper, whose CRC is online since the blog post about it in August, GS pretends it does not exist either, idem for Zäschke’s paper.)

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). 11-13 November, 2013, Hong Kong. Springer LNCS vol 8217, 313-326.

[2] Siena, A., Ingolfo, A, Perini, A, Susi, A, Mylopoulos, J. Automated reasoning for regulatory compliance. ER’13, Springer LNCS vol 8217, 47-60.

[3] Guizzardi, G., Wagner, G., de Almeida Falbo, R., Guizzardi, R.S.S., Almeida, J.P.A. Towards ontological foundations for the conceptual modeling of events. ER’13, Springer LNCS vol 8217, 327-341.

[4] Fileto, R., Kruger, M., Pelekis, N., Theodoridis, Y., Renso, C. Baquara: a holistic ontological framework for movement analysis using linked data. ER’13, Springer LNCS vol 8217, 342-355.

[5] Martinez Ferrandis, A.M., Pastor Lopez, O., Guizzardi, G.. Applying the principles of an ontology-based approach. ER’13, Springer LNCS vol 8217, 471-478.

[6] Partridge, C., Gonzalez-Perez, C., Henderson-Sellers, B. Are conceptual models concept models? ER’13, Springer LNCS vol 8217, 96-105.

[7] Lukyanenko, R. Parsons, J. Is traditional conceptual modeling becoming obsolete? ER’13, Springer LNCS vol 8217, 61-73.

[8] Zäschke, T., Leone, S., Gmunder, T., Norrie, M.C.. Optimizing conceptual data models through profiling in object databases. ER’13, Springer LNCS vol 8217, 284-297.

Mixed experiences with conferences and traveling

I just made it back to South Africa from the Knowledge Engineering and Ontology Development (KEOD’13) conference in Vilamoura, Portugal, and subsequent Model & Data Engineering (MEDI’13) conference in Amantea, Calabria, Italy. I had two papers at each conference (briefly described here, here, and here, in previous posts) and I suppose I should count myself lucky to have (barely) sufficient research funds to make such a trip.

If I had the choice again but with the foreknowledge of what was in store, I’d skip KEOD’13, regardless of the fact that the Portuguese airline TAP made a bit of a mess of my travel and, jointly with Alitalia, ‘lost’ my baggage for a while (and being unresponsive upon inquiry, and still are). Analysing the twice re-labeled baggage tags once it arrived two days later, though, it appeared to be Alitalia who had let me wait one day more, who did not update on the status of the luggage either, and their lost & found at Lamezia Terme airport could not be bothered to bring it to the hotel, though they ought to have done so. Meanwhile, I’d done a bit of guilt-free shopping in Amantea so as to wear something else than the same clothes I traveled in; and the clothes even fit!

For the remainder of the post: if you want to read about me complaining about a conference, then simply read on, if you want to read a more typical conference report, then scroll to the MEDI’13 section, further below.

KEOD’13

KEOD appeared to be a lucrative business for the organizers rather than a full-fledged conference. It started with an increase in the registration fees after submission, amounting to 525 euro without lunch (another 70 euro) or social dinner (another 75 euro), and another 290 euro for any additional paper. But we were already sucked into it and didn’t want to go through the whole process of resubmitting one of the papers elsewhere, so I ended up forking out 815 euro just for the registration (with a bad Rand->Euro exchange rate to boot). The internet connection was close to zero bits per hour, so that wasn’t taking up the lion’s share of the conference registration either. The ‘welcome reception’—included in the price—consisted of two jugs of ice tea, two jugs of juice, and three bottles of water, with a few snacks—even the alcohol-free welcome receptions in the USA do better than that. Now, I know that top conferences such as KCAP, CIKM and cs. do tend to be pricy, but KEOD is not a top conference, despite the claimed statistics in the proceedings that there was an acceptance rate for the joint IC3K conferences for long papers of 10% and, including regular papers, about 35% overall (the paper with my MSc student, Zubeida Khan [1] was a full/long paper of 12 pages double-column and the other one [2] a regular 8 pages).

The titles of the papers certainly sounded interesting, but most of the presentations did not live up to the promise, multiple sessions had at least one paper where the author had not shown up, and there were only about 20-25 KEOD attendees, which is a generous rough headcount of the attendees in the plenary sessions (quite a few who did show up did not stay for the whole conference). Or: at best, it was more like an expensive and not well-organised workshop-level event. The best paper award went to a groupie of one of the two organizers.

Of course, it’s your call to submit for next year’s installment or not, and my opinion is just that. When inquiring with a few people who had attended a previous installment, the papers were then “a bit of a mixed bag”, so maybe only this year was a temporary dip, not indicative of a general trend. Regardless the relatively average low quality, it’s still too expensive.

On the bright side, Tahir Khan, with whom I have a CIKM’13 paper jointly with Chiara Ghidini [3] (topic of a next post), was attending as well, and the three of us (Zubeida, Tahir, and me) have set out some nice tasks for research and a new prospective paper (assuming we’ll get interesting results, that is).

MEDI’13

Regarding MEDI’13, organized by Alfredo Cuzzocrea and Sofian Maabout, there was a last-minute wobble that got ironed out, and the local organization, the quality of the papers and presentations, and the atmosphere at the conference was substantially better. It was a really enjoyable event.

The conferences organized in Italy that I have attended over the years (among others, AI*IA’07, AI*IA’09, and DL’07) were always good with the food and drinks, and MEDI was no exception and they even had additional entertainment with folkloristic music and dance at the welcome reception and (huge) social dinner. And now I know how tartufo is really supposed to taste (sorry Bolzanini, but none of the restaurants and gelateria up there come even close).

So, let me mention some of the presentations and papers that piqued my interest—what I normally write about when writing a blog post about a conference.

The first keynote was given by Gottfried Vossen from the University of Münster on “Model engineering as a social activity: the case of business processes”, which focused on the role of models in computer science, the move from modeling to model engineering (a more structured and rigorous activity than just modelling), the Horus method from modelling, and closing with ideas about inserting crowdsourcing into the process. A side-note in the presentation, but worthwhile linking here anyway, is that there’s now even also algorithm engineering. The second keynote was given by Dimitrios Gunopulos from the University of Athens about “Analyzing massive streaming heterogeneous data: towards a new computing model for computational sustainability”. This talk was about document retrieval, e.g., newspaper articles, after first having established a baseline of the occurrences of certain keywords over time, and then detecting ‘bursts’ of keyword incidences in certain smaller intervals, whose documents then get ranked higher in the retrieval of relevant documents.

What I’ll certainly read in more detail over the upcoming days is the persistent meta-model system paper [4] that enhances the OntoDB/OntoQL system, as our papers were about metamodels [5] and the model repository ROMULUS [6]. Likewise, I’m biased toward reading the paper presented by Zdenek Rybola [7], for it deals with ontology-driven conceptual data modeling with UML using OntoUML, aiming to port the ontology-driven conceptual model toward improvements in implementations. Further, [8] proposes to use Time Petri Nets to verify temporal coherence in a SMIL (synchronized multimedia integration language, from the W3C) multimedia document presentation through the transformation of the SMIL document into TPN.

I was not the only one traveling to MEDI’13 all the way from South Africa: Kobamelo Moremedi, a MSc student at UNISA, presented his work about various possible diagrammatic notations for Z [9]. Staying with the topic of South Africa, and highlighting some more immediate possible relevance and link to societal use of material presented at the conference (cf. talking about foundational ontologies, ETL, architectural documentation, database integrity and the like, where the chain toward practical relevance is a bit longer): Andrea Nucita presented a Markov chain-based model for individual patients and agent-based model for interaction among individuals so as to predict HIV/AIDS epidemiological trends and simulate the various epidemiological scenarios [10]. They used actual clinical data to test their model and it showed that, as one would expect, expanding access to testing and therapy will influence the evolution of the epidemiology toward limiting the spread of HIV/AIDS, therewith also corroborating other works on the same topic.

In closing

Overall, the experience was quite mixed in the relatively long trip. I haven’t written negatively about a conference before—if it wasn’t great, then I didn’t write about it, although not having written about an event does not imply it wasn’t great (e.g., KCAP’13 got cancelled)—but KEOD really was a disappointment on multiple aspects. I don’t know whether it’s in the same league as WORLDCOMP, IASTED, IARIA and the like, as I’ve never attended those, but going by reviews of those events, if the KEOD organisers don’t get their act together and improve upon it, they will be down there in the same league. As for MEDI, if I have enough research money and suitable material, I may submit again for next year’s installment in Cyprus or thereafter, as the quality, relevance, and experience was better than I thought it would be.

References

[1] Khan, Z.C., Keet, C.M. Addressing issues in foundational ontology mediation. Fifth International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

[2] Keet, C.M., Suárez Figueroa, M.C., and Poveda-Villalón, M. (2013) The current landscape of pitfalls in ontologies. International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

[3] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. ACM International Conference on Information and Knowledge Management (CIKM’13), San Francisco, USA, Oct 27-Nov 1, 2013.

[4] Youness Bazhar, Yassine Ouhammou, Yamine Aït-Ameur, Emmanuel Grolleau, and Stéphane Jean. Persistent Meta-Modeling Systems as Heterogeneous Model Repositories. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 25-37.

[5] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[6] Khan, Z.C., Keet, C.M. The foundational ontology library ROMULUS. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 200-211.

[7] Robert Pergl, Tiago Prince Sales, and Zdenek Rybola. Towards OntoUML for Software Engineering: From Domain Ontology to Implementation Model. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 249-263.

[8] Abdelghani Ghomari, Naceur Belheziel, Fatma-Zohra Mekahlia, and Chabane Djeraba. Towards a Formal Approach for Verifying Temporal Coherence in a SMIL Document Presentation. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 132–146.

[9] Kobamelo Moremedi and John Andrew van der Poll. Transforming Formal Specification Constructs into Diagrammatic Notations. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 212–224.

[10] Andrea Nucita, Giuseppe M. Bernava, Pietro Giglio, Marco Peroni, Michelangelo Bartolo, Stefano Orlando, Maria Cristina Marazzi, and Leonardo Palombi. A Markov Chain Based Model to Predict HIV/AIDS Epidemiological Trends. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 225-236.

Towards a metamodel for conceptual data modeling languages

There are several conceptual data modelling languages one can use to develop a conceptual data model that should capture the subject domain of the application area in an implementation-independent way. Complex software development may need to leverage the strengths of each language yet have the need for interoperability between the software components; e.g., an application layer object-oriented software design in a UML Class diagram that needs to be able to talk to the EER diagram for a relational database. Or one is at a state where there are already several conceptual data models for different applications, but they need to be integrated (or at least made compatible). For various reasons, each of these models may well be represented in a different language, such as in UML, EER, ORM, MADS, Telos etc. Superficially, these languages all seem quite similar, even though they are known to be distinct in ‘a few’ features, such as that UML Class Diagrams typically have methods, but EER does not.

To adequately deal with such scenarios, we need not a comparison of language features, but a unification to foster interoperability. However, no unifying framework exists that respects all of their language features. In addition, one may wonder about questions such as: where are the real commonalities ontologically, what is fundamentally different, and what is the same in underlying idea or meaning but only looks different on the surface? We—Pablo Fillottrani (with the Universidad Nacional Del Sur in Argentina) and I—aim to fill this gap.

As a first step, we designed a common, ontology-driven, metamodel[1] of the static, structural, components of ER, EER, UML v2.4.1, ORM, and ORM2, in such a way that each language is strictly a fragment of the encompassing metamodel. In the meantime, we also have developed the metamodel for the constraints, but for now, the results of the metamodel for the static, structural, components have been accepted at the 32nd International Conference on Conceptual Modeling (ER’13) [1] and 3rd International Conference on Model & Data Engineering (MEDI’13) [2]. There is no repetition among the papers; it merely has been split up into two papers because of page limitations of conference proceedings and the amount of results we had.

The ER’13 paper [1] presents an overview on the core entities and constraints, and an analysis on roles and relationships, their interaction with predicates, and attributes and value types (which we refine with the notion of dimensional attribute). The MEDI’13 paper [2] focuses more on all the structural components, and covers a discussion on classes/concepts, subsumption, aggregation, and nested entity types.

(Warning: spoiler alert…) Perhaps surprisingly, the intersection of all the features in the selected languages is rather small: role, relationship (including subsumption), and object type. The attributions—attributes, value types—are represented differently, but they aim to represent the same underlying idea of attributive properties, and several implicit aspects, such as dimensional attribute and its reusability and relationship versus predicate, have been made explicit. Regarding constraints, only disjointness, completeness, mandatory, object cardinality, and the subset constraint appear in the three language families. The two overview figures in the paper have the classes colour-coded to give an easier overview on how many of the elements are shared across languages, and the appendix contains a table/list of terminology across UML, EER and ORM2, like that UML’s “association” and EER’s “relationship” denote the same kind of thing.

This also received attention in the UKZN e-news letter here, which combined the announcement of the ER’13 paper with my participation in the Dagstuhl seminar on Reasoning over Conceptual Schemas last May, the DST/NRF funded South Africa – Argentina bi-lateral project on the unification of conceptual modelling languages, and Pablo’s visit to UKZN the previous two weeks.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). 11-13 November, 2013, Hong Kong. Springer LNCS (in print).

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS (in print).


[1] the term “metamodel” is used here in the common usage of the term in the literature, but, more precisely, we have a conceptual model that represents the entities and the constraints on their usage in a conceptual data model; e.g., it states that each relationship (association) is composed of at least two roles (association ends), that a nested entity type objectifies one relationship, that a multi-valued attribute is an attribute, and so on.