# First tractable encoding of ORM conceptual data models

For (relatively) many years I’ve been focusing on as-expressive-as-possible languages to represent information and knowledge, including the computationally impractical full first order logic, because one would/should want to be as precise as possible and required to represent the subject domain in an ontology and universe of discourse for the application in a conceptual data model. After all, one can always throw out the computationally unpleasant constructs later during the implementation stage, if the ontology or conceptual data model is intended for use at runtime, such as OBDA [1], test data generate for verification [2], and in the query compilation stage in RDBMSs [3]. The resulting slimmed theories/models may be different for different applications, but then at least the set of slimmed theories/models share their common understanding.

So, now I ventured in that area, not because there’s some logic x and conceptual modeling language y has to be forced into it, but it actually appears that many fancy construct/features are not used in publicly available conceptual data models anyway (see data set and xls with some analysis). The timing of the outcome of the analysis of the data set coincided with David Toman’s visit to UCT as part of his sabbatical and Pablo Fillottrani’s visit, who enjoyed the last exchange of our bi-lateral project on the unification of conceptual data modelling languages (project page). To sum up the issue we were looking at: the need for run-time usage of conceptual data models requires a tractable logic-based reconstruction of the conceptual models (i.e., in at most PTIME), which appeared to hardly exist or miss constructs important for conceptual models (regardless whether that was ORM, EER or UML Class Diagrams), or both.

The solution ended up to be a logic-based reconstruction for most of ORM2 using the $\mathcal{CFDI}_{nc}^{\forall -}$ Description Logic, which also happens to be the first tractable encoding of (most of) ORM/ORM2. With this logic, several features important for conceptual models (i.e., occur relatively often) do have their proper encoding in the logic, notably n-aries, complex identification constraints, and n-ary role subsumption. The, admittedly quite tedious, mapping

Low resolution and small version of our DL15 poster summarising the contributions.

captures over 96% of the constructs used in practice in the set of 33 ORM diagrams we analysed (see data set). Further, the results are easily transferable to EER and UML Class diagrams, with an even greater coverage. The results (and comparison with related works) are presented in our recently accepted paper at the 28th International Workshop on Description Logics (DL’15) that will take place form 7 to 11 June in Athens, Greece.

The list of accepted papers of DL’15 is available, listing 21 papers with long presentations, 16 papers with short presentation, and 26 papers with poster presentations. David will present our results in the poster session, as it’s probably of more relevance in the conceptual modelling community (and I’ll be marking exams then), and some other accepted papers cover more new ground, such as casting schema.org as a description logic, temporal query answering in EL, exact learning of ontologies, and more. The proceedings is will be online on CEUR-WS in the upcoming days as volume 1350. I’ve added a mini version of our poster on the right. I tried tikzposter, as they look really cool, but it doesn’t support figures (other than those made in latex), so I resorted to ppt (that doesn’t support math), wondering why these issues haven’t been solved by now.

References

[1] Calvanese, D., Keet, C.M., Nutt, W., Rodriguez-Muro, M., Stefanoni, G. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. ACM Symposium on Applied Computing (ACM SAC’10), March 22-26 2010, Sierre, Switzerland. pp 1389-1396.

[2] Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compilation. Synthesis Lectures on Data Management, Morgan & Claypool  Publishers (2011)

[3] Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking and test data generation from modeling diagrams. Automation in Software Engineering 16, 73–99 (2009)

[4] Fillottrani, P.R., Keet, C.M., Toman, D. Polynomial encoding of ORM conceptual models in $\mathcal{CFDI}_{nc}^{\forall -}$. 28th International Workshop on Description Logics (DL’15). CEUR-WS vol xx., 7-10 June 2015, Athens, Greece.

# Three CS problem-solving strategies exercise sets

The preparations for the ACM ICPC World Finals 2015 in Morocco are in full swing, and so is the training for by with the 125 or so 3-person teams selected of over 30000 contestants who participated in the various selection rounds across the world. While a lot comes down to “practice, practice, practice” all the hard problems you can find, and learning more algorithms and maths, there’s also work to do on refining one’s team strategy and problem solving skills. For the latter, Steven and Felix Halim’s Competitive Programming book comes in handy (as coach at least). Instead of going back-to-back through the book and do the problems listed in the designated problems solving paradigm categories, I turned it around, and selected problems of which the students had to figure out which problem solving paradigm was the right one (Greedy, DP, ad hoc, etc., and more detailed, such as geometry, shortest path, etc.). I’ve made three sets of increasing difficulty:

1. easy’, where most of them can be solved with one problem solving paradigm (also useful for those who are preparing for the Standard Bank IT Challenge ‘heats’ on May 16);
2. medium’, where most of them can be solved with two problem solving paradigms combined;
3. hard’, where most of the problems have appeared in a World Final.

The (direction of the) solutions are on the last page of each pdf, including the UVa Online Judge number, so you can implement and test your solution as well.

I did change some of the problem descriptions for various reasons:

• Localization of the problem description to South Africa: among others, ‘Durban prawns’ (big fat cockroaches are endemic there), ‘nuts for nuts’ (we do have squirrels on campus), ‘shopping for operas’ (Amazon gave up delivering in SA, because so much packets were lost), and some characters got different names.
• Changed story line: ‘charming canines’ (disagreeable storyline in original), and some now have female characters (cf. mostly male or none).
• To make the title an alliteration, like the other titles in the problem set: ‘colliding catamarans’ and ‘cult caps’.

Happy solving :)

# Journal paper on Data Mining OPtimization Ontology

I’ve been writing here about bits and pieces of the Data Mining OPtimization ontology (DMOP) before (modeling issues, reasoner performance), but there never was something about the whole setting. I’m happy to say that now there is, for the Semantic Web Journal paper about DMOP is in print now and its in-press version is online, waiting in the queue to be assigned a volume [1]. The ontology itself (v5.4) is freely accessible and downloadable in several formats from the dmo foundry.

The paper can be considered the new so-called ‘reference paper’ of the ontology: it describes the rationale, the non-trivial design choices, content, and its use. The abstract sums it up nicely:

The Data Mining OPtimization Ontology (DMOP) has been developed to support informed decision-making at various choice points of the data mining process. The ontology can be used by data miners and deployed in ontology-driven information systems. The primary purpose for which DMOP has been developed is the automation of algorithm and model selection through semantic meta-mining that makes use of an ontology-based meta-analysis of complete data mining processes in view of extracting patterns associated with mining performance.

To this end, DMOP contains detailed descriptions of data mining tasks (e.g., learning, feature selection), data, algorithms, hypotheses such as mined models or patterns, and workflows. A development methodology was used for DMOP, including items such as competency questions and foundational ontology reuse. Several non-trivial modeling problems were encountered and due to the complexity of the data mining details, the ontology requires the use of the OWL 2 DL profile.

DMOP was successfully evaluated for semantic meta-mining and used in constructing the Intelligent Discovery Assistant, deployed at the popular data mining environment RapidMiner.

As two more teasers to lift the veil a bit, the architecture of various related components is shown in the first figure below, and how it is integrated in the RapidMiner Intellgent Discovery Assistant is shown in the second figure.

(source: [1])

(source: [1])

Unfortunately, the paper is behind Elsevier’s paywall, but we’re free to distribute to individuals. (If only the copyright stuff question from Elsevier would have come some 1.5 month later, this would not have been the case—things have improved and there are addenda and whatnot so that apparently it could have been put in an institutional repository. But, alas, better next time.) More precisely, the ‘we’ are Agniezska Lawrynowicz (shared first author), Claudia d’Amato, Alexandros Kalousis, Phong Nguyen, Raul Palma, Robert Stevens, and Melanie Hilario, and I.

If you use DMOP, experiment with it, or would like to contribute to its further development, please let us know.

References

[1] Keet, C.M., Lawrynowicz, A., d’Amato, C., Kalousis, A., Nguyen, P., Palma, R., Stevens, R., Hilario, M. The Data Mining OPtimization ontology. Web Semantics: Science, Services and Agents on the World Wide Web. in press. http://dx.doi.org/10.1016/j.websem.2015.01.001

# Forum for AI Research 2015, Cape Town

In 10 day’s time, the (CAIR-driven) Forum for Artificial Intelligence Research 2015 (FAIR’15) Workshop will be held at UCT in Cape Town, South Africa, from March 30 to April 2. There are still some spaces available; registration is free, but please register (for catering purposes). What will you get for this ‘bargain price’? A lot of food for the mind!

FAIR’15 follows the same format as the previous 7 editions that went under various acronyms since 2008 (among others, MOWS, MOSS, MAIS, FAIR), with a mini-course, a tutorial, and postgraduate student presentations. This edition has the following on offer.

Ulrike Sattler (University of Manchester, UK) will present a mini-course on automated reasoners in the mornings. She will go into the details of what really happens when you click that menu option “start reasoner” and Protégé’s “?” that explains the deductions, and what are the factors that influence the reasoner’s performance.

David Toman (University of Waterloo, Canada) will present a 2-hour tutorial on using knowledge representation and reasoning (logic) for query optimization in relational databases and ontology-based data access (i.e., advanced aspects of database systems implementation).

Further, there are several sessions with postgraduate student presentations. Among others, Catherine Chavula will talk about new results (cf. [1]) in multilingual ontologies, Zubeida Khan will talk about foundational ontology interchangeability (details in [2]), and (very recently MSc cum laude graduated!) Nasubo Ongoma will present her thesis on logic-based temporal conceptual data modeling (including material from [3]). Gavin Rens will talk about probabilistic belief change, Kody Moodley on defeasible reasoning for description logics, Henriette Harmse about scenario testing with OWL, and Nishal Morar on taxonomic classification.

Aurona Gerber will give an overview of Data Science at CSIR, and for some more variety in the programme, I’ll talk about the stuff ontology [4]. Check the programme for all titles of the presentations and the abstracts of the mini-course and tutorial.

An important aim of FAIR is the networking among people in Southern Africa, and share and discuss informally our research in (predominantly) KR&R and related areas—so if the above topics sound interesting, or made you curious, or you would like to meet a potential MSc/PhD supervisor, you’re welcome to join (note: some basic knowledge of logics will be needed to understand the talks, though). If you have any questions, please don’t hesitate to contact one of the organisers, Arina Britz and me.

References

[1] Chavula, C., Keet, C.M. Is Lemon Sufficient for Building Multilingual Ontologies for Bantu Languages? 11th OWL: Experiences and Directions Workshop (OWLED’14). Keet, C.M., Tamma, V. (Eds.). Riva del Garda, Italy, Oct 17-18, 2014. CEUR-WS vol. 1265, 61-72.

[2] Khan, Z.C., Keet, C.M. Feasibility of automated foundational ontology interchangeability. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI 8876, 225-237.

[3] Keet, C.M., Ongoma, E.A.N. Temporal Attributes: their Status and Subsumption. Asia-Pacific Conference on Conceptual Modelling (APCCM’15). Koehler, H., Saeki, M. (Eds.), Conferences in Research and Practice in Information Technology (CRPIT), Vol. 165. 27-30 January, 2015, Sydney, Australia.

[4] Keet, C.M. A core ontology of macroscopic stuff. 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). K. Janowicz et al. (Eds.). 24-28 Nov, 2014, Linkoping, Sweden. Springer LNAI vol. 8876, 209-224.

# Dancing Algorithms

Yes, it appears that the two can go together. Not in that the algorithms are dancing, but one can do a dance with a choreography such that it demonstrates an algorithm. Zoltan Katai and Laszlo Toth from Romania came up with the idea of this intercultural computer science education [1], with a theoretical motivation traced all the way back to Montessori. It has nothing to do with the scope of my earlier post on folk dancing and cultural heritage preservation, yet at the same time, it contributes to it: watching the videos of the dances immerses you in the folk music, rhythm, the traditional clothes of the region, and some typical steps and and movements used in their dances.

The context, in short: learning to program is not easy for most students—as our almost 900 first-year students are starting to experience from next week onwards—and especially understanding the workings of algorithms. Katai and Toth’s approach is to involve ‘playing out’ the algorithm with people, not by clumsily walking around, but using folk dance and music to make students understand and remember it more easily. They took several sorting algorithms to demonstrate the idea, and tested it on their students, demonstrating that it improved understanding significantly [1].

Perhaps because of my bias toward the dancing, I didn’t take note of the algorithm being danced-out when I watched it the first time, or perhaps it is useful to have read the core steps of the algorithm before watching anyway. You choose: watch the video of the selection sort algorithm—given a list, repeatedly select the smallest remaining element and move that to the ‘sorted’ section of the list—with a Gypsy (Roma) folk dance, or read further below what selection sort is. (note: the video goes on double speed in the middle, for it gets a bit repetitive.)

So, what was happening in the dance? We have one ‘comparer’ (x, for short) and one ‘compared with’ (y). The left-most dancer, with number 3 (our first value of x), starts to dance to the front, calls on the second one in line, being the guy with 0 (our first y), he swirls her back in the line in the spot he came from and stays at the front (0 being the new value of x), and calls on the next, the lady with the 1 (the new value of y), who gets back in the line; and so on till the last one (with number 6). Dancer 0 does a solo act and goes to the first spot: he’s now the first one in the ‘sorted’ part, and we have completed one iteration. Starting with the second main iteration: now number 3 is again at the front of the unsorted, and she dances again to the front (so the value of our x is 3 at this point), calling the second one in the unsorted list, who has number 1, so the lady with number 3 goes back in the unsorted again, and the dancer with 1 continues through the remainder of the list, has her solo, and joins the guy with the 0 in the sorted part, having completed the second main iteration. And so on until about 6:20 minutes into the video clip when the list is sorted and the dancers do a little closing act.

A bit more structured, the following is happening in the choreography of the dance in the video:

1. Divide the list into a ‘sorted’ part (initially empty) and an ‘unsorted’ part (initially the list you want to sort)

2. Do for as long as there’s more than one item in the ‘unsorted’ (find the smallest item in that list):

1. select first element of the unsorted list (with some value, that we refer to with x)

2. if we’re not at the end of the ‘unsorted’ list, then get the next element of the ‘unsorted’ list (with some value, let’s call that one y)

1. if x < y, then y is put back in the same spot in the unsorted list, and we return to the start of step (b) to get the next item to test x against (being the one after the one we just tested)

2. else (i.e., x > y), then the value of x takes y‘s spot in the unsorted list, we assign the value of y to x, and we return to the start of step (b) to get the next element from the unsorted list

3. else (i.e., we’re at the end of the list and thus x is the lowest value), place (the value of) x in the next available spot in the ‘sorted’ part. Then go back to the start of step 2.

3. Place the last item from ‘unsorted’ at the end of the ‘sorted’.

4. Done (i.e., there’s nothing more in ‘unsorted’ to sort)

The algorithm itself is less cumbersome by not having those “let’s pick out one and come to the front” steps, but direct comparisons. I did not plan to include here, but do after all for it makes a nice sequence of artsy → informal analysis → semi-precise structure → specification the computer can work with. (Never mind that that was not the order things came about). Using our CSC1015F course material (2012 samples) that teaches Python, one of the possible sample code snippets is as follows:

def selection_sort ( values ):
"""Sort values using selection sort algorithm."""
# iterate over outer positions in list
for outer in range (len (values)):
# assume first value is minimum
minimum = outer
# compare minimum to rest of list and update
for inner in range (outer+1, len (values)):
if values[inner] < values[minimum]:
minimum = inner
# swap minimum with outer position
temp = values[minimum]
values[minimum] = values[outer]
values[outer] = temp
return values

This is not the only way of achieving it, btw, and the 1017F lecture and lab lecture files have another version of achieving the same (I just thought that this one may be more readable by non-CS readers of the post).

Other danced sorting algorithms are insertion sort (video) with Romanian dance, and shell-sort (video) and bubble sort (video) on Hungarian dances, and more information is also available from the Algo-rythmics website. This isn’t new, and dances on many other tunes can be viewed on youtube, e.g., various bubble-sort dances. Reconstructing the bubble-sort algorithm from the dance below (5min, no fastforward) is an exercise left to the reader… likewise making dances on other algorithms (the ICPC’14 solution to the baggage problem seems like a fun candidate line dance-like), and the same dances but with other folk dances and music.

References

[1] Zoltan Katai and Laszlo Toth. Technologically and artistically enhanced multi-sensory computer programming education. Teaching and Teacher Education, 2010, 26(2): 244-251.

# On the need for bottom-up language-specific terminology development

Peoples of several languages intellectualise their vocabulary so as to maintain their own language as medium of instruction (or: LoLT, language of teaching and learning), to conduct scientific discussions among peers and, in some cases, still, publish research in their own language. Some languages I know of who do this are French, Spanish, German, and Italian; e.g., the English ‘set’ is conjunto (Sp.) and insieme (It.), and the Dutch for ‘garbage collection’ (in computing) is geheugensanering. I found out the hard way last month that my Italian scientific vocabulary was better than my Dutch one, never really having practiced the latter in my field of specialisation and I noticed that over the years that I have been globetrotting, quite a few Anglicisms in Dutch had been replaced with Dutch words and some were there for a while already (as excuse: I studied a different discipline in the Netherlands). How do these new words come about? There are many ways of word creation, and then it depends on the country or language region how it gets incorporated in the language. For instance, French uses a top-down approach with the Académie Française and Spain has the Real Academia Española. The Netherlands has De Nederlandse Taalunie that isn’t as autocratic, it seems; for instance, to follow suit with the French mot dièse for the twitter ‘hashtag’, there was some consultation and online voting (sound file) to come up with an agreeable Dutch term for hashtag. But how does that happen elsewhere?

We found out that there is a mode of practice for language-specific terminology development that happens in small ‘workshops’ of some 13-15 people, constituting mainly of terminologists and linguists, and 1-3 subject matter experts. There may be a consultative event with stakeholders, who are not necessarily with subject matter experts. Shocking. The sheer arrogance of the former, who ‘magically’ grasp the concepts that typically take a while to understand when it comes to science, but they supposedly nevertheless understand it well enough to come up with a meaningful local-language word. But maybe, you say, I’m too arrogant in thinking subject matter experts, such as myself, can come up with decent local-language terms. Maybe that’s partially true, but what may be more problematic, is that only a few subject matter experts are involved, so there is an over-reliance on those mere few. Maybe, you say, that’s not a problem. We put that to the test for a computing and computer literacy terminology development for isiZulu, and found out it was: it depends on who you ask what comes out of the term harvesting and term preference. And then asking just a few people is a problem for a term’s uptake. (The students involved in the experiments did not even know there was a computer literacy term list from the South African Department of Arts and Culture, published in 2005, and boo-ed away several of the terms.)

The way we tested it, was with three experiments. The first experiment was an experts-only workshop, with ‘experts’ being 4th-year computer science students who have isiZulu as home language, as there were no isiZulu-speaking MSc and PhD students, nor colleagues, in CS at the University of KwaZulu-Natal, where we did the experiment. The second experiment was an isiZulu-localised survey among undergraduate CS students to collect terms, where we hoped to see a difference between a survey where they were given the entity with an English name and the entity as a picture. The third experiment was a survey where computer literacy students (1st-year science students) could vote for terms for which there was more than one isiZulu term proposed. The details of the set-up and the results have been published recently in the Alternation open-access journal article “Limitations of Regular Terminology Development Practices: The Case of isiZulu Computing Terminology”, in the special issue on “Re-envisioning African Higher Education: Alternative Paradigms, Emerging Trends and New Directions”, edited by Rubby Dhunpath, Nyna Amin and Thabo Msibi. It describes which isiZulu terms from where are affected, ranging from a higher incidence of ‘zulufying’ English terms in aforementioned list by the South African Department of Arts and Culture cf. the proposals by the experiments’ participants, and, e.g., expert consensus for inqolobane for database, versus a preference for imininingo egciniwe by the computer literacy students (see paper for more cases). Further, when all respondents across the survey are aggregated and go for majority voting, the proposed terms by the experts are snowed under. The latter is particularly troublesome in a country where computing is a designated critical skill (or: there aren’t nearly enough of them).

A byproduct of the experiments was that we have collected the, to date, longest list of isiZulu computing terms, which have gone through a standardisation process in the meantime. The latter is mainly thanks to the tireless efforts of Khumbulani Mngadi of the ULPDO of UKZN, and the two expert CS honours students who volunteered in the process, Sibonelo Dlamini and Tanita Singano.

Our approach was already less exclusionary cf. the aforementioned traditional/standard way, but it also shows that broader participation is needed both to collect and to choose terms; or, in the words of the special issue editors [2]: a “democratization of the terminology development process” that “transcends the insularity and purism which characterises traditional laboratory approaches to development”. We are still working on-and-off to achieve this with crowdsourcing, and maybe we should start thinking of crowdfunding that crowdsourcing effort to speed up the whole thing and complete the commuterm project.

As a last note: in case you are interested in other contributions to “re-envisioning African higher education”: scan through the main page online, read the editorial [2] for main outcomes of each of the papers, and/or read the papers, on topics as diverse as postgrad supervision in isiZulu, teaching sexual and gender diversity to pre-service teachers, maths education, IKS in HE, and much more.

References

[1] Keet, C.M., Barbour, G. Limitations of Regular Terminology Development practices: the case of the isiZulu Computing Terminology. Alternation, 2014, 12: 13-48.

[2] Dhunpath, R., Amin, N. Msibi, T. Editorial: Re-envisioning African and Higher Education: Alternative Paradigms, Emerging Trends and New Directions. Alternation, 2014, 12: 1-12.

# Updated ontology engineering lecture notes (2015)

It’s that time of the year again, in the southern hemisphere that is, where course preparations for the academic year are going on full steam ahead. Also this year, I’ll be teaching a CS honours course on ontology engineering. To that end, the lecture notes have been updated, though not in a major way like last year. Some sections have been shuffled around, there are a few new exercises, Chris’s update suggestion from last year on the OBO-OWL mapping has been included, and a couple of typos and odd sentences have been fixed.

Practically, this installment will be a bit different from previous years, as it has integrated a small project on Semantic Wikis, funded by CILT and OpenUCT. Set up, maintenance, and filling it with contents on ontology engineering topics will initially be done ‘in house’ by students enrolled in the course and not be generally available on the Web, but if all goes well, it’ll be accessible to everyone some time in April this year, and possibly included in the OER Commons.

Semantic MediaWiki’s features are fairly basic and there are a bunch of plugins and extensions I’ve seen listed, but I didn’t check whether they all worked with the latest SMW. If you have a particular suggestion, please leave a comment or send me an email. One thing I’m still wondering about particularly, but haven’t found a solution to, is whether there’s a plugin that lets you see the (lightweight) ontology when adding contents, so that it makes it easier to use terms in the text from the ontology’s vocabulary rather than find an having to process manually whatever (near)synonyms have been used throughout the pages (like, one contributor using ‘upper ontology’, another ‘foundational ontology’ and the third ‘top-level ontology’), and allow on-the-fly extensions of that ontology.