Commuterm presented at the Teaching and Learning in Higher Education Conference 2013

While I was attending MEDI’13 in Calabria, my colleague Graham Barbour, supported by Rosanne Els, presented our first results of the commuterm project on crowdsourcing an isiZulu scientific terminology at the Teaching and Learning in Higher Education Conference (TLHEC’13), held from 25 to 27 September in Pinetown, South Africa. In contrast to the presentation in the same session as ours, we did show that it is possible to come up with isiZulu terms in the sciences, even for the generally already artificially created terms in the still fairly recent discipline of computer science.

Our first results are of a rather foundational nature. The presentation started off with the general setting for the need of rapid terminology development and a literature review on terminology development, isiZulu terminology, and the context of our proposed solution—crowdsourcing—and the general landscape thereof.

Subsequently, we briefly described our results of the baseline experiment against which we will evaluate the effectiveness of using crowdsourcing as a method for rapid terminology development: the time-consuming typical workshop-approach. 15 students participated in the 2-hour workshop, which resulted in a list of 37 terms in programming and networking, which included terms such as ‘algorithm’ (indlela yokwenza), ‘overriding’ (ukushintsha ufuzo), and ‘formal parameter list’ (amalungu ohlelo ahlelekile), which are well beyond the few computer literacy terms in the 2005 list constructed under auspices of the Department of Arts and Culture (DAC) of South Africa. Interestingly, the new terms are typically semantic translations, not simply a ‘zulufication’ of English terms.

Group photo of the participants and RAs of the Workshop session on Sept 4, 2013

Group photo of the participants and RAs of the Workshop session on Sept 4, 2013

A more detailed analysis of the new terms is still to be carried out. Some noteworthy initial observations are that there’s an overlap of only 5 terms with the DAC list, and even then they do not match with the terms proposed in the workshop. Time will tell which term is better (vote for it in the game!). An interesting case is, for instance,

‘database’, for which inqolobane was proposed in our workshop experiment, whereas the DAC lists ulwazi olugciniwe, ulwazi olulondoloziwe, and imininingo egciniwe as isiZulu equivalents. The first two terms from the DAC list cannot be right from a computing viewpoint, however, because ulwazi means ‘knowledge’, and database and knowledge base are two different things. The end-user may not care about that distinction, but we do.

In the last part of the presentation, Graham demo-ed the tool. At the time, it was in alpha stage, but at least a general idea could be conveyed on what the game is about. The slides of the presentation have some more information and, for the impossibility of demo-in-a-presentation-document, it contains a few screenshots of the tool. We’ll go live after the user testing—stay tuned!

Mixed experiences with conferences and traveling

I just made it back to South Africa from the Knowledge Engineering and Ontology Development (KEOD’13) conference in Vilamoura, Portugal, and subsequent Model & Data Engineering (MEDI’13) conference in Amantea, Calabria, Italy. I had two papers at each conference (briefly described here, here, and here, in previous posts) and I suppose I should count myself lucky to have (barely) sufficient research funds to make such a trip.

If I had the choice again but with the foreknowledge of what was in store, I’d skip KEOD’13, regardless of the fact that the Portuguese airline TAP made a bit of a mess of my travel and, jointly with Alitalia, ‘lost’ my baggage for a while (and being unresponsive upon inquiry, and still are). Analysing the twice re-labeled baggage tags once it arrived two days later, though, it appeared to be Alitalia who had let me wait one day more, who did not update on the status of the luggage either, and their lost & found at Lamezia Terme airport could not be bothered to bring it to the hotel, though they ought to have done so. Meanwhile, I’d done a bit of guilt-free shopping in Amantea so as to wear something else than the same clothes I traveled in; and the clothes even fit!

For the remainder of the post: if you want to read about me complaining about a conference, then simply read on, if you want to read a more typical conference report, then scroll to the MEDI’13 section, further below.

KEOD’13

KEOD appeared to be a lucrative business for the organizers rather than a full-fledged conference. It started with an increase in the registration fees after submission, amounting to 525 euro without lunch (another 70 euro) or social dinner (another 75 euro), and another 290 euro for any additional paper. But we were already sucked into it and didn’t want to go through the whole process of resubmitting one of the papers elsewhere, so I ended up forking out 815 euro just for the registration (with a bad Rand->Euro exchange rate to boot). The internet connection was close to zero bits per hour, so that wasn’t taking up the lion’s share of the conference registration either. The ‘welcome reception’—included in the price—consisted of two jugs of ice tea, two jugs of juice, and three bottles of water, with a few snacks—even the alcohol-free welcome receptions in the USA do better than that. Now, I know that top conferences such as KCAP, CIKM and cs. do tend to be pricy, but KEOD is not a top conference, despite the claimed statistics in the proceedings that there was an acceptance rate for the joint IC3K conferences for long papers of 10% and, including regular papers, about 35% overall (the paper with my MSc student, Zubeida Khan [1] was a full/long paper of 12 pages double-column and the other one [2] a regular 8 pages).

The titles of the papers certainly sounded interesting, but most of the presentations did not live up to the promise, multiple sessions had at least one paper where the author had not shown up, and there were only about 20-25 KEOD attendees, which is a generous rough headcount of the attendees in the plenary sessions (quite a few who did show up did not stay for the whole conference). Or: at best, it was more like an expensive and not well-organised workshop-level event. The best paper award went to a groupie of one of the two organizers.

Of course, it’s your call to submit for next year’s installment or not, and my opinion is just that. When inquiring with a few people who had attended a previous installment, the papers were then “a bit of a mixed bag”, so maybe only this year was a temporary dip, not indicative of a general trend. Regardless the relatively average low quality, it’s still too expensive.

On the bright side, Tahir Khan, with whom I have a CIKM’13 paper jointly with Chiara Ghidini [3] (topic of a next post), was attending as well, and the three of us (Zubeida, Tahir, and me) have set out some nice tasks for research and a new prospective paper (assuming we’ll get interesting results, that is).

MEDI’13

Regarding MEDI’13, organized by Alfredo Cuzzocrea and Sofian Maabout, there was a last-minute wobble that got ironed out, and the local organization, the quality of the papers and presentations, and the atmosphere at the conference was substantially better. It was a really enjoyable event.

The conferences organized in Italy that I have attended over the years (among others, AI*IA’07, AI*IA’09, and DL’07) were always good with the food and drinks, and MEDI was no exception and they even had additional entertainment with folkloristic music and dance at the welcome reception and (huge) social dinner. And now I know how tartufo is really supposed to taste (sorry Bolzanini, but none of the restaurants and gelateria up there come even close).

So, let me mention some of the presentations and papers that piqued my interest—what I normally write about when writing a blog post about a conference.

The first keynote was given by Gottfried Vossen from the University of Münster on “Model engineering as a social activity: the case of business processes”, which focused on the role of models in computer science, the move from modeling to model engineering (a more structured and rigorous activity than just modelling), the Horus method from modelling, and closing with ideas about inserting crowdsourcing into the process. A side-note in the presentation, but worthwhile linking here anyway, is that there’s now even also algorithm engineering. The second keynote was given by Dimitrios Gunopulos from the University of Athens about “Analyzing massive streaming heterogeneous data: towards a new computing model for computational sustainability”. This talk was about document retrieval, e.g., newspaper articles, after first having established a baseline of the occurrences of certain keywords over time, and then detecting ‘bursts’ of keyword incidences in certain smaller intervals, whose documents then get ranked higher in the retrieval of relevant documents.

What I’ll certainly read in more detail over the upcoming days is the persistent meta-model system paper [4] that enhances the OntoDB/OntoQL system, as our papers were about metamodels [5] and the model repository ROMULUS [6]. Likewise, I’m biased toward reading the paper presented by Zdenek Rybola [7], for it deals with ontology-driven conceptual data modeling with UML using OntoUML, aiming to port the ontology-driven conceptual model toward improvements in implementations. Further, [8] proposes to use Time Petri Nets to verify temporal coherence in a SMIL (synchronized multimedia integration language, from the W3C) multimedia document presentation through the transformation of the SMIL document into TPN.

I was not the only one traveling to MEDI’13 all the way from South Africa: Kobamelo Moremedi, a MSc student at UNISA, presented his work about various possible diagrammatic notations for Z [9]. Staying with the topic of South Africa, and highlighting some more immediate possible relevance and link to societal use of material presented at the conference (cf. talking about foundational ontologies, ETL, architectural documentation, database integrity and the like, where the chain toward practical relevance is a bit longer): Andrea Nucita presented a Markov chain-based model for individual patients and agent-based model for interaction among individuals so as to predict HIV/AIDS epidemiological trends and simulate the various epidemiological scenarios [10]. They used actual clinical data to test their model and it showed that, as one would expect, expanding access to testing and therapy will influence the evolution of the epidemiology toward limiting the spread of HIV/AIDS, therewith also corroborating other works on the same topic.

In closing

Overall, the experience was quite mixed in the relatively long trip. I haven’t written negatively about a conference before—if it wasn’t great, then I didn’t write about it, although not having written about an event does not imply it wasn’t great (e.g., KCAP’13 got cancelled)—but KEOD really was a disappointment on multiple aspects. I don’t know whether it’s in the same league as WORLDCOMP, IASTED, IARIA and the like, as I’ve never attended those, but going by reviews of those events, if the KEOD organisers don’t get their act together and improve upon it, they will be down there in the same league. As for MEDI, if I have enough research money and suitable material, I may submit again for next year’s installment in Cyprus or thereafter, as the quality, relevance, and experience was better than I thought it would be.

References

[1] Khan, Z.C., Keet, C.M. Addressing issues in foundational ontology mediation. Fifth International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

[2] Keet, C.M., Suárez Figueroa, M.C., and Poveda-Villalón, M. (2013) The current landscape of pitfalls in ontologies. International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

[3] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. ACM International Conference on Information and Knowledge Management (CIKM’13), San Francisco, USA, Oct 27-Nov 1, 2013.

[4] Youness Bazhar, Yassine Ouhammou, Yamine Aït-Ameur, Emmanuel Grolleau, and Stéphane Jean. Persistent Meta-Modeling Systems as Heterogeneous Model Repositories. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 25-37.

[5] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 188-199.

[6] Khan, Z.C., Keet, C.M. The foundational ontology library ROMULUS. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 200-211.

[7] Robert Pergl, Tiago Prince Sales, and Zdenek Rybola. Towards OntoUML for Software Engineering: From Domain Ontology to Implementation Model. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 249-263.

[8] Abdelghani Ghomari, Naceur Belheziel, Fatma-Zohra Mekahlia, and Chabane Djeraba. Towards a Formal Approach for Verifying Temporal Coherence in a SMIL Document Presentation. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 132–146.

[9] Kobamelo Moremedi and John Andrew van der Poll. Transforming Formal Specification Constructs into Diagrammatic Notations. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 212–224.

[10] Andrea Nucita, Giuseppe M. Bernava, Pietro Giglio, Marco Peroni, Michelangelo Bartolo, Stefano Orlando, Maria Cristina Marazzi, and Leonardo Palombi. A Markov Chain Based Model to Predict HIV/AIDS Epidemiological Trends. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS 8216, 225-236.

CFP for WS on Logics and reasoning for conceptual models (LRCM’13)

From the ‘advertising department’ of promoting events I co-organise: here’s the Call for Papers for the LRCM’13 workshop.

================================================================
First Workshop on Logics and Reasoning for Conceptual Models (LRCM 2013)
14th of December 2013, Stellenbosch, South Africa
http://www.cair.za.net/LRCM2013/
co-located with the 19th International Conference on Logic for Programming,
Artificial Intelligence and Reasoning (LPAR-19), Stellenbosch, South Africa
==============================================================

There is an increase in complexity of information systems due to, among others, company mergers with information system integration, upscaling of scientific collaborations, e-government etc., which push the necessity for good quality information systems. An information system’s quality is largely determined in the conceptual modeling stage, and avoiding or fixing errors of the conceptual model saves resources during design, implementation, and maintenance. The size and high expressivity of conceptual models represented in languages such as EER, UML, and ORM require a logic-based approach in the representation of information and adoption of automated reasoning techniques to assist in the development of good quality conceptual models. The theory to achieve this is still in its infancy, however, with only a limited set of theories and tools that address subtopics in this area. This workshop aims at bringing together researchers working on the logic foundations of conceptual data modeling languages and the reasoning techniques that are being developed so as to discuss the latest results in the area.

**** Topics ****

Topics of interest include, but are not limited to:
- Logics for temporal and spatial conceptual models and BPM
- Deontic logics for SBVR
- Other logic-based extensions to standard conceptual modeling languages
- Unifying formalisms for conceptual schemas
- Decidable reasoning over conceptual models
- Dealing with finite and infinite satisfiability of a conceptual model
- Reasoning over UML state and behaviour diagrams
- Reasoning techniques for EER/UML/ORM
- Interaction between ontology languages and conceptual data modeling languages
- Tools for logic-based modeling and reasoning over conceptual models
- Experience reports on logic-based modelling and reasoning over conceptual models

To this end, we solicit mainly theoretical contributions with regular talks and implementation/system demonstrations and some modeling experience reports to facilitate cross-fertilization between theory and praxis. Selection of presentations is based on peer-review of submitted papers by at least 2 reviewers, with a separation between theory and implementation & experience-type of papers.

**** Submissions ****

We welcome submissions in LNCS style in the following two formats for oral presentation:
- Extended abstracts of maximum 2 pages;
- Research papers of maximum 10 pages.
Both can be submitted in pdf format via the EasyChair website at https://www.easychair.org/conferences/?conf=lrcm13

**** Important dates ****

Submission of papers/abstracts: 14 October 2013
Notification of acceptance:     14 November 2013
Camera-ready copies:            2 December 2013
Workshop:                       14 December 2013

**** Organisation ****

Maria Keet, University of KwaZulu-Natal, South Africa, keet@ukzn.ac.za
Diego Calvanese, Free University of Bozen-Bolzano, Italy, calvanese@inf.unibz.it
Szymon Klarman, CAIR, UKZN / CSIR-Meraka Institute, South Africa, szymon.klarman@gmail.com
Arina Britz, CAIR, UKZN / CSIR-Meraka Institute, South Africa, abritz@csir.co.za

**** Programme Committee ****

Diego Calvanese, Free University of Bozen-Bolzano, Italy
Szymon Klarman, CAIR, UKZN / CSIR-Meraka Institute, South Africa
Maria Keet, University of KwaZulu-Natal, South Africa
Marco Montali, Free University of Bozen-Bolzano, Italy
Mira Balaban, Ben-Gurion University of the Negev, Israel
Meghyn Bienvenu, CNRS and Universite Paris-Sud, France
Terry Halpin, INTI International University, Malaysia
Anna Queralt, Barcelona Supercomputing Center, Spain
Vladislav Ryzhikov, Free University of Bozen-Bolzano, Italy
Till Mossakowski, University of Bremen, Germany
Alessandro Artale, Free University of Bozen-Bolzano, Italy
Giovanni Casini, CAIR, UKZN / CSIR-Meraka Institute, South Africa
Pablo Fillottrani, Universidad Nacional del Sur, Argentina
Chiara Ghidini, Fondazione Bruno Kessler, Italy
Roman Kontchakov, Birkbeck, University of London, United Kingdom
Oliver Kutz, University of Bremen, Germany
Tommie Meyer, CAIR, UKZN / CSIR-Meraka Institute, South Africa
David Toman, University of Waterloo, Canada

Towards a metamodel for conceptual data modeling languages

There are several conceptual data modelling languages one can use to develop a conceptual data model that should capture the subject domain of the application area in an implementation-independent way. Complex software development may need to leverage the strengths of each language yet have the need for interoperability between the software components; e.g., an application layer object-oriented software design in a UML Class diagram that needs to be able to talk to the EER diagram for a relational database. Or one is at a state where there are already several conceptual data models for different applications, but they need to be integrated (or at least made compatible). For various reasons, each of these models may well be represented in a different language, such as in UML, EER, ORM, MADS, Telos etc. Superficially, these languages all seem quite similar, even though they are known to be distinct in ‘a few’ features, such as that UML Class Diagrams typically have methods, but EER does not.

To adequately deal with such scenarios, we need not a comparison of language features, but a unification to foster interoperability. However, no unifying framework exists that respects all of their language features. In addition, one may wonder about questions such as: where are the real commonalities ontologically, what is fundamentally different, and what is the same in underlying idea or meaning but only looks different on the surface? We—Pablo Fillottrani (with the Universidad Nacional Del Sur in Argentina) and I—aim to fill this gap.

As a first step, we designed a common, ontology-driven, metamodel[1] of the static, structural, components of ER, EER, UML v2.4.1, ORM, and ORM2, in such a way that each language is strictly a fragment of the encompassing metamodel. In the meantime, we also have developed the metamodel for the constraints, but for now, the results of the metamodel for the static, structural, components have been accepted at the 32nd International Conference on Conceptual Modeling (ER’13) [1] and 3rd International Conference on Model & Data Engineering (MEDI’13) [2]. There is no repetition among the papers; it merely has been split up into two papers because of page limitations of conference proceedings and the amount of results we had.

The ER’13 paper [1] presents an overview on the core entities and constraints, and an analysis on roles and relationships, their interaction with predicates, and attributes and value types (which we refine with the notion of dimensional attribute). The MEDI’13 paper [2] focuses more on all the structural components, and covers a discussion on classes/concepts, subsumption, aggregation, and nested entity types.

(Warning: spoiler alert…) Perhaps surprisingly, the intersection of all the features in the selected languages is rather small: role, relationship (including subsumption), and object type. The attributions—attributes, value types—are represented differently, but they aim to represent the same underlying idea of attributive properties, and several implicit aspects, such as dimensional attribute and its reusability and relationship versus predicate, have been made explicit. Regarding constraints, only disjointness, completeness, mandatory, object cardinality, and the subset constraint appear in the three language families. The two overview figures in the paper have the classes colour-coded to give an easier overview on how many of the elements are shared across languages, and the appendix contains a table/list of terminology across UML, EER and ORM2, like that UML’s “association” and EER’s “relationship” denote the same kind of thing.

This also received attention in the UKZN e-news letter here, which combined the announcement of the ER’13 paper with my participation in the Dagstuhl seminar on Reasoning over Conceptual Schemas last May, the DST/NRF funded South Africa – Argentina bi-lateral project on the unification of conceptual modelling languages, and Pablo’s visit to UKZN the previous two weeks.

References

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). 11-13 November, 2013, Hong Kong. Springer LNCS (in print).

[2] Keet, C.M., Fillottrani, P.R. Structural entities of an ontology-driven unifying metamodel for UML, EER, and ORM2. 3rd International Conference on Model & Data Engineering (MEDI’13). September 25-27, 2013, Amantea, Calabria, Italy. Springer LNCS (in print).


[1] the term “metamodel” is used here in the common usage of the term in the literature, but, more precisely, we have a conceptual model that represents the entities and the constraints on their usage in a conceptual data model; e.g., it states that each relationship (association) is composed of at least two roles (association ends), that a nested entity type objectifies one relationship, that a multi-valued attribute is an attribute, and so on.

A few notes and tips for forming new words

Recently, the COMMUTERM project was accepted, where we will use crowdsourcing to develop an isiZulu terminology for, first, computer science, and then in another discipline to test genericity of the approach and the tools. One of the components is that new words will have to be invented: while there are isiZulu words for the computer mouse (igundane), there is none so far for, say, ‘computational complexity of an algorithm’, or even ‘algorithm’ (though there’s a tentative candidate for the latter).

So, how would you go about inventing them? In a conversation about that and a less daunting example, the spreadsheet ‘table’, I asked whether the isiZulu word for table—itafula—could be reused. The answer was not just a “no”—a physical table is a very different kind of thing so you can’t use the same word[1], and likewise in several other languages—, but with the addition in a tone of embarrassment that there weren’t that many isiZulu words and “even itafula originates from Afrikaans”. I countered that loan words, modification and adoption are the norm, rather than the exception, in many languages (well, at least the ones I know of), and gave a few sketchy examples.

What I hope to achieve here is to structure that somewhat with examples and ‘types’ of adoptions in an accessible way. As input I use my memory of a seminar I attended in 2006 about that very topic (the Language and Communication Technologies colloquia at the Free University of Bozen-Bolzano, where the computer science faculty operates in a trilingual mode) and the languages I have learned over the years. If you have better sources, I’d be grateful if you inform me about them, which, in turn, may improve the outcome of the COMMUTERM project. I will divide it into stages of adoption of a new word, and then describe and illustrate the patterns I know of that have been used to invent new ones.

Stages of adoption

There a different stages of adoption of a new word in a language, resulting from nearness to overlapping language regions and, in these day, globalization. I am not talking of intentional usage of a foreign word, such reading in an English language text “spitting chewing gum on the street is Verboten”: English has a word for verboten (‘forbidden’), but the use of the German word is intended to convey a sense of ‘really strictly forbidden’. Instead, I am considering primarily the first one of three cases: 1) language X has some word abcd whereas language Y does not have a word for the entity but wants or needs it, 2) a speaker of language X does not speak Y well, and makes a Y-ification of abcd and that somehow creeps into the language[2], 3) there is an existing word for abcd in Y, but for some reason (whichever it may be), abcd is used anyway[3].

The first stage is just plain borrowing of abcd from X by Y; for instance, guerilla (Spanish, Sp.) or polder (Dutch, Ned.) or niche (French, Fr.) in English language. Sometimes it remains at this stage, i.e., the loan word is adopted as one’s own as is, be this in the original meaning or not. Regarding latter, you might find the following example mildly entertaining. We colloquially used the word ‘floppy’ as short-hand for ‘floppy disk’ in the Netherlands, but the ‘stiffy’ of ‘stiffy disk’ never really made it—there are translations of ‘stiffy’ into Dutch, but none that fits well, and we have the metric system, so inches were not an option either. In the time of their co-existence, we had to compare them in some way nevertheless, which ended up as grote floppy [disk/diskette] (‘big floppy’) versus a kleine floppy [disk/diskette] (‘small floppy’) or a zachte floppy [disk/diskette] (‘soft floppy’) versus a harde floppy [disk/diskette] (‘hard floppy’, even in the urban dictionary). After the real floppy disk had its exit, the stiffy disk became a plain floppy (Ned., plural: floppies) used as a noun without adjective, or plain diskette (Ned.). Writing this now, it sounds like lunacy, but it made perfect sense back then and everybody understood what you meant with these terms.

The second stage is adaptation of the word, and this also may be the final stage. Adaptation leaves the word largely intact, but modifies it a little according to rules of word formation or grammar of Y. For instance, the verb ‘to browse’ is modified in Dutch with Dutch verb rules: the verb is now browsen and jij (you) browst, wij (we) browsen, etc., and the German (Ger.) Apfelstrudel is strudel di mele in Italian (It.) where Strudel is untranslatable and Apfel is mele.

The third stage, if it occurs, is complete adoption after adaptation or invention of a new word. For instance, the English (En.) ‘to educate’ has its origin in Latin (Lat.) educare, ‘democracy’ from the Greek demos kratos, and ‘cookie’ from koekje (Ned., a longer list). The direct import ‘taxi’ originates from Greek—supposedly, all words with an x in it, are from the Greek language—and contradiction in terms is a 1-to-1 translation from contradictio in terminis (Lat.). There are very many such words in English that have their origin in other languages, and there are plenty of etymological dictionaries you may like to check (e.g., word origin’s list with stories and etymonline with just a very brief note for each entry).

Different regions may for one reason or other stay in one stage or another with some word. For instance, in the USA, ‘kindergarten’ is a common term, whereas elsewhere ‘pre-school’ is used. I won’t consider all the why-this issues here, only what. What I have observed is that different cultures in countries are more or less or not at all fanatic when it comes to their vocabulary. For instance, there is the Academie Francaise who is in charge of imposing in a top-down fashion French words for otherwise loan words (e.g., the recent mot-dièse for ‘hash tag’), the Flemish are generally more inventive than the Dutch (e.g., helikopter (Ned.) vs. wentelwiek (Be.) for ‘helicopter’), and the speakers of Italian, Spanish, and German typically come up with own words. However, comparing computing terms, this is not always the case: besturingssysteem (Ned., new word) versus sistema operativo (It., direct translation).

Types of changes

This is my attempt at structuring the ways of inventing adaptations and word inventions. I did glean a bit from [1], notably that it motivated me to add the distinction between ‘there is abcd in language X, now find one in Y’ versus the totally ab initio word creation, in the sense of ‘we created this new thingy as the first thingy in the world, now name it’. Within the COMMUTERM project, we mostly face the former, although some ideas on how people in other languages deal with the latter may be of help for the former if there is no feasible translation and you have to go back to the drawing board of word creation. I’ll go through them in the following order: more or less a translation, Y-ify a noun, Y-ify a verb, and word formation.

1-to-1 translation.

Direct translation of abcd in X to an existing word in Y, i.e., in both languages the new word or reuse of an existing word for another meaning happens in the exact same way. Examples:

  • (in computing) mouse (En.) – igundane­ (Zu.) – muis (Ned.) – topo (It.)
  • (in computing) memory (En.) – geheugen (Ned.) – memoria (It.)
  • email (En.) – correo electronico (Sp.)
  • database (En.) – base de datos (Sp.)
  • ontology (En.) – ontologie (Ned., Ger.) – ontologia (It., Sp.), although, in this case, English has taken it from philosophy, which has taken it from Latin.

There are many more terms also in computer science of which you (well, just the English-speakers) may think they are English but have a root in another language and English borrowed from that or adopted it fully. To back this up, just in case you were thinking everything comes from English: check out the etymology of, e.g., data (from datum (Lat.)), algorithm (after the Persian mathematician Al-Khwarizmi), to compute (from Latin), printer created as noun of print (from Old French preinte, which, in turn, comes from premere (Lat.)).

Almost a 1-to-1 translation.

It looks like a 1-to-1 translation of existing words, but there is a slight semantic difference, as if a nitpicking refinement occurred in the search for a translation that possibly indicates a slight difference in underlying meaning or perhaps it was felt unavoidable because a suitable equivalent was not available in Y. Examples:

  • (in computing) operating system (En.) – Betriebssystem (Ger.), where the betriebs- is literally the ‘steering’ of the system, not the ‘operating’.
  • (in computing) keyboard (En.) – toetsenbord (Ned.), i.e., literally, the keysboard, for there are multiple keys on a keyboard, not just one.
  • (in computing) save (En.) – opslaan/bewaren (Ned.) – speichern (Ger.), which means ‘to store’ in Dutch and German, not ‘to save’.
  • (in computing) file (En.) – documento (It.).

With respect to some offline comments I received, I’ll rephrase the latter point differently (perhaps too bluntly): if you cannot find an exact 1-to-1 translation but only some sort of approximation, then do not worry about that and do not put down your own language, as there are very many such cases with other languages. If you do not believe that, I can lend you a few of my bi-directional dictionaries to check: they are all inconsistent.

Partial translations.

Partial translations, I suspect, are due to compound forms where the component-words were introduced at different times or it has a readily available equivalent in Y. Examples:

  • Email address (En.) – indirrizo email (It.) – ikheli le-e-mail (Zu.)

Y-ify a noun from X.

This can be in two ways: 1) typically, change the beginning or ending of a noun to conform to the word forms/gender/alphabet of Y, 2) change the plural to adhere to the grammar for plurals of Y. One perhaps could count a third way as being the article used with it. Examples:

  • Radio (En.) – iRadio (Zu.), i.e., Zulufy a foreign word by putting an i- in front of the noun.
  • Computer (En.) – computadora (Sp.)
  • Between English and Roman languages, such as Italian and Spanish, there are quasi rules as well: nouns with -ción (Sp.) and -zione (It.) often end up as -tion in English (e.g., educa-) and -(a)dor/-(a)dora (Sp.) as -ter or -tor (e.g., investiga-).
  • Niche (En.) – nicchio (It., masculine) / nicchia (It., feminine). The nicchio ‘recess in the wall’ travelled to France, and back to Italy came the new concept of ‘niche of a species’, for which the original term was modified into nicchia (It.) to denote the conceptual distinction, i.e., a gender change. English took niche (Fr.) for both.
  • Preparations, arrangements (Eng.) are amalungiselelo (Zu.), but software settings, being similar in idea of arrangements but not the same, is isilungiselelo (Zu.), i.e., having changed noun class (from ama- to isi-).

On the other hand, I noticed that violating certain rules resulted in grumbling. The isiZulu interface of Google has idrayivu for the ‘drive’, but although the i- is following the same as mentioned the first item, above, the few people I asked were not happy with it, because the word contains an r and isiZulu does not have the r in the alphabet.

Y-ify a verb from X.

This is grammatically more elaborate to explain than the case for the nouns, because quite a few languages have a more structured grammar than English. Let me first give an example for the plain grammar rule, present tense, for ‘to speak’ in Spanish and isiZulu in the following table (omitting the you-formal).

 

Spanish

isiZulu

  hablar  root + ukukhuluma + root
I hablo -o ngikhuluma ngi-
You (singular) hablas -as ukhuluma u-
He/she/it habla -a ukhuluma u-
We hablamos -amos sikhuluma si-
You (plural) hablais -ais nikhuluma ni-
They hablaron -aron bakhuluma ba-

So, for instance, we have the English verb ‘to program’ some application and in Spanish programar, then ‘we program’ in Spanish ends up as programamos, which results from the combination of the root, which is obtained by removing the -ar from the verb, and appending the correct ending to indicate the ‘we’, i.e., -amos. The use of the gerund is composed from the auxiliary verb estar (with its root est- + -amos for the ‘we’) together with the root + -ando for the gerund, and ‘we are programming’ is in Spanish thus estamos programando. Hypothetically, if ukuprogram would be the verb for ‘to program’ in isiZulu, then ‘we program’ would be siprogram (it is not, though, see below).

Other examples of y-ifications/x-ifications—i.e., be this from X to Y or Y to X—are copiare (It.), copiar (Sp.), kopieeren (Ned.), to copy (En.), and studiare (It.), estudiar (Sp.), studeren (Ned.), to study (En.), where the Italian ­-are and Dutch -en are like the Spanish -ar and isiZulu uku- as above.

New terms for essentially different conceptualizations.

They are not direct translations or near-translations, but include also conceptually totally different ones (even though, loosely, they are translated as such). A reason why I include them as a separate option, is because here we are not even aiming at a translation, but it is intentionally different.

  • IT: Information Technology (En.) – EDV: Elektronische Datenverarbeitung (Ger.), which is, literally translated ‘electronic data processing’.
  • Computer Science (En.) – informatika (Ned.) – informatica (It., Sp.) – Informatik (Ger.): literally: the science of computers (which it is not) versus the science of information (much closer to it).

New words, using a language’s features.

Germanic languages have the fun of putting words together to create a new word with a new meaning. Arabic and Nguni language are much more semantics oriented, where the underlying idea of the stem can be reused for conceptually related entities. Examples (I looked up most in the dictionary):

  • -fund- (Zu.): something with studying/learning. ukufunda: to learn, read. umfundisi (high tones): teacher, umfundisi (low tones): preacher. imfundiso: teaching/doctine. ulwazi lemfundo: education (note: the dictionary said imfundo: knowledge, but the English ->isiZulu section says ulwazi, which I have heard before, ukwazi, and imfundiso (an example of just one of the myriad inconsistencies in bi-directional dictionaries).
  • -sebenza- (Zu.): something on working. ukusebenza: to work. umsebenzi: the work/job. abasebenzi: workers. alisebenzi: broken (not-working). insebenzo: wages (the fruit of one’s labour). uhlelokusebenza: software.

For English, a list of principles for word creation exists already, which I summarise here (with international examples added) to give you an idea, as they transfer over to several other languages as well.

Real compounding: joining words to make a new one: toothbrush and tablecloth. This is a very common feature of Germanic languages, and one of the more entertaining examples being Eisenbahnknotenpunkhinundherschieber (Ger.), which used to be an actual job title[4]. Uhlelokusebenza (Zu., ‘software’) sounds a lot like real compounding as well, based on -hlelo + -sebenza: the grammar/arrangement is working, or some such similar translation for the word components, which, to be honest, is a fabulous term compared to ‘software’ (En.). An approximation of compounding is putting a dash between the words, as in  ‘smoke-free’ (which is in Dutch just one word: rookvrij). UPDATE (29-8-2013): I just discovered there is an entire article on compounding in isiZulu [2].

Conversion: changing ‘class’ in the sense of making a noun out of a verb or vice versa, which is very common in English; e.g., to print -> printer, to push ->pusher.

Affixation: adding a prefix or a suffix; e.g., making a noun out of an adjective by adding –ness (happiness), from noun to adjective by adding –al (regional), from verb to adjective by adding –able (drinkable). The same holds for other languages, with their specific affixes; e.g.. –bar (Ger.), ­-baar (Ned.), -bile (It.) does the same in those languages as –able in English, and likewise -heid (Ned.) and -heit (Ger.) work alike the rules for the English -ness.

Other: clipping a longer word into a shorter version (flu for influenza), blending words together (smog, from smoke+fog).

Closing remarks

Ukwakhuhlelo means programming (noun), where -hlehlo is the root for ‘grammar’/‘arrangement’ (u-, izin-) and -ukwakh- relates to ‘to build’, i.e., based on compunding to form a new word. What can be modified to create a term for the verb ‘to program’? Following the basics for verb-ifying a noun by putting uku- in front of it, I would make a verb from the noun as ukukwakhuhlelo, but maybe you are more creative, like the inventor of isikhahlamezi, Thokozani Nene, was (‘fax’, and it sounds a lot nicer to the ear than one of the other translations the dictionary provides: ifeksi). Isikhahlamezi is an example of the kind of word creation where, as [1] notes, the purpose was not to create transparent output (recoverable from its origins, for there is none in this case), but to create a term with certain desired features that match word characteristics of the language, such as number of vowels and syllables.

As a last note on terms and given the readership of this blog, and having mentioned knowledge (ulwazi) before, which is easily memorizable, here it goes for ‘logic’, where the first term is easy to remember, but the other two require some practice to pronounce and remember: ilojiki; ukwazi ukuqonda nokuhlazulula ngohlelo izindaba; ukuhlela ngokulandelanisa.

Either way, I hope the range of options has given you some ideas for borrowing, adapting, and creating new words, which can give you a head start in the crowdsourcing game that we aim to launch late September/early October.

References

[1] Ronneberger-Siebold, E. On useful darkness: loss and destruction of transparency by linguistic change, borrowing, and word creation. Yearbook of Morphology 1999. Booij, G.E., Marle, J. (Eds.). Springer. 2001. V, pp97-120.

[2] Buthelezi, T.M. Exploring the Role of Conceptual Blending in Developing the Extension of Terminology in isiZulu Language. Alternation, 2008, 15(2):181-200.

Thanks to Charmaine and Nokubonga for the lively conversation about and suggestions for some of the isiZulu terms.


[1] Double-checking the spelling of itafula in the dictionary now, I noticed there is an entry “amathebula (arith. tables)” in the Scholar’s Zulu dictionary; what about that for the spreadsheet tables?

[2] An example of the latter may be the expression die Treppe herunter schendieren (going down the stairs), where schendieren is a germanification of the Italian scendere (thanks to my former colleague Andrea at FUB who mentioned this example).

[3] E.g., the Afrikaans word braai is used by the English as home language speakers in South Africa, even though elsewhere it is called barbeque.

[4] Eisen iron, bahn road, eisenbahn railway, knoten knots, punkt point, knotenpunkt crossroad or spaghetti junction, eisenbahnknotenpunkt railway point where the train can change tracks, hin to, und and, her fro, schieber pusher: Eisenbahnknotenpunkhinundherschieber is the guy who manually pushes the lever backward and forward so that the train moves onto the right railway track. In the late 1920/early 1930s, it was the longest German word in use (which my grandfather had happened to learn in the few years he went to school in Germany before the family moved back to the Netherlands before WW II).

Ontologies and Knowledge bases lecture notes for 2013

The lecture notes for the ontologies and knowledge bases module (COMP720) for semester 2 in 2013 are online available now. I’ve updated them compared to last year’s installment (mentioned here): in addition to the regular changes, like updates to reflect the advances made in the past year in ontology engineering, better explanations in several sections, and more examples, it includes the DL primer by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!), more exercises, and answers to selected exercises.

As last year, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks: logic foundations, ontology engineering, and advanced topics. The logic foundations contain a recap of FOL, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with top-down ontology development using foundational ontologies, then bottom-up ontology development to extract knowledge from ‘legacy’ representations, and finally (perhaps too briefly), methods and methodologies. The advanced topics are balanced in two directions, where the first one certainly will be covered and the second one if time permits: ontology-based data access applications (i.e., an ontology-drive information system) and temporal ontologies.

It is essentially still an evolving document, and relative completeness of sections varies slightly. Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.

KCAP13 poster on aligning and mapping foundational ontologies

I announced in an earlier post the realisation of the Repository of Ontologies for MULtiple USes ROMULUS foundational ontology library as part of Zubeida’s MSc thesis, as well as that a very brief overview describing it was accepted as a poster/demo paper [1] at the 7th International Conference on Knowledge Capture (KCAP’13) that will take place next week in Banff, Canada. The ‘sneak preview’ of the poster in jpeg format is included below. To stay in style, it has roughly the same colour scheme as the ontology library.

KCAP13romulusPoster

The poster’s content is slightly updated compared to the contents of the 2-page poster/demo paper: it has more detail on the results obtained with the automated alignments. On reason for that is the limited space of the KCAP paper, another is that a more comprehensive evaluation has been carried out in the meantime. We report on those results in a paper [2] recently accepted at the 5th International Conference on Knowledge Engineering and Ontology Development (KEOD’13). The results of the tools aren’t great when compared to the ‘gold standard’ of manual alignments and mappings, but there are some interesting differences due to—and thanks to—the differences in the algorithms that the tools use. Mere string matching generates false positives and misses ‘semantic [near-]synonyms’ (e.g., site vs. situoid, but missing perdurant/occurrent), and a high reliance on structural similarity causes a tool to miss alignments (compare, e.g., the first subclasses in GFO vs. those in DOLCE). One feature that surely helps to weed out false positives is the cross-check whether an alignment would be logically consistent or not, as LogMap does. That is also what Zubeida did with the complete set of alignments between DOLCE, BFO, and GFO, aided by HermiT and Protégé’s explanation feature.

The KEOD paper describes those ‘trials and tribulations’; or: there are many equivalence alignments that do not map due to a logical inconsistency. They have been analysed on the root cause (mainly: disjointness axioms between higher-level classes), and, where possible, solutions are proposed, such as subsumption instead of equivalence or proposing to make them sibling classes. Two such examples of alignments that do not map are shown graphically in the poster: a faltering temporal region that apparently means something different in each of the ontologies, and necessary-for does not map to generic-dependent due to conflicting domain/range axioms. The full list of alignments, mappings, and logical inconsistencies is now not only browsable on ROMULUS, as announced in the KCAP demo paper, but also searchable.

Having said that, it is probably worthwhile repeating the same caution made in the paper and previous blog post: what should be done with the inconsistencies is a separate issue, but at least now it is known in detail where the matching problems really are, so that we can go to the next level. And some mappings are possible, so some foundational ontology interchangeability is possible (at least from a practical engineering viewpoint).

References

[1] Khan, Z.C., Keet, C.M. Toward semantic interoperability with aligned foundational ontologies in ROMULUS. Seventh International Conference on Knowledge Capture (K-CAP’13), ACM proceedings. 23-26 June 2013, Banff, Canada. (poster &demo)

[2] Khan, Z.C., Keet, C.M. Addressing issues in foundational ontology mediation. Fifth International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

Follow

Get every new post delivered to your Inbox.

Join 36 other followers

%d bloggers like this: