Identification and keys in UML Class Diagrams

ER, its extended version EER, ORM, and it extended version ORM2 have several options for identification with keys/reference schemes. UML Class Diagrams, on the other hand, have internal, system-generated, identifiers, with a little-known and underspecified option for user-defined identifiers inspired by ER’s keys, whose description is buried in the standard on pp290-293 [1]. Although UML’s ease of system-generated identifiers relieves the burden of detailed conceptual analysis by the modeller, it is exactly making implicit subject domain semantics explicit that is crucial in the analysis stage; or: less analysis during the modelling stage stores up more problems down the road in terms of software bugs and interoperability. A uniform, or at least a structured and unambiguous approach, can reduce or even avoid inconsistencies in a conceptual data model, especially concerning taxonomies, achieve interoperability through less resource-consuming information integration, and if the identification mechanisms are the same across conceptual data modeling language, then unification among them comes a step closer. The present lack of harmonisation in handling identification hampers all this.

But which identification mechanism(s) could, or should, be specified more precisely for UML, and how does it rhyme with progress about identity in Ontology? How to incorporate it in UML to foster consistent usage? For instance, should the procedure to find and represent identity, or at least good keys, be a step in the modelling methodology, also be enforced in the CASE tool, or be part of the metamodel and/or in the conceptual modelling language itself?

The distinct implicit assumptions and explicit formalisations of extant identification schemes are elucidated in [2] for ER, EER, ORM, and ORM2. As it appears, not even ER/EER and ORM/ORM2 agree fully on keys and reference schemes, neither from a formal perspective (though the difference is minimal), nor from a methodological or CASE tool perspective. Both, however, do aim at strong or weak identification of entities with so-called natural keys (/semantic identifiers) in particular.

At the other end of the spectrum, Ontology does have something on offer, but the philosophers are only interested in identity of entities. Moreover, it is not just that they don’t agree on the details, but there is a tendency to admit it is nigh on inapplicable (see [3] for a good introduction). Watered-down versions of the notion of identity in philosophy that have been proposed in AI, are to used either only the necessary or only the sufficient conditions, which are at least somewhat applicable, be it as a basis for OntoClean [4], or in Description Logic languages (including OWL 2) with their primitive and defined classes. The details of the definitions, explanations, and pros and cons are presented in a ‘digested’ format in the first part of [2].

This analysis was subsequently used to increase the ontological foundations of UML in the second part of [2] to introduce two language enhancements for UML, being formally defined simple and compound identifiers and the notion of defined class, which also have a corresponding extension of UML’s metamodel. The proposed extensions focus on practical usability in conceptual data modelling, informed by ontology, and are approximations to the qualitative, relative, and synchronic identity, and to the notion of equivalent class (defined concept) in OWL (DL).

For those of you who do not care about the ‘unnecessary philosophizing’ (as one reviewer had put it) and justifications, there is a short (4-page) version [5] with the formal definitions and the UML extensions, which has been accepted at the SAICSIT’11 conference in Cape Town, South Africa. The longer version that provides explanation as to why the proposed extensions are the way they are is available as a SoCS technical report [2].

References

[1] Object Management Group. Ontology definition metamodel v1.0. Technical Report formal/2009-05-01, Object Management Group, 2009.

[2] Keet, C.M. Enhancing identification mechanisms in UML class diagrams with meaningful keys (extended version). Technical Report SoCS11-1, School of Computer Science, University of KwaZulu-Natal, South Africa. August 8, 2011.

[3] H. Noonan. Identity. In E. N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Fall 2008 edition, 2008.

[4] N. Guarino and C. Welty. Identity, unity, and individuality: towards a formal toolkit for ontological analysis. In W. Horn, editor, Proc. of ECAI’00. IOS Press, Amsterdam, 2000.

[5] Keet, C.M. Enhancing Identification Mechanisms in UML Class Diagrams with Meaningful Keys. SAICSIT Annual Research Conference 2011 (SAICSIT’11), Cape Town, October 3-5, 2011. ACM Conference Proceedings (in print).

Advertisements

TAR article on Google in Africa

The The Africa Report magazine’s cover story was “Is Google good for Africa?” [1] (the online page provides only an introduction to the longer article in the print/paid edition). Google is investing in Africa, both regarding connectivity and content: if there’s no content then there’s no need to go online, and if there’s no or a very slow connection, then there won’t be enough people online to make online presence profitable. In the words of Nelson Mattos, Google’s VP for EMEA: “Our business model works only when you have enough advertisements and lots of users online, and that’s the environment we are trying to create in Africa” (p24). Gemma Ware notes that “by investing now into Africa’s internet ecosystem, Google hopes to hardwire it with tools that will make people click through its websites”, and, as she aptly puts it: they have raised the flag first.

(Picture from WhiteAfrican's blogpost on "What should Google do in Africa?" (2))

On average, there is one web domain for every 94 people in the world, but for Africa, this is 1 in 10.000. Somewhere buried on p24 and p26 of the TAR article, two reasons are given: no credit card to buy space online and a ‘.[country]’ costs more than a ‘.com’ domain. There’s no lack of creativity (e.g., the Ushahidi platform co-founded by the new head of Google’s Africa policy Ory Okolloh, and much more).

In percentages of Google hits around the world, the USA tops with 31%, then India with 8%, China with 4.2%, UK 3%, Italy 2.3%, Germany and Brazil 2.9%, Russia 2.8%, France and Spain 2%, and at the lower end of the chart South Africa with 0.7%, Algeria and Nigeria with 0.6% and Sweden with 0.5%. The other African countries are not mentioned and have a lighter colour in the diagram than the lowest given value of 0.5%. These data should have been normalized by population size, but give a rough idea nevertheless.

40% of the Google searches in Africa are through mobile internet—including mine outside the office (unlike in Italy [well, Bolzano], here in South Africa they actually do sell functioning USB/Internet keys and SIM cards to foreigners). They estimated that there were about 14 million users in Africa in 2010 (the Facebook numbers on p26 total to about 28 million), which they expect to grow to 800 million by 2015. Now that’s what you can call a growth market.

There’s no Google data centre in Africa yet, but there are caches at several ISPs, which brings to mind the filter bubble. One can ponder about whether a cache and a bubble are better than practicing one’s patience. What you might not have considered, however, is that there are apparently (i.e.: so I was told, but did not check it) Internet access packages that charge lower rates for browsing national Web content and higher rates for international content where the data has to travel through the new fibre optic cable. So the caching isn’t necessarily a bad idea.

On content generation, Google has been holding “mapping parties” to add content to Google MapMaker, which also pleased its participants, because, as quoted in the article, they didn’t like seeing a blank spot as if there’s nothing, even though clearly there are roads, villages, communities, businesses in reality. There are funded projects to digitize Nelson Mandela’s documentary archives, crowd sourcing to generate content, Google Technology User Groups, helping businesses to create websites, and many other activities. In short, according to Google’s Senegal representative Tidjane Deme: “What Google is doing in Africa is very sexy”.

One of the ‘snapshots’ in the article mentions that Google now supports 31 African languages. I had a look at http://www.google.co.za, which has localized interfaces to 5 of the 9 official African languages in South Africa (isiZulu, Sesotho, isiXhosa, Setswana, Northern Sotho). As I have only rudimentary knowledge of isiZulu only, I had a look at that one to see how the localization has been done. Aside from the direct translations, such as izithombe for images and usesho for search, there are new concoctions. Apparently there is little IT and computing vocabulary in isiZulu, so new words have to be made up, or meanings of existing ones stretched liberally. For instance, logout has become phuma ngemvume (out/exit from authorization/permission) and when clicking on izigcawu (literally: open air meeting places) you navigate to the Google groups page, which are sort of understandable. This is different for izilungiselelo (noun class 8 or 10?) that brings you to Settings in the interface. There is no such word in the dictionary, although the stem –lungiselelo (noun class 6) translates as preparations/arrangements; my dictionary translates ‘setting’ (noun) into ukubeka (verb, in back-translation it means put/place, install; bilingual dictionaries are inconsistent, I know). It’s not just that Google is “hardwir[ing] [Africa] with tools”, they are ‘soft-wiring’ by unilaterally inventing a vocabulary, it seems, which reeks of cultural imperialism.

Admitted, I have not (yet) seen much IT for African languages, other than spell checkers for all 11 official languages in South Africa that work for OpenOffice and Mozilla, a nice online isiZulu-English dictionary and conjugation, and Laurette Pretorius’ research in computational linguistics—the former was heavily funded by outside funds and the second one a hobby project by German isiZulu enthusiast Carsten Gaebler. Nevertheless, it would have been nice if there were some coordinated, participatory, effort.

Writes the article’s author, Gemma Ware: “as Google’s influence grows, Africa’s techies are aware of the urgency to stake their own territorial claim”. This awareness has yet to be transformed into more action by more people. Overall, my impression is that ICT (and the shortage of ICT professionals) already has generated the buzz of excitement where people see plenty of possibilities, which makes it a stimulating environment down here.

References

[1] Gemma Ware. Is Google good for Africa?. The Africa Report, No 32, July 2011, pp20-26.

[2] Erik Hersman (WhiteAfrican). What Should Google do in Africa? June 28, 2011.

p.s.: The article does not really answer the question whether Google is good for Africa, and I didn’t either in the blog post; that’s a topic for a later date when I know more about what’s going on here.

The rough ontology language rOWL and basic rough subsumption reasoning

Following the feasibility assessments on marrying Rough Sets with Description Logic languages last year [1,2], which I blogged about before, I looked into ‘squeezing’ into OWL 2 DL the very basic aspects of rough sets. The resulting language is called, rOWL, which is described in a paper [3] accepted at SAICSIT’11—the South African CS and IT conference (which thus also gives me the opportunity to meet the SA research community in CS and IT).

DLs are not just about investigating decidable languages, but, perhaps more importantly, also about reasoning over the logical theories.  The obvious addition to the basic crisp automated reasoning services is to add the roughness component, somehow. There are various ways to do that. Crisp subsumption (and definite and possible satisfiability) of rough concepts have been defined by Jiang and co-authors [4], and there was a presentation at DL 2011 about paraconsistent rough DL [5]. I have added the notion of rough subsumption.

There are two principal cases to consider (the “\wr ” before the OWL class name denotes it is a rough class):

  • If \wr C \sqsubseteq \wr D is asserted in the ontology, what can be said about the subsumption relations among their respective approximations?
  • Given a subsumption between any of the lower and upper approximations of C and D, then can one deduce \wr C \sqsubseteq \wr D ?

Addressing this raises questions: because being rough or not depends entirely on the chosen properties for C together with the available data, should these two cases be solved only at the TBox level or necessarily include the ABox for it to make sense? And should that be under the assumption of standard instantiation and instance checking, or in the presence of a novel DL notion of rough instantiation and rough instance checking?

These questions are answered in the second part of the paper Rough Subsumption Reasoning with rOWL [3]. In an attempt to make the proofs more readable and because the presence of instances is intuitively tied to the matter, the proofs are done by counterexample, which is relatively ‘easy’ to grasp. But maybe I should have obfuscated it with another proof technique to make the results look more profound.

Last, but not least: just in case you thought there is little motivation to bother with rough ontologies: the hypothesis testing and experimentation described in [2] still holds, and a small example is added to [3].

The succinct paper abstract is as follows:

There are various recent efforts to broaden applications of ontologies with vague knowledge, motivated in particular by applications of bio(medical)-ontologies, as well as to enhance rough set information systems with a knowledge representation layer by giving more attention to the intension of a rough set. This requires not only representation of vague knowledge but, moreover, reasoning over it to make it interesting for both ontology engineering and rough set information systems. We propose a minor extension to OWL 2 DL, called rOWL, and define the novel notions of rough subsumption reasoning and classification for rough concepts and their approximations.

I’ll continue looking into the topic, and more is in the pipeline w.r.t. the logic aspects of rough ontologies (in collaboration with Arina Britz).

References

[1] C. M. Keet. On the feasibility of description logic knowledge bases with rough concepts and vague instances. Proceedings of the 23rd International Workshop on Description Logics (DL’10), CEUR-WS, pages 314-324, 2010. 4-7 May 2010, Waterloo, Canada.

[2] C. M. Keet. Ontology engineering with rough concepts and instances. P. Cimiano and H. Pinto, editors, 17th International Conference on Knowledge Engineering and Knowledge Management (EKAW’10), volume 6317 of LNCS, pages 507-517. Springer, 2010. 11-15 October 2010, Lisbon, Portugal.

[3] C.M. Keet. Rough Subsumption Reasoning with rOWL. SAICSIT Annual Research Conference 2011 (SAICSIT’11), Cape Town, South Africa, October 3-5, 2011. ACM Conference Proceedings. (accepted).

[4] Y. Jiang, J. Wang, S. Tang, and B. Xiao. Reasoning with rough description logics: An approximate concepts approach. Information Sciences, 179:600-612, 2009.

[5] H. Viana, J. Alcantara, and A.T. Martins. Paraconsistent rough description logic. Proceedings of the 24th International Workshop on Description Logics (DL’11), 2011. Barcelona, Spain, July 13-16, 2011.