## 8 years of keetblog

The 8-year anniversary swooshed by a few days ago, but, actually it’s really only completing today, as the first blog post with real content was published on April 18, 2006, about solving sudokus with constraint programming.

The top-post among the 186 posts (>9000 visits to that page alone) is still the introduction for two lectures on top-down and bottom-up ontology development that I wrote in November 2009 as part of the Semantic Web technologies MSc course at the Free University of Bolzano; anyone wishing to read an updated version: have a look at the 2014 lecture notes (its ‘Block II’). The post most commented on is about academia.edu, and then on my wish for a semantic search of insects.

The more ‘trivia’/fun ones—still having to do with science—are, I think, about the complexity of coffee and culinary evolution, but I may be biased (my first degree up to MSc was in food science). For some reason, there were more visitors reading about failing to recognize your own incompetence and some sneakiness of academia.edu than about food (and many other topics). Ah, well. A full list sorted by year is available on the list of blog posts page.

The frequency of posting is somewhat less than a few years ago and, consequently, the visits went down from about 1500/month during its heydays [well, years] to about 1000/month now, but that’s still not bad at all for a ‘dull’ blog, and I would like to thank you again, and even more so the fans (subscribers) and those of you who have taken the effort to like a post or to leave comments both online and offline! I hope it’s been an interesting read, or else enjoyable procrastination.

## Ontology Engineering lecture notes for 2014 online

The lecture notes for the Ontology Engineering BSc honours in CS course are available online now. The file is updated compared to the COMP720 module (and those notes have been removed). The main changes consist of reordering the chapters in Block II and Block III, adding better or more explanations and examples in several sections, fixing typos, and updates to reflect advances made in the field. It again includes the DL primer written by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!).

As with the last three installments, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks after the introduction: logic foundations, ontology engineering, and advanced topics (the latter we will skip, as this is a shorter course). The logic foundations contain a recap of FOL and the notion of reasoning, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with methods and methodologies that give guidance how to commence actually developing an ontology, and how to avoid and fix issues. Subsequently, there are two chapters going into some detail of two ‘paths’ in the methodology, being top-down ontology development using foundational ontologies, and bottom-up ontology development to extract knowledge from other material, such as relational databases, thesauri, and natural language documents.

The advanced topics are optional this year, but I left them in the lecture notes, as they may pique your interest. Chapter 8 on Ontology-Based Data Access is a particular application scenario of ontologies that ‘spice up’ database applications. Chapter 9 touches upon a few sub-areas within ontologies: representing and reasoning with vagueness and uncertainty, extending the language to include also temporal knowledge, the use of ontologies to enhance conceptual data models, and a note on social aspects.

It is still an evolving document, and relative completeness of sections varies slightly, so it has to be seen in conjunction with the slides, lectures, and some additional documentation that will be made available on the course’s Vula site.

Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.

## More book suggestions (2013)

Given that I’ve written post the past two years about books I’ve read during the previous year and that I think are worthwhile to read (here and here), I’m adding a new list for 2013, divided into fiction and non-fiction, and again a selection only. They are not always the newest releases but worthwhile the read anyway.

Fiction

The book of the dead by Kgebetli Moele (2009), which has won the South African Literary Award. The cover does not say anything about the story, and maybe I should not either. Moele’s book is a gripping read, and with a twist in the second part of the book (so: spoiler alert!). The first part is about Khutso, a boy growing up in a town in South Africa; it is “the book of the living”. Then he gets infected with HIV, and “the book of the dead” starts. Writing shifts from third-person to first person, and from the vantage point of the virus that wants to replicate and spread to sustain its existence, as if it has a mind of its own (read an excerpt from the second part). All does not end well.

Zen and the art of motorcycle maintenance by Robert Pirsig is a ‘modern classic’ that this year celebrates its 40th anniversary. It is semi-autobiographical and the story exposes some philosophical ideas and the tensions between the sciences and the arts, partially explained through drawing parallels with motorcycles and motorcycle maintenance. A minor storyline is about a road trip of father and son, and there is an unspoken undercurrent about inhumane psychiatric treatments (electroshocks in particular) of people deemed mentally ill. It is an interesting read for the complexity of the narrative and the multiple layers of the overall story, i.e. literary it is impressive, but I guess it is called ‘a classic’ more for the right timing of the release of the book and the zeitgeist of that era and therefore may resonate less with younger people these days. There are many websites discussing the contents, and it has its own wikipedia entry.

The girl with the dragon tattoo by Stieg Larssen (2008). I know, the movie is there for those who do not want to read the tome. I have not seen it, but the book is great; I recently got the second installment and can’t wait to start reading it. It is beautiful in the way it portrays Swedish society and the interactions between people. The tired male journalist, the troubled female hacker, and a whole cast of characters for the ‘whodunnit’.

Other books I read and would recommend: The songs of distant earth by Arthur C Clarke and De dolende prins [the lost prince] by Bridget Wood.

Non-fiction

Outliers by Malcolm Gladwell (2008). I bought this book because I liked the tipping point (mentioned last year). It is just as easily readable, and this time Gladwell takes a closer look at the data behind “outliers”, those very successful people, and comes to the conclusion there are rather mundane reasons for it. From top sports people who typically happen to have their date of birth close to the yearly cut-off point, which makes a big difference among small children, giving them a physical advantage, and then it’s just more time spent training in the advanced training programmes. To being at the right time in the right place, and a lot (‘10000 hours’) of practice and that “no one, not even a genius, ever makes it alone” (regardless of what the self-made-man stories from the USA are trying to convince you of).

Others books include Nice girls don’t get the corner office by Lois Frankel, but if you’d have to choose, then I’d rather recommend the Delusions of gender I mentioned last year, and the non-fiction books in the 2012 list would be a better choice, in my opinion, than Critical mass by Philip Ball as well (the mundane physics information at the start was too long and therefore I made it only partially through the book and put it back on the shelf before I would have gotten to the actual thesis of the book.)

And yes, like last year, I’ve read some ‘pulp’, and re-read the hunger games trilogy (in one weekend!), but I’ll leave that for what it is (or maybe another time). If you have any suggestions for ‘must read’, feel free to leave a note. There are some access limitations here, though, because it is not always the most recent books that are in the bookshops. I live near a library now, and will visit it soon, hoping I can finally follow up on a reader’s previous suggestion to read the books by Nadine Gordimer.

## Preliminary list of isiZulu terms for computing and computer literacy

As part of the COMMUTERM project, we played around with isiZulu terminology development using “the” “traditional” way of terminology development (frankly, having read up on it, I don’t think there is an established methodology), which were interesting of themselves already.

We have gathered relevant computing and computing literacy terms from extant resources, conducted a workshop with relative experts (typical way of doing it), executed two online surveys through an isiZulu-localised version of Limesurvey, and completed a voting experiment among computer literacy students. The results and analysis has been written up for a paper, but this will take some time to see the light of day (if it is accepted, that is). In the meantime, we do not want to ‘sit’ on the list that we have compiled: so far, there are 233 isiZulu terms from 8 resources for 146 entities. At the time of writing, this is the largest list of entities with isiZulu terms for the domain of computing and computer literacy.

The list is available in table format, sorted alphabetically by English term and sorted alphabetically by isiZulu term. Except from a few (very) glaring mistakes/typos, the list has not been curated in any way, so you have to use your own judgment. In fact, I don’t care which terms you’d prefer—I’m facilitating, not dictating.

Besides that you can leave a comment to this post or send me an email if you have updates you’d like to share, there are other ways to share your knowledge of isiZulu computing and computer literacy terminology with the COMMUTERM project and/or the world, being, among others:

• Contributing to the Limesurvey localization for isiZulu, so that not only the text in two existing surveys will be entirely in isiZulu, but also any survey and the back-end admin. Members of the African Languages department at UKZN are especially interested in this so that they will be able to use it for their research.
• The computer literacy surveys are still open (100% isiZulu interface), so you can still choose to do either this one or that one (but not both).
• Participate in the crowdsourcing game ([link TBA]), which will be launched in February, given that it is still summer holidays for the students at present.

## Book chapter on conceptual data modeling for biology published

Just a quick note that my book chapter on “Ontology-driven formal conceptual data modeling for biological data analysis” finally has been published in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data (edited by Mourad Elloumi and Albert Y. Zomaya). A summary of the chapter’s contents is described in an earlier blog post from little over two years ago, and I’ve put the preprint online.

The whole book is an impressive 1192 pages consisting of 48 chapters of about 25 pages each, which are grouped into three main sections. The first section, Biological data pre-processing, has four parts: biological data management, biological data modeling (which includes my chapter), biological feature extraction, and biological feature selection. The second section, biological data mining, has six parts: Regression Analysis of Biological Data, Biological Data Clustering, Biological Data Classification, Association Rules Learning from Biological Data, Text Mining and Application to Biological Data, and High-Performance Computing for Biological Data Mining. The third section, biological data post-processing, has only one part: biological knowledge integration and visualization. (check the detailed table of contents). Happy reading!

## Some ontology authoring guidelines to prevent pitfalls: TIPS

We showed pervasiveness of pitfalls in ontologies ealier [1], and it is overdue to look at how to prevent them in a structured manner. From an academic viewpoint, preventing them is better, because it means you have a better grasp of ontology development. Following our KEOD’13 paper [1], we received a book chapter invitation from its organisers, and the Typical pItfall Prevention Scheme (TIPS) is described there. Here I include a ‘sneak preview’ selection of the 10 sets of guidelines (i.e., it is somewhat reworded and shortened for this blog post).

The TIPS are relevant in general, including also to the latest OWL 2. They are structured in an order of importance in the sense of how one typically goes about developing an ontology at the level of ontology authoring, and they embed an emphasis with respect to occurrence of the pitfall so that common pitfalls can be prevented first. The numbers in brackets refer to the type of pitfall, and is the same numbering as in the OOPS! pitfall catalogue and in [1].

T1: Class naming and identification (includes P1, P2, P7, C2, and C5): Synonymy and polysemy should be avoided in naming a class: 1) distinguish the concept/universal itself from the names it can have (the synonyms) and create just one class for it and add other names using rdfs:label annotations; 2) in case of polysemy (the same name has different meanings), try to disambiguate the term and refine the names. Concerning identifying classes, do not lump several together into one with an ‘and’ or ‘or’ (like a class TaskOrGoal or ShrubsAndBushes), but try to divide them into subclasses. Squeezing in modality (like ‘can’, ‘may’, ‘should’) in the name is readable for you, but has no effect on reasoning—if you want that, choose another language—and sometimes can be taken care of in a different way (like a canCook: the stove has the function or affordability to cook). Last, you should have a good URI indicating where the ontology will be published and a relevant name for the file.

T2: Class hierarchy (includes P3, P6, P17, and P21): A taxonomy is based on is-a relationships, meaning that classA is-a classB, if and only if every instance of A is also instance of B, and is-a is transitive. The is-a is present in the language already (subclassOf in OWL), so do not introduce it as an object property. Also, do not confuse is-a with instance-of: the latter is used for representing membership of an individual in a class (which also has a primitive in OWL). Consider the leaf classes of the hierarchy: are they are still classes (entities that can have instances) or individuals (entities that cannot be instantiated anymore)? If the latter, then convert them into instances. What you typically want to avoid are cycles in the hierarchy, as then some class down in the hierarchy—and all of them in between—ends up as equivalent to one of its superclasses. Also try to avoid adding some class named Unknown, Other or Miscellaneous in a class hierarchy just because the set of sibling classes defined is incomplete.

T3: Domain and range of a class (includes P11 and P18): When you add an object or data property, answer the question “What is the most general class in the ontology for which this property holds?” and declare that class as domain/range of the property.  If the answer happens to be multiple classes, then ensure you combine them with ‘or’, not a simple list of those classes (which amounts to the intersection), likewise if the answer is owl:Thing, then try to combine several subclasses instead of using the generic owl:Thing (can the property really relate anything to anything?). For the range of a data property, you should take the answer to the question “What would be the format of data (strings of characters, positive numbers, dates, floats, etc.) used to fill in this information?” (the most general one is literal).

T4: Equivalent relations (includes P12 and P27):

T5: Inverse relations (includes P5, P13, P25, and P26): For object properties that are declared inverses of each other, check that the domain class of one is the same class as the range of the other one, and vv. (for a single object property, consider T6).

T6: Object property characteristics (includes P28 and P29): Go through the object properties and check their characteristics, such as symmetry, functional, and transitivity. See also the SubProS reasoning service [2] to ensure to have ‘safe’ object property characteristics declared that will not have unexpected deductions Concerning reflexivity, be sure to distinguish between the case where a property holds for all objects in your ontology—if so, declare it reflexive—and when it counts only for a particular relation and instances of the participating classes—then use the Self construct.

T7: Intended formalization (includes P14, P15, P16, P19, C1, and C4): As mentioned in T3, a property’s domain or range can consist of more than one class, which is usually a union of the classes, not the intersection of them. For a property’s usage in an axiom, there are typically three cases: (i) if there is at least one such relation (quite common), then use SomeValuesFrom/some/$\exists$; (ii)  ‘closing’ the relation, i.e., it doesn’t relate to anything else than the class(es) specified, then also add a AllValuesFrom/only/$\forall$; (iii) stating there is no such relation in which the class on the left-hand side participates, you have to be precise at what you really want to say: to achieve the latter, put the negation before the quantifier, but when there is a relation that is just not with some particular class, then the negation goes in front of the class on the right-hand side. For instance, a vegetarian pizza does have ingredients but not meat ($\neg\exists hasIngredient.Meat$), which is different from saying that it has as ingredients anything in the ontology—cucumber, beer, soft drink, marsh mellow, chocolate, …—that is not meat ($\exists hasIngredient.\neg Meat$). Don’t create a ‘hack’ by introducing a class with negation in the name, alike a NotMeat, but use negation properly in the axiom. Finally, when you are convinced that all relevant properties for a class have been represented, convert it to a defined class (if not already done so), which gets you more deductions for free.

T8: Modelling aspects (includes P4, P23, and C3):

T9: Domain coverage and requirements (includes P9 and P10):

T10: Documentation and understandability (includes P8, P20, and P22): annotate!

I don’t know yet when the book with the selected papers from KEOD will be published, but I assume within the next few months. (date will be added here once I know).

References

[1] Keet, C.M., Suárez Figueroa, M.C., and Poveda-Villalón, M. (2013) The current landscape of pitfalls in ontologies. International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

[2] C. Maria Keet. Detecting and Revising Flaws in OWL Object Property Expressions. EKAW’12. Springer LNAI vol 7603, pp2 52-266.

## Notes on a successful ER 2013 conference

Unlike two other conferences earlier this year, the 32nd International Conference on Conceptual Modeling (ER’13) in Hong Kong, held 11-13 Nov, was a success: good presentations, inspiring discussions, new ideas, follow-ups, and an enjoyable crowd. As a bonus, the paper Pablo Fillottrani and I wrote on metamodelling [1] was nominated for best paper award. I’ve posted about our paper earlier, so I will highlight some of the other papers.

There were two sessions on ontology-driven conceptual modelling, of which one ran concurrent with the reasoning over conceptual data models. It was a tough choice, but in the end I attended both ontology-based conceptual modelling sessions. Skimming and reading through the three reasoning papers from John Mylopoulos and co-authors, they covered reasoning with decision-theoretic goals, reasoning with business plans, and the third was about automated reasoning for regulatory compliance, like in law and for answering questions such as ‘given situation S, what are alternative ways to comply with law L?’ [2]. Regarding the latter, there are models of the law represented in the Nomos 2 modelling language, which were formalized and sent to the automated reasoner, being the off-the-shelf Datalog-based reasoner DLV. It was demonstrated that it is actually feasible to do this, taking into account scalability. These are encouraging results for automated reasoning with such conceptual models.

The ontology-based modeling papers were varied. There were some fundamental results on a first extension of the UFO foundational ontology for conceptual data modeling of events [3], presented by Giancarlo Guizzardi, that has been used successfully in other projects, and our ontology-driven metamodelling, also using philosophy directly (notably, the positionalism of relations and quality properties) [1]. A ‘merger’ of ontology, information systems, and linked data was presented by Chiara Renso who talked about the Baquara ontology to help conceptual analysis of movement of people talking about some entity at a certain location [4], which won the best paper award. A use case of improving a conceptual data model using UFO was presented by Oscar Pastor [5], using an earlier developed conceptual model of the human genome. Not that I agree with Gene being a “collective”, but, overall, it gives a clear example how a model may be enhanced and indeed lays bare underlying assumptions and understanding that are missed in ‘plain’ conceptual modelling.

Besides ontology-driven conceptual modeling, there were four papers on fundamentals of conceptual modeling. One of the topics was about conceptual modeling and concepts [6], presented by Chris Partridge. To its credit, the paper refines some notions of concepts I wasn’t aware of, but I have grown a bit tired of the concept vs universal debate due to its intense discussions in ontology engineering (see links to debates and reference here). Roman Lukyanenko proposed a new way for conceptual modeling: instead of top-down, go bottom-up and gather the classes and attributes from the crowd using citizen science and free-form annotations without any form of curation [7]. It’s on the other end of the spectrum compared to standard conceptual data modeling, which is a bit too loose to my liking especially because of the lack of curation of proposed terms, but a hybrid certainly may be useful.  Not in this session, but somewhat related, was Tilmann Zäschke’s presentation about optimizing conceptual data models using the actual database [8]. They proposed a method and patterns for updating the conceptual data model based on usage of the database (including path navigation), using DBLP as a case study.

There were two sessions on business process modeling and two sessions on applications, one on network modeling, security, data semantics, and a demo session, several keynotes, workshops, and panels that partially overlapped with other sessions that I don’t have the time for writing up the notes here. I did go to the panel on “open models”, or: why is there open source software, but hardly any open source conceptual models? I plan to get back to this question in a later post.

The food was good, and so were the generous reception and social dinner (eating some sort of a sweet bean soup for desert was a bit curious, though), and it was great to meet again with people I’ve met before and to finally meet several people in person of whom I only had read and cited papers over the years, including Brian Henderson-Sellers, Veda Storey, Sudha Ram, Antoni Olivé, and Peter Chen. Even though ER’14 is in the USA next year (Atlanta), I may give it a try anyway.

References

(note: most of the links point to the version at Springer; search again later or ask the authors for a free copy. In fact, it smells as if this is due to a collaboration between Google Scholar and Springer: when I search for my own paper, whose CRC is online since the blog post about it in August, GS pretends it does not exist either, idem for Zäschke’s paper.)

[1] Keet, C.M., Fillottrani, P.R. Toward an ontology-driven unifying metamodel for UML Class Diagrams, EER, and ORM2. 32nd International Conference on Conceptual Modeling (ER’13). 11-13 November, 2013, Hong Kong. Springer LNCS vol 8217, 313-326.

[2] Siena, A., Ingolfo, A, Perini, A, Susi, A, Mylopoulos, J. Automated reasoning for regulatory compliance. ER’13, Springer LNCS vol 8217, 47-60.

[3] Guizzardi, G., Wagner, G., de Almeida Falbo, R., Guizzardi, R.S.S., Almeida, J.P.A. Towards ontological foundations for the conceptual modeling of events. ER’13, Springer LNCS vol 8217, 327-341.

[4] Fileto, R., Kruger, M., Pelekis, N., Theodoridis, Y., Renso, C. Baquara: a holistic ontological framework for movement analysis using linked data. ER’13, Springer LNCS vol 8217, 342-355.

[5] Martinez Ferrandis, A.M., Pastor Lopez, O., Guizzardi, G.. Applying the principles of an ontology-based approach. ER’13, Springer LNCS vol 8217, 471-478.

[6] Partridge, C., Gonzalez-Perez, C., Henderson-Sellers, B. Are conceptual models concept models? ER’13, Springer LNCS vol 8217, 96-105.

[7] Lukyanenko, R. Parsons, J. Is traditional conceptual modeling becoming obsolete? ER’13, Springer LNCS vol 8217, 61-73.

[8] Zäschke, T., Leone, S., Gmunder, T., Norrie, M.C.. Optimizing conceptual data models through profiling in object databases. ER’13, Springer LNCS vol 8217, 284-297.