Dabbling into evaluating reasoners with the DMOP ontology

The Data Mining OPtimization ontology (DMOP) is a highly axiomatised ontology that uses almost all features of OWL 2 DL and the domain entities are linked to DOLCE, using all four main ‘branches’ of DOLCE. Some details are described in last year’s OWLED’13 paper [1] and a blog post. We did observe ‘slow’ reasoner performance to classify the ontology, however, like, between 10 and 20 minutes, varying across versions and machines. The Ontology Reasoner Evaluation (ORE’14) workshop (part of the Vienna Summer of Logic) was a nice motivation to have a look at trying to figure out what’s going on, and some initial results are described briefly in the 6 pages-short paper [2], which is co-authored with Claudia d’Amato, Agnieszka Lawrynowicz, and Zubeida Khan.

Those results are definitely what can be called interesting, even though we’re still at the level of dabbling into it from a reasoner user-centric viewpoint, and notably, from a modeller-centric viewpoint. The latter is what made us pose questions like “what effect does using feature x have on performance of the reasoner?”. No one knew, except for the informal feedback back I received at DL 2010 on [3] that reasoning with data types slows down things, and likewise when the cardinalities are high. That’s not an issue with DMOP, though.

So, the first thing we did was determining a baseline on a good laptop—your average modeller doesn’t have a HPC cluster readily at hand—and in an Ontology Development Environment, where the reasoner is typically accessed from. Some 9 minutes to classify the ontology (machine specs and further details in the paper).

The second steps were the analysis of one specific modeling construct (inverses), and what effect DOLCE has on the overall performance.

The reason why we chose representation of inverses is because in OWL 2 DL (cf. OLW DL), one can use the objectInverseOf(OP) to use the inverse of an object property instead of extending the ontology’s vocabulary and using InverseObjectProperties(OPE1 OPE2) to relate the property and its inverse. For instance, to use the inverse the property addresses in an axiom, one used to have to introduce a new property, addressed by, declare it inverse to addresses, and then use that in the axiom, whereas in OWL 2 DL, one can use ObjectInverseOf(addresses) in the axiom (in Protégé, the syntax is inverse(addresses)). That slashed computing the class hierarchy by at least over a third (and about half for the baseline). Why? We don’t know. Other features used in DMOP, such as punning and property chains, were harder to remove and are heavily used, so we didn’t test those.

The other one, removing DOLCE, is a bit tricky. But to give away the end results upfront: that made it 10 times faster! The ‘tricky’ part has to do with the notion of ‘linking to a foundational ontology’ (deserving of its own blog post). For DMOP, we had not imported but merged, and we did not merge everything from DOLCE and its ExtendedDnS, but only what was deemed relevant, being, in numbers, 43 classes, 78 object properties and 593 axioms. To make matters worse—from an evaluation viewpoint, that is—is that we reused heavily three DOLCE object properties, so we kept those three DOLCE properties in the evaluation file, as we suspected it otherwise would have affected the deductions too much and interfere with the DOLCE-or-not question (one also could argue that those three properties can be considered an integral part of DMOP). So, it was not a simple case of ‘remove the import statement and run the reasoner again’, but a ‘remove almost everything with a DOLCE URI manually and then run the reasoner again’.

Because computation was so ‘slow’, we wondered whether maybe cleverly modularizing DMOP could be the way to go, in case someone wants to use only a part of DMOP. We got as far as trying to modularize the ontology, which already was not trivial because DMOP and DOCLE are both highly axiomatised and with few, if any, relatively isolated sections amenable to modularization. Moreover, what it did show, is that such automated modularization (when it was possible) only affects the number of class and number of axioms, not the properties and individuals. So, the generated modules are stuck with properties and individuals that are not used in, or not relevant for, that module. We did not fix that manually. Also, putting back together the modules did not return it to the original version we started out with, missing 225 axioms out of the 4584.

If this wasn’t enough already, the DMOP with/without DOLCE test was performed with several reasoners, out of curiosity, and they gave different output. FaCT++ and MORe had a “Reasoner Died” message. My ontology engineering students know that, according to DOLCE, death is an achievement, but I guess that its reasoners’ developers would deem otherwise. Pellet and TrOWL inferred inconsistent classes; HermiT did not. Pellet’s hiccup had to do with datatypes and should not have occurred (see paper for details). TrOWL fished out a modeling issue from all of those 4584 axioms (see p5 of the paper), of the flavour as described in [4] (thank you), but with the standard semantics of OWL—i.e., not caring at all about the real semantics of object property hierarchies—it should not have derived an inconsistent class.

Overall, it feels like having opened up a can of worms, which is exciting.

References

[1] Keet, C.M., Lawrynowicz, A., d’Amato, C., Hilario, M. Modeling issues and choices in the Data Mining OPtimisation Ontology. 8th Workshop on OWL: Experiences and Directions (OWLED’13), 26-27 May 2013, Montpellier, France. CEUR-WS vol 1080.

[2] Keet, C.M., d’Amato, C., Khan, Z.C., Lawrynowicz, A. Exploring Reasoning with the DMOP Ontology. 3rd Workshop on Ontology Reasoner Evaluation (ORE’14). July 13, 2014, Vienna, Austria. CEUR-WS vol (accepted).

[3] Keet, C.M. On the feasibility of Description Logic knowledge bases with rough concepts and vague instances. 23rd International Workshop on Description Logics (DL’10), 4-7 May 2010, Waterloo, Canada.

[4] Keet, C. M. (2012). Detecting and revising flaws in OWL object property expressions. In Proc. of EKAW’12, volume 7603 of LNAI, pages 252–266. Springer.

8 years of keetblog

The 8-year anniversary swooshed by a few days ago, but, actually it’s really only completing today, as the first blog post with real content was published on April 18, 2006, about solving sudokus with constraint programming.

The top-post among the 186 posts (>9000 visits to that page alone) is still the introduction for two lectures on top-down and bottom-up ontology development that I wrote in November 2009 as part of the Semantic Web technologies MSc course at the Free University of Bolzano; anyone wishing to read an updated version: have a look at the 2014 lecture notes (its ‘Block II’). The post most commented on is about academia.edu, and then on my wish for a semantic search of insects.

The more ‘trivia’/fun ones—still having to do with science—are, I think, about the complexity of coffee and culinary evolution, but I may be biased (my first degree up to MSc was in food science). For some reason, there were more visitors reading about failing to recognize your own incompetence and some sneakiness of academia.edu than about food (and many other topics). Ah, well. A full list sorted by year is available on the list of blog posts page.

The frequency of posting is somewhat less than a few years ago and, consequently, the visits went down from about 1500/month during its heydays [well, years] to about 1000/month now, but that’s still not bad at all for a ‘dull’ blog, and I would like to thank you again, and even more so the fans (subscribers) and those of you who have taken the effort to like a post or to leave comments both online and offline! I hope it’s been an interesting read, or else enjoyable procrastination.

Ontology Engineering lecture notes for 2014 online

The lecture notes for the Ontology Engineering BSc honours in CS course are available online now. The file is updated compared to the COMP720 module (and those notes have been removed). The main changes consist of reordering the chapters in Block II and Block III, adding better or more explanations and examples in several sections, fixing typos, and updates to reflect advances made in the field. It again includes the DL primer written by Markus Kroetzsch, Ian Horrocks and Frantisek Simancik (saving me the time writing about that; thanks!).

As with the last three installments, the target audience is computer science students in their 4th year (honours), so the notes are of an introductory nature. It has three blocks after the introduction: logic foundations, ontology engineering, and advanced topics (the latter we will skip, as this is a shorter course). The logic foundations contain a recap of FOL and the notion of reasoning, the DL primer and the basics of automated reasoning with the Description Logics with ALC, the DL-based OWL species, and some practical automated reasoning. The ontology engineering block starts with methods and methodologies that give guidance how to commence actually developing an ontology, and how to avoid and fix issues. Subsequently, there are two chapters going into some detail of two ‘paths’ in the methodology, being top-down ontology development using foundational ontologies, and bottom-up ontology development to extract knowledge from other material, such as relational databases, thesauri, and natural language documents.

The advanced topics are optional this year, but I left them in the lecture notes, as they may pique your interest. Chapter 8 on Ontology-Based Data Access is a particular application scenario of ontologies that ‘spice up’ database applications. Chapter 9 touches upon a few sub-areas within ontologies: representing and reasoning with vagueness and uncertainty, extending the language to include also temporal knowledge, the use of ontologies to enhance conceptual data models, and a note on social aspects.

It is still an evolving document, and relative completeness of sections varies slightly, so it has to be seen in conjunction with the slides, lectures, and some additional documentation that will be made available on the course’s Vula site.

Suggestions and corrections are welcome! If you want to use a part of it in your own lectures and/or use the accompanying slides with it, please contact me.

More book suggestions (2013)

Given that I’ve written post the past two years about books I’ve read during the previous year and that I think are worthwhile to read (here and here), I’m adding a new list for 2013, divided into fiction and non-fiction, and again a selection only. They are not always the newest releases but worthwhile the read anyway.

Fiction

The book of the dead by Kgebetli Moele (2009), which has won the South African Literary Award. The cover does not say anything about the story, and maybe I should not either. Moele’s book is a gripping read, and with a twist in the second part of the book (so: spoiler alert!). The first part is about Khutso, a boy growing up in a town in South Africa; it is “the book of the living”. Then he gets infected with HIV, and “the book of the dead” starts. Writing shifts from third-person to first person, and from the vantage point of the virus that wants to replicate and spread to sustain its existence, as if it has a mind of its own (read an excerpt from the second part). All does not end well.

Zen and the art of motorcycle maintenance by Robert Pirsig is a ‘modern classic’ that this year celebrates its 40th anniversary. It is semi-autobiographical and the story exposes some philosophical ideas and the tensions between the sciences and the arts, partially explained through drawing parallels with motorcycles and motorcycle maintenance. A minor storyline is about a road trip of father and son, and there is an unspoken undercurrent about inhumane psychiatric treatments (electroshocks in particular) of people deemed mentally ill. It is an interesting read for the complexity of the narrative and the multiple layers of the overall story, i.e. literary it is impressive, but I guess it is called ‘a classic’ more for the right timing of the release of the book and the zeitgeist of that era and therefore may resonate less with younger people these days. There are many websites discussing the contents, and it has its own wikipedia entry.

The girl with the dragon tattoo by Stieg Larssen (2008). I know, the movie is there for those who do not want to read the tome. I have not seen it, but the book is great; I recently got the second installment and can’t wait to start reading it. It is beautiful in the way it portrays Swedish society and the interactions between people. The tired male journalist, the troubled female hacker, and a whole cast of characters for the ‘whodunnit’.

Other books I read and would recommend: The songs of distant earth by Arthur C Clarke and De dolende prins [the lost prince] by Bridget Wood.

Non-fiction

Outliers by Malcolm Gladwell (2008). I bought this book because I liked the tipping point (mentioned last year). It is just as easily readable, and this time Gladwell takes a closer look at the data behind “outliers”, those very successful people, and comes to the conclusion there are rather mundane reasons for it. From top sports people who typically happen to have their date of birth close to the yearly cut-off point, which makes a big difference among small children, giving them a physical advantage, and then it’s just more time spent training in the advanced training programmes. To being at the right time in the right place, and a lot (‘10000 hours’) of practice and that “no one, not even a genius, ever makes it alone” (regardless of what the self-made-man stories from the USA are trying to convince you of).

The Abu Ghraib Effect by Stephen F. Eisenman (2007). I had a half-baked draft blog post about this book, trying to have it coincide with the 10 year ‘anniversary’ of the invasion by the USA into Iraq, but ran short of time to complete it. This is a condensed version of that draft. Eisenman critically examines the horrific photos taken of the torture at the Abu Ghraib prison in 2003 by US Military officers, by analyzing their composition, content, and message and comparing it to a selection of (what is deemed) art over the past 2500 years originating in, mainly, Europe. He finds that the ‘central theme’ depicted in those photos can be traced back from all the way to Hellenic times to this day, with just a brief shimmer of hope in the late nineteenth and early twentieth century and a few individuals deviating from the central theme. The ‘central theme’ is the Pathosformel being an entente between torturer and victim, of passionate suffering, and representing “the body as something willingly alienated by the victim (even to the point of death) for the sake of the pleasure and aggrandizement of the oppressor” (p16). Or, in plain terms: the artworks depicting subjects (gleefully and at time with sexual undertone) undergoing and accepting their suffering and even the need for torture, the necessity of suffering for the betterment of the ruling classes, for the victors of war, imperial culture, for (the) god(s), including fascist and racist depictions, and the perpetrators somehow being in their ‘right’. In contrast to the very Christian accept-your-suffering, there are artworks that deviate from this by showing the unhappy suffering, conveying that it is not the natural order of things, and are, as such, political statements against torture. Examples Eisenman uses to illustrate the difference between the two are Picasso’s Guernica, Goya’s Third of May 1808, the custody of a criminal does not call for torture, and the captivity is as barbarous as the crime (links to the pictures). Compare this to Laokoon and his sons (depicting him being tortured to death by being ripped apart by snakes, according to the story), Michelangelo’s The dying slave (if it weren’t for the title, one would think he’s about to start his own foreplay), Sodoma’s St. Sebastien (who seems to be delightedly looking upwards to heaven whilst having spears rammed in his body), and the many more artworks analysed in the book on the pathos formula. While Eisenman repeats that the Abu Ghraib photos are surely not art, his thesis is that the widespread internalization of the pathos formula made it acceptable to the victimizers in Iraq to perpetrate the acts of torture and take the pictures (upon instigation and sanctioning by higher command in the US Military), and that there was not really an outcry over it. Sure, the pictures have gone around the globe, people expressed their disgust, but, so far as Eisenman could document (the book was written in 2007), the only ones convicted for the crimes are a few military officers to a few year in prison. The rest goes on with apparent impunity, with people in ‘the West’ going about their business, and probably most of you reading this perhaps had even forgotten about it, as if they were mere stills of a Hollywood movie. Eisenman draws parallels with the TV series 24 and the James Bond Movie Goldfinger, the latter based on his reading of Umberto Eco’s analysis of Fleming, where love is transformed in hatred and tenderness in ferocity (Eisenman quoting Eco, p94). From a theoretical standpoint, the “afterword” is equally, if not more, important, to read. Overall, the thin book is full of information to ponder about.

Others books include Nice girls don’t get the corner office by Lois Frankel, but if you’d have to choose, then I’d rather recommend the Delusions of gender I mentioned last year, and the non-fiction books in the 2012 list would be a better choice, in my opinion, than Critical mass by Philip Ball as well (the mundane physics information at the start was too long and therefore I made it only partially through the book and put it back on the shelf before I would have gotten to the actual thesis of the book.)

And yes, like last year, I’ve read some ‘pulp’, and re-read the hunger games trilogy (in one weekend!), but I’ll leave that for what it is (or maybe another time). If you have any suggestions for ‘must read’, feel free to leave a note. There are some access limitations here, though, because it is not always the most recent books that are in the bookshops. I live near a library now, and will visit it soon, hoping I can finally follow up on a reader’s previous suggestion to read the books by Nadine Gordimer.

Preliminary list of isiZulu terms for computing and computer literacy

As part of the COMMUTERM project, we played around with isiZulu terminology development using “the” “traditional” way of terminology development (frankly, having read up on it, I don’t think there is an established methodology), which were interesting of themselves already.

We have gathered relevant computing and computing literacy terms from extant resources, conducted a workshop with relative experts (typical way of doing it), executed two online surveys through an isiZulu-localised version of Limesurvey, and completed a voting experiment among computer literacy students. The results and analysis has been written up for a paper, but this will take some time to see the light of day (if it is accepted, that is). In the meantime, we do not want to ‘sit’ on the list that we have compiled: so far, there are 233 isiZulu terms from 8 resources for 146 entities. At the time of writing, this is the largest list of entities with isiZulu terms for the domain of computing and computer literacy.

The list is available in table format, sorted alphabetically by English term and sorted alphabetically by isiZulu term. Except from a few (very) glaring mistakes/typos, the list has not been curated in any way, so you have to use your own judgment. In fact, I don’t care which terms you’d prefer—I’m facilitating, not dictating.

Besides that you can leave a comment to this post or send me an email if you have updates you’d like to share, there are other ways to share your knowledge of isiZulu computing and computer literacy terminology with the COMMUTERM project and/or the world, being, among others:

  • Contributing to the Limesurvey localization for isiZulu, so that not only the text in two existing surveys will be entirely in isiZulu, but also any survey and the back-end admin. Members of the African Languages department at UKZN are especially interested in this so that they will be able to use it for their research.
  • The computer literacy surveys are still open (100% isiZulu interface), so you can still choose to do either this one or that one (but not both).
  • Participate in the crowdsourcing game ([link TBA]), which will be launched in February, given that it is still summer holidays for the students at present.

Book chapter on conceptual data modeling for biology published

Just a quick note that my book chapter on “Ontology-driven formal conceptual data modeling for biological data analysis” finally has been published in the Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data (edited by Mourad Elloumi and Albert Y. Zomaya). A summary of the chapter’s contents is described in an earlier blog post from little over two years ago, and I’ve put the preprint online.

The whole book is an impressive 1192 pages consisting of 48 chapters of about 25 pages each, which are grouped into three main sections. The first section, Biological data pre-processing, has four parts: biological data management, biological data modeling (which includes my chapter), biological feature extraction, and biological feature selection. The second section, biological data mining, has six parts: Regression Analysis of Biological Data, Biological Data Clustering, Biological Data Classification, Association Rules Learning from Biological Data, Text Mining and Application to Biological Data, and High-Performance Computing for Biological Data Mining. The third section, biological data post-processing, has only one part: biological knowledge integration and visualization. (check the detailed table of contents). Happy reading!

Some ontology authoring guidelines to prevent pitfalls: TIPS

We showed pervasiveness of pitfalls in ontologies ealier [1], and it is overdue to look at how to prevent them in a structured manner. From an academic viewpoint, preventing them is better, because it means you have a better grasp of ontology development. Following our KEOD’13 paper [1], we received a book chapter invitation from its organisers, and the Typical pItfall Prevention Scheme (TIPS) is described there. Here I include a ‘sneak preview’ selection of the 10 sets of guidelines (i.e., it is somewhat reworded and shortened for this blog post).

The TIPS are relevant in general, including also to the latest OWL 2. They are structured in an order of importance in the sense of how one typically goes about developing an ontology at the level of ontology authoring, and they embed an emphasis with respect to occurrence of the pitfall so that common pitfalls can be prevented first. The numbers in brackets refer to the type of pitfall, and is the same numbering as in the OOPS! pitfall catalogue and in [1].

T1: Class naming and identification (includes P1, P2, P7, C2, and C5): Synonymy and polysemy should be avoided in naming a class: 1) distinguish the concept/universal itself from the names it can have (the synonyms) and create just one class for it and add other names using rdfs:label annotations; 2) in case of polysemy (the same name has different meanings), try to disambiguate the term and refine the names. Concerning identifying classes, do not lump several together into one with an ‘and’ or ‘or’ (like a class TaskOrGoal or ShrubsAndBushes), but try to divide them into subclasses. Squeezing in modality (like ‘can’, ‘may’, ‘should’) in the name is readable for you, but has no effect on reasoning—if you want that, choose another language—and sometimes can be taken care of in a different way (like a canCook: the stove has the function or affordability to cook). Last, you should have a good URI indicating where the ontology will be published and a relevant name for the file.

T2: Class hierarchy (includes P3, P6, P17, and P21): A taxonomy is based on is-a relationships, meaning that classA is-a classB, if and only if every instance of A is also instance of B, and is-a is transitive. The is-a is present in the language already (subclassOf in OWL), so do not introduce it as an object property. Also, do not confuse is-a with instance-of: the latter is used for representing membership of an individual in a class (which also has a primitive in OWL). Consider the leaf classes of the hierarchy: are they are still classes (entities that can have instances) or individuals (entities that cannot be instantiated anymore)? If the latter, then convert them into instances. What you typically want to avoid are cycles in the hierarchy, as then some class down in the hierarchy—and all of them in between—ends up as equivalent to one of its superclasses. Also try to avoid adding some class named Unknown, Other or Miscellaneous in a class hierarchy just because the set of sibling classes defined is incomplete.

T3: Domain and range of a class (includes P11 and P18): When you add an object or data property, answer the question “What is the most general class in the ontology for which this property holds?” and declare that class as domain/range of the property.  If the answer happens to be multiple classes, then ensure you combine them with ‘or’, not a simple list of those classes (which amounts to the intersection), likewise if the answer is owl:Thing, then try to combine several subclasses instead of using the generic owl:Thing (can the property really relate anything to anything?). For the range of a data property, you should take the answer to the question “What would be the format of data (strings of characters, positive numbers, dates, floats, etc.) used to fill in this information?” (the most general one is literal).

T4: Equivalent relations (includes P12 and P27):

T5: Inverse relations (includes P5, P13, P25, and P26): For object properties that are declared inverses of each other, check that the domain class of one is the same class as the range of the other one, and vv. (for a single object property, consider T6).

T6: Object property characteristics (includes P28 and P29): Go through the object properties and check their characteristics, such as symmetry, functional, and transitivity. See also the SubProS reasoning service [2] to ensure to have ‘safe’ object property characteristics declared that will not have unexpected deductions Concerning reflexivity, be sure to distinguish between the case where a property holds for all objects in your ontology—if so, declare it reflexive—and when it counts only for a particular relation and instances of the participating classes—then use the Self construct.

T7: Intended formalization (includes P14, P15, P16, P19, C1, and C4): As mentioned in T3, a property’s domain or range can consist of more than one class, which is usually a union of the classes, not the intersection of them. For a property’s usage in an axiom, there are typically three cases: (i) if there is at least one such relation (quite common), then use SomeValuesFrom/some/\exists ; (ii)  ‘closing’ the relation, i.e., it doesn’t relate to anything else than the class(es) specified, then also add a AllValuesFrom/only/\forall ; (iii) stating there is no such relation in which the class on the left-hand side participates, you have to be precise at what you really want to say: to achieve the latter, put the negation before the quantifier, but when there is a relation that is just not with some particular class, then the negation goes in front of the class on the right-hand side. For instance, a vegetarian pizza does have ingredients but not meat (\neg\exists hasIngredient.Meat ), which is different from saying that it has as ingredients anything in the ontology—cucumber, beer, soft drink, marsh mellow, chocolate, …—that is not meat (\exists hasIngredient.\neg Meat ). Don’t create a ‘hack’ by introducing a class with negation in the name, alike a NotMeat, but use negation properly in the axiom. Finally, when you are convinced that all relevant properties for a class have been represented, convert it to a defined class (if not already done so), which gets you more deductions for free.

T8: Modelling aspects (includes P4, P23, and C3):

T9: Domain coverage and requirements (includes P9 and P10):

T10: Documentation and understandability (includes P8, P20, and P22): annotate!

I don’t know yet when the book with the selected papers from KEOD will be published, but I assume within the next few months. (date will be added here once I know).

References

[1] Keet, C.M., Suárez Figueroa, M.C., and Poveda-Villalón, M. (2013) The current landscape of pitfalls in ontologies. International Conference on Knowledge Engineering and Ontology Development (KEOD’13). 19-22 September, Vilamoura, Portugal.

[2] C. Maria Keet. Detecting and Revising Flaws in OWL Object Property Expressions. EKAW’12. Springer LNAI vol 7603, pp2 52-266.

Follow

Get every new post delivered to your Inbox.

Join 39 other followers

%d bloggers like this: