Every American is a NamedPizza

Or: verbalizing OWL ontologies still doesn’t really work well.

Ever since we got the multi-lingual verbalization of ORM conceptual data models (restricted FOL theories) working in late 2005 [1]—well: the implementation worked in the DOGMA tool, but the understandability of the output depended on the natural language—I have been following on and off the progress on solutions to the problem. It would be really nice if it all had worked by now, because it is a way for non-logician domain experts to validate the knowledge represented in the ontology and verbalization has been shown to be very useful for domain experts (mainly enterprise) validating (business) knowledge represented in the ORM conceptual data modeling language. (Check out the NORMA tool for the latest fancy implementation, well ahead of OWL verbalization in English Controlled Natural Language).

Some of my students worked on it as an elective ‘mini-project’ topic of the ontology engineering courses I have taught [SWT at FUB, UH, UCI, UKZN]. They have tried to implement it for OWL into Italian and Spanish natural language using a template-based approach with some additional mini-grammar-engine to improve the output, or in English as a competitor to the Manchester syntax. All of them invariable run, to a greater or lesser extent, into the problems discussed in [1], especially when it comes to non-English languages, as English is grammatically challenged. Now, I do not intend to offend people who have English as first language, but English does not have features like gendered articles (just ‘the’ instead of ‘el’ and ‘la’, in Spanish), declensions (still ‘the’ instead of ‘der’ ‘des’, ‘dem’, ‘den’ depending on the proposition, in German), conjunction depending on the nouns (just ‘and’ instead of ‘na’, ‘ne’, ‘no’ that is glued onto the second noun depending on the first letter of that noun, in isiZulu), or subclauses where the verb tense changes by virtue of being in a subclause (in Italian). To sort out such basic matters to generate an understandable pseudo-natural language sentence, a considerable amount of grammar rules and a dictionary have to be added to a template-based approach to make it work.

But let us limit ourselves to English for the moment. Then it is still not trivial. There is a paper comparing the different OWL verbalizers [2], such as Rabbit (ROO) and ACE, which considers issues like how to map, e.g., an AllValuesFrom to “Each…”, “Every…” etc. This is an orthogonal issue to the multi-lingual aspects, and I don’t know how that affects the user’s understanding of the sentences.

I had another look at ACE, as ACE also has a web-interface that accepts OWL/XML files (i.e., OWL 2). I tried it out with the Pizza tutorial ontology, and it generated many intelligible sentences. However, there were also phrases like (i) “Everything that is hasTopping by a Mushroom is something that is a MozzarellaTopping or that is a MushroomTopping or that is a TomatoTopping.”, the (ii) “Every American is a NamedPizza” mentioned in the title of this post, and then there are things like  (iii) “Every DomainConcept that is America or that is England or that is France or that is Germany or that is Italy is a Country”. Example (iii) is not a problem of the verbalizer, but merely an instance of GIGO and the ontology should be corrected.

Examples (i) and (ii) exhibit other problems, though. Regarding (ii), I have noticed that when (novice) ontologists use an ontology development tool, it is a not uncommon practice to not name the entity fully, probably because it is easy for a human reader to fill in the rest from the context; in casu, American is not an adjective to people, but relates to pizza. A more precise name could have avoided such issues (AmericanPizza), or a new solution to ‘context’ can be devised. The weird “is hasTopping by” is due, I think, to the lexicalization of OWL’s ObjectPropertyRange in ACE, which takes the object property, assumes that to be in the infinitive and then puts it in the past participle form (see the Web-ACE page, section 4). So, if the Pizza Ontology developers had chosen not hasTopping but, say, the verb ‘top’, ACE would have changed it into ‘is topped by’. In idea the rule makes sense, but it can be thwarted by the names used in the ontology.

Fliedl and co-authors [3] are trying to resolve just such issues. They propose a rigid naming convention to make it easier to verbalize the ontology. I do not think it is a good proposal, because it is ‘blaming’ the ontologists for failing natural language generation (NLG) systems, and syntactic sugar (verbalization) should not be the guiding principle when adding knowledge to the ontology. Besides, it is not that difficult to add another rule or two to cater for variations, which is probably what will be needed in the near future anyway once ontology reuse and partial imports become more commonplace in ontology engineering.

Power and Third [4] readily admit that verbalizing OWL is “dubious in theory”, but they provide data that it may be “feasible in practice”. The basis of their conclusion lies in the data analysis of about 200 ontologies, which show that the ‘problematic’ cases seldom arise. For instance, OWL’s SubClassOf takes two class expressions, but in praxis it is only used in the format of SubClassOf(C CE) or SubClassOf(C C), idem regarding EquivalentClasses—I think that is probably due to Protégé’s interface—which makes the verbalization easier. They did not actually build a verbalizer, though, but the tables on page 1011 can be of use what to focus on first; e.g., out of the 633,791 axioms, there were only 12 SubDataPropertyOf assertions, whereas SubClassOf(Class,Class) appeared 297,293 times (46.9% of the total) and SubClassOf(Class,ObjectSomeValuesFrom(ObjectProperty,Class)) 158,519 times (25.0%). Why this distribution is the way it is, is another topic.

Going back to the multi-lingual dimension, there is a general problem with OWL ontologies, which is, from a theoretical perspective, addressed more elegantly with OBO ontologies. In OBO, each class has an identifier and the name is just a label. So one could, in principle, amend this by adding labels for each natural language; e.g., have a class “PIZZA:12345” in the ontology with associated labels “tomato @en”, “pomodoro @it”, “utamatisi @zulu” and so forth, and when verbalizing it in one of those languages, the system picks the right label, compared to the present cumbersome and error-prone way of developing and maintaining an OWL file for each language. Admitted, this has its limitations for terms and verbs that do not have a neat 1:1 translation, but a fully lexicalized ontology should be able to solve this (though does not do so yet).

It is very well possible that I have missed some recent paper that addresses the issues but that I have not come across. At some point in time, we’ll probably will (have to) develop an isiZulu verbalization system, so anyone who has/knows of references that point to (partial) solutions is most welcome to add them in the comments section of the post.

References

[1] M. Jarrar, C.M. Keet, and P. Dongilli. Multilingual verbalization of ORM conceptual models and axiomatized ontologies. STARLab Technical Report, Vrije Universiteit Brussels, Belgium. February 2006.

[2] R. Schwitter, K. Kaljurand, A. Cregan, C. Dolbear, G. Hart. A comparison of three controlled natural languages for OWL 1.1. Proc. of OWLED 2008 DC. Washington, DC, USA, 1-2 April 2008.

[3] Fliedl, G., Kop, C., Vöhringer, J. Guideline based evaluation and verbalization of OWL class and property labels. Data & Knowledge Engineering, 2010, 69: 331-342.

[4] Power, R., Third, A. Expressing OWL axioms by English sentences: dubious in theory, feasible in practice. Coling 2010: Poster Volume, pages 1006–1013,

Beijing, August 2010.

Advertisements

6 responses to “Every American is a NamedPizza

  1. Some comments.

    To be absolutely precise, the paper [2] compares three languages: ACE, Rabbit, SOS, and discusses how these languages can be used to provide an alternative syntax for OWL. The paper does not really describe a tool (verbalizer) that converts from a standard OWL syntax into one of these languages. I’m aware of only one such tool, the OWL->ACE converter (http://attempto.ifi.uzh.ch/site/docs/owl_to_ace.html), that you also link to in your post.

    The OWL->ACE tool does not try to guess which naming convention the user followed when naming the entities of the ontology. Instead it assumes a certain naming convention. In short, names should be English words with standard ortography (e.g. CamelCase is only suitable for proper names such as ‘easyJet’), they don’t refer to the meta-level (such as ‘DomainConcept’ ‘ValuePartition’), class names are singular common nouns, names of individuals are proper names, and names of object properties transitive verbs. In case the input ontology contains annotation properties that map entity IRIs to literals then these literals are used to build sentences (and one can use the OBO style of numeric entity names), but in any case the input to the verbalizer is expected to be English words of the categories common noun, proper name, transitive verb, singular, plural, and past participle. The verbalizer does not check if these conventions are followed and does not reject any input on the grounds of incompatible naming.

    I think the naming convention that OWL->ACE expects is sensible,
    although one problem is that it doesn’t allow nouns (‘topping’, ‘spiciness’, ‘father’) as object property names, even though such relational nouns are often used in natural language to talk about relations.

    I don’t understand what you mean by “syntactic sugar (verbalization) should not be the guiding principle when adding knowledge to the ontology”. At least in the context of paper [2], conventions and languages are proposed that offer an alternative and more user-friendly OWL syntax (not a “syntactic sugar”), and a natural language verbalization of an ontology which increases the understandability _should_ be a guiding principle. For example, if the naming of classes inconsistently uses singular and plural forms (I’m assuming that such inconsistency is bad in any naming framework), then a natural language verbalization highlights it much better than the standard graphical class-tree-view.

    Regarding paper [4], I think the collected ontologies reflect mostly the current ontology editors which are able to present only structurally simple axioms in a user-friendly way, and focus on the frame rather than the axiom. If e.g. the rule editor of Protege 4 decides one day to save into OWL instead of SWRL (if possible) then we would see an increase in the number of “SubClassOf(CE, C)” patterns. Also, rather than be happy that 98% of the axioms pose no major difficulty for verbalization, we should focus on verbalizing the remaining structurally rich axioms (e.g. the ones that contain interaction of negation and property restrictions) for which graphical methods and Manchester Syntax are hard to follow, but which the users would still want to express.

    • Dear Kaarel,

      Thank you for taking the time to contribute your extensive comments.
      In the whole, as I mentioned, I think that ACE sets a high standard regarding verbalization that, indeed, has not been surpassed by any other verbalization tool for OWL (NORMA generates nice sentences for ORM though). Notwithstanding that, improvements are possible, which, as shown with the examples, appear to have more to do with handing (or not) different extant naming conventions and what is put in the ontology than the technology for the verbalization itself–and the former two are basically out of your control.

      >”and a natural language verbalization of an ontology which increases the understandability _should_ be a guiding principle.”
      This is where we differ, and perhaps explains why you did not understand my sentence about it. The principal purpose of the ontology is the representation of the knowledge (our understanding of reality), not verbalizing the knowledge. Things like verbalization and graphical representations are ‘sugar’ on top of the logical theory so as to make it palatable for domain experts, and I am convinced those two things–the logics and the sugar coating–should be treated separately (ideally managed within one system, but as distinct features). Ontology can be informed by natural language and diagrams, but should not be constrained by it.

      As for your last paragraph: yes, and I agree.

      Kind regards,
      Maria

  2. One comment that refers to the contents of your last paragraph exemplified by the very last sentence

    > Ontology can be informed by natural language and diagrams, but
    > should not be constrained by it.

    with which I strongly disagree. Ontologies are created for domain experts who communicate their domain knowledge in natural language. Thus the knowledge that is OWL’ified originates in natural language and should adhere to the jargon of the domain and to the constructs of the respective natural language. As the pizza example shows the authors of OWL ontologies often do not seem to be bound to these constraints which makes subsequent paraphrasing awkward or even ridiculous as your examples more than clearly show. We advise the users of ACE to start writing ontologies first in ACE and then use Kaarel’s ACE -> OWL translator to OWL’ify them. This eliminates any problems with the later OWL -> ACE translation. Of course, problems occur when users import ontologies that were authored in OWL without any respect for the English syntax.

    Regards.

    — nef

  3. Dear Norbert,
    I am aware there are strong opinions about it, both coming from the ontologist community as well as domain experts; the argument is different one for each. Regarding the former one, I take a shortcut on the argument with the example about mereological and meronymic part-whole relations that I assume you are aware of (which was my main motivation of writing that sentence).
    Regarding the latter, it perhaps comes to you as a surprise that in disciplines like cell biology, there is a tremendous amount of knowledge represented in the figures (that, as a biologist, one learns to read and see more in it than an untrained person), principally captured there and merely approximated in the text (not infrequently only partially); but non-life scientists tend to differ on this point, still claiming it is natural language–I’ve never seen a conclusive point other than that people agree to disagree about it (though I think you will agree with me that there are people who are more visually oriented that linguistically, and vv.).

    Sure that a naming scheme can help avoiding verbalization issues and simplify implementations of the verbalization as well as formalization. But the main problem I see with that is that for it to work properly, the whole ontology development world has to use that single naming scheme and one single verbalization system adjusted to that particular naming scheme. That is probably too rigid, hence unrealistic to expect, in the open area of ontologies.
    I don’t have a solution to it though, other than implementing the rather sub-optimal option of doing it the hard way by incorporating different verbalization patterns/schemes/rules so that it can figure out which one to use when for which axiom.

    regards,
    Maria

  4. The weird translation of “is hasTopping by” also raises the question whether it is useful to have a relation “hasTopping” at all. “hasProperPart” would do the job equally well (as long as we don’t want to combine cardinality restrictions with transitive properties, but for this one could introduce a non-transitive, but non-pizza-specific relation such as hasComponent).

    • Indeed it does, Stefan. But “hasProperPart” won’t fix it either: we would obtain a verbalization “is hasProperPart by”, which does not sound nice either, and, on top of that, introduce the problem of transitivity with cardinality restrictions for the formal representation of the ontology. ACE, and any other verbalization system I am aware of, wants a verb as label for an object property, like “top” or “composes”, not a verb appended with something else (be this “-Topping”, “-ProperPart”, or “-Component”),

      One nice feature of the verbalizations (though normally not the intention), is that it helps highlighting ‘peculiar’ axioms in an ontology that can be improved upon from a viewpoint of ontology. So, I’m still looking for a solution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s