Roles: how are they used in modelling?

How are roles, also called association ends, used in conceptual data models? It was a question that I pondered about some five years ago, with the aim to improve their use for more precise modelling and add more ontological principles to it. VerbNet seemed to fit right in there to contribute to, if not be the, solution: a knowledge base about verbs and verb classes, the kind of things that participate in the action represented by the verb and the roles they play in it. But I got stuck. It didn’t add up as I thought it would, and then sabbatical time was up, and other work took over. I dusted it off over half a year ago to give it a try to ‘un-stuck’ it, since the topic resurfaced as part of the ‘abstract representation’ for the Abstract Wikipedia project. After additional analysis, I got to a better understanding of the problem [Keet23], but no concrete usable solution so far. Yet, the new insights that the analysis resulted in was already enough for the work to be accepted at the 13th International Conference on Formal Ontology in Information Systems (FOIS2023) that will be held in July in Sherbrooke, Canada and in September online.

Before I can even start to clarify it informally in this post, we first need to disambiguate which roles I’m referring to. The term ‘role’ is used for many things, including the social roles people play (e.g., student), roles (/positions/argument places) as part of a relation in an ontology of relations, the roles as components of fact types in Object-Role Modeling, roles that are mostly binary relationships in Description Logics, or roles in linguistics, like the agent and undergoer roles in verbs and subject and object roles in a  sentence. The element under investigation in the paper is the role as part of a relation(ship), committing to positionalism as a (to be refined) theory for the ontology of relations [Fine00,Leo08,Orilia11]. In conceptual data modelling, they are called roles, association ends, or relationship components; in linguistics, from what it seemed initially at least, semantic or thematic roles. Both fields deem roles particularly useful for a wide range of reasons—regardless whether philosophers like that extra piece of fundamental furniture of the universe—such as using them in constraint declarations, adding annotations, or improving parsing of text.

Here’s a plausible-looking small ORM diagram about animals living in geographic regions on the left-hand side of the image:

(source: [Keet23])

It’s neat and tidy and assists with disentangling key elements like the reading labels, roles with role names, and a name for the relation that shows up in the behind-the-scenes formalisation. The role name [inhabitant] sounds like a social role and [location] sounds like a thematic role; conversely, [location] is definitely not a social role and [inhabitant] is surely not a thematic role. Still, the names are perfectly fine for an ORM diagram. Ontologically, and with ontology-driven conceptual data modelling in mind, it’s a bit murky.

Naming the roles may not happen very often in either ORM, UML class diagrams, or EER, but maybe something can be learned from it nonetheless and be used for Abstract Wikipedia’s abstract representation, and, likewise, we may be able to pick up something from VerbNet’s thematic roles [Palmer17] and conjure up a fine cocktail for solution and be ontology-informed or else contribute to the further development of an ontology of relations. This led to two key questions to start tackling the issue:

  1. How are roles used in conceptual data models? Are they named and if so, how, and do they map usefully into semantic or thematic roles as specified in linguistic resources or ontologies? Can this modus operandi be copied over to Abstract Wikipedia’s abstract representation?
  2. To what extent do those verb classes with their roles and fillers in the authoritative linguistic resource VN adhere to ontological principles? Can that be improved upon further, using basic modelling guidance from ontology development and without the need for major theoretical `overhead’ for an end user writing Abstract Wikipedia’s constructors?

Instead of a purely theoretical analysis, I updated that 5-year-old VerbNet hands-in-the-mud data-based analysis with the updates that VerbNet had made—neither their set of roles nor their verb class characterisations remained static over the years—and I revisited the manual analysis of roles in a corpus of 101 conceptual data models from 2018 that extended an earlier data analysis reported in [KeetF15]. Those conceptual data models have only about half of the roles named in UML class diagrams, where it is actually mandatory to do so, and scoring much better than EER and ORM diagrams where less than 10% of the roles in the models in the corpus are named. When roles are named, they mostly are of the type of so-called ‘deep’, or subject domain-specific, roles, which also may be called ontological roles (even if a particular role name may not be the most suitable ontologically). They are roles with names in the examined UML, EER, and ORM models such as [participant], [member], [work (for)], [manage], [parent], [client], [physician], and [upperValue]. Ideally, there would be some modelling guidance and quality control for them eventually.

For VerbNet, and with an eye on the possibility that relationships might be defined by their roles and participants, I coded up their XML-based specification of 5 selected verb classes and their subclasses in a test ontology and ran the reasoner over it. A few equivalences were deduced, such as Deprive-10.6.2 and Cheat-10.6.1; or: at least the information in VerbNet is not enough to distinguish verb classes that way. But the deductions may help evaluate those modelling decisions of VerbNet to seek areas for refinement of the representation. Also, 36 subsumptions were deduced, even across major verb classes, such as Fire-10.10 being deduced to be a subclass of Hire-13.5.3, which also pinpoint to possible areas for improvement.

Digging deeper, it was clear that there are a few infelicities in both VerbNet’s thematic role hierarchy and in the specification of the role players; the paper motivates improvements on both. For the hierarchy, I remodelled the multiple inheritances to a single inheritance hierarchy, but I kept the ontologically awkward terms to keep backward compatibility (there’s room for improvement there). Redesigned, it looks like this:

(source: [Keet23]) (and please consult their respective descriptions before jumping to conclusions–these are VerbNet’s terms still.)

For the role players, I separated ontological categories from the grammatical features of the words we use to describe them and used DOLCE categories to indicate the category of the entity that can participate in the relations referred to by that verb class (also this can be refined further).

Consider, for instance, the verb class of knead. VerbNet has it that the agent role in knead can be played by ‘Animate or Machine’, with as example that a human can be kneading the dough or the bread machine can do that (and cats knead, too). Animacy is a semantic feature (in linguistics) and therewith a grammatical feature, whereas machine is a physical object in the real world. But it ought to be a union among kind and at the same level of analysis, not mixing ontology and linguistics. So, then either we’d have, e.g., ‘Physical object’ in the sense of a foundational ontology such as DOLCE, comprising both the animals who knead and the kneading machines, or ‘Animate or Inanimate’ as linguistic constraints on the role players of agent in knead. They each deserve their own framework to deal with it—the relations with their ontological roles and participants on the one hand, and the verbs with their linguistic roles and features of words on the other. For conceptual modelling and an ontology of relations, one would be more interested in the former; for natural language generation, the latter will also be useful.

To squeeze the related work and all the analysis into a mere 15 pages was not easy and some details have been left out for readability; there’s more in the supplementary material as well. Does that contain any concrete new frameworks rolling out of all this? Not yet, but, with the conceptual muddles cleared up, this should be doable to specify as a next step. Or: TBC…

I’ll present the paper as part of the FOIS2023 online sessions in September, but I will still attend FOIS2023 thanks to joining ISAO 2023 as a facilitator (and to present another paper at FOIS2023), so if you have any questions or comments, please feel free to email or, even better: let’s meet up while I’m there next month!

References

[Fine00] Fine K. Neutral Relations. The Philosophical Review. 2000;109(1):1-33.

[Keet23] Keet, C.M. An analysis of positionalism’s roles in use. 13th International Conference on Formal Ontology in Information Systems 2023 (FOIS’23). IOS Press, FAIA vol. xxx, xx-xx. 18-20 July Sherbrooke, Canada / Sept online. (in print)

[KeetF15] Keet, C.M., Fillottrani, P.R. An analysis and characterisation of publicly available conceptual models. 34th International Conference on Conceptual Modeling (ER’15). Johannesson, P., Lee, M.L. Liddle, S.W., Opdahl, A.L., Pastor Lopez, O. (Eds.). Springer LNCS vol 9381, 585-593. 19-22 Oct, Stockholm, Sweden.

[Leo08] Leo J. Modeling relations. Journal of Philosophical Logic. 2008;37:353-85.

[Orilia11] Orilia F. Relational Order and Onto-Thematic Roles. Metaphysica. 2011;12:1-18.

[Palmer17] Palmer M, Bonial C, Hwang JD. VerbNet: Capturing English verb behavior, meaning and usage. In: Chipman SEF, ed. The Oxford Handbook of Cognitive Science. OUP. 2017. pp315-336.

Relations with roles / verbalising object properties in isiZulu

The narratives can be very different for the paper “A model for verbalising relations with roles in multiple languages” that was recently accepted paper at the 20th International Conference on Knowledge Engineering and Knowledge management (EKAW’16), for the paper makes a nice smoothie of the three ingredients of language, logic, and ontology. The natural language part zooms in on isiZulu as use case (possibly losing some ontologist or logician readers), then there are the logics about mapping the Description Logic DLR’s role components with OWL (lose possible interest of the natural language researchers), and a bit of philosophy (and lose most people…). It solves some thorny issues when trying to verbalise complicated verbs that we need for knowledge-to-text natural language generation in isiZulu and some other languages (e.g., German). And it solves the matching of logic-based representations popularised in mainly UML and ORM (that typically uses a logic in the DLR family of Description Logic languages) with the more commonly used OWL. The latter is even implemented as a Protégé plugin.

Let me start with some use-cases that cause problems that need to be solved. It is well-known that natural language renderings of ontologies facilitate communication with domain experts who are expected to model and validate the represented knowledge. This is doable for English, with ACE in the lead, but it isn’t for grammatically richer languages. There, there are complications, such as conjugation of verbs, an article that may be dependent on the preposition, or a preposition may modify the noun. For instance, works for, made by, located in, and is part of are quite common names for object properties in ontologies. They all do have a dependent preposition, however, there are different verb tenses, and the latter has a copulative and noun rather than just a verb. All that goes into the object properties name in an ‘English-based ontology’ and does not really have to be processed further in ontology verbalisation other than beautification. Not so in multiple other languages. For instance, the ‘in’ of located in ends up as affixes to the noun representing the object that the other object is located in. Like, imvilophu ‘envelope’ and emvilophini ‘in the envelope’ (locative underlined). Even something straightforward like a property eats can end up having to be conjugated differently depending on who’s eating: when a human eats, it is udla in isiZulu, but for, say, a dog, it is idla (modification underlined), which is driven by the system of noun classes, of which there are 17 in isiZulu. Many more examples illustrating different issues are described in the paper. To make a long story short, there are gradations in complicating effects, from no effect where a preposition can be squeezed in with the verb in naming an OP, to phonological conditioning, to modifying the article of the noun to modifying the noun. A ‘3rd pers. sg.’ may thus be context-dependent, and notions of prepositions may modify the verb or the noun or the article of the noun, or both. For a setting other than English ontologies (e.g., Greek, German, Lithuanian), a preposition may belong neither to the verb nor to the noun, but instead to the role that the object plays in the relation described by the verb in the sentence. For instance, one obtains yomuntu, rather than the basic noun umuntu, if it plays the role of the whole in a part-whole relation like in ‘heart is part of a human’ (inhliziyo iyingxenye yomuntu).

The question then becomes how to handle such a representation that also has to include roles? This is quite common in conceptual data modelling languages and in the DLR family of DL languages, which is known in ontology as positionalism [2]. Bumping up the role to an element in the representation language—thus, in addition to the relationship—enables one to attach information to it, like whether there is a (deep) preposition associated with it, the tense, or the case. Such role-based annotations can then be used to generate the right element, like einen Betrieb ‘some company’ to adjust the article for the case it goes with in German, or ya+umuntu=yomuntu ‘of a human’, modifying the noun in the object position in the sentence.

To get this working properly, with a solid theoretical foundation, we reused a part of the conceptual modelling languages’ metamodel [3] to create a language model for such annotations, in particular regarding the attributes of the classes in the metamodel. On its own, however, it is rather isolated and not immediately useful for ontologies that we set out to be in need of verbalising. To this end, it links to the ‘OWL way of representing relations’ (ontologically: the so-called standard view), and we separate out the logic-based representation from the readings that one can generate with the structured representation of the knowledge. All in all, the simplified high-level model looks like the picture below.

Simplified diagram in UML Class Diagram notation of the main components (see paper for attributes), linking a section of the metamodel (orange; positionalist commitment) to predicates (green; standard view) and their verbalisation (yellow). (Source: [1])

Simplified diagram in UML Class Diagram notation of the main components (see paper for attributes), linking a section of the metamodel (orange; positionalist commitment) to predicates (green; standard view) and their verbalisation (yellow). (Source: [1])

That much for the conceptual part; more details are described in the paper.

Just a fluffy colourful diagram isn’t enough for a solid implementation, however. To this end, we mapped one of the logics that adhere to positionalism to one of the standard view, being DLR [4] and OWL, respectively. It equally well could have been done for other pairs of languages (e.g., with Common Logic), but these two are more popular in terms of theory and tools.

Having the conceptual and logical foundations in place, we did implement it to see whether it actually can be done and to check whether the theory was sufficient. The Protégé plugin is called iMPALA—it could be an abbreviation for ‘Model for Positionalism And Language Annotation’—that both writes all the non-OWL annotations in a separate XML file and takes care of the renderings in Protégé. It works; yay. Specifically, it handles the interaction between the OWL file, the positionalist elements, and the annotations/attributes, plus the additional feature that one can add new linguistic annotation properties, so as to cater for extensibility. Here are a few screenshots:

OWL’s arbeitetFuer ‘works for’ is linked to the relationship arbeiten.

OWL’s arbeitetFuer ‘works for’ is linked to the relationship arbeiten.

The prey role in the axiom of the impala being eaten by the ibhubesi.

The prey role in the axiom of the impala being eaten by the ibhubesi.

 Annotations of the prey role itself, which is a role in the relationship ukudla.

Annotations of the prey role itself, which is a role in the relationship ukudla.

We did test it a bit, from just the regular feature testing to the African Wildlife ontology that was translated into isiZulu (spoken in South Africa) and a people and pets ontology in ciShona (spoken in Zimbabwe). These details are available in the online supplementary material.

The next step is to tie it all together, being the verbalisation patterns for isiZulu [5,6] and the OWL ontologies to generate full sentences, correctly. This is set to happen soon (provided all the protests don’t mess up the planning too much). If you want to know more details that are not, or not clearly, in the paper, then please have a look at the project page of A Grammar engine for Nguni natural language interfaces (GeNi), or come visit EKAW16 that will be held from 21-23 November in Bologna, Italy, where I will present the paper.

 

References

[1] Keet, C.M., Chirema, T. A model for verbalising relations with roles in multiple languages. 20th International Conference on Knowledge Engineering and Knowledge Management EKAW’16). Springer LNAI, 19-23 November 2016, Bologna, Italy. (in print)

[2] Leo, J. Modeling relations. Journal of Philosophical Logic, 2008, 37:353-385.

[3] Keet, C.M., Fillottrani, P.R. An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2. Data & Knowledge Engineering, 2015, 98:30-53.

[4] Calvanese, D., De Giacomo, G. The Description Logics Handbook: Theory, Implementation and Applications, chap. Expressive description logics, pp. 178-218. Cambridge University Press (2003).

[5] Keet, C.M., Khumalo, L. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 2016, in print.

[6] Keet, C.M., Khumalo, L. On the verbalization patterns of part-whole relations in isiZulu. Proceedings of the 9th International Natural Language Generation conference 2016 (INLG’16), Edinburgh, Scotland, Sept 2016. ACL, 174-183.