Toward a framework for resolving conflicts in ontologies (with COVID-19 examples)

Among the many tasks involved in developing an ontologies, are deciding what part of the subject domain to include, and how. This may involve selecting a foundational ontology, reuse of related domain ontologies, and more detailed decisions for ontology authoring for specific axioms and design patterns. A recent example of reuse is that of the Infectious Diseases Ontology for schistosomiasis knowledge [1], but even before reuse, one may have to assess differences among ontologies, as Haendel et al did for disease ontologies [2]. Put differently, even before throwing alignment tools at them or selecting one with an import statement and hope for the best, issues may arise. For instance, two relevant domain ontologies may have been aligned to different foundational ontologies, a partOf relation could be set to be transitive in one ontology but is also used in a qualified cardinality constraint in the other (so then one cannot use an OWL 2 DL reasoner anymore when the ontologies are combined), something like Infection may be represented as a class in one ontology but as a property infectedby in another, or the ontologies differ on the science, like whether Virus is an organism or an inanimate object.

What to do then?

Upfront, it helps to be cognizant of the different types of conflict that may arise, and understand what their causes are. Then one would want to be able to find those automatically. And, most importantly, get some assistance in how to resolve them; if possible, also even preventing conflicts from happening in the first place. This is what Rolf Grütter, from the Swiss Federal Research Institute WSL, and I have been working since he visited UCT last year. The first results have been accepted for the International Conference on Biomedical Ontologies (ICBO) 2020, which are described in a paper entitled “Towards a Framework for Meaning Negotiation and Conflict Resolution in Ontology Authoring” [3]. A sample scenario of the process is illustrated informally in the following figure.

Summary of a sample scenario of detecting and resolving conflicts, illustrated with an ontology reuse scenario where Onto2 will be imported into Onto1. (source: [3])

The paper first defines and illustrates the notions of meaning negotiation and conflict resolution and summarises their main causes, to then go into some detail of the various categories of conflicts and ways how to resolve them. The detection and resolution is assisted by the notion of a conflict set, which is a data structure that stores the details for further processing.

It was tested with a use case of an epizootic disease outbreak in the Lemanic Arc in Switzerland in 2006, due to H5N1 (avian influenza): an administrative ontology had to be merged with one about the epidemiology for infected birds and surveillance zones. With that use case in place already well before the spread of SARS-CoV-2 that caused the current pandemic, it was a small step to add a few examples to the paper about COVID-19. This was made possible thanks to recently developed relevant ontologies that were made available, including for COVID-19 specifically. Let’s highlight the examples here, also so that I can write a bit more about it than the terse text in the paper, since there are no page limits for a blog post.

Example 1: OWL profile violations

Medical terminologies tend to veer toward being represented in an ontology language that is less or equal to OWL 2 EL: this permits scalability, compatibility with typical OBO Foundry ontologies, as well as fitting with the popular SNOMED CT. As one may expect, there have been efforts in ontology development with content relevant for the current pandemic; e.g., the Coronavirus Infectious Disease Ontology (CIDO) [4]. The CIDO is not in OWL 2 EL, however: it has a class expressions with a universal quantifier (ObjectAllValuesFrom) on the right-hand side; specifically (in DL notation): ‘Yale New Haven Hospital SARS-CoV-2 assay’ \sqsubseteq \forall ‘EUA-authorized use at’.’FDA EUA-authorized organization’ or, in the Protégé interface:

(codes: CIDO_0000020, CIDO_0000024, and CIDO_0000031, respectively). It also imported many ontologies and either used them to cause some profile violations or the violations came with them, such as by having used the union operator (‘or’) in the following axiom for therapeutic vaccine function (VO_0000562):

How did I find that? Most certainly NOT by manually browsing through the more than 70000 axioms of the CIDO (including imports) to find the needle in the haystack. Instead, I burned the proverbial haystack to easily get the needles. In this case, the burning was done with the OWL Classifier, which automatically computes which axioms violate any of the OWL species, and lists them accordingly. Here are two examples, illustrating an OWL 2 EL violation (that aforementioned universal quantification) and an OWL 2 QL violation (a property chain with entities from BFO and RO); you can do likewise for OWL 2 RL violations.

Following the scenario with the assumption that the CIDO would have to stay in the OWL 2 EL profile, then it is easy to find the conflicting axioms and act accordingly, i.e., remove them. (It also indicates something did not go well with importing the NDF-RT.owl into the cido-base.owl, but that as an aside for this example.)

Example 2: Modelling issues: same idea, different elements

Let’s take the CIDO again and now also the COviD Ontology for cases and patient information (CODO), which have some overlapping and complementary information, so perhaps could be merged. A not unimportant thing is the test for SARS-CoV-2 and its outcome. CODO has a ‘laboratory test finding’ \equiv {positive, pending, negative}, i.e., the possible outcomes of the test are individuals made into a class using the ObjectOneOf constructor. Consulting CIDO for the test outcomes, it has a class ‘COVID-19 diagnosis’ with three subclasses: Negative, Positive, and Presumptive positive. Aside from the inexact matches of the test status that won’t simplify data integration efforts, this is an example of class vs. instance modeling of what is ontologically the same thing. Resolving this in any merging attempt means that either

  1. the CODO has to change and bump up the test results from individuals to classes, or
  2. the CIDO has to change the subclasses to individuals in the ABox, or
  3. take an ‘outside option’ and represent it in yet a different way where both the CODO and the CIDO have to modify the ontology (e.g., take a conceptual data modeling approach by making the test outcome an attribute with a few possible values).

The paper provides an attempt to systematize such type of conflicts toward a library of common types of conflict, so that it should become easier to find them, and offers steps toward a proper framework to manage all that, which assisted with devising generic approaches to resolution of conflicts. We already have done more to realize all that (which could not all be squeezed into the 12 pages), but more is still to be done, so stay tuned.

Since COVID-19 is still doing the rounds and the international borders of South Africa are still closed (with a lockdown for some 5 months already), I can’t end the blog post with the usual ‘I hope to see you at ICBO 2020 in Bolzano in September’—well, not in the common sense understanding at least. Hopefully next year then.

 

References

[1] Cisse PA, Camara G, Dembele JM, Lo M. An Ontological Model for the Annotation of Infectious Disease Simulation Models. In: Bassioni G, Kebe CMF, Gueye A, Ndiaye A, editors. Innovations and Interdisciplinary Solutions for Underserved Areas. Springer LNICST, vol. 296, 82–91. 2019.

[2] Haendel MA, McMurry JA, Relevo R, Mungall CJ, Robinson PN, Chute CG. A Census of Disease Ontologies. Annual Review of Biomedical Data Science, 2018, 1:305–331.

[3] Grütter R, Keet CM. Towards a Framework for Meaning Negotiation and Conflict Resolution in Ontology Authoring. 11th International Conference on Biomedical Ontologies (ICBO’20), 16-19 Sept 2020, Bolzano, Italy. CEUR-WS (in print).

[4] He Y, Yu H, Ong E, Wang Y, Liu Y, Huffman A, Huang H, Beverley J, Hur J, Yang X, Chen L, Omenn GS, Athey B, Smith B. CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis. Scientific Data, 2020, 7:181.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.