Bias in ontologies?

Bias in models in the area of Machine Learning and Deep Learning are well known. They feature in the news regularly with catchy headlines and there are longer, more in-depth, reports as well, such as the Excavating AI by Crawford and Paglen and the book Weapons of Math Destruction by O’Neil (with many positive reviews). What about other types of ‘models’, like those that are not built in a data-driven bottom-up way from datasets that happen to lie around for the taking, but that are built by humans? Within Artificial Intelligence still, there are, notably, ontologies. I searched for papers about bias in ontologies, but could find only one vision paper with an anecdote for knowledge graphs [1], one attempt toward a framework but looking at FOAF only [2], which is stretching it a little for what passes as an ontology, and then stretching it even further, there’s an old one of mine on bias in relation to conceptual data models for databases [3].

We simply don’t have bias in ontologies? That sounds a bit optimistic since it’s pervasive elsewhere, and at least worthy of examination whether there is such notion as bias in ontologies and if so, what the sources of that may be. And, if one wants to dig deeper, since Ontology: what is bias anyhow? The popular media is much more liberal in the use of the term ‘bias’ than scientific literature and I’m not going to answer that last question here now. What I did do, is try to identify sources of bias in the context of ontologies and I took a relevant selection of Dimara et al’s list of 154 biases [4] (just like only a subset is relevant to their scope) to see whether they would apply to a set of existing ontologies in roughly the same domain.

The outcome of that exploratory analysis [5], in short, is: yes, there is such notion as bias in ontologies as well. First, I’ve identified 8 types of sources, described them, and illustrated them with hand-picked examples from extant ontologies. Second, I examined the three COVID-19 ontologies (CIDO, CODO, COVoc) on possible bias, and they exhibited different subsets indeed.

The sources can be philosophical, by purpose (commonly known as encoding bias), and ‘subject domain’ source, such as scientific theory, granularity, linguistic, social-cultural, political or religious, and economic motivations, and they may be explicit choices or implicit.

Table 1. Summary of typical possible biases in ontologies grouped by source, with an indication whether such biases would be explicit choices or whether they may creep in unintentionally and lead to implicit bias. (Source: [5])

An example of an economic motivation is to (try to) categorise some disorder as a type of disease: there latter gets more resources for medicines, research, treatments and is more costly for insurers who’s rather keep it out of the terminology altogether. Or modifying the properties of a disease or disorder in the classification in the medical ontology so that more people will be categorised as having the disorder even when they don’t. It has happened (see paper for details). Terrorism ontologies can provide ample material for political views to creep in.

Besides the hand-picked examples, I did assess the three COVID-19 ontologies in more detail. Not because I wanted to pick on them—I actually think it’s laudable they tried in trying times—but because they were developed in the same timeframe by three different groups in relative isolation from each other. I looked at both the sources, which can be argued to be present and identified some from a selection of Dimara et al’s list, such as the “mere exposure/familiarity” bias and “false consensus” bias (see table below). How they are present, is also described in that same paper, entitled “An exploration into cognitive bias in ontologies”, which has recently been accepted at the workshop on Cognition And OntologieS V (CAOS’21), which is part of the Joint Ontology Workshops Episode VII at the Bolzano Summer of Knowledge.

Table 2. Tentative presence of bias in the three COVID-19 ontologies, by cognitive bias; see paper for details.

Will it matter for automated reasoning when the ontologies are deployed in various information systems? For reasoning over the TBox only, perhaps not so much, or, at least, any inconsistencies that it would have caused should have been detected and discussed during the ontology development stage, rather.

Will it matter for, say, annotating data or literature etc? Some of it yes, for sure. For instance, COVoc has only ‘male’ in the vocabulary, not female (in line with a well-known issue in evidence-based medicine), so when it is used for the “scientific literature triage” they want to, then it’s going to be even harder to retrieve COVID-19 research papers in relation to women specifically. Similarly, when ontologies are used with data, such as for ontology-based data access, bias may have negative effects. Take as example CIDO’s optimism bias, where a ‘COVID-19 experimental drug in a clinical trial’ is a subclass of ‘COVID-19 drug’, and this ontology would be used for OBDA and data integration, as illustrated in the following use case scenario with actual data from the ClinicalTrials database and the FDA approved drugs database:

Figure 1. OBDI scenario with the CIDO, two database, and a query over the system that returns a logically correct but undesirable result due to some optimism that an experimental substance is already a drug.

The data together with the OBDA-enabled reasoner will return ‘hydroxychloroquine’, which is incorrect and the error is due to the biased and erroneous class subsumption declared in the ontology, not the data source itself.

Some peculiarities of content in an ontology may not be due to an underlying bias, but merely a case of ‘ran out of time’ rather than an act of omission due to a bias, for instance. Or it may not be an honest mistake due to bias but a mistake because of some other reason, such as due to having clicked erroneously on a wrong button in the tool’s interface, say, or having misunderstood the modelling language’s features. Disentangling the notion of bias from attendant ontology quality issues is one of the possible avenues of future work. One also can have a go at those lists and mini-taxonomies of cognitive biases and make a better or more comprehensive one, or to try to harmonise the multitude of definitions of what bias is exactly. Methods and supporting software may also assist ontology developers more concretely further down the line. Or: there seems to be enough to do yet.

Lastly, I still hope that I’ll be allowed to present the paper in person at the CAOS workshop, but it’s increasingly looking less and less likely, as our third wave doesn’t seem to want to quiet down and Italy is putting up more hurdles. If not, I’ll try to make a fancy video presentation.


[1] K. Janowicz, B. Yan, B. Regalia, R. Zhu, G. Mai, Debiasing knowledge graphs: Why female presidents are not like female popes, in: M. van Erp, M. Atre, V. Lopez, K. Srinivas, C. Fortuna (Eds.), Proceeding of ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks, volume 2180 of CEUR-WS, 2017.

[2] D. L. Gomes, T. H. Bragato Barros, The bias in ontologies: An analysis of the FOAF ontology, in: M. Lykke, T. Svarre, M. Skov, D. Martínez-Ávila (Eds.), Proceedings of the Sixteenth International ISKO Conference, Ergon-Verlag, 2020, pp. 236 – 244.

[3] Keet, C.M. Dirty wars, databases, and indices. Peace & Conflict Review, 2009, 4(1):75-78.

[4] E. Dimara, S. Franconeri, C. Plaisant, A. Bezerianos, P. Dragicevic, A task-based taxonomy of cognitive biases for information visualization, IEEE Transactions on Visualization and Computer Graphics 26 (2020) 1413–1432.

[5] Keet, C.M. An exploration into cognitive bias in ontologies. Cognition And OntologieS (CAOS’21), part of JOWO’21, part of BoSK’21. 13-16 September 2021, Bolzano, Italy. (in print)