What is a pandemic, ontologically?

At some point in time, this COVID-19 pandemic will be over. Each time that thought crossed my mind, there was that little homunculus in my head whispering: but do you know the criteria for when it can be declared ‘over’? I tried to push that idea away by deferring it to a ‘whenever the WHO says it’s over’, but the thought kept nagging. Surely there would be a clear set of criteria lying on the shelf awaiting to be ticked off? Now, with the omicron peak well past us here in South Africa, and with comparatively little harm done in that fourth wave, there’s more talk publicly of perhaps having that end in sight – and thus also needing to know what the decisive factors are for calling it an end.

Then there are the anti-vaxxers. I know a few of them as well. One raged on with the argument that ‘they’ (the baddies in the governments in multiple countries) count the death toll entirely unfairly: “flu deaths count per season in a year, but for covid they keep adding up to the same counter from 2020 to make the death toll look much worse!! Trying to exaggerate the severity!” My response? Duh, well, yes they do count from early 2020, because a pandemic is one event and you count per event! Since the COVID-19 pandemic is a pandemic that is an event, we count from the start until the end – whenever that end is. It hadn’t even crossed my mind that someone wouldn’t count per event but, rather, wanted to chop up an event to pretend it would be smaller than it actually is.

So I did a little digging after all. What is the definition of a pandemic? What are its characteristics? Ontologically, what is that notion of ‘pandemic’, be it according to the analytic philosophers, ontologists, or modellers, or how it may be aligned to some of the foundational ontologies used in ontology engineering? From that, we then should be able to determine when all this COVID-19 has become a ‘is not a pandemic’ (whatever it may be classified into after the pandemic is over).

I could not find any works from the philosophers and theory-focussed ontologists that would have done the work for me already. (If there is and I missed it, please let me know.) Then, to start: what about definitions? There are some, like the recently updated one from dictionary.com where they tried to explain it from a language perspective, and lots of debate and misunderstandings in the debate about defining and describing a pandemic [1]. The WHO has descriptions, but not a clear definition, and pandemic phases. Formulations of definitions elsewhere vary slightly as well, except for the lowest common denominator: it’s a large epidemic.

Ontologically, that is an entirely unsatisfying answer. What is ‘large’? Some, like the CDC of the USA qualified it somewhat: it’s spread over the world or at least multiple regions and continents, and in those areas, it usually affects many people. The Australian Department of Health adds ‘new disease’ to it. Now we’re starting to get somewhere with inclusion of key properties of a pandemic. Kelly [2] adds another criterion to it, albeit focussed on influenza: besides worldwide/very wide area and  affecting a large number of people, “almost simultaneous transmission takes place worldwide” and thus for a part of the world, there is an out-of-season influenza virus transmission.

Image credits: Miroslava Chrienova, taken from this page.

The best resource of all from an ontologists’ perspective, is a very clear, well-written, perspective article written by Morens, Folkers and Fauci – yes, that Fauci from the CDC – in the Journal of Infectious Diseases that, in their lack of wisdom, keeps the article paywalled (it somehow made it onto the webarchive with free access here anyhow). They’re experts and they trawled the literature to, if not define a pandemic, then at least describe it through trying to list the characteristics and the merits, or demerits, thereof. They are, in short, and with my annotation on what sort of attribute (/feature/characteristic, as loosely used term for now) it is:

  1. Wide geographic extension; as aforementioned. That’s a scale or ‘fuzzy’ (imprecise in some way) feature, i.e., without a crisp cut-off point when ‘wide’ starts or ends.
  2. Disease movement, i.e., there’s some transmission going on from place to place and that can be traced. That’s a yes/no characteristic.
  3. High attack rates and explosiveness, i.e., lots of people affected in a short timespan. There’s no clear cut-off point on how fast the disease has to spread for counting as ‘fast spreading’, so a scale or fuzzy feature.
  4. Minimal population immunity; while immunity is a “relative concept” (i.e., you have it to a degree), it’s a clear notion for a population when that exists or not; e.g., it certainly wasn’t there when SARS-CoV-2 started spreading. It is agnostic about how that population immunity is obtained. This may sound like a yes/no feature, perhaps, but is fuzzy, because practically we may not know and there’s for sure a grey area thanks to possible cross-immunity (natural or vaccine-induced) and due to the extent of immune-evasion of the infectious agent.
  5. Novelty; the term speaks for itself, and clearly is a yes/no feature as well. It seems to me like ‘novel’ implies ‘minimal population immunity’, but that may not be the case.
  6. Infectiousness; it’s got to be infectious, and so excluding non-infectious things, like obesity and smoking. Clear yes/no.
  7. Contagiousness; this may be from person to person or through some other medium (like water for cholera). Perhaps as an attribute with categorical values; e.g., human-to-human, human-animal intermediary (e.g., fleas, rats), and human-environment (notably: water).
  8. Severity; while the authors note that it’s not typically included, historically, the term ‘pandemic’ has been applied more often for diseases that are severe or with high fatality rates (e.g., HIV/AIDS) than for milder ones. Fuzzy concept for which a scale could be used.

And, at the end of their conclusions, “In summary, simply defining a pandemic as a large epidemic may make ultimate sense in terms of comprehensibility and consistency. We also suggest that use of the term is best reserved for infectious diseases that share many of the same epidemiologic features discussed above” (p1020), largely for simplifying it to the public, but where scientists and public health officials would maintain their more precise consensus understanding of the complex scientific concept.

Those imprecise/fuzzy properties and lack of clarity of cut-off points bug the epidemiologists, because they lead to different outcomes of their prediction models. From my ontologist viewpoint, however, we’re getting somewhere with these properties: SARS-CoV-2, at least early in 2020 when the pandemic was declared, ticked all those eight boxes and so any reasoner would classify the disease it causes, COVID-19, as a pandemic. Now, in early 2022 with/after the omicron variant of concern? Of those eight properties, numbers 4 and 8 much less so, and number 5 is the million-dollar-question two years into the pandemic. Either way, considering all those properties of a pandemic that have passed the revue here so far, calling an end to the pandemic is not as trivial is it initially may have sounded like. WHO’s “post pandemic period” phase refers to “levels seen for seasonal influenza in most countries with adequate surveillance”. That is a clear specification operationally.

Ontologically, if we were to take these eight properties at face value, the next question then is: are all eight of them combined the necessary and sufficient conditions, or are some of them ‘more essential’ for calling it a pandemic, and the other ones would then be optional features? Etymologically, the pan in pandemic means ‘all’, so then as long as it rages across the world, it would remain a pandemic?

Now that things get ontologically more interesting, the ontological status. Informally, an epidemic is an occurrence (read: instance/individual entity) of an infectious disease at a particular time (read: an unspecified duration of time, not an instant) and that affects some community (be that a community of humans, chicken, or whatever other organisms that live in a community), and pandemic, as a minimum, extends the region that it affects and amount of organisms infected, and then some of those other features listed above.

A pandemic is in the same subject domain as an infectious disease, and so we can consult the OBO Foundry and see what they did, or first start with just the main BFO categories for a general sense of what it would align to. With our BFO Classifier, I get as far as process:

As to the last (optional) question: could one argue that a pandemic is a collection of disjoint part-processes? Not if the part-processes all have to be instances of different types of processes. The other loose end is that BFO’s processes need not have an end, but pandemics do. For now, what’s the most relevant is that the pandemic is distinctly in the occurrent branch of BFO, and occurrents have temporal parts.

Digging further into the OBO Foundry, they indeed did quite some work on infectious diseases and COVID-19 already [4], and following the trail from their Figure 1 (see below): disposition is a realizable entity is a specifically dependent continuant is a continuant; infectious disease course is a disease course is a process is an occurrent; and “realizable entity comes to be realized in the course of the process”.

Source: Figure 1 of [4].

In that approach, COVID-19 is the infectious disease being realised in the pandemic we’re in at the moment, with multiple infectious disease courses in humans and a few other animals. But where does that leave us with pandemic? Inspecting the Infectious Disease Ontology (IDO) since the article does not give a definition, infectious disease epidemic and infectious disease pandemic are siblings of infectious disease course, where disease course is described as “Totality of all processes through which a given disease instance is realized.” (presumably the totality of all processes in one human where there’s an instance of, say, COVID-19). Infectious disease pandemic is an atomic class with no properties or formal definitions, but there’s an annotation with a definition. Nice try; won’t work.

What’s the problem? There are three. The first, and key, problem is that pandemic is stated to be a collection of epidemics, but i) collections of individual things (collectives, aggregates) are categorically different kind of entities than individual things, and ii) epidemic and pandemic are not categorically different things. Not just that, there’s a fiat boundary (along a continuum, really) between an epidemic evolving into becoming a pandemic and then subsiding into separate epidemics. A comparatively minor, or at least secondary, issue is how to determine the boundary of one epidemic from another to be able to construct a collective, since, more fundamentally: what are the respective identities of those co-occurring epidemics? One can’t get collections of things we can’t quite identify. For instance, is it one epidemic in two places that it jumped to, or do they count as two then, and what when two separate ones touch and presumably merge to become one large one? The third issue, and also minor for the current scope, is the definition for epidemic in the ontology’s annotation field, talking of “statistically significant increase in the infectious disease incidence” as determiner, but actually it’s based on a threshold.

Let’s try DOLCE as foundational ontology and see what we get there. With the DOLCE Decision Diagram [5], pandemic ends up as: Is [pandemic] something that is happening or occurring? Yes (perdurant – alike BFO’s occurrent). Are you able to be present or participate in [a pandemic]? Yes (event). Is [a pandemic] atomic, i.e., has no subdivisions of it and has a definite end point? No (accomplishment). Not the greatest word choice to say that a pandemic is an accomplishment – almost right up there with the DOLCE developers’ example that death is an achievement – but it sure is an accomplishment from the perspective of the infectious agent. The nice thing of dolce:accomplishment over  bfo:process is that it entails there’s a limited duration to it (DOLCE also has process that also can go on and on and on).

The last question in both decision diagrams made me pause. The instances of COVID-19 going around could possibly be going around after the pandemic is over, uninterrupted in the sense that there is no time interval where no-one is infected with SARS-CoV-2, or it could be interrupted with later flare-ups if it’s still SARS-CoV-2 and not substantially different, but the latter is a grey area (is it a flare-up or a COVID-2xxx?). The latter is not our problem now. The former would not be in contradiction with pandemic as accomplishment, because COVID-19-the-pandemic and COVID-19-the-disease are two different things. (How those two relate can be a separate story.)

To recap, we have pandemic as an occurrent/perdurant entity unfolding in time and, depending on one’s foundational ontology, something along the line of accomplishment. For an epidemic to be classified as a pandemic, there are a varying number of features that aren’t all crisp and for which the fuzzy boundaries haven’t been set.

To sketch this diagrammatically (hence, informally), it would look something like this:

where the clocks and the DEX and DEV arrows are borrowed from the TREND temporal conceptual data modelling language [6]: Epidemic and Pandemic are temporal entities, DEX (+dashed arrow) verbalised is “An epidemic may also become a pandemic” and DEV (+solid arrow): “Each pandemic must evolve to epidemic ceasing to be a pandemic” (hiding the logic at the back-end).

It isn’t a full answer as to what a pandemic is ontologically – hence, the title of the blog post still has that question mark – but we can already clear up the two issues from the introduction of this post, as follows.


We already saw that with any definition, description, and list of properties proposed, there is no unambiguous and certain definite endpoint to a pandemic that can be deterministically computed. Well, other than the extremes of either 100% population immunity or the affected species is extinct such that there is no single instance of a disease course (in casu, of COVID-19) either way. Several measured values of the scales for the fuzzy variables will go down and immunity increase (further) as the pandemic unfolds, and then the pandemic phase is over eventually. Since there are no thresholds defined, there likely will be people who are forever disagreeing on when it can be called over. That is inherent in the current state of defining what a pandemic is. Perhaps it now also makes you appreciate the somewhat weak operational statement of the WHO post-pandemic period phase – specifying anything better is fraught with difficulties to date and unlikely to ever make everybody happy.

There’s that flawed argument of the anti-vaxxer to deal with still. Flu epidemics last about 10 weeks, on average [7]. They happen in the winter and in the  northern hemisphere that may cross a New Year (although I can’t remember that has ever happened in all the years I’ve lived in Europe). And yet, they also count per epidemic and not per calendar year. School years run from September to July, which provides a different sort of year, and the flu epidemics there are typically reported as ‘flu season 2014/2015’, indicating just that. Because those epidemics are short-lived, you typical get only one of those in a year, and in-season only.

Contrast this with COVID-19: it’s been going round and round and round since late December 2019, with waves and lulls for all countries, regions, and continents, but never did it stop for a season in whole regions or continents. Most countries come close to a stop during a lull at some point between the waves; for South Africa, according to worldometers, the lowest 7-day moving average since the first wave in 2020 was 265 recorded infections per day, on 7 November 2021. Any out-of-season waves? Oh yes – beta came along in summer last year and it was awful; at least for this year’s summer we got a relatively harmless omicron. And it’s not just South Africa that has been having out-of-season spikes. Point is, the COVID-19 pandemic ‘accomplishment’ wasn’t over within the year – neither a calendar year nor a northern hemisphere school year – and so we keep counting with the same counter for as long as the event takes until the pandemic as event is over. There’s no nefarious plot of evil controlling scaremongering governments, just a ‘demic that takes a while longer than we’ve been used to until 2019.

In closing, it is, perhaps, not the last word on the ontological status of pandemic, but I hope the walkthrough provided a little bit of clarity in the meantime already.


[1] Doshi, P. The elusive definition of pandemic influenza. Bulletin of the World Health Organization,  2011, 89:532–538

[2] Kelly, H. The classical definition of a pandemic is not elusive. Bulletin of the World Health Organization, 2011, 89 (‎7)‎, 540 – 541.

[3] Morens, DM, Folkers, GK, Fauci, AS. What Is a Pandemic? The Journal of Infectious Diseases, 2009, 200(7): 1018-1021.

[4] Babcock, S., Beverley, J., Cowell, L.G. et al. The Infectious Disease Ontology in the age of COVID-19. Journal of Biomedical Semantics, 2021, 12, 13.

[5] Keet, C.M., Khan, M.T., Ghidini, C. Ontology Authoring with FORZA. 22nd International Conference on Information and Knowledge Management (CIKM’13). ACM proceedings, pp569-578. 2013.

[6] Keet, C.M., Berman, S. Determining the preferred representation of temporal constraints in conceptual models. 36th International Conference on Conceptual Modeling (ER’17). Springer LNCS 10650, 437-450. 6-9 Nov 2017, Valencia, Spain.

[7] Fleming DM, Zambon M, Bartelds AI, de Jong JC. The duration and magnitude of influenza epidemics: a study of surveillance data from sentinel general practices in England, Wales and the Netherlands. European Journal of Epidemiology, 1999, 15(5):467-73.


Conference report: SWAT4HCLS 2022

The things one can do when on sabbatical! For this week, it’s mainly attending the 13th Semantic Web Applications and tools for Health Care and Life Science (SWAT4HCLS) conference and even having some time to write a conference report again. (The last lost tagged with conference report was FOIS2018, at the end of my previous sabbatical.) The conference consisted of a tutorial day, two conference days with several keynotes and invited talks, paper presentations and poster sessions, and the last day a ‘hackathon’/unconference. This clearly has grown over the years from the early days of the event series (one day, workshop, life science).

A photo of the city where it was supposed to take place: Leiden (NL) (Source: here)

It’s been a while since I looked in more detail into the life sciences and healthcare semantics-driven software ecosystems. The problems are largely the same, or more complex, with more technologies and standards to choose from that promise that this time it will be solved once and for all but where practitioners know it isn’t that easy. And lots of tooling for SARS-CoV-2 and COVID-19, of course. I’ll summarise and comment on a few presentations in the remainder of this post.


The first keynote speaker was Karin Verspoor from RMIT in Melbourne, Australia, who focussed her talk on their COVID-SEE tool [1], a Scientific Evidence Explorer for COVID-19 information that relies on advanced NLP and some semantics to help finding information, notably taking open questions where the sentence is analysed by PICO (population, intervention, comparator, outcome) or part thereof, and using UMLS and MetaMap to help find more connections. In contrast to a well-known domain with well-known terminology to formulate very specific queries over academic literature, that was (and still is) not so for COVID-19. Their “NLP+” approach helped to get better search results.

The second keynote was by Martina Summer-Kutmon from Maastricht University, the Netherlands, who focussed on metabolic pathways and computation and is involved in WikiPathways. With pretty pictures, like the COVID-19 Disease map that culminated from a lot of effort by many research communities with lots of online data resources [2]; see also the WikiPathways one for covid, where the work had commenced in February 2020 already. She also came to the idea that there’s a lot of semantics embedded in the varied pathway diagrams. They collected 64643 diagrams from the literature of the past 25 years, analysed them with ML, OCR, and manual curation, and managed to find gaps between information in those diagrams and the databases [3]. It reminded me of my own observations and work on that with DiDOn, on how to get information from such diagrams into an ontology automatically [4]. There’s clearly still lots more work to do, but substantive advances surely have been made over the past 10 years since I looked into it.

Then there were Mirjam van Reisen from Leiden UMC, the Netherlands, and Francisca Oladipo from the Federal University of Lokoja, Nigeria, who presented the VODAN-Africa project that tries to get Africa to buy into FAIR data, especially for COVID-19 health monitoring within this particular project, but also more generally to try to get Africans to share data fairly. Their software architecture with tooling is open source. Apart from, perhaps, South Africa, the disease burden picture for, and due to, COVID-19, is not at all clear in Africa, but ideally would be. Let me illustrate this: the world-wide trackers say there are some 3.5mln infections and 90000+ COVID-19 deaths in South Africa to date, and from far away, you might take this at face value. But we know from SA’s data at the SAMRC that deaths are about three times as much; that only about 10% of the COVID-19-positives are detected by the diagnostics tests—the rest doesn’t get tested [asymptomatic, the hassle, cost, etc.]; and that about 70-80% of the population already had it at least once (that amounts to about 45mln infected, not the 3.5mln recorded), among other things that have been pieced together from multiple credible sources. There are lots of issues with ‘sharing’ data for free with The North, but then not getting the know-how with algorithms and outcomes etc back (a key search term for that debate has become digital colonialism), so there’s some increased hesitancy. The VODAN project tries to contribute to addressing the underlying issues, starting with FAIR and the GDPR as basis.

The last keynote at the end of the conference was by Amit Shet, with the University of South Carolina, USA, whose talk focussed on how to get to augmented personalised health care systems, with as one of the cases being asthma. Big Data augmented with Smart Data, mainly, combining multiple techniques. Ontologies, knowledge graphs, sensor data, clinical data, machine learning, Bayesian networks, chatbots and so on—you name it, somewhere it’s used in the systems.


Reporting on the papers isn’t as easy and reliable as it used to be. Once upon a time, the papers were available online beforehand, so I could come prepared. Now it was a case of ‘rock up and listen’ and there’s no access to the papers yet to look up more details to check my notes and pad them. I’m assuming the papers will be online accessible soon (CEUR-WS again presumably). So, aside from our own paper, described further below, all of the following is based on notes, presentation screenshots, and any Q&A on Discord.

Ruduan Plug elaborated on the FAIR & GDPR and querying over integrated data within that above-mentioned VODAN-Africa project [5]. He also noted that South Africa’s PoPIA is stricter than the GDPR. I’m suspecting that is due to the cross-border restrictions on the flow of data that the GDPR won’t have. (PoPIA is based on the GDPR principles, btw).

Deepak Sharma talked about FHIR with RDF and JSON-LD and ShEx and validation, which also related to the tutorial from the preceding day. The threesome Mercedes Arguello-Casteleiro, Chloe Henson, and Nava Maroto presented a comparison of MetaMap vs BERT in the context of covid [6], which I have to leave here with a cliff-hanger, because I didn’t manage to make a note of which one won because I had to go to a meeting that we were already starting later because of my conference attendance. My bet would be on the semantics (those deep learning models probably need more reliable data than there is available to date).

Besides papers related to scientific research into all things covid, another recurring topic was FAIR data—whether it’s findable, accessible, interoperable, and reusable. Fuqi Xu  and collaborators assessed 11 features for FAIR vocabularies in practice, and how to use them properly. Some noteworthy observations were that comparing a FAIR level makes more sense before-and-after changing a single resource compared to pitting different vocabularies against each other, “FAIR enough” can be enough (cf. demanding 100% compliance) [7], and a FAIR vocabulary does not imply that it is also a good quality vocabulary. Arriving at the topic of quality, César Bernabé presented an analysis on the use of foundational ontologies in bioinformatics by means of a systematic literature mapping. It showed that they’re used in a range of activities of ontology engineering, there’s not enough empirical analysis of the pros and cons of using one, and, for the numbers game: 33 of the ontologies described in the selected literature used BFO, 16 DOLCE, 7 GFO, and 1 SUMO [8]. What to do next with these insights remains to be seen.

Last, but not least—to try to keep the blog post at a sort of just about readable length—our paper, among the 15 that were accepted. Frances Gillis-Webber, a PhD student I supervise, did most of the work surveying OWL Ontologies in BioPortal on whether, and if so how, they take into account the notion of multilingualism in some way. TL;DR: they barely do [9]. Even when they do, it’s just with labels rather than any of the language models, be they the ontolex-lemon from the W3C community group or another, and if so, mainly French and German.

Source: [9]

Does it matter? It depends on what your aims are. We use mainly the motivation of ontology verbalisation and electronic health records with SNOMED CT and patient discharge note generation, which ideally also would happen for ‘non-English’. Another use case scenario, indicated by one of the participants, Marco Roos, was that the bio-ontologies—not just health care ones—could use it as well, especially in the case of rare diseases, where the patients are more involved and up-to-date with the science, and thus where science communication plays a larger role. One could argue the same way for the science about SARS-CoV-2 and COVID-19, and thus that also the related bio-ontologies can do with coordinated multilingualism so that it may assist in better communication with the public. There are lots of opportunities for follow-up work here as well.


There were also posters where we could hang out in gathertown, and more data and ontologies for a range of topics, such as protein sequences, patient data, pharmacovigilance, food and agriculture, bioschemas, and more covid stuff (like Wikidata on COVID-19, to name yet one more such resource). Put differently: the science can’t do without the semantic-driven tools, from sharing data, to searching data, to integrating data, and analysis to develop the theory figuring out all its workings.

The conference was supposed to be mainly in person, but then on 18 Dec, the Dutch government threw a curveball and imposed a relatively hard lockdown prohibiting all in-person events effective until, would you believe, 14 Jan—one day after the end of the event. This caused extra work with last-minute changes to the local organisation, but in the end it all worked out online. Hereby thanks to the organising committee to make it work under the difficult circumstances!


[1] Verspoor K. et al. Brief Description of COVID-SEE: The Scientific Evidence Explorer for COVID-19 Related Research. In: Hiemstra D., Moens MF., Mothe J., Perego R., Potthast M., Sebastiani F. (eds). Advances in Information Retrieval. ECIR 2021. Springer LNCS, vol 12657, 559-564.

[2] Ostaszewski M. et al. COVID19 Disease Map, a computational knowledge repository of virus–host interaction mechanisms. Molecular Systems Biology, 2021, 17:e10387.

[3] Hanspers, K., Riutta, A., Summer-Kutmon, M. et al. Pathway information extracted from 25 years of pathway figures. Genome Biology, 2020, 21,273.

[4] Keet, C.M. Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn. Journal of Biomedical Informatics, 2012, 45(3): 482-494. DOI: dx.doi.org/10.1016/j.jbi.2012.01.004.

[5] Ruduan Plug, Yan Liang, Mariam Basajja, Aliya Aktau, Putu Jati, Samson Amare, Getu Taye, Mouhamad Mpezamihigo, Francisca Oladipo and Mirjam van Reisen: FAIR and GDPR Compliant Population Health Data Generation, Processing and Analytics. SWAT4HCLS 2022. online/Leiden, the Netherlands, 10-13 January 2022.

[6] Mercedes Arguello-Casteleiro, Chloe Henson, Nava Maroto, Saihong Li, Julio Des-Diz, Maria Jesus Fernandez-Prieto, Simon Peters, Timothy Furmston, Carlos Sevillano-Torrado, Diego Maseda-Fernandez, Manoj Kulshrestha, John Keane, Robert Stevens and Chris Wroe, MetaMap versus BERT models with explainable active learning: ontology-based experiments with prior knowledge for COVID-19. SWAT4HCLS 2022. online/Leiden, the Netherlands, 10-13 January 2022.

[7] Fuqi Xu, Nick Juty, Carole Goble, Simon Jupp, Helen Parkinson and Mélanie Courtot, Features of a FAIR vocabulary. SWAT4HCLS 2022. online/Leiden, the Netherlands, 10-13 January 2022.

[8] César Bernabé, Núria Queralt-Rosinach, Vitor Souza, Luiz Santos, Annika Jacobsen, Barend Mons and Marco Roos, The use of Foundational Ontologies in Bioinformatics. SWAT4HCLS 2022. online/Leiden, the Netherlands, 10-13 January 2022.

[9] Frances Gillis-Webber and C. Maria Keet, A Survey of Multilingual OWL Ontologies in BioPortal. SWAT4HCLS 2022. online/Leiden, the Netherlands, 10-13 January 2022.

Trying to categorise popular science books

Some time last year, a colleague asked about good examples of popular science books, in order to read and thereby to get inspiration on how to write books at that level, or at least for first-year students at a university. I’ve read (and briefly reviewed) ‘quite a few’ across multiple disciplines and proposed to him a few of them that I enjoyed reading. One aspect that bubbled up at the time, is that not all popsci books are of the same quality and, zooming in on this post’s topic: not all popsci books are of the same level, or, likely, do not have the same target audience.

I’d say they range from targeting advanced interested laypersons to entertaining laypersons. The former entails that you’d be better off having covered the topic at school and an undergrad course or two will help as well for making it an enjoyable read, and be fully awake, not tired, when reading it. For the latter category at the other end of the spectrum: having completed little more than primary school will do fine and no prior subject domain knowledge is required, at all, and it’s good material for the beach; brain candy.

Either way you’ll learn something from any popsci book, even if it’s too little for the time spent reading the book or too much to remember it all. But some of them are much more dense than others. Compare cramming the essence of a few scientific papers in a book’s page to drawing out one scientific paper into a whole chapter. Then there’s humor—or the lack thereof—and lighthearted anecdotes (or not) to spice up the content to a greater or lesser extent. The author writing about fungi recounting eating magic mushrooms, say, or an economist being just as much of a sucker for summer sales in the shops as just about anyone. And, of course, there’s readability (more about that shortly in another post).

Putting all that in the mix, my groupings are as follows, with a selection of positive exemplars that I also enjoyed reading.

There are more popsci books of which I thought they were interesting to read, but I didn’t want to turn it into a laundry list. Also, it seemed that books on politics and society and philosophy and such seem to be deserving their own discussion on categorisation, but that’s for another time. I also intentionally excluded computer science, information systems, and IT books, because I may be differently biassed to those books compared to the out-of-my-own-current-specialisation books listed above. For instance, Dataclysm by Cristian Rudder on Data Science mainly with OKCupid data (reviewed earlier) was of the ‘entertainment’ level to me, but probably isn’t so for the general audience.

Perhaps it is also of use to contrast them to ‘bad’ examples—well, not bad, but I think they did not succeed well in their aim. Two of them are Critical mass by Phillip Ball (physics, social networks), because it was too wordy and drawn out and dull, and This is your brain on music by Daniel Levitin (neuroscience, music), which was really interesting, but very, very, dense. Looking up their scores on goodreads, those readers converge to that view for your brain on music as well (still a good 3.87 our of 5, from nearly 60000 ratings and well over 1500 reviews), as well as for the critical mass one (3.88 from some 1300 ratings and about 100 reviews). Compare that to a 4.39 for the award-wining Entangled life, 4.35 of Why we sleep, and 4.18 for Mama’s last hug. To be fair, not all books listed above have a rating above 4.

Be this as it may, I still recommend all of those listed in the four categories, and hopefully the sort of rough categorisation I added will assist in choosing a book among the very many vying for your attention and time.

Pushing the envelope categorising popsci books

Regarding book categories more generally, romance novels have subgenres, as does science fiction, so why not the non-fiction popsci books? Currently, they’re mostly either just listed (e.g., here or the new releases) or grouped by discipline, but not according to, say, their level of difficulty, humor, whether it mixes science with politics, self-help, or philosophy, or some other quality dimension of the book along which they possibly could be assessed.

As example that the latter might work for assigning attributes to the books: Why we sleep is 100% science but a reader can distill some ideas to practice with as self-help for sleeping better, whereas When: the scientific secrets of perfect timing is, contrary to what the title suggests, largely just self-help. Delusions of gender and Inside rebellion can, or, rather, should have some policy implications, and Why we sleep possibly as well (even if only to make school not start so early in the morning), whereas the sort of content of Elephants on acid already did (ethics review boards for scientific experiments, notably). And if you were not convinced of the presence of animal cognition, then Mama’s last hug may induce some philosophical reflecting, and then have a knock-on effect on policies. Then there are some books that I can’t see having either a direct or indirect effect on policy, such as Gastrophysics and Entangled life.

Let’s play a little more with that idea. What about vignettes composed of something like the followings shown in the table below?

Then a small section of the back cover of Entangled life would look like this, with the note that the humor is probably inbetween the ‘yes’ and ‘some’ (I laughed harder with the book on drunkenness).

Mama’s last hug would then have something like:

And Why we sleep as follows (though I can’t recall for sure now whether it was ‘some’ or ‘no laughing matter’ and a friend has borrowed the book):

A real-life example of a categorisation box on a product; coffee suitable for moka pots, according to House of Coffees.

Of course, these are just mock-ups to demonstrate the idea visually and to try out whether it is even doable to classify the books. They are. There very well may be better icons than these scruffy ‘take a cc or public domain one and fiddle with it in MS Paint’ or a mixed mode approach, like on the packs of coffee (see image on the right).

Moreover: would you have created the same categorisation for the three examples? What (other) properties of popular science books could useful? Also, and perhaps before going down that route: would something like that possibly be useful according to you or someone you know who reads popular science books? You may leave your comments below, on my facebook page, or write an email, or we can meet in person some day.

p.s.: this is not a serious post on the ontology of popular science books — it is summer vacation time here and I used to write book reviews in the first week of the year and this is sort of related.