Riffling through readability metrics

I was interviewed recently about my ontology engineering textbook, following having won the 2021 UCT Open Textbook Award for it. The interviewer assumed initially it was a textbook for undergraduate students because it has the word ‘Introduction’ in the title. Not quite. Soon thereafter, one of the 3rd-year computer science students who arrived early in class congratulated me on the award and laughed that that was an introduction at a different level altogether. It is, by design, but largely so with respect to the topics covered: it does not assume the reader knows anything about ontologies—hence, the ‘introduction’—but it does take for granted that the reader knows some of the basics in computer science or software engineering. For instance, there’s no explanation on what a database is, or a conceptual data model, or object-oriented software.

In addition, and getting to this post’s topic, I had tried to make the textbook readable, and at least definitely more accessible than scientific papers and handbooks that were the only alternatives before this textbook saw the light of day. I think it is readable and I also have received feedback that the book was easily readable. Admittedly, though, the notion of assessing readability only came afore in the editing process of my memoir, for it is aimed at a broader audience than the textbook. This raised a nagging question. What is it that makes some text readable?

It’s one of those easy questions that just do not have a simple answer. The quickest answer is “use a readability metric standardised by grade level” for a home language/mother tongue speaker. Scratching that surface, it lays bare the next question: what parameters have to be taken into account in what way so as to come up with a score for the estimated grade level? Even the brief overview on the Wikipedia page on readability already lists 11 measurable parameters, and there are different ways to measure them and to possibly combine them as well. The same page lists 8 popular metrics and 4 advanced ones. That’s just for English. For instance, the Flesch reading ease is calculated as

206.835 – 1.015 * (total number of words / total number of sentence) – 84.6 * (total number of syllables / total number of words)

A rough categorisation of various texts for adults according to their respective Flesh Reading ease scores. Source: https://blog.cathy-moore.com/2017/07/how-to-get-everyone-to-write-like-ernest-hemingway/.

to result in rough bands of reading ease. For instance, 90-100 for an 11-year old, 60-70 as ‘plain English’, up to anything <30 down to 0 (and possibly even negative) for very to extremely difficult English texts and for professionals and graduate students. See also the figure on the right.

The Gunning fog index has fewer fantastically tweaked multipliers:

Grade level = 0.4 * (average sentence length + percentage of Hard Words)

but there’s a wonderful Hard Words variable. What is that supposed to mean exactly? The readability page says that they are those words with two or more syllables, but the Gunning fog index page says three or more syllables (excluding proper nouns, familiar jargon, or compound words, and not counting common suffixes either).

Either way, the popular metrics are all easy to measure computationally without human intervention. Parameters such as fatigue or speed of perception or background knowledge are not. Proxies for reading speed surely will be available by now somewhere; e,g., in the form of algorithms that analyse page-turning in eBook readers and a visitor’s behaviour scrolling webpages when reading a long article (the system likely knows that you probably won’t finish reading this post).

I don’t know why I never thought about all that before writing the textbook and why none of the writing guidelines I have looked up over the years had mentioned it. The most I did for readability, especially when I was writing my PhD thesis, was the “read aloud test” that was proposed in one of those writing guidelines: read your text aloud, and if you can’t, then something is wrong with the sentence. I used the Acrobat built-in screen reader for that as a first pass. If the text-to-speech algorithm stumbled over it, then it was time to reconsider the phrasing. I would then read it aloud myself and decide whether the Acrobat algorithm had to be improved upon or my sentence had to be revised.

How does the ontology engineering textbook fare? Are my blog posts any more readable? How much worse are the scientific papers? Is it true that the English in articles in science are a sort of a pidgin English whereas in other fields, notably humanities, the erudition and wordsmithery shines through in the readability metrics scores? I have no good answers now, but it would be easy to compute with a fine dataset of texts and the Python py-readability-metrics module for some quick ‘n dirty checks or to adapt some other open source code for batch processing (e.g., from here, among multiple options). Maybe later; there are some other kinks to straighten first.

Notably, one can game the system based on some of those key parameters. Besides sentences length—around 20 words is fine, I was told a long while ago—there are the number of syllables of the words and the vocabulary that are taken into account. More monosyllabic words in shorter sentences with fewer types will come out as more easily readable, according to the metric that is.

But ‘easier’ or ‘better’ lies in the eyes of the beholder: it may be such confetti so as to have become awful to read due to its lack of flow and coherence. Really. It is as I say. Don’t you think? It’s the way I see it. What say you? The “ Really. … you?” has a Flesch reading ease of 90.38 and a Gunning Fog index of 1.44 as number of years of formal education you would have needed to easily understand that. The “Notably, … and coherence” before it in this paragraph has a Flesch reading ease of 50.52 and a Gunning Fog index of 13.82.

Based on random sampling from my textbook, at least one of the paragraphs (p34, ‘purposes’) got a Flesch reading ease of 9.29 and a Gunning Fog index of 22.73, while other parts are around 30 and some are even in the 50-70 region for reading ease.

The illustration out of the way, let’s look at limitations. First, not all polysyllabic words are difficult and not all monosyllabic words are simple; e.g., the common, and therewith easy, ‘education’ and ‘interesting’ vs. the semi-obscure ‘nub’, ‘sloop’, ‘gry’, and ‘squick’ (more here). The longest monosyllabic words, such as ‘scraunched’ and ‘strengthed’, aren’t exactly easy to read either.

Plenty of other languages have predominantly polysyllabic words with lots of syllables, such as Dutch or German where new words can be formed by putting existing ones together. Dutch woord meervoudigepersoonlijkheidsstoornis puts together into one concept meervoudige and persoonlijkheid and stoornis (‘multiple personality disorder’). Agglutinating languages, such as isiZulu, not only compose long words, but have so many meaningful pieces that a single word may well be a whole sentence in a disjunctive language. For instance, the 10-syllabic word that one of my former students used to make the point: titukakimureeterahoganu ‘we have never ever brought it to him’. You get used to long words and there’s no reason why English speakers would be inherently incapable to handle that. Intelligence does not depend on one’s mother tongue. Perhaps, if one is used to a disjunctive orthography, one may have become lazy. Any use off aforementioned readability metrics for ‘non-English’ clearly will have to be revised to tailor it to a language.

Then there’s foreign language background that interferes with reading ease. Many a so-called supposedly ‘difficult‘ word in English comes from French, Italian, Latin, or Greek; e.g., oxymoron (Gr), camaraderie (Fr), quotidian (It), and obfuscate (La). For instance, we use oxymoron in Dutch as well, so there’s no ‘difficulty’ to it for a Dutch person, or take maalstroom that is pronounced nearly the same as ‘maelstrom’ and demagoog for ‘demagogue’ (also Greek origins, similar pronunciation) and algorithme for ‘algorithm’ (Persian origins, not an Anglicism), and recalcitrant is even spelled the same. The foreigner trying to speak or write English may not be erudite, but just winging it and hoping that the ‘copy and adapt’ works out. Conversely, supposedly ‘simpler’ words may not be: ‘wayward’ is a synonym for recalcitrant and with only two syllables, it will make the readability score better. It would make it less readable to at least Dutch, Spanish, Italian and so on readers who are trying to read English text, however, because there’s no connection with a familiar-looking word. About 80% of English words are borrowed from other languages.

Be that as it may, maybe I should reassess my textbook on the metric; maybe not. What does the algorithm know about computer science terminology anyhow? “Ontology Engineering is a specialisation in knowledge representation and reasoning.” has a Flesh reading ease of -31.73 and a Gunning Fog index of 20.00; a tough game it would be to get that back to a reading ease of 50.

It did affect a number of sentences in my memoir book. I don’t expect Joe and Joanne Soap to be interested, but teenagers who are shopping around for a university degree programme might, and then professionals, students, and academics with a little spare time to relax and read, too. In other words: a reading ease of around 40-60. Some long sentences could indeed be split up without losing content, coherence, and flow.

There were others where the simplification didn’t feel like an improvement. For instance, compare “according to my opinion” with “the way I saw it”: the former flows smoothly whereas the latter sounds alike a nagging firing off. The latter for sure improves the readability score with all those monosyllabic words. The copy editor changed the former into the latter. It still bugs me. Why? After some further pondering beyond just blaming the grating staccato of a sequence of monosyllabic words, perhaps it is because an opinion generally is (though need not be) formed after considering the facts and analysing them, whereas seeing something in some way may (but definitely need not) be based on facts and analysis. That is, on closer inspection, they’re not equivalent phrases, not at all. Nuances can be, and were, lost with shorter sentences and simpler words. One’s voice, too. So there’s that. Overall, though, I hope the balance leans toward more readable, to get the message across better to more readers.

Lastly, there seems to be plenty of scope for more research on readability metrics—ones that can be computed, that is. While there are several applications for other well-resourced languages, including easy web apps, such as for Spanish and German and even for Dutch, there are very many languages spoken around the globe that do not have such metrics and nice algorithms yet. But even the readability metrics for English could be tweaked. For instance, to tailor it to a genre or a discipline. Then one it would be easier to determine if a book is, say, an easy-reading popular science book for the holidays on the beach or one that requires some or even a lot of effort. For computer science, one could take Gunning Fog and adjust the Hard Words variable to exclude common jargon that is detrimental to the score, like ‘encapsulation’ and ‘representation’ (both 5 syllables); biochemistry would need that too, given the long names for chemical compounds. And to add a penalty for too many successive monosyllabic words. There will be more options to tweak the formulae and test it, but such additional digging is something for another time.

As to my question in the introductory paragraph of this post, “What is it that makes some text readable?”: if you’re made it all the way here reading this post, we’re all a bit wiser on readability, but a short and simple answer I still don’t have. It’s a long story with ifs and buts, and the last word is yet to be said about it.

As a bonus, here are a few hints to make something more readable, according to the readability calculator of the web-based editor tool of the The Conversation:

Screenshot I took some time halfway when working on a article for The Conversation.

p.s.: The ‘science of reading‘ adds more to it, to the point you wonder how there even can be metrics. But, their scope is broader.

pp.s.: The first full draft of this post had a reading ease of 52.37 and a Gunning Fog of 11.78, and the final one 54.37 and 11.18, respectively, which is fine by me. Length is probably more of an issue.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.