Good girls, bold girls – but not böse

That first sentence of a book, including non-fiction books, may set the tone for what’s to come. For my memoir, it’s a translation of Brave meisjes komen in de hemel, brutale overal: good girls go to heaven, bold ones go everywhere.

I had read a book with that title some 25 years ago. It was originally written by Ute Ehrhardt in 1994 and translated from German to Dutch and published a year later. For the memoir, I had translated the Dutch title of the book into English myself: the brutale translates to ‘bold’ according to me, my dictionary (a Prisma Woordenboek hard copy), and an online dictionary. Bold means “(of a person, action, or idea) showing a willingness to take risks; confident and courageous.” according to the Oxford dictionary (and similarly here) and it’s in the same league as audacious, daring, brazen, and perky. It has a positive connotation.

What I, perhaps, ought to have done last year, is to find out whether the book also had been translated into English and trust that translator. As it turned out, I’m glad I did not do so, which brings me to the more substantive part of the post. I wanted to see whether I could find the book in order to link it in this post. I did. Interestingly, the word used in the English title was “bad” rather than ‘bold’, yet brutaal is not at all necessarily bad, nor is the book about women being bad. Surely something must have gotten warped in translation there?!

I took the hard copy from the bookshelf and checked the fine-print: it listed the original German title as Gute Mädchen kommen in den Himmel, böse überall hin. Hm, bӧse is not good. It has 17 German-to-English translations and none is quite as flattering as bold, not at all. This leaves either bad translations to blame or there was a semantic shift in the German-to-Dutch translation. Considering the former first, it appeared that the German-Dutch online dictionary did not offer nice Dutch words for bӧse either. Getting up from my chair again to consult my hard copy Prisma German-Dutch dictionary did not pay off either, except for one, maybe (ondeugend). It does not even list brutaal as possible translation. Was the author, Dr Ehrhardt of the Baby Boomer generation, still so indoctrinated in the patriarchy and Christianity – Gute vs Das Bӧse – as to think that not being a smiling nice girl must mean being bӧse? The term did not hold back the Germans, by the way: it was the best-sold non-fiction book in Germany in 1995, my Dutch copy stated. Moreover, it turned out to be at second place overall since German book sales counting started 60 years ago, including having been a whopping 107 weeks at first place in the Spiegel bestseller list. What’s going on here? Would the Germans be that interested in ‘bad’ girls? Not quite. The second option applies, i.e., the the semantic shift for the Dutch translation.

The book’s contents is not about bad, mean, or angry women at all and the subtitle provides a further hint to that: waarom lief zijn vrouwen geen stap verder brengt ‘why being nice won’t get women even one step ahead’. Instead of being pliant, submissive, and self-sabotaging in several ways, and therewith have our voices ignored, contributions downplayed, and being passed over for jobs and promotions, it seeks to give women a kick in the backside in order to learn to stand one’s ground and it provides suggestions to be heard and taken into account by avoiding the many pitfalls. Our generation of children of the Baby Boomers would improve the world better than those second wave feminists tried to do, and this book fitted right within the Zeitgeist. It was the girl power decade in the 1990s, where women took agency to become master of their own destiny, or at least tried to. The New Woman – yes, capitalised in the book. Agent Dana Scully of the X Files as the well-dressed scientist and sceptic investigator. Buffy the vampire slayer. Xena, Warrior Princess. The Spice Girls. Naomi Wolf’s Fire with Fire (that, by the way, wasn’t translated into Dutch). Reading through the book again now, it comes across as a somewhat dated use-case-packed manifesto about the pitfalls to avoid and how to be the architect of your own life. That’s not being bad, is it.

I suppose I have to thank the German-to-Dutch book translator Marten Hofstede for putting a fitting Dutch title to the content of the book. It piqued my interest in the bookstore at the train station, and I bought and read it in hat must have been 1997. It resonated. To be honest, if the Dutch title would have used any of the listed translations in the online dictionary – such as kwaad, verstoord, and nijdig – then I likely would not have bought the book. Having had to be evil or perpetually angry to go everywhere, anywhere and upward would have been too steep price to pay. Luckily, bold was indeed the right attribute. Perhaps for the generation after me, i.e., who are now in their twenties, it’s not about being bold but about being, as a normal way of outlook and interaction in society. Of course a woman is entitled to live her own life, as any human being is.


English, Englishes – which one to use for writing?

Sometimes, the answer to the question in the post’s title is easy, if you’re writing in English: do whatever the style guide says. Don’t argue with the journal editor or typesetter about that sort of trivia (unless they’re very wrong). If it states American English spelling, do so; if British English, go for that. If you can’t distinguish your color from colour, modeling from modelling, and a faucet from a tap, use a spellchecker with one of the Englishes on offer—even OpenOffice Writer shows red wavy lines under ‘color’, ‘modeling’, and ‘faucet’ when it’s set to my default “English (South Africa)”. There are very many other places where you can write in English as much as you like or have time for, however, and then the blog post’s question becomes more relevant. How many Englishes or somehow accepted recognised variants of English exist, and where does it make a difference in writing such that you’ll have to, or are supposed to, choose?

It begs the question of how many variants of English count as one of the Englishes, which is tricky to answer, because it depends on what counts. Does a dialect count? Does it count when it’s sanctioned by a country when it has an official language status and a language body? Does it count when there are enough users? Or when there’s enough text to detect the substantive differences? What are the minimum number or type of differences, if any, and from which standard, before one may start to talk of different Englishes and a new spin-off X-English? People have been deliberating about such matters and trying to document differences and even have come up with classification schemes. Englishes around the world, to be more precise, refer to localised or indigenised versions of English that are either those people’s first or institutionalised language, not just any variant or dialect. There’s an International Association for World Englishes (IAWE) and there are handbooks, textbooks, and scientific journals about it, and the 25th conference of the IAWE will take place next year.

In recent years there have been suggestions that English could break up into mutually unintelligible languages, much as Latin once did. Could such a break-up occur, or are we in need of a new appreciation of the nature of World English?

Tom McArtrur, 1987, writing from “the mother country”, but not “the centre of gravity”, of English (pdf).

My expertise doesn’t go that far – I’m operating from the consumer-side of these matters, standards-following, and trying to not make too many mistakes. It took me a while to figure out there was British English (BE) and American English (AE) and then it was a matter of looking up rules on spelling differences, like -ise vs. -ize and single vs. double l (e.g., traveling vs. travelling), checking comparative word lists, and other varied differences, like whether it’s ‘towards’ or ‘toward’ or 15:30, 15.30, 3.30pm or 3:30pm (or one of my colleagues p’s, like a 3.30p). Not to mention a plethora of online writing guides and the comprehensive sense of style book by Steven Pinker. Let’s explore the Englishes and Global English a little.

McArthur’s Englishes (source)

South African English (SAE) exists as one of the recognised Englishes, all the way into internationally reputable dictionaries. It is a bit of a mix of BE and AE, with some spices sprinkled into it. It tries to follow BE but there are AE influences due to the media and, perhaps, anti-colonial sentiment. It’s soccer, not football, for instance, and the 3.30pm variant rather than a 24h clock. Well, I’m not sure it is officially, but practically it is so. It also has ‘weird’ words that everyone is convinced is native English of the BE variety, but isn’t, like timeously rather than timeous or timely – the most I could find was a Wiktionary entry on it claiming it to be Scottish and SAE, but not even the Dictionary of SAE (DSAE) has an entry for it. I’ve seen it so often in work emails over the years that I caved in and use it as well. There are at least a handful of SAE words that people in South Africa think is BE but isn’t, as any SA expat will be able to recall when they get quizzical looks overseas. Then there are hundreds of words that people know is SAE at least unofficially, which are mainly the loan words and adopted words from the 10 other languages spoken in SA – regional overlap causes mutual language influences in all directions. Bakkie, indaba, veld, lekker, dagga, and many more – I’ve blogged about that before. My OpenOffice SAE spellchecker doesn’t flag any of these words as typos.

Arguably, also grammatical differences for SAE exist. In practice they sure do, but I’m not aware of anything officially endorsed. There is no ‘benevolent language dictator’ with card-carrying members of the lexicography and grammar police to endorse or reprimand. Indeed there is the Pan-South African Language Board (PANSALB), but its teeth and thunder don’t come close to the likes of the Académie Française or Real Academia Española. Regarding grammar, that previous post already mentioned the case of the preposition at the end of a sentence when it’s a separable part of the verb in Afrikaans, Dutch, and German (e.g., meenemen or mitnehmen ‘take with’). A concoction that still makes me wince each time I hear or read it, is the ‘can be able to’. It’s either can + verb what you can, or copula + able to + verb what you can do. It is, e.g., ‘I can carry out the experiment’ or ‘I’m able to carry out the experiment’, but not ‘I can be able to carry out the experiment’. I suspect it carries over from a verb form in Niger-Congo B languages since I’ve heard it used also by at least Tanzanians, Kenyans, and Malawians, and meanwhile I’ve occasionally seen it also in texts written by English South African students.

If the notion of “Englishes” feels uncomfortable, then what about Global/World/International English? Is there one? For many a paper I review double-blind, i.e., where the author names and affiliations are hidden, I can’t tell unless the English is really bad. I’ve read enough to be able to spot Spanglish or Chinglish, but mostly I can’t tell, in that there’s a sort of bland scientific English – be it a pidgin English, or maybe multiple authors cancel out ways of making mistakes, or no-one really bothers tear the vocabulary apart into their boxes because it’s secondary to the scientific content being communicated. No doubt that investigative deliberations are ongoing about that too; if there aren’t, they ought to.

Another scenario for ‘global English’, concerns how to write a newsletter to a global audience. For instance, if you were to visit a website with an intended audience in the USA, then it should tolerable to read “this fall”, even though elsewhere it’s either autumn, spring, a rainy or a dry season. If it’s an article by the UN, say, then one may expect a different wording that is either not US-centric or, if the season matters, to qualify it like in a “Covid-19 cases are expected to rise during fall and winter in North America”. With the former wording, you can’t please everyone, due to different calendars with different month names and year ends and different seasons. The question also came up recently for a Wikimedia blog post that I was involved sideways in a draft version, on Abstract Wikipedia progress for its natural language generation component. My tendency was toward(s) a Global English, whereas one of my collaborators’ stance was that they assumed a rule that it should be the English of wherever the organisation’s headquarters is located. These choices were also confusing when I was writing the first draft of my memoir: it was published by a South African publisher, hence, SAE style guidelines, but the book is also distributed – and read! – internationally.

Without clear rules, there will always be people who complain about your English, be it either that you’re wrong or just not in the inner circle for sensing ‘the feeling of the language that only a native speaker can have’, that supposedly inherently unattainable fingerspitzengefühl for it. No clear rules isn’t good for developing spelling and grammar checkers either. In that regard, and that one only, perhaps I just might prefer a benevolent dictator. I don’t even care which of the Englishes (except for not the stupid stuff like spelling ‘light’ as ‘lite’, ffs). I also fancy the idea of banding together with other ‘nonfirst-language’ speakers of English to start devising and dictating rules, since the English speakers can’t seem to sort out their own language – at least not enough like the grammatically richer languages – and we’re in the overwhelming majority in numbers (about 1:3 apparently). One can dream.

As to the question in the title of the blog post: what I’ve written so far is not a clear answer for all cases, indeed, in particular when there is no editorial house style dictating it, but this lifting of the veil hopefully has made clear that attempting to answer the question means opening up that can of worms further. You could create your own style guide for your not-editor-policed writings. The more I read about it, though, the more complicated things turn out to be, so you’re warned in case you’d like to delve into this topic. Meanwhile, I’ll keep winging it on my blog with some version of a ‘global English’ and inadvertent typos and grammar missteps…

Riffling through readability metrics

I was interviewed recently about my ontology engineering textbook, following having won the 2021 UCT Open Textbook Award for it. The interviewer assumed initially it was a textbook for undergraduate students because it has the word ‘Introduction’ in the title. Not quite. Soon thereafter, one of the 3rd-year computer science students who arrived early in class congratulated me on the award and laughed that that was an introduction at a different level altogether. It is, by design, but largely so with respect to the topics covered: it does not assume the reader knows anything about ontologies—hence, the ‘introduction’—but it does take for granted that the reader knows some of the basics in computer science or software engineering. For instance, there’s no explanation on what a database is, or a conceptual data model, or object-oriented software.

In addition, and getting to this post’s topic, I had tried to make the textbook readable, and at least definitely more accessible than scientific papers and handbooks that were the only alternatives before this textbook saw the light of day. I think it is readable and I also have received feedback that the book was easily readable. Admittedly, though, the notion of assessing readability only came afore in the editing process of my memoir, for it is aimed at a broader audience than the textbook. This raised a nagging question. What is it that makes some text readable?

It’s one of those easy questions that just do not have a simple answer. The quickest answer is “use a readability metric standardised by grade level” for a home language/mother tongue speaker. Scratching that surface, it lays bare the next question: what parameters have to be taken into account in what way so as to come up with a score for the estimated grade level? Even the brief overview on the Wikipedia page on readability already lists 11 measurable parameters, and there are different ways to measure them and to possibly combine them as well. The same page lists 8 popular metrics and 4 advanced ones. That’s just for English. For instance, the Flesch reading ease is calculated as

206.835 – 1.015 * (total number of words / total number of sentence) – 84.6 * (total number of syllables / total number of words)

A rough categorisation of various texts for adults according to their respective Flesh Reading ease scores. Source:

to result in rough bands of reading ease. For instance, 90-100 for an 11-year old, 60-70 as ‘plain English’, up to anything <30 down to 0 (and possibly even negative) for very to extremely difficult English texts and for professionals and graduate students. See also the figure on the right.

The Gunning fog index has fewer fantastically tweaked multipliers:

Grade level = 0.4 * (average sentence length + percentage of Hard Words)

but there’s a wonderful Hard Words variable. What is that supposed to mean exactly? The readability page says that they are those words with two or more syllables, but the Gunning fog index page says three or more syllables (excluding proper nouns, familiar jargon, or compound words, and not counting common suffixes either).

Either way, the popular metrics are all easy to measure computationally without human intervention. Parameters such as fatigue or speed of perception or background knowledge are not. Proxies for reading speed surely will be available by now somewhere; e,g., in the form of algorithms that analyse page-turning in eBook readers and a visitor’s behaviour scrolling webpages when reading a long article (the system likely knows that you probably won’t finish reading this post).

I don’t know why I never thought about all that before writing the textbook and why none of the writing guidelines I have looked up over the years had mentioned it. The most I did for readability, especially when I was writing my PhD thesis, was the “read aloud test” that was proposed in one of those writing guidelines: read your text aloud, and if you can’t, then something is wrong with the sentence. I used the Acrobat built-in screen reader for that as a first pass. If the text-to-speech algorithm stumbled over it, then it was time to reconsider the phrasing. I would then read it aloud myself and decide whether the Acrobat algorithm had to be improved upon or my sentence had to be revised.

How does the ontology engineering textbook fare? Are my blog posts any more readable? How much worse are the scientific papers? Is it true that the English in articles in science are a sort of a pidgin English whereas in other fields, notably humanities, the erudition and wordsmithery shines through in the readability metrics scores? I have no good answers now, but it would be easy to compute with a fine dataset of texts and the Python py-readability-metrics module for some quick ‘n dirty checks or to adapt some other open source code for batch processing (e.g., from here, among multiple options). Maybe later; there are some other kinks to straighten first.

Notably, one can game the system based on some of those key parameters. Besides sentences length—around 20 words is fine, I was told a long while ago—there are the number of syllables of the words and the vocabulary that are taken into account. More monosyllabic words in shorter sentences with fewer types will come out as more easily readable, according to the metric that is.

But ‘easier’ or ‘better’ lies in the eyes of the beholder: it may be such confetti so as to have become awful to read due to its lack of flow and coherence. Really. It is as I say. Don’t you think? It’s the way I see it. What say you? The “ Really. … you?” has a Flesch reading ease of 90.38 and a Gunning Fog index of 1.44 as number of years of formal education you would have needed to easily understand that. The “Notably, … and coherence” before it in this paragraph has a Flesch reading ease of 50.52 and a Gunning Fog index of 13.82.

Based on random sampling from my textbook, at least one of the paragraphs (p34, ‘purposes’) got a Flesch reading ease of 9.29 and a Gunning Fog index of 22.73, while other parts are around 30 and some are even in the 50-70 region for reading ease.

The illustration out of the way, let’s look at limitations. First, not all polysyllabic words are difficult and not all monosyllabic words are simple; e.g., the common, and therewith easy, ‘education’ and ‘interesting’ vs. the semi-obscure ‘nub’, ‘sloop’, ‘gry’, and ‘squick’ (more here). The longest monosyllabic words, such as ‘scraunched’ and ‘strengthed’, aren’t exactly easy to read either.

Plenty of other languages have predominantly polysyllabic words with lots of syllables, such as Dutch or German where new words can be formed by putting existing ones together. Dutch woord meervoudigepersoonlijkheidsstoornis puts together into one concept meervoudige and persoonlijkheid and stoornis (‘multiple personality disorder’). Agglutinating languages, such as isiZulu, not only compose long words, but have so many meaningful pieces that a single word may well be a whole sentence in a disjunctive language. For instance, the 10-syllabic word that one of my former students used to make the point: titukakimureeterahoganu ‘we have never ever brought it to him’. You get used to long words and there’s no reason why English speakers would be inherently incapable to handle that. Intelligence does not depend on one’s mother tongue. Perhaps, if one is used to a disjunctive orthography, one may have become lazy. Any use off aforementioned readability metrics for ‘non-English’ clearly will have to be revised to tailor it to a language.

Then there’s foreign language background that interferes with reading ease. Many a so-called supposedly ‘difficult‘ word in English comes from French, Italian, Latin, or Greek; e.g., oxymoron (Gr), camaraderie (Fr), quotidian (It), and obfuscate (La). For instance, we use oxymoron in Dutch as well, so there’s no ‘difficulty’ to it for a Dutch person, or take maalstroom that is pronounced nearly the same as ‘maelstrom’ and demagoog for ‘demagogue’ (also Greek origins, similar pronunciation) and algorithme for ‘algorithm’ (Persian origins, not an Anglicism), and recalcitrant is even spelled the same. The foreigner trying to speak or write English may not be erudite, but just winging it and hoping that the ‘copy and adapt’ works out. Conversely, supposedly ‘simpler’ words may not be: ‘wayward’ is a synonym for recalcitrant and with only two syllables, it will make the readability score better. It would make it less readable to at least Dutch, Spanish, Italian and so on readers who are trying to read English text, however, because there’s no connection with a familiar-looking word. About 80% of English words are borrowed from other languages.

Be that as it may, maybe I should reassess my textbook on the metric; maybe not. What does the algorithm know about computer science terminology anyhow? “Ontology Engineering is a specialisation in knowledge representation and reasoning.” has a Flesh reading ease of -31.73 and a Gunning Fog index of 20.00; a tough game it would be to get that back to a reading ease of 50.

It did affect a number of sentences in my memoir book. I don’t expect Joe and Joanne Soap to be interested, but teenagers who are shopping around for a university degree programme might, and then professionals, students, and academics with a little spare time to relax and read, too. In other words: a reading ease of around 40-60. Some long sentences could indeed be split up without losing content, coherence, and flow.

There were others where the simplification didn’t feel like an improvement. For instance, compare “according to my opinion” with “the way I saw it”: the former flows smoothly whereas the latter sounds alike a nagging firing off. The latter for sure improves the readability score with all those monosyllabic words. The copy editor changed the former into the latter. It still bugs me. Why? After some further pondering beyond just blaming the grating staccato of a sequence of monosyllabic words, perhaps it is because an opinion generally is (though need not be) formed after considering the facts and analysing them, whereas seeing something in some way may (but definitely need not) be based on facts and analysis. That is, on closer inspection, they’re not equivalent phrases, not at all. Nuances can be, and were, lost with shorter sentences and simpler words. One’s voice, too. So there’s that. Overall, though, I hope the balance leans toward more readable, to get the message across better to more readers.

Lastly, there seems to be plenty of scope for more research on readability metrics—ones that can be computed, that is. While there are several applications for other well-resourced languages, including easy web apps, such as for Spanish and German and even for Dutch, there are very many languages spoken around the globe that do not have such metrics and nice algorithms yet. But even the readability metrics for English could be tweaked. For instance, to tailor it to a genre or a discipline. Then one it would be easier to determine if a book is, say, an easy-reading popular science book for the holidays on the beach or one that requires some or even a lot of effort. For computer science, one could take Gunning Fog and adjust the Hard Words variable to exclude common jargon that is detrimental to the score, like ‘encapsulation’ and ‘representation’ (both 5 syllables); biochemistry would need that too, given the long names for chemical compounds. And to add a penalty for too many successive monosyllabic words. There will be more options to tweak the formulae and test it, but such additional digging is something for another time.

As to my question in the introductory paragraph of this post, “What is it that makes some text readable?”: if you’re made it all the way here reading this post, we’re all a bit wiser on readability, but a short and simple answer I still don’t have. It’s a long story with ifs and buts, and the last word is yet to be said about it.

As a bonus, here are a few hints to make something more readable, according to the readability calculator of the web-based editor tool of the The Conversation:

Screenshot I took some time halfway when working on a article for The Conversation.

p.s.: The ‘science of reading‘ adds more to it, to the point you wonder how there even can be metrics. But, their scope is broader.

pp.s.: The first full draft of this post had a reading ease of 52.37 and a Gunning Fog of 11.78, and the final one 54.37 and 11.18, respectively, which is fine by me. Length is probably more of an issue.