Good girls, bold girls – but not böse

That first sentence of a book, including non-fiction books, may set the tone for what’s to come. For my memoir, it’s a translation of Brave meisjes komen in de hemel, brutale overal: good girls go to heaven, bold ones go everywhere.

I had read a book with that title some 25 years ago. It was originally written by Ute Ehrhardt in 1994 and translated from German to Dutch and published a year later. For the memoir, I had translated the Dutch title of the book into English myself: the brutale translates to ‘bold’ according to me, my dictionary (a Prisma Woordenboek hard copy), and an online dictionary. Bold means “(of a person, action, or idea) showing a willingness to take risks; confident and courageous.” according to the Oxford dictionary (and similarly here) and it’s in the same league as audacious, daring, brazen, and perky. It has a positive connotation.

What I, perhaps, ought to have done last year, is to find out whether the book also had been translated into English and trust that translator. As it turned out, I’m glad I did not do so, which brings me to the more substantive part of the post. I wanted to see whether I could find the book in order to link it in this post. I did. Interestingly, the word used in the English title was “bad” rather than ‘bold’, yet brutaal is not at all necessarily bad, nor is the book about women being bad. Surely something must have gotten warped in translation there?!

I took the hard copy from the bookshelf and checked the fine-print: it listed the original German title as Gute Mädchen kommen in den Himmel, böse überall hin. Hm, bӧse is not good. It has 17 German-to-English translations and none is quite as flattering as bold, not at all. This leaves either bad translations to blame or there was a semantic shift in the German-to-Dutch translation. Considering the former first, it appeared that the German-Dutch online dictionary did not offer nice Dutch words for bӧse either. Getting up from my chair again to consult my hard copy Prisma German-Dutch dictionary did not pay off either, except for one, maybe (ondeugend). It does not even list brutaal as possible translation. Was the author, Dr Ehrhardt of the Baby Boomer generation, still so indoctrinated in the patriarchy and Christianity – Gute vs Das Bӧse – as to think that not being a smiling nice girl must mean being bӧse? The term did not hold back the Germans, by the way: it was the best-sold non-fiction book in Germany in 1995, my Dutch copy stated. Moreover, it turned out to be at second place overall since German book sales counting started 60 years ago, including having been a whopping 107 weeks at first place in the Spiegel bestseller list. What’s going on here? Would the Germans be that interested in ‘bad’ girls? Not quite. The second option applies, i.e., the the semantic shift for the Dutch translation.

The book’s contents is not about bad, mean, or angry women at all and the subtitle provides a further hint to that: waarom lief zijn vrouwen geen stap verder brengt ‘why being nice won’t get women even one step ahead’. Instead of being pliant, submissive, and self-sabotaging in several ways, and therewith have our voices ignored, contributions downplayed, and being passed over for jobs and promotions, it seeks to give women a kick in the backside in order to learn to stand one’s ground and it provides suggestions to be heard and taken into account by avoiding the many pitfalls. Our generation of children of the Baby Boomers would improve the world better than those second wave feminists tried to do, and this book fitted right within the Zeitgeist. It was the girl power decade in the 1990s, where women took agency to become master of their own destiny, or at least tried to. The New Woman – yes, capitalised in the book. Agent Dana Scully of the X Files as the well-dressed scientist and sceptic investigator. Buffy the vampire slayer. Xena, Warrior Princess. The Spice Girls. Naomi Wolf’s Fire with Fire (that, by the way, wasn’t translated into Dutch). Reading through the book again now, it comes across as a somewhat dated use-case-packed manifesto about the pitfalls to avoid and how to be the architect of your own life. That’s not being bad, is it.

I suppose I have to thank the German-to-Dutch book translator Marten Hofstede for putting a fitting Dutch title to the content of the book. It piqued my interest in the bookstore at the train station, and I bought and read it in hat must have been 1997. It resonated. To be honest, if the Dutch title would have used any of the listed translations in the online dictionary – such as kwaad, verstoord, and nijdig – then I likely would not have bought the book. Having had to be evil or perpetually angry to go everywhere, anywhere and upward would have been too steep price to pay. Luckily, bold was indeed the right attribute. Perhaps for the generation after me, i.e., who are now in their twenties, it’s not about being bold but about being, as a normal way of outlook and interaction in society. Of course a woman is entitled to live her own life, as any human being is.

Advertisement

English, Englishes – which one to use for writing?

Sometimes, the answer to the question in the post’s title is easy, if you’re writing in English: do whatever the style guide says. Don’t argue with the journal editor or typesetter about that sort of trivia (unless they’re very wrong). If it states American English spelling, do so; if British English, go for that. If you can’t distinguish your color from colour, modeling from modelling, and a faucet from a tap, use a spellchecker with one of the Englishes on offer—even OpenOffice Writer shows red wavy lines under ‘color’, ‘modeling’, and ‘faucet’ when it’s set to my default “English (South Africa)”. There are very many other places where you can write in English as much as you like or have time for, however, and then the blog post’s question becomes more relevant. How many Englishes or somehow accepted recognised variants of English exist, and where does it make a difference in writing such that you’ll have to, or are supposed to, choose?

It begs the question of how many variants of English count as one of the Englishes, which is tricky to answer, because it depends on what counts. Does a dialect count? Does it count when it’s sanctioned by a country when it has an official language status and a language body? Does it count when there are enough users? Or when there’s enough text to detect the substantive differences? What are the minimum number or type of differences, if any, and from which standard, before one may start to talk of different Englishes and a new spin-off X-English? People have been deliberating about such matters and trying to document differences and even have come up with classification schemes. Englishes around the world, to be more precise, refer to localised or indigenised versions of English that are either those people’s first or institutionalised language, not just any variant or dialect. There’s an International Association for World Englishes (IAWE) and there are handbooks, textbooks, and scientific journals about it, and the 25th conference of the IAWE will take place next year.

In recent years there have been suggestions that English could break up into mutually unintelligible languages, much as Latin once did. Could such a break-up occur, or are we in need of a new appreciation of the nature of World English?

Tom McArtrur, 1987, writing from “the mother country”, but not “the centre of gravity”, of English (pdf).

My expertise doesn’t go that far – I’m operating from the consumer-side of these matters, standards-following, and trying to not make too many mistakes. It took me a while to figure out there was British English (BE) and American English (AE) and then it was a matter of looking up rules on spelling differences, like -ise vs. -ize and single vs. double l (e.g., traveling vs. travelling), checking comparative word lists, and other varied differences, like whether it’s ‘towards’ or ‘toward’ or 15:30, 15.30, 3.30pm or 3:30pm (or one of my colleagues p’s, like a 3.30p). Not to mention a plethora of online writing guides and the comprehensive sense of style book by Steven Pinker. Let’s explore the Englishes and Global English a little.

McArthur’s Englishes (source)

South African English (SAE) exists as one of the recognised Englishes, all the way into internationally reputable dictionaries. It is a bit of a mix of BE and AE, with some spices sprinkled into it. It tries to follow BE but there are AE influences due to the media and, perhaps, anti-colonial sentiment. It’s soccer, not football, for instance, and the 3.30pm variant rather than a 24h clock. Well, I’m not sure it is officially, but practically it is so. It also has ‘weird’ words that everyone is convinced is native English of the BE variety, but isn’t, like timeously rather than timeous or timely – the most I could find was a Wiktionary entry on it claiming it to be Scottish and SAE, but not even the Dictionary of SAE (DSAE) has an entry for it. I’ve seen it so often in work emails over the years that I caved in and use it as well. There are at least a handful of SAE words that people in South Africa think is BE but isn’t, as any SA expat will be able to recall when they get quizzical looks overseas. Then there are hundreds of words that people know is SAE at least unofficially, which are mainly the loan words and adopted words from the 10 other languages spoken in SA – regional overlap causes mutual language influences in all directions. Bakkie, indaba, veld, lekker, dagga, and many more – I’ve blogged about that before. My OpenOffice SAE spellchecker doesn’t flag any of these words as typos.

Arguably, also grammatical differences for SAE exist. In practice they sure do, but I’m not aware of anything officially endorsed. There is no ‘benevolent language dictator’ with card-carrying members of the lexicography and grammar police to endorse or reprimand. Indeed there is the Pan-South African Language Board (PANSALB), but its teeth and thunder don’t come close to the likes of the Académie Française or Real Academia Española. Regarding grammar, that previous post already mentioned the case of the preposition at the end of a sentence when it’s a separable part of the verb in Afrikaans, Dutch, and German (e.g., meenemen or mitnehmen ‘take with’). A concoction that still makes me wince each time I hear or read it, is the ‘can be able to’. It’s either can + verb what you can, or copula + able to + verb what you can do. It is, e.g., ‘I can carry out the experiment’ or ‘I’m able to carry out the experiment’, but not ‘I can be able to carry out the experiment’. I suspect it carries over from a verb form in Niger-Congo B languages since I’ve heard it used also by at least Tanzanians, Kenyans, and Malawians, and meanwhile I’ve occasionally seen it also in texts written by English South African students.

If the notion of “Englishes” feels uncomfortable, then what about Global/World/International English? Is there one? For many a paper I review double-blind, i.e., where the author names and affiliations are hidden, I can’t tell unless the English is really bad. I’ve read enough to be able to spot Spanglish or Chinglish, but mostly I can’t tell, in that there’s a sort of bland scientific English – be it a pidgin English, or maybe multiple authors cancel out ways of making mistakes, or no-one really bothers tear the vocabulary apart into their boxes because it’s secondary to the scientific content being communicated. No doubt that investigative deliberations are ongoing about that too; if there aren’t, they ought to.

Another scenario for ‘global English’, concerns how to write a newsletter to a global audience. For instance, if you were to visit a website with an intended audience in the USA, then it should tolerable to read “this fall”, even though elsewhere it’s either autumn, spring, a rainy or a dry season. If it’s an article by the UN, say, then one may expect a different wording that is either not US-centric or, if the season matters, to qualify it like in a “Covid-19 cases are expected to rise during fall and winter in North America”. With the former wording, you can’t please everyone, due to different calendars with different month names and year ends and different seasons. The question also came up recently for a Wikimedia blog post that I was involved sideways in a draft version, on Abstract Wikipedia progress for its natural language generation component. My tendency was toward(s) a Global English, whereas one of my collaborators’ stance was that they assumed a rule that it should be the English of wherever the organisation’s headquarters is located. These choices were also confusing when I was writing the first draft of my memoir: it was published by a South African publisher, hence, SAE style guidelines, but the book is also distributed – and read! – internationally.

Without clear rules, there will always be people who complain about your English, be it either that you’re wrong or just not in the inner circle for sensing ‘the feeling of the language that only a native speaker can have’, that supposedly inherently unattainable fingerspitzengefühl for it. No clear rules isn’t good for developing spelling and grammar checkers either. In that regard, and that one only, perhaps I just might prefer a benevolent dictator. I don’t even care which of the Englishes (except for not the stupid stuff like spelling ‘light’ as ‘lite’, ffs). I also fancy the idea of banding together with other ‘nonfirst-language’ speakers of English to start devising and dictating rules, since the English speakers can’t seem to sort out their own language – at least not enough like the grammatically richer languages – and we’re in the overwhelming majority in numbers (about 1:3 apparently). One can dream.

As to the question in the title of the blog post: what I’ve written so far is not a clear answer for all cases, indeed, in particular when there is no editorial house style dictating it, but this lifting of the veil hopefully has made clear that attempting to answer the question means opening up that can of worms further. You could create your own style guide for your not-editor-policed writings. The more I read about it, though, the more complicated things turn out to be, so you’re warned in case you’d like to delve into this topic. Meanwhile, I’ll keep winging it on my blog with some version of a ‘global English’ and inadvertent typos and grammar missteps…

Riffling through readability metrics

I was interviewed recently about my ontology engineering textbook, following having won the 2021 UCT Open Textbook Award for it. The interviewer assumed initially it was a textbook for undergraduate students because it has the word ‘Introduction’ in the title. Not quite. Soon thereafter, one of the 3rd-year computer science students who arrived early in class congratulated me on the award and laughed that that was an introduction at a different level altogether. It is, by design, but largely so with respect to the topics covered: it does not assume the reader knows anything about ontologies—hence, the ‘introduction’—but it does take for granted that the reader knows some of the basics in computer science or software engineering. For instance, there’s no explanation on what a database is, or a conceptual data model, or object-oriented software.

In addition, and getting to this post’s topic, I had tried to make the textbook readable, and at least definitely more accessible than scientific papers and handbooks that were the only alternatives before this textbook saw the light of day. I think it is readable and I also have received feedback that the book was easily readable. Admittedly, though, the notion of assessing readability only came afore in the editing process of my memoir, for it is aimed at a broader audience than the textbook. This raised a nagging question. What is it that makes some text readable?

It’s one of those easy questions that just do not have a simple answer. The quickest answer is “use a readability metric standardised by grade level” for a home language/mother tongue speaker. Scratching that surface, it lays bare the next question: what parameters have to be taken into account in what way so as to come up with a score for the estimated grade level? Even the brief overview on the Wikipedia page on readability already lists 11 measurable parameters, and there are different ways to measure them and to possibly combine them as well. The same page lists 8 popular metrics and 4 advanced ones. That’s just for English. For instance, the Flesch reading ease is calculated as

206.835 – 1.015 * (total number of words / total number of sentence) – 84.6 * (total number of syllables / total number of words)

A rough categorisation of various texts for adults according to their respective Flesh Reading ease scores. Source: https://blog.cathy-moore.com/2017/07/how-to-get-everyone-to-write-like-ernest-hemingway/.

to result in rough bands of reading ease. For instance, 90-100 for an 11-year old, 60-70 as ‘plain English’, up to anything <30 down to 0 (and possibly even negative) for very to extremely difficult English texts and for professionals and graduate students. See also the figure on the right.

The Gunning fog index has fewer fantastically tweaked multipliers:

Grade level = 0.4 * (average sentence length + percentage of Hard Words)

but there’s a wonderful Hard Words variable. What is that supposed to mean exactly? The readability page says that they are those words with two or more syllables, but the Gunning fog index page says three or more syllables (excluding proper nouns, familiar jargon, or compound words, and not counting common suffixes either).

Either way, the popular metrics are all easy to measure computationally without human intervention. Parameters such as fatigue or speed of perception or background knowledge are not. Proxies for reading speed surely will be available by now somewhere; e,g., in the form of algorithms that analyse page-turning in eBook readers and a visitor’s behaviour scrolling webpages when reading a long article (the system likely knows that you probably won’t finish reading this post).

I don’t know why I never thought about all that before writing the textbook and why none of the writing guidelines I have looked up over the years had mentioned it. The most I did for readability, especially when I was writing my PhD thesis, was the “read aloud test” that was proposed in one of those writing guidelines: read your text aloud, and if you can’t, then something is wrong with the sentence. I used the Acrobat built-in screen reader for that as a first pass. If the text-to-speech algorithm stumbled over it, then it was time to reconsider the phrasing. I would then read it aloud myself and decide whether the Acrobat algorithm had to be improved upon or my sentence had to be revised.

How does the ontology engineering textbook fare? Are my blog posts any more readable? How much worse are the scientific papers? Is it true that the English in articles in science are a sort of a pidgin English whereas in other fields, notably humanities, the erudition and wordsmithery shines through in the readability metrics scores? I have no good answers now, but it would be easy to compute with a fine dataset of texts and the Python py-readability-metrics module for some quick ‘n dirty checks or to adapt some other open source code for batch processing (e.g., from here, among multiple options). Maybe later; there are some other kinks to straighten first.

Notably, one can game the system based on some of those key parameters. Besides sentences length—around 20 words is fine, I was told a long while ago—there are the number of syllables of the words and the vocabulary that are taken into account. More monosyllabic words in shorter sentences with fewer types will come out as more easily readable, according to the metric that is.

But ‘easier’ or ‘better’ lies in the eyes of the beholder: it may be such confetti so as to have become awful to read due to its lack of flow and coherence. Really. It is as I say. Don’t you think? It’s the way I see it. What say you? The “ Really. … you?” has a Flesch reading ease of 90.38 and a Gunning Fog index of 1.44 as number of years of formal education you would have needed to easily understand that. The “Notably, … and coherence” before it in this paragraph has a Flesch reading ease of 50.52 and a Gunning Fog index of 13.82.

Based on random sampling from my textbook, at least one of the paragraphs (p34, ‘purposes’) got a Flesch reading ease of 9.29 and a Gunning Fog index of 22.73, while other parts are around 30 and some are even in the 50-70 region for reading ease.

The illustration out of the way, let’s look at limitations. First, not all polysyllabic words are difficult and not all monosyllabic words are simple; e.g., the common, and therewith easy, ‘education’ and ‘interesting’ vs. the semi-obscure ‘nub’, ‘sloop’, ‘gry’, and ‘squick’ (more here). The longest monosyllabic words, such as ‘scraunched’ and ‘strengthed’, aren’t exactly easy to read either.

Plenty of other languages have predominantly polysyllabic words with lots of syllables, such as Dutch or German where new words can be formed by putting existing ones together. Dutch woord meervoudigepersoonlijkheidsstoornis puts together into one concept meervoudige and persoonlijkheid and stoornis (‘multiple personality disorder’). Agglutinating languages, such as isiZulu, not only compose long words, but have so many meaningful pieces that a single word may well be a whole sentence in a disjunctive language. For instance, the 10-syllabic word that one of my former students used to make the point: titukakimureeterahoganu ‘we have never ever brought it to him’. You get used to long words and there’s no reason why English speakers would be inherently incapable to handle that. Intelligence does not depend on one’s mother tongue. Perhaps, if one is used to a disjunctive orthography, one may have become lazy. Any use off aforementioned readability metrics for ‘non-English’ clearly will have to be revised to tailor it to a language.

Then there’s foreign language background that interferes with reading ease. Many a so-called supposedly ‘difficult‘ word in English comes from French, Italian, Latin, or Greek; e.g., oxymoron (Gr), camaraderie (Fr), quotidian (It), and obfuscate (La). For instance, we use oxymoron in Dutch as well, so there’s no ‘difficulty’ to it for a Dutch person, or take maalstroom that is pronounced nearly the same as ‘maelstrom’ and demagoog for ‘demagogue’ (also Greek origins, similar pronunciation) and algorithme for ‘algorithm’ (Persian origins, not an Anglicism), and recalcitrant is even spelled the same. The foreigner trying to speak or write English may not be erudite, but just winging it and hoping that the ‘copy and adapt’ works out. Conversely, supposedly ‘simpler’ words may not be: ‘wayward’ is a synonym for recalcitrant and with only two syllables, it will make the readability score better. It would make it less readable to at least Dutch, Spanish, Italian and so on readers who are trying to read English text, however, because there’s no connection with a familiar-looking word. About 80% of English words are borrowed from other languages.

Be that as it may, maybe I should reassess my textbook on the metric; maybe not. What does the algorithm know about computer science terminology anyhow? “Ontology Engineering is a specialisation in knowledge representation and reasoning.” has a Flesh reading ease of -31.73 and a Gunning Fog index of 20.00; a tough game it would be to get that back to a reading ease of 50.

It did affect a number of sentences in my memoir book. I don’t expect Joe and Joanne Soap to be interested, but teenagers who are shopping around for a university degree programme might, and then professionals, students, and academics with a little spare time to relax and read, too. In other words: a reading ease of around 40-60. Some long sentences could indeed be split up without losing content, coherence, and flow.

There were others where the simplification didn’t feel like an improvement. For instance, compare “according to my opinion” with “the way I saw it”: the former flows smoothly whereas the latter sounds alike a nagging firing off. The latter for sure improves the readability score with all those monosyllabic words. The copy editor changed the former into the latter. It still bugs me. Why? After some further pondering beyond just blaming the grating staccato of a sequence of monosyllabic words, perhaps it is because an opinion generally is (though need not be) formed after considering the facts and analysing them, whereas seeing something in some way may (but definitely need not) be based on facts and analysis. That is, on closer inspection, they’re not equivalent phrases, not at all. Nuances can be, and were, lost with shorter sentences and simpler words. One’s voice, too. So there’s that. Overall, though, I hope the balance leans toward more readable, to get the message across better to more readers.

Lastly, there seems to be plenty of scope for more research on readability metrics—ones that can be computed, that is. While there are several applications for other well-resourced languages, including easy web apps, such as for Spanish and German and even for Dutch, there are very many languages spoken around the globe that do not have such metrics and nice algorithms yet. But even the readability metrics for English could be tweaked. For instance, to tailor it to a genre or a discipline. Then one it would be easier to determine if a book is, say, an easy-reading popular science book for the holidays on the beach or one that requires some or even a lot of effort. For computer science, one could take Gunning Fog and adjust the Hard Words variable to exclude common jargon that is detrimental to the score, like ‘encapsulation’ and ‘representation’ (both 5 syllables); biochemistry would need that too, given the long names for chemical compounds. And to add a penalty for too many successive monosyllabic words. There will be more options to tweak the formulae and test it, but such additional digging is something for another time.

As to my question in the introductory paragraph of this post, “What is it that makes some text readable?”: if you’re made it all the way here reading this post, we’re all a bit wiser on readability, but a short and simple answer I still don’t have. It’s a long story with ifs and buts, and the last word is yet to be said about it.

As a bonus, here are a few hints to make something more readable, according to the readability calculator of the web-based editor tool of the The Conversation:

Screenshot I took some time halfway when working on a article for The Conversation.

p.s.: The ‘science of reading‘ adds more to it, to the point you wonder how there even can be metrics. But, their scope is broader.

pp.s.: The first full draft of this post had a reading ease of 52.37 and a Gunning Fog of 11.78, and the final one 54.37 and 11.18, respectively, which is fine by me. Length is probably more of an issue.

A handful of memoirs and autobiographies for computer science

Since I published my second book, that memoir on a scenic route into computer science, several people have asked me “why?” and “what makes yours stand out from the crowd?”. The answer to the latter is easy: there is no crowd. (The brief answer to ‘why’ is mentioned in the Introduction chapter). Let me elaborate a little.

In the early stage of writing the book, I dutifully did do my market research to answer the typical starter questions like: What books in your genre or on your topic are already out there? How crowded is the field? Will your prospective book be just another one on that pile? Will it stand out as different? And if so, is that an interesting difference to at least some readership segment so that it will have potential to be sold beyond a close circle of friends and family? So, I searched and searched and searched, in late 2020 and again twice in 2021, and even now when writing this post. Memoirs by female computer scientists, by male computer scientists, whatever gender computer scientist in academia. Autobiographies as well then. I stretched the search criteria further, into the not-in-their-own-words biographies of computer science professors.

Collage made with the respective covers or first page of the memoir and autobiography books listed and linked here.

If you take your time searching for those books, you should be able to find the following four books and booklets of the memoir or autobiography variety, by computer science professors, on computing, computing milieux, or computer science:

  • James Morris’ memoir that was published in the same week as mine was in late 2021. It covers his 60 years career in computer science and, according to the book’s tweet-size blurb “is a search for intelligence across multiple facets of the human condition—religion and science, evolution, and innovation”.
  • The early years of academic computing professional memoir by Kenneth King made available in 2014 (free pdf).
  • The unpublished memoir by Ray Miller, on 50 years in computing (1953-1993), online available from the IEEE Computer Society as part of its computer history museum.
  • Maurice Wilkes’ hardcopy autobiography from 1985 that is, consequently, hard to access.

That’s all. Four retired (and some meanwhile deceased) computer science professors telling their tale, three of which cover only the early days of computing.

Collage made with the covers or first page of the quite related memoir and autobiography books listed and linked here.

There are a few very recent memoirs by professors that were in print or announced to go in print soon, on attendant topics, notably:

What there are lots of, are books about, and occasionally by, ‘celebrity’ people in IT and computing who made it in industry these days, such as Bill Gates, Steve Jobs, Elon Musk, Satya Nadella, and Sheryl Sandberg, and famous people in computing history, such as Ada Lovelace, Grace Hopper, George Boole, and Alan Turing (also about, not by). And there are short and long memoirs about tech by journalists and writers and by engineers and programmers who write, such as on Linux in Australia (here) or 10 years in Silicon Valley (here). There are also a few professional memoir essays and articles by computer science professors, such as about the development of the network time protocol by David Mills (here).

The people ‘out there’ – outside of the ivory tower of academia – do have lots of assumptions about computer science professors. When I mention to them that, yes, I’m one of those, at UCT even, a not uncommon reaction is an involuntary reflex of apprehension. The eyes move to a corner of the eye socket, the head turns a little and moves back, and the upper body follows, even if only slightly. I notice. But what do you really know about us? Nothing, really.

Even among academics in computer science, we have only sketchy information about our colleagues’ respective backgrounds. Yes there are the privileged ones, who had early access to computers, tinkered with them in their spare time, got their pizza delivered, participated in programming contests and so on. But there are others who made it. Who escaped persecution in Eastern Europe during the Cold War and had to find their way in a different country, whose first interaction with a computer was only at university, or who grew up in some hamlet with limited electricity and potable water. Who came from a broken home, or who had to leave family and friends to get that elusive job in the scarce academic job market many kilometers away, or whose relations stranded due to the two-body problem (partner who is also an academic, but in a different city or country). Who made it against the odds. And there are those who defected from physics, or who took a stroll out of philosophy to never return, or who still flip-flop with chemistry, to name but a few, and who thus have at least two specialisations under their belt. Those who know about more stuff than just computing.

That’s just about an academic’s background. What do you know of our daily activities? Nothing really, either. Assumptions abound; there are about as many memes and jokes about our jobs as assumption. And movies, TV series, and fiction novels that aren’t necessarily depicting it accurately either.

But us, in our own words? The memoir and autobiography books literally can be counted on one hand. I can assure you it’s not because we have no life and have nothing to say. We do. For instance, it takes about 10-30 years before the theories and techniques we investigate will mature enough to seep into the wider society. Impactful, cool, and fun things happen along the way. Those ‘infoboxes’ from Google when it returns the search results? The theory and techniques behind it date back to the late 1990s with ontologies and I was a part of that. Toy drones? There was one to play with at the European Conference on Artificial Intelligence 2006 (ECAI’06) that I attended, when the first small toy drones needed to be equipped with ‘intelligent’ processing of sensor data. The drone demo area was suitably demarcated with red-white coloured tape, for neither the engineers nor the organisers, nor us as attendees, were convinced it was safe to make it fly around without causing trouble.

Screengrab of “Dr Fill” in action in last year’s crossword puzzle contest: Video: https://www.youtube.com/watch?v=aIjD-sIDCeE

The demo session at ECAI’06 also had a crossword puzzle contest with WebCrow: researchers against an algorithm that trawled the Web for answers. The 25 of us onsite participants – perhaps the first ever to participate in such a contest – sat on uncomfortable plastic chairs in cinema style in a section of a large hall in the conference venue at Riva del Garda in Italy. Onlookers marveled that the event really took place, and unsure about which horse to bet on. The algorithm won, but we had fun. Last year’s news that an algorithmic solver won from expert human puzzlers seems a bit late and old news. I can very well imagine what those human participants must have felt.

Maybe you don’t care about computer science professors or about early days of new theories and techniques and how they came about. We all have our interests and time is limited. That’s fine; I don’t read all books either. But, if you were to ever wonder about the human in the computer science academic, there are, for now, those four books listed above, mine, and the other three books that are quite nearby in scope. Happy reading!

Trying to categorise popular science books

Some time last year, a colleague asked about good examples of popular science books, in order to read and thereby to get inspiration on how to write books at that level, or at least for first-year students at a university. I’ve read (and briefly reviewed) ‘quite a few’ across multiple disciplines and proposed to him a few of them that I enjoyed reading. One aspect that bubbled up at the time, is that not all popsci books are of the same quality and, zooming in on this post’s topic: not all popsci books are of the same level, or, likely, do not have the same target audience.

I’d say they range from targeting advanced interested laypersons to entertaining laypersons. The former entails that you’d be better off having covered the topic at school and an undergrad course or two will help as well for making it an enjoyable read, and be fully awake, not tired, when reading it. For the latter category at the other end of the spectrum: having completed little more than primary school will do fine and no prior subject domain knowledge is required, at all, and it’s good material for the beach; brain candy.

Either way you’ll learn something from any popsci book, even if it’s too little for the time spent reading the book or too much to remember it all. But some of them are much more dense than others. Compare cramming the essence of a few scientific papers in a book’s page to drawing out one scientific paper into a whole chapter. Then there’s humor—or the lack thereof—and lighthearted anecdotes (or not) to spice up the content to a greater or lesser extent. The author writing about fungi recounting eating magic mushrooms, say, or an economist being just as much of a sucker for summer sales in the shops as just about anyone. And, of course, there’s readability (more about that shortly in another post).

Putting all that in the mix, my groupings are as follows, with a selection of positive exemplars that I also enjoyed reading.

There are more popsci books of which I thought they were interesting to read, but I didn’t want to turn it into a laundry list. Also, it seemed that books on politics and society and philosophy and such seem to be deserving their own discussion on categorisation, but that’s for another time. I also intentionally excluded computer science, information systems, and IT books, because I may be differently biassed to those books compared to the out-of-my-own-current-specialisation books listed above. For instance, Dataclysm by Cristian Rudder on Data Science mainly with OKCupid data (reviewed earlier) was of the ‘entertainment’ level to me, but probably isn’t so for the general audience.

Perhaps it is also of use to contrast them to ‘bad’ examples—well, not bad, but I think they did not succeed well in their aim. Two of them are Critical mass by Phillip Ball (physics, social networks), because it was too wordy and drawn out and dull, and This is your brain on music by Daniel Levitin (neuroscience, music), which was really interesting, but very, very, dense. Looking up their scores on goodreads, those readers converge to that view for your brain on music as well (still a good 3.87 our of 5, from nearly 60000 ratings and well over 1500 reviews), as well as for the critical mass one (3.88 from some 1300 ratings and about 100 reviews). Compare that to a 4.39 for the award-wining Entangled life, 4.35 of Why we sleep, and 4.18 for Mama’s last hug. To be fair, not all books listed above have a rating above 4.

Be this as it may, I still recommend all of those listed in the four categories, and hopefully the sort of rough categorisation I added will assist in choosing a book among the very many vying for your attention and time.

Pushing the envelope categorising popsci books

Regarding book categories more generally, romance novels have subgenres, as does science fiction, so why not the non-fiction popsci books? Currently, they’re mostly either just listed (e.g., here or the new releases) or grouped by discipline, but not according to, say, their level of difficulty, humor, whether it mixes science with politics, self-help, or philosophy, or some other quality dimension of the book along which they possibly could be assessed.

As example that the latter might work for assigning attributes to the books: Why we sleep is 100% science but a reader can distill some ideas to practice with as self-help for sleeping better, whereas When: the scientific secrets of perfect timing is, contrary to what the title suggests, largely just self-help. Delusions of gender and Inside rebellion can, or, rather, should have some policy implications, and Why we sleep possibly as well (even if only to make school not start so early in the morning), whereas the sort of content of Elephants on acid already did (ethics review boards for scientific experiments, notably). And if you were not convinced of the presence of animal cognition, then Mama’s last hug may induce some philosophical reflecting, and then have a knock-on effect on policies. Then there are some books that I can’t see having either a direct or indirect effect on policy, such as Gastrophysics and Entangled life.

Let’s play a little more with that idea. What about vignettes composed of something like the followings shown in the table below?

Then a small section of the back cover of Entangled life would look like this, with the note that the humor is probably inbetween the ‘yes’ and ‘some’ (I laughed harder with the book on drunkenness).

Mama’s last hug would then have something like:

And Why we sleep as follows (though I can’t recall for sure now whether it was ‘some’ or ‘no laughing matter’ and a friend has borrowed the book):

A real-life example of a categorisation box on a product; coffee suitable for moka pots, according to House of Coffees.

Of course, these are just mock-ups to demonstrate the idea visually and to try out whether it is even doable to classify the books. They are. There very well may be better icons than these scruffy ‘take a cc or public domain one and fiddle with it in MS Paint’ or a mixed mode approach, like on the packs of coffee (see image on the right).

Moreover: would you have created the same categorisation for the three examples? What (other) properties of popular science books could useful? Also, and perhaps before going down that route: would something like that possibly be useful according to you or someone you know who reads popular science books? You may leave your comments below, on my facebook page, or write an email, or we can meet in person some day.

p.s.: this is not a serious post on the ontology of popular science books — it is summer vacation time here and I used to write book reviews in the first week of the year and this is sort of related.

A brief reflection on maintaining a blog for 15 years (going on 16)

Fifteen years is a long time in IT, yet blogging software is still around and working—the same WordPress I started my blog with, even. At the time, in 2006, when WordPress was still only offering blogging functionality, they had the air of being respectable and at least somewhat serious compared to blogspot (redirects to Blogger now) that hosted a larger share of the informal and whimsical blogs. Blogs are not nearly as popular now as they used to be, there seems to be a move to huddle together to take a ride on a branded bandwagon, like Medium and Substack, and all of the blog-providing companies have diversified the services they offer for blogging. WordPress now markets itself as website builder, rather than blogging, software.

One might even be tempted to argue that blogs are (nearly) obsolete, with TikTok and the like having come along over the years. No so, claims a blogger here, some 10 more more bloggers here, and even a necessity according to another that does provide a list of links to data to back it up. (Just maybe don’t try making a living from it—there are plenty of people who like to read, but writing doesn’t pay well.)

Some data for this blog, then. It has 325 published post, there are around 400-600 visitors per month in recent years (depending on the season and posting frequency), there are people still signed up to receive updates (78), some even like some of the posts, and some of them are shared Twitter and other social media. The most visited post of all time got over 21000 visits and counting (since 2011) and the most visited post in the past year (after the home page) still had a fine 355 visitors and is on my research and teaching topic (see also the occasionally updated vox populi). So, obsolete it is not. Admitted, the latter post had its heydays in 2010-2012 with about 2500 visits/year and the former saw its best of times in 2014-2015 (4425 and 4948 visits in each of those years alone, respectively). The best visited post of the mere 10 posts I wrote in 2021 is on bias in ontologies, having attracted the attention of 119 visitors. Summarizing this blog’s stats trends: numbers are down compared to 5-10 years ago, indeed, but insignificant it is not and multiple posts have staying power.

Heatmap of monthly views to this blog over time.

I also can reveal that there’s no clear correlation between the time-to-write and number-of-visits variables, nor between either of them and the post’s topic, and not with post length either. With more time, there would have been more, and more polished, posts. There’s plenty to write about, not only the long overdue posts for published papers that came out at an extra-busy time and therefore have slipped through writing about, but also other interesting research that’s going on and deserves that extra bit of attention, some more book reviews, teaching updates and so on. There’s no shortage of topics to write about, which therewith turned out to be an unfounded worry from 15 years ago.

Will I go on for another 15 years? Perhaps, perhaps not. I’m still fence-sitting, from the very first post in 2006 that summed up the reasons for starting a blog to this day, to give it a try nonetheless and see when and where it will end.

Why still fence-sitting? I still don’t know whether it’s beneficial or harmful to one’s career, and if beneficial, whether the time put into writing those posts could have been used better for obtaining more benefit from those alternative activities than from the blog post writing. What I do know, is that, among others, it has helped me to learn to write better, it made me take notes during conferences in order to write conference reports and therewith engage more productively with a conference, structure ideas and thoughts, and pitch papers. Also, the background searches for fact-checking, adding links, and trying to find pictures made me stumble into interesting detours as well. Some of the posts took a long time to write, but at least they were enjoyable pastimes or worktimes.

Uhm, so, the benefit is to (just?) me? I do hope the posts have been worthwhile to the readers. But, it brings into vision the question that’s well-known to aspiring writers: should I write for myself or for my readers? The answer depends on whom you consult: blog for yourself, says the blogger from paradise, write for another, imaginary, reader persona, says the novelist, and go for bothsideism for the best results according to the writer’s guide. I write for myself, and brush it up in an attempt to increase a post’s appeal. The brushing up mainly concerns the choice of words, phrases, and paragraphs and the ordering thereof, and the images to brighten up some of the otherwise text-only posts (like this one).

After so many years and posts, I ought to be able to say something more profound. It’s really just that, though: the joy of writing the posts, the hope it makes a difference to readers and to what I’ve written about, and the slight worry it may not be the best thing to do for advancing my career.

Be this as it may, over the past few days, I’ve added a bit more structure to the blog to assist readers finding the topics they may be interested in. The key different categories are now also accessible from the ‘Menu’, being work-related topics (research and papers, software, and teaching), posts on writing and publishing, and there are a few posts that belong to neither, which still can be found on the complete list of posts. Happy reading!

p.s.: in case you wondered: yes, I intended to do a reflection when the blog turned a nice round 15 in late March, were it not for that blurry extension to 2020 and lots of extra teaching and teaching admin duties in 2021. The summer break has started now and there’s not much of a chance to properly go on holiday, and writing also counts as leisure activity, so there the opportunity was, just about three months shy of the blog turning 16. (In case the post’s title vaguely rings a bell: yes, there’s that cheesy song from one of the top-5 movie musicals of all time [according to imdb], depicting a happy moment with promise of staying together before Rolfe makes some more bad decisions, but that’s 16 going on 17.)

Some explorations into book publishing logistics

Writing a book is only one part of the whole process of publishing a book. There’s the actual thing that eventually needs to get out into the wide world. Hard copy? E-book? Print-on-demand? All three or a subset only? Taking a step back: where are you as author located, where are the publisher and the printer, and where is the prospective audience? Is the prospective readership IT savvy enough for e-books to even consider that option? Is the book’s content suitable for reading on devices with a gazillion different screen sizes? Here’s a brief digest from after my analysis paralysis of the too many options where none has it all – not ever, it seems.

I’ve written about book publishing logistics and choices for my open textbook, but that is, well, a textbook. My new book, No Taming of the Enthusiast, is of a different genre and aimed at a broader audience. Also, I’m a little wiser on the practicalities of hard copy publishing. For instance, it took nearly 1.5 months for the College Publications-published textbook to arrive in Cape Town, having travelled all the way from Europe where the publisher and printer are located. Admittedly, these days aren’t the best days for international cargo, but such a delivery time is a bit too long for the average book buyer. I’ve tried buying books with other overseas retailers and book sellers over the past few years—same story. On top of that, in South Africa, you then have to go to the post office to pick up the parcel and pay a picking-up-the-parcel fee (or whatever the fee is for), on top of the book’s cost and shipping fee. And it may get stuck in Customs limbo. This is not a good strategy if I want to reach South African readers. Also, it would be cool to get at least some books all the way onto the shelves of local book stores.

A local publisher then? That would be good for contributing my bit to stimulating the local economy as well. It has the hard copy logistics problem in reverse at least in part, however: how to get the books from so far down south to other places in the world where buyers may be located. Since the memoir is expected to have an international audience as well, some international distribution is a must. This requirement still gives three options: a multinational hard copy publisher that distributes to main cities with various shipping delays, print-on-demand (soft copy distributed, printed locally wherever it is bought), or e-book.

Let’s take the e-books detour for a short while. There is a low percentage of uptake of e-books – some 20% at best – and lively subjective opinions on why people don’t like ebooks. I prefer hard copies as well, but tolerate soft copies for work. Both are useful for different types of use: a hard copy for serious reading and a soft copy for skimming and searching so as to save oneself endless flicking to look up something. It’s happening the same with my textbook as well, to some extent at least: people pay for it to have it nicely printed and bound even though they can do that with the pdf themselves or just read the pdf. For other genres, some are better in print in any case, such as colourful cookbooks, but others should tolerate e-readers quite well, such as fiction when it’s just plain text.

In deciding whether to go for an e-book, I did explore usability and readability of e-books for non-work books to form my own opinion on it. I really tried. I jumped into the rabbit hole of e-reader software with their pros and cons, and settled on Calibre eventually as best fit. I read a fixed-size e-book in its entirety and it was fine, but there was a glitch in that it did not quite adjust to the screen size of the device easily and navigating pages was awkward; I didn’t try to search. I also bought two e-book novels from smashwords (epub format) and tested one for cross-device usability and readability. Regarding the ‘across devices’: I think I deserve to share and read e-books on all my devices when I duly paid for the copyrighted books. And, lo and behold, I indeed could do so across unconnected devices through emailing myself on different email addresses. The flip side of that is that it means that once any epub is downloaded by one buyer (separately, not into e-books software), it’s basically a free-for-all. There are also epub to pdf converters. The hurdles to do so may be enough of a deterrent for an average reader, but it’s not even a real challenge for anyone in IT or computing.

After the tech tests, I’ve read through the first few pages of one of the two epub e-books – and abandoned it since. Although the epub file resized well, and I suppose that’s a pat on the back for the software developers, it renders ugly on the dual laptop/tablet and smartphone I checked it with. It offers not nearly the same neat affordances of a physical book. For the time being, I’ll buy an e-book only if there’s no option to buy a hard copy and I really, really, want to read it. Else to just let it slide – there are plenty of interesting books that are accessible and my reading time is limited.

Spoiler alert on how the logistics ended up eventually 🙂

So, now what for my new book? There is no perfect solution. I don’t want to be an author of something I would not want to read (the e-book), but it can be set up if there’s enough demand for it. Then, for the hard copies route, if you’re not already a best-selling author or a VIP who dabbles in writing, it’s not possible to get it both published ‘fast’ – in, say, at most 6 months cf. the usual 1.5-2 years with a traditional publisher – and have it distributed ‘globally’. Even if you are quite the hotshot writer, you have to be rather patient and contend with limited reach.

Then what about me, as humble award-wining textbook writer who wrote a memoir as well, and who can be patient but generally isn’t for long? First, I still prefer hard copies first and foremost nonetheless. Second, there’s the decision to either favour local or global in the logistics. Eventually, I decided to favour local and found a willing South African publisher, Porcupine Press, to publish it under their imprint and then went for the print-on-demand for elsewhere. PoD will take a few days lead time for an outside-South-Africa buyer, but that’s little compared to international shipping times and costs.

How to do the PoD? A reader/buyer need not worry and simply will be able to buy it from the main online retailers later in the upcoming week, with the exact timing depending on how often they run their batch update scripts and how much manual post-processing they do.

From the publishing and distribution side: it turns out someone has thought about all that already. More precisely, IngramSpark has set up an international network of local distributors that has a wider reach than, notably, KDP for the Kindle, if that floats your boat (there are multiple comparisons of the two on many more parameters, e.g., here and here). You load the softcopy files onto their system and then they push it into some 40000 outlets, including the main international ones like Amazon and multiple national ones (e.g., Adlibris in Sweden, Agapea in Spain). Anyway, that’s how it works in theory. Let’s see how that works in practice. The ‘loading onto the system’ stage started last week and should be all done some time this upcoming week. Please let me know if it doesn’t work out; we’ll figure something out.

Meanwhile for people in South Africa who can’t wait for the book store distribution that likely will take another few weeks to cover the Joburg/Pretoria and Cape Town book shops (an possibly on the shelf only in January): 1) it’s on its way for distribution through the usual sites, such as TakeALot and Loot, over the upcoming days (plus some days that they’ll take to update their online shop); 2) you’ll be able to buy it from the Porcupine Press website once they’ve updated their site when the currently-in-transit books arrive there in Gauteng; 3) for those of you in Cape Town, and where the company that did the actual printing is located (did I already mention logistics matter?): I received some copies for distribution on Thursday and I will bring copies to the book launch next weekend. If the impending ‘family meeting’ is going to mess up the launch plans due to an unpleasant more impractical adjusted lockdown level, or you simply can’t wait: you may contact me directly as well.

Version 1.5 of the textbook on ontology engineering is available now

“Extended and Improved!” could some advertisement say of the new v1.5 of “An introduction to ontology engineering” that I made available online today. It’s not that v1 was no good, but there were a few loose ends and I received funding from the digital open textbooks for development (DOT4D) project to turn the ‘mere pdf’ into a proper “textbook package” whilst meeting the DOT4D interests of, principally, student involvement, multilingualism, local relevance, and universal access. The remainder of this post briefly describes the changes to the pdf and the rest of it.

The main changes to the book itself

With respect to contents in the pdf itself, the main differences with version 1 are:

  • a new chapter on modularisation, which is based on a part of the PhD thesis of my former student and meanwhile Senior Researcher at the CSIR, Dr. Zubeida Khan (Dawood).
  • more content in Chapter 9 on natural language & ontologies.
  • A new OntoClean tutorial (as Appendix A of the book, introduced last year), co-authored with Zola Mahlaza, which is integrated with Protégé and the OWL reasoner, rather than only paper-based.
  • There are about 10% more exercises and sample answers.
  • A bunch of typos and grammatical infelicities have been corrected and some figures were updated just in case (as the copyright stuff of those were unclear).

Other tweaks have been made in other sections to reflect these changes, and some of the wording here and there was reformulated to try to avoid some unintended parsing of it.

The “package” beyond a ‘mere’ pdf file

Since most textbooks, in computer science at least, are not just hardcopy textbooks or pdf-file-only entities, the OE textbook is not just that either. While some material for the exercises in v1 were already available on the textbook website, this has been extended substantially over the past year. The main additions are:

There are further extras that are not easily included in a book, yet possibly useful to have access to, such as list of ontology verbalisers with references that Zola Mahlaza compiled and an errata page for v1.

Overall, I hope it will be of some (more) use than v1. If you have any questions or comments, please don’t hesitate to contact me. (Now with v1.5 there are fewer loose ends than with v1, yet there’s always more that can be done [in theory at least].)

p.s.: yes, there’s a new front cover, so as to make it easier to distinguish. It’s also a photo I took in South Africa, but this time standing on top of Table Mountain.

Computer ethics (SIPP) notes relevant to South Africa

Social issues and Professional Practice in IT & Computing (formerly known as ‘computer ethics’ in our curriculum) increased in prominence in curriculum guidelines in recent years. Also, there is an increase in popular and scientific literature on computer ethics especially since Big Data, the popularisation of Artificial Intelligence, and now the 4th Industrial Revolution. Most of the articles and books are focussed on ethical and social issues where SIPP is taught mostly, being in ‘the West’.

It is taught elsewhere as well. For instance, since the early 2000s, the Computer Science Department at the University of Cape Town has taught it as part of a Masters in IT conversion course and as a block in a first-year computer science course. While initial material and lecture notes were reused from one of those universities in ‘the West’, over time, attempts have been made to localise it to some extent at least. For instance, South Africa has its own version of EU’s GDPR (the POPI Act), there is a South African IT organisation (IITPSA) with its code of conduct, and is the textbook case that illustrates the concept of leapfrogging with its wireless network (and perhaps also with the digital divide). In addition, some ‘aspects’ look different from a country that is classified as an emerging economy than for a high-income country; e.g., as patent protection and Silicon Valley’s data collection vs. potentially stifling emerging local tech companies and digital colonialism, respectively.

Updating lecture notes takes time, and so it is typically a multi-author effort carried out every few years, as it is in this case. Differently from the previous main update, is that, in line with teaching and with the times, the lecture notes are now publicly available for free on UCT’s “Open Educational Resources” site. It is with some hesitation, as it clearly does not have the quality of a textbook and we know of certain limitations that I would have liked to be better. Yet, I hope that it may be of some use already nonetheless, be it for people in the region or from ‘outside’ looking in.

I have contributed some sections as well, partially because I think it’s an interesting theme and partially because I have to teach it. I would have liked to add more, but time was running out (i.e., it’s a balancing act with other commitments, like research, teaching, and admin). With more time, the privacy chapter would have been updated better (e.g., also touching upon privacy in the context of the common practice of mobile phone sharing), emerging concepts would have been better integrated (e.g., digital colonialism, surveillance capitalism), some of the separate exercises could have been integrated, and so on and so forth. Alas, maybe a next time. (To any of my students reading this: some of these aspects are already integrated in the slides that are used in the CSC1016S lectures, which are running ahead in content compared to the written notes, and that is examinable content as well.)

Some experiences on making a textbook available

I did make available a textbook on ontology engineering for free in July 2018. Meanwhile, I’ve had several “why did you do this and not a proper publisher??!?” I had tried to answer that already in the textbook’s FAQ. Turns out that that short answer may be a bit too short after all. So, here follows a bit more about that.

The main question I tried to answer in the book’s FAQ was “Would it not have been better with a ‘proper publisher’?” and the answer to that was:

Probably. The layout would have looked better, for sure. There are several reasons why it isn’t. First and foremost, I think knowledge should be free, open, and shared. I also have benefited from material that has been made openly available, and I think it is fair to continue contributing to such sharing. Also, my current employer pays me sufficient to live from and I don’t think it would sell thousands of copies (needed for making a decent amount of money from a textbook), so setting up such a barrier of high costs for its use does not seem like a good idea. A minor consideration is that it would have taken much more time to publish, both due to the logistics and the additional reviewing (previous multi-author general textbook efforts led to nothing due to conflicting interests and lack of time, so I unlikely would ever satisfy all reviewers, if they would get around reading it), yet I need the book for the next OE installment I will teach soon.

Ontology Engineering (OE) is listed as an elective in the ACM curriculum guidelines. Yet, it’s suited best for advanced undergrad/postgrad level because of the prerequisites (like knowing the basics of databases and conceptual modeling). This means there won’t be big 800-students size classes all over the world lining up for OE. I guess it would not go beyond some 500-1000/year throughout the world (50 classes of 10-20 computer science students), and surely not all classes would use the textbook. Let’s say, optimistically, that 100 students/year would be asked to use the book.

With that low volume in mind, I did look up the cost of similar books in the same and similar fields with the ‘regular’ academic publishers. It doesn’t look enticing for either the author or the student. For instance this one from Springer and that one from IGI Global are all still >100 euro. for. the. eBook., and they’re the cheap ones (not counting the 100-page ‘silver bullet’ book). Handbooks and similar on ontologies, e.g., this and that one are offered for >200 euro (eBook). Admitted there’s the odd topical book that’s cheaper and in the 50-70 euro range here and there (still just the eBook) or again >100 as well, for a, to me, inexplicable reason (not page numbers) for other books (like these and those). There’s an option to publish a textbook with Springer in open access format, but that would cost me a lot of money, and UCT only has a fund for OA journal papers, not books (nor for conference papers, btw).

IOS press does not fare much better. For instance, a softcover version in the studies on semantic web series, which is their cheapest range, would be about 70 euro due to number of pages, which is over R1100, and so again above budget for most students in South Africa, where the going rate is that a book would need to be below about R600 for students to buy it. A plain eBook or softcover IOS Press not in that series goes for about 100 euro again, i.e., around R1700 depending on the exchange rate—about three times the maximum acceptable price for a textbook.

The MIT press BFO eBook is only R425 on takealot, yet considering other MIT press textbooks there, with the size of the OE book, it then would be around the R600-700. Oxford University Press and its Cambridge counterpart—that, unlike MIT press, I had checked out when deciding—are more expensive and again approaching 80-100 euro.

One that made me digress for a bit of exploration was Macmillan HE, which had an “Ada Lovelace day 2018” listing books by female authors, but a logics for CS book was again at some 83 euros, although the softer area of knowledge management for information systems got a book down to 50 euros, and something more popular, like a book on linguistics published by its subsidiary “Red Globe Press”, was down to even ‘just’ 35 euros. Trying to understand it more, Macmillan HE’s “about us” revealed that “Macmillan International Higher Education is a division of Macmillan Education and part of the Springer Nature Group, publishers of Nature and Scientific American.” and it turns out Macmillan publishes through Red Globe Press. Or: it’s all the same company, with different profit margins, and mostly those profit margins are too high to result in affordable textbooks, whichever subsidiary construction is used.

So, I had given up on the ‘proper publisher route’ on financial grounds, given that:

  • Any ontology engineering (OE) book will not sell large amounts of copies, so it will be expensive due to relatively low sales volume and I still will not make a substantial amount from royalties anyway.
  • Most of the money spent when buying a textbook from an established publisher goes to the coffers of the publisher (production costs etc + about 30-40% pure profit [more info]). Also, scholarships ought not to be indirect subsidy schemes for large-profit-margin publishers.
  • Most publishers would charge an amount of money for the book that would render the book too expensive for my own students. It’s bad enough when that happens with other textbooks when there’s no alternative, but here I do have direct and easy-to-realise agency to avoid such a situation.

Of course, there’s still the ‘knowledge should be free’ etc. argument, but this was to show that even if one were not to have that viewpoint, it’s still not a smart move to publish the textbook with the well-known academic publishers, even more so if the topic isn’t in the core undergraduate computer science curriculum.

Interestingly, after ‘publishing’ it on my website and listing it on OpenUCT and the Open Textbook Archive—I’m certainly not the only one who had done a market analysis or has certain political convictions—one colleague pointed me to the non-profit College Publications that aims to “break the monopoly that commercial publishers have” and another colleague pointed me to UCT press. I had contacted both, and the former responded. In the meantime, the book has been published by CP and is now also listed on Amazon for just $18 (about 16 euro) or some R250 for the paperback version—whilst the original pdf file is still freely available—or: you pay for production costs of the paperback, which has a slightly nicer layout and the errata I knew of at the time have been corrected.

I have noticed that some people don’t take the informal self publishing seriously—even below the so-called ‘vanity publishers’ like Lulu—notwithstanding the archives to cater for it, the financial take on the matter, the knowledge sharing argument, and the ‘textbooks for development’ in emerging economies angle of it. So, I guess no brownie points from them then and, on top of that, my publication record did, and does, take a hit. Yet, writing a book, as an activity, is a nice and rewarding change from just churning out more and more papers like a paper production machine, and I hope it will contribute to keeping the OE research area alive and lead to better ontologies in ontology-driven information systems. The textbook got its first two citations already, the feedback is mostly very positive, readers have shared it elsewhere (reddit, ungule.it, Open Libra, Ebooks directory, and other platforms), and I recently got some funding from the DOT4D project to improve the resources further (for things like another chapter, new exercises, some tools development to illuminate the theory, a proofreading contest, updating the slides for sharing, and such). So, overall, if I had to make the choice again now, I’d still do it again the way I did. Also, I hope more textbook authors will start seeing self-publishing, or else non-profit, as a good option. Last, the notion of open textbooks is gaining momentum, so you even could become a trendsetter and be fashionable 😉