Some experiences on making a textbook available

I did make available a textbook on ontology engineering for free in July 2018. Meanwhile, I’ve had several “why did you do this and not a proper publisher??!?” I had tried to answer that already in the textbook’s FAQ. Turns out that that short answer may be a bit too short after all. So, here follows a bit more about that.

The main question I tried to answer in the book’s FAQ was “Would it not have been better with a ‘proper publisher’?” and the answer to that was:

Probably. The layout would have looked better, for sure. There are several reasons why it isn’t. First and foremost, I think knowledge should be free, open, and shared. I also have benefited from material that has been made openly available, and I think it is fair to continue contributing to such sharing. Also, my current employer pays me sufficient to live from and I don’t think it would sell thousands of copies (needed for making a decent amount of money from a textbook), so setting up such a barrier of high costs for its use does not seem like a good idea. A minor consideration is that it would have taken much more time to publish, both due to the logistics and the additional reviewing (previous multi-author general textbook efforts led to nothing due to conflicting interests and lack of time, so I unlikely would ever satisfy all reviewers, if they would get around reading it), yet I need the book for the next OE installment I will teach soon.

Ontology Engineering (OE) is listed as an elective in the ACM curriculum guidelines. Yet, it’s suited best for advanced undergrad/postgrad level because of the prerequisites (like knowing the basics of databases and conceptual modeling). This means there won’t be big 800-students size classes all over the world lining up for OE. I guess it would not go beyond some 500-1000/year throughout the world (50 classes of 10-20 computer science students), and surely not all classes would use the textbook. Let’s say, optimistically, that 100 students/year would be asked to use the book.

With that low volume in mind, I did look up the cost of similar books in the same and similar fields with the ‘regular’ academic publishers. It doesn’t look enticing for either the author or the student. For instance this one from Springer and that one from IGI Global are all still >100 euro. for. the. eBook., and they’re the cheap ones (not counting the 100-page ‘silver bullet’ book). Handbooks and similar on ontologies, e.g., this and that one are offered for >200 euro (eBook). Admitted there’s the odd topical book that’s cheaper and in the 50-70 euro range here and there (still just the eBook) or again >100 as well, for a, to me, inexplicable reason (not page numbers) for other books (like these and those). There’s an option to publish a textbook with Springer in open access format, but that would cost me a lot of money, and UCT only has a fund for OA journal papers, not books (nor for conference papers, btw).

IOS press does not fare much better. For instance, a softcover version in the studies on semantic web series, which is their cheapest range, would be about 70 euro due to number of pages, which is over R1100, and so again above budget for most students in South Africa, where the going rate is that a book would need to be below about R600 for students to buy it. A plain eBook or softcover IOS Press not in that series goes for about 100 euro again, i.e., around R1700 depending on the exchange rate—about three times the maximum acceptable price for a textbook.

The MIT press BFO eBook is only R425 on takealot, yet considering other MIT press textbooks there, with the size of the OE book, it then would be around the R600-700. Oxford University Press and its Cambridge counterpart—that, unlike MIT press, I had checked out when deciding—are more expensive and again approaching 80-100 euro.

One that made me digress for a bit of exploration was Macmillan HE, which had an “Ada Lovelace day 2018” listing books by female authors, but a logics for CS book was again at some 83 euros, although the softer area of knowledge management for information systems got a book down to 50 euros, and something more popular, like a book on linguistics published by its subsidiary “Red Globe Press”, was down to even ‘just’ 35 euros. Trying to understand it more, Macmillan HE’s “about us” revealed that “Macmillan International Higher Education is a division of Macmillan Education and part of the Springer Nature Group, publishers of Nature and Scientific American.” and it turns out Macmillan publishes through Red Globe Press. Or: it’s all the same company, with different profit margins, and mostly those profit margins are too high to result in affordable textbooks, whichever subsidiary construction is used.

So, I had given up on the ‘proper publisher route’ on financial grounds, given that:

  • Any ontology engineering (OE) book will not sell large amounts of copies, so it will be expensive due to relatively low sales volume and I still will not make a substantial amount from royalties anyway.
  • Most of the money spent when buying a textbook from an established publisher goes to the coffers of the publisher (production costs etc + about 30-40% pure profit [more info]). Also, scholarships ought not to be indirect subsidy schemes for large-profit-margin publishers.
  • Most publishers would charge an amount of money for the book that would render the book too expensive for my own students. It’s bad enough when that happens with other textbooks when there’s no alternative, but here I do have direct and easy-to-realise agency to avoid such a situation.

Of course, there’s still the ‘knowledge should be free’ etc. argument, but this was to show that even if one were not to have that viewpoint, it’s still not a smart move to publish the textbook with the well-known academic publishers, even more so if the topic isn’t in the core undergraduate computer science curriculum.

Interestingly, after ‘publishing’ it on my website and listing it on OpenUCT and the Open Textbook Archive—I’m certainly not the only one who had done a market analysis or has certain political convictions—one colleague pointed me to the non-profit College Publications that aims to “break the monopoly that commercial publishers have” and another colleague pointed me to UCT press. I had contacted both, and the former responded. In the meantime, the book has been published by CP and is now also listed on Amazon for just $18 (about 16 euro) or some R250 for the paperback version—whilst the original pdf file is still freely available—or: you pay for production costs of the paperback, which has a slightly nicer layout and the errata I knew of at the time have been corrected.

I have noticed that some people don’t take the informal self publishing seriously—even below the so-called ‘vanity publishers’ like Lulu—notwithstanding the archives to cater for it, the financial take on the matter, the knowledge sharing argument, and the ‘textbooks for development’ in emerging economies angle of it. So, I guess no brownie points from them then and, on top of that, my publication record did, and does, take a hit. Yet, writing a book, as an activity, is a nice and rewarding change from just churning out more and more papers like a paper production machine, and I hope it will contribute to keeping the OE research area alive and lead to better ontologies in ontology-driven information systems. The textbook got its first two citations already, the feedback is mostly very positive, readers have shared it elsewhere (reddit,, Open Libra, Ebooks directory, and other platforms), and I recently got some funding from the DOT4D project to improve the resources further (for things like another chapter, new exercises, some tools development to illuminate the theory, a proofreading contest, updating the slides for sharing, and such). So, overall, if I had to make the choice again now, I’d still do it again the way I did. Also, I hope more textbook authors will start seeing self-publishing, or else non-profit, as a good option. Last, the notion of open textbooks is gaining momentum, so you even could become a trendsetter and be fashionable 😉

Linked data as the future of scientific publishing

The (not really a) ‘pipe dream’ paper about “Strategic reading, ontologies, and the future of scientific publishing” came out in August in Science [1], but does not seem to have picked up a lot of blog-attention other than copy-and-paste-the-abstract (except for Dempsey‘s take on it), which is curious. Renear and Palmer envision a bright new world with online scientific papers where the text has click-through terms to records in other databases and web pages to have your network of linked data; click on a protein name in the paper and automatically browse to Uniprot, sort of. And it is even the users who want that; i.e., it is not some covert push from Semantic Web techies. But then, in this case the ‘users’ are in informatics & information/library science (the enabling-users?), which does not imply that the endusers—say, biochemists, geneticists, etc.—want all that for better management of their literature (or they want that but do not yet realise that that is what they want).

But let us assume those endusers want the linked data for their literature (after all, it was a molecular ecologists who sent me the article—thanks Mark!). Or, to use some ‘fancier’ terms from the article: the zapping scientists want (need?) ontology-supported strategic reading to work efficiently and effectively with the large amounts of papers and supporting data being published. “Scientists are scanning more and reading less”, so then the linked data would (should?) help them in this superficial foraging of data, information, and knowledge to find the useful needle in the haystack—or so goes Renear and Palmer’s vision.

However, from a theoretical and engineering point of view, this can already be done. Not just that, it has been shown that some things work: there is iHOP and Textpresso, as the authors point out, but also GoPubMed, and SHER with PubMed, which begs the question: are those tools not good enough (and if something is missing, what?) or is it about convincing people and funding agencies? If the latter, then what does the paper do in Science in the “review” section??

If one reads further on in the paper, some peculiar remarks are being made, but not the one I would have expected. That the “natural language prose of scientific articles provides too much valuable nuance and context to be treated only as data” is a known problem that keeps many a (computational) linguist busy for his/her lifetime. But they go on saying also that “Traditional approaches to evaluating information systems, such as precision, recall, and satisfaction measures, offer limited guidance for further development of strategic reading technologies”, yet alternative evaluation methods are not presented. That “research on information behaviour and the use of ontologies is also needed” may be true form an outsider’s perspective: usages are known among ontologists but perhaps a review of all the ways of actual usage may be useful. Further down in the same section (p832), the authors claim, out of the blue and without any argumentation, that “the development of ontology languages with additional expressive power is needed”. What additional expressive power is needed for document and data navigation when they talk of the desire to exploit better the “terminological annotations”? The preceding text in the same paragraph only mentions XML, so they do not seem to have a clue about ontology languages, let alone their expressiveness, at all (OWL is mentioned only in passing on p830) and they manage to mention it in the same breath as so-called service-oriented architectures with a reference to another Science paper. Frankly, I think that papers like this are bound to cause more harm (or, at best, indifference) than good.

One thing I was wondering about, but that is not covered in the paper, is the following: who decides which term goes to which URI? There are different databases for, say, proteins, and the one who will be selected (by whom?) in the scientific publishing arena will become the de facto standard. A possibility to ameliorate this is to create a specific interface so that when a scientists clicks on a term, a drop-down box appears with something like “do you wish to retrieve more information from source x, y, or z?” Nevertheless, one easily ends up with a certain bias and powerful “gatekeepers”, and perhaps, with a similar attitude as toward DBLP/PubMed (“if it’s listed in there it counts, if it is not, then it doesn’t” regardless the content of the indexed papers, and favouring older established, and well-connected, outlets above newer and/or interdisciplinary ones).

Anyway, if Semantic Web researchers need some reference padding in the context of “it is not a push but a pull, hear hear, look here at the Science-reference, and I even read Science”, it’ll do the job, even though the contents of the paper is less than mediocre for the outlet.

[1] Allen H. Renear and Carole L. Palmer. Strategic Reading, Ontologies, and the Future of Scientific Publishing. Science 325 (5942), 828. [DOI: 10.1126/science.1157784]