Notes on the ontology of International Standard Book Numbers

Preamble. Draft notes about experiences and peculiar findings on ISBNs were gathering dust and some people got curious: how could something as mundane as ISBNs, and a popular topic for EER model design exercises, be not straightforward? It was not worth the effort to write scientific paper, but cute enough for a blog post on ontology and modelling in everyday life. (everyday ontology #1)

~~~

A glitch occurred with the ISBN of my memoir at a late stage in the publication process when that one memoir by that one publisher needed two ISBNs, one for the printed version and one for the print-on-demand version. The printed version can be re-printed by popular demand, with the same ISBN, but that’s different from printing more print-on-demand copies with that other ISBN. It made me explore the international standard of book numbers. ISBN basics and a dose of knowledge of database theory and techniques were the key to resolving that glitch in the publication process of the memoir. But questions had surfaced as to what the ISBN really means, questions that needed answers.

A quick search revealed the easy part to grasp, and rule to adhere to: each format – hardcover, paperback, e-book – needs its own ISBN. A database needs to be able to distinguish which of the three has been sold in the shop in order to keep track of the inventory and for online retailers to send you the format you bought. The ‘same’ book published by a different publisher also requires its own ISBN, for each of these formats, because the right amount of money has to be sent to the right publisher. Different editions need their own ISBN as well, as they tend to differ, like an extra preface or postscript. So far so good from the conceptual data modelling and retailer’s perspectives.

Ontologically, one might not be so happy. One might have assumed that an ISBN help to identify a book, but clearly it only provides identification at best, not identity. What is ‘the book’? What makes a book a book is its content, as the thing we refer to when someone says “Yes, I bought the book; the cheaper paperback, not the hard cover though”. In that sense ‘the book’ excludes at least the table of contents and any index, since page numbers will vary across formats, and front matter and cover image and text may differ. Then there’s a meaning of ‘the book’ as the physical manifestation, be it on the bookshelf or stored on disk in e-book format. Let us follow the direction of formats rather than the alley of an information artefact ontology. Curiously, for e-books, each of those formats also need their own ISBN, like the EPUB and PDF and Kindle versions.

What does that say about the identity and identification of the object? The reasoning by the ISBN organisation is as follows:

“if a specific device or software is required to read the e-book or different usage constraints that control user functionality are offered (e.g. copy, print, lend etc.) then each separate version will be a distinct product. Each distinct product that is available must be identified by its own ISBN as it is a separate publication. Thus, a separate publication is normally defined by a combination of product form features or details and usage constraints.” (emphasis added)

So, a book number is a number for a unique product that is a publication; and since a publication need not be a book, a non-book artefact may have a book number. Also, the (PDF file for the) print-on-demand requires a different ISBN, and thus deemed to be a separate product, from the PDF file for the regular batch printing of the same book. It may be the same book that is being printed from the PDF file, but nonetheless the softcover hardcopies are somehow different, counting as two publications instead of one.

For my no taming of the enthusiast book, the only difference was the change in ISBN number (circular issue) and the cut-off points of the width of side of the cover due to different paper type in overseas printers assumed by the international distributor. For my modelling book, there are, at least, the ISBN-10 3031396944, ISBN-13 softcover 978-3-031-39694-6, ISBN-13 e-book 978-3-031-39695-3, a digital object identifier (DOI) 10.1007/978-3-031-39695-3, and an Amazon Standard Identification Number (ASIN) for the Kindle version (B0CDP5KXT7). Declare sameAs in the knowledge graph or LOD cloud at your own peril. Yet, I doubt that the Department of Higher Education’s publication bean counters will count my modelling book as at least two, if not five, publications – with each publication earning a subsidy for the university.

Section of an image found at https://kitaboo.com/online-ebook-publishing-5-easy-steps/ and copyrighted by “yanlev – Fotolia” or “Federico Caputo”

The content in each book is exactly the same otherwise. If either of the publishers were to create a PDF that disabled printing, it should obtain a separate ISBN, according to the ISBN description. If there’s software that determines 0, 1, 5, or whichever number of sequential or concurrent lending for 30 minutes, a day, two weeks or whichever amount of time, it would be different ISBNs. Access restrictions by country, be it due to censorship or just so, likewise. The ‘just so’ indeed does exist. My most recent experience was trying to get my hands on Ten Planets by Yuri Herrera: the sci-fi short story collection is not on AppleBooks at all according to the app and Amazon didn’t make the the Kindle version available to people physically in South Africa, yet Rakuten’s Kobo was eager to sell it to me and made me a happy reader. Restrictive and, at times, opaque ‘digital rights management’ supposedly all require separate ISBNs. That’s absurd.

What’s going on? Book-loving people not grasping software or ontology? Or, given that ISBNs have to be bought, is it a money-making exercise to find more ways to collect money from the mostly poor and underpaid writers and struggling publishers? What is the criterion for “distinct”? Really only differences in basic “functionality”? Sure, an ontology-enhanced digital and interactive Inquire Biology is functionally is different from the original printed textbook, but that’s thanks to its text mark-up, context-sensitive questions and answers, semantic browsing and search.

In contrast, a lending constraint, say, is not a functionality or feature of the book intrinsically. If desired, such an accidental feature added to an e-book should be constraints managed by the software rather than requiring new book identifiers. Digital Rights Management (DRM) technology adds a wrapper to the e-book for each variation of access control, including number of users and devices, time, and so on. Each variant is a newly wrapped e-book file and, apparently, needs its own ISBN.

There are countably infinite ways to declare usage constraints. One could create a unique DRM-ed version for each person in the world, multiplied by the number of devices we each can use it on, multiplied by region locations, etc. And that for each book published. About 4 million new books are released each year, including traditional (500K to 1mln), self-published (some 1.7mln), and other forms of publishing. With a more realistic 10-25 ISBNs per book, the numbers will run out well beyond the current global civilization, if they were classless numbers.

But that’s not how it works. The 13 digits of ISBN-13 are class-based, so it won’t accommodate nine trillion (9 999 999 999 999) books. A bar code is associated with it, and the first three digits are allocated to the books: 978 and, for the sake of argument, let’s take all of 979 as well. That reduces the amount to 2 times 9 999 999 999, or about 20 billion. Between one and five digits are allocated to the group (language), four to the publisher, at least three for the title, and the last one is for the checksum. So, we obtain at most 2 billion books with ISBN-13. Practically, and assuming no-one will take the ISBN organisation to the task with their “product form features”, it’s probably enough, albeit a trivial exercise to ruin for anyone with enough money to buy ISBNs.

Either way, ontologically, it’s clear what causes the conceptual mess. Different book numbers for intrinsic and extrinsic features of what makes a book a book, fine; different numbers for countably infinite combinations of accidental padded-on usage features for the same book, not. A DRM wrapper can be hacked, cracked, and circumvented and then it violates the numbering scheme. The EPUB format can be converted into PDF and into MS Word and so on, violating the app-specific identification principle. Convertible formats shouldn’t require distinct ISBNs just because different default software applications may be needed to open them. An e-book standard for interoperability would negate the application file format issue, which was precisely the point of the EPUB open standard.

The plethora of external accidental arbitrary constraints/features added to an e-book belong to a different category of features from those intrinsic to the book. They would need a book usage number or a DRM wrapper number for identification, not a book number, for it is an identification for a file on top of the actual e-book, not a new publication of the book itself by any definition of what a book is. In its own way, Amazon’s ASIN for Kindle editions of e-books does just that. But it’s a missed opportunity for the current ISBN standard.

How this affects a typical ‘library loan’ modelling homework exercise or an information artefact ontology is left as an exercise to the reader… Edraw’s examples would need to be updated and the IAO extended, for instance. Alternatively, the ISBN organisation could revise what merits a distinct ISBN.