Blog categories and non-hierarchies

Mildly silly, but due to procrastination efforts I found out that when I add or change categories in the “manage” “categories” section of the WordPress dashboard, I can create a hierarchy of categories by selecting a category’s parent as opposed just a simple long list (that appears when one adds new categories to a new post). Great! Except that now that I know I can make a hierarchy, there is the urge to do so.

The resulting mess is only partially visible in the menu on the left. Apart from labeling issues, the main problem is that the feature doesn’t allow multiple inheritance. Two points on that. First, good ontological practice says you should not have multiple inheritance in a taxonomy anyway and there are indeed good arguments from philosophy against doing so in order to avoid mixing classification desiderata etc. (see also the OntoClean methodology). Single inheritance is computationally more convenient as well; e.g. for recursive queries ‘up’ in the hierarchy (e.g., which branch should the software choose if there are multiple parents?). But, second, here I have no intention to make a proper mini-ontology, but merely order topics addressed in some pieces of text. And there is where things go wrong.

I am aware that there are research efforts on enhancing information (document) retrieval with `ontologies’, but a tree for classifying documents is different from an ontology of universals in biological reality—here I take a brief look at classifying documents in different categories, cf. classifying universals. Most, if not all documents (blog posts included) often cover several topics; hence, they either would need to be categorized under multiple categories, or the software should offer a feature to tag parts of strings of text to enable a more fine-grained categorization than just the whole document. Moreover, there are implicit contextual issues. For instance, adding a category “automated reasoning” would fit under both, say, bio-ontologies and DL, but they would contain different posts about reasoning adjusted to the intended potential reader group. It would make long names or odd abbreviations to indicate such contextual information, which are then less likely to have a string-match with other tags across blogs, and in turn will hamper any data mining efforts for finding patterns in blogging. No doubt that NLP researchers came up with some solutions that I’m not aware of; please propagate that to industry.

Then there is another issue with the ‘tree structure’ one can create: there does not seem to be any notion of transitivity. Presumably the absence of proper set theoretic & taxonomic notions is just an engineering glitch—i.e., if a blogpost x is categorized in B that is a subtype of A, then x should also be a blogpost categorized under A, automatically. But it is not, even though that is one of the few properties of relations that is easy to implement and computationally not costly compared to, say, parthood relations that adhere to at least Ground Mereology (generally, adding transitivity for roles in Description Logics will push the language in at least ExpTime, like DLRmu, DLRreg, OWL-DL, OWL 1.1). Thus, it is still a flat list of categories, but the graphical display looks as if there is some structure among the categories… Then I have been wasting my time thinking about how to make the tree.

So, although it still may look nice with the feigned sub-categories, please don’t attack me for having made dubious ontological choices: it’s just graphical display and blog-post-categorisation, not a formal ontology representing natural kinds in reality!