Via a great list of novel data visualisation techniques and interfaces at data visualisation: modern approaches, hence, ways of structuring and analysing data, I stumbled upon htmlgraph. This tool analyses the usage of HTML tags of each page of a website so as to obtain an idea about how the site is structured and what kind of site it is. For instance, the difference between Yahoo (bit messy, with old-fashioned tables) and Google (clean and simple) is very striking. The developer has many more annotated examples and has made htmlgraph available as java applet, so I could not resist to test it and see what came out of my different types of sites.
I have checked three principally different sites: my home page (which I assume to be typical for a research-oriented one), a Joomla CMS I administer (PwoB), and this blog. The colour codings are as follows:
blue: for links (the A tag)
red: for tables (TABLE, TR and TD tags)
green: for the DIV tag
violet: for images (the IMG tag)
yellow: for forms (FORM, INPUT, TEXTAREA, SELECT and OPTION tags)
orange: for linebreaks and blockquotes (BR, P, and BLOCKQUOTE tags)
black: the HTML tag, the root node
gray: all other tags
First, as expected, my home page shows the pattern of a ‘typical’ content-based site centred around the old-fashioned html coding with a central table (research page and publications) and some side topics (the IT remnants, MSc thesis, and one to the personal pages).
This is in stark contrast to my blog, which seems like an amalgamation of random things. It looks like each page has its own little flower, so even if I had stuck to one topic throughout all posts, it still would look like this.
Content management systems, on the other hand, have their own principal structure: the following picture shows the HTML structure of the NGO Professors without Borders website running Joomla. I also tested the EU FP6 FET project TONES website that runs Joomla too, which looks very much the same as the one of Professors without Borders, i.e., the structure of Joomla sites are highly similar regardless the topic/content of the site.
Last, comparing the colours of the nodes across the three types of sites, then it is immediately clear that the blog is heavy in links (blue), Joomla and ‘researcher homepage’ (ok, perhaps just mine) heavy in old-fashioned tables (red) and that the prefab generic structures of Joomla and WordPress use quite a lot of divs (green) as well. Remains to ponder about all those “other” (grey) HTML tags in the WordPress blogs that I surely have not added to my blog posts but are in there somewhere, presumably doing something, anyway.