Journal paper on granular perspectives

My paper “The granular perspective as semantically enriched granulation hierarchy” [1] has been accepted in the International Journal Granular Computing, Rough Sets and Intelligent Systems, which is an invited extended version of the GrC’09 paper.

Although the paper is rather logic-based and full of definitions, propositions, and proofs, the underlying idea is quite straightforward. One granulates data and information in multiple ways to generate (granulation) hierarchies that have levels containing more or less detail about one’s domain of interest. Such hierarchies can be as simple as a partonomy of, say, all parts of the human structural anatomy in medicine (cell, organ, body, etc,) or administrative boundaries in GIS (city, province, etc.).  However, what the characteristics of such hierarchies are and what consequences they have on levels of granularity is left implicit throughout literature on granularity. Being imprecise about them easily can yield nonsensical hierarchies using different criteria to granulate information at different levels of detail, which is undesirable in general and in particular for software implementations, whereas having a way to declare that knowledge explicitly gives new opportunities for computation. For instance, to query at ones desired level of detail instead of perpetual time-consuming browsing or simplifying generating partial views of an ontology suited to the context of the domain expert.

The paper describes a way how to represent that hitherto implicit information and to do that in a structured and consistent manner throughout, and justifies why it makes sense to include exactly those properties of the hierarchies. Such a ‘dressed up’ hierarchy I call a granular perspective. Granular perspectives can be uniquely identified, hence, distinguished, by means of formally representing their semantics using a granulation criterion—by what attributes or properties do you divide up things—and type of granularity—how do you divide it—used for granulation. For instance, a criterion could be human structural anatomy (cf., say, functional, or by processes in the human body) and as mechanism using one specific type of relation between the entities in the different levels, e.g., by parthood (cf., say, by subsumption).

Those perspectives can be connected to each other consistently, which can be done by a simple relation or using mereological relations, thereby facilitating cross-granular querying and other reasoning scenarios. (Nothing of that is implemented though—it’s good to [try to] have the theory sorted out first.)

There are some copyright restrictions on the paper that is in print at the moment, so if you want to have a copy, feel free to contact me. An earlier, non-self-standing, version with more ontological analysis can be found in two sections of Chapter 3 of my PhD thesis [2]. To my pleasant surprise, Vogt has, independently, applied my theory of granularity—including the perspectives and linking them—in an informal way with biological material entities [3], which some readers of this blog might find more motivational to start reading than the technical details and, admittedly, fairly simple examples I have used to illustrate it in the thesis. Several illustrations from the eco/GIS domain are described elsewhere [4].

References

[1] Keet, C.M. (2011). The granular perspective as semantically enriched granulation hierarchy. Int. J. Granular Computing, Rough Sets and Intelligent Systems, (in print).

[2] Keet, C.M. A formal theory of granularity. PhD Thesis, KRDB Research Centre, Faculty of Computer Science, Free University of Bozen-Bolzano, Italy. 2008.

[3] Vogt, L. Spatio-structural granularity of biological material entities. BMC Bioinformatics, 2010, 11:289.

[4] Keet, C.M. Structuring GIS information with types of granularity: a case study. VI International Conference on Geomatics (Geomatics’09), 10-12 February 2009, Havana, Cuba.

Related works: when do you read ‘too much’?

Even after so many theses and research activities, reading and reviewing papers, I still wonder sometimes when is reading and discussing related works really enough? Obviously, a review paper requires many references anyway, and one reaches a point where one has moved from referencing really relevant papers to reference padding—which is the time to stop adding unless there is an explicit request to add many references and no page limit. That is fine. It is more problematic with (i) conference paper page limits and (ii) how many related works one should read before writing a paper (lest one keeps on reading for the rest of one’s life).

I have seen many a paper where the wheel is reinvented, a relevant sub-topic omitted, or a ‘need-to-cite’ reference absent. There seem to be quite a few authors who do not read widely first and instead just go ahead writing a paper—and at times even get it accepted (maybe because the reviewers read few papers as well). May you expect from the authors to search better and read those relevant papers, or accept the authors are only human and resource-constrained? Would semantically enabled scientific literature solve this problem, be it with the linked data or some other technology, like GoPubMed?

Then there are seniors in academia, some of whom I have heard saying (once even proudly!) that they do not read papers anymore. As if they are above that kind of mundane grunt work. Perhaps if you reach a certain age and maturity, one indeed finds there is nothing new in a paper; however, if that is really the case, then the field is not progressing but just recycling old ideas and offering more of the same results. Or the novelties compared to the ‘old’ results are in the fine details, which indicates high (over?) specialization of the field.

Either way, I am not such a senior VIP, so I waste/spend quite some time reading papers to try to avoid duplication. To the best of my knowledge, I have not duplicated work after all and have filled up some true gaps in the body of scientific knowledge; but then, one ought not to forget the problems of incompetency and being unaware of it.

Interestingly, the amount of references in a paper has increased, with, depending on the discipline, iirc some 5-10 citations per article more now than 50 years ago [1]. One can guess for the many reasons—and pure scientifically motivated ones are not the only arguments going around to try to explain these data. There are relatively many more publications (and science is cumulative) and there are salami slicing practices so that each paper contains a least publishable unit instead of almost bursting with novelties (so that more papers have to be cited). In addition, there also appears to be an increase in teamwork [2], each author adding his/her references to it, mutual friends and community citation patterns exist, and generous citing begets citations of one’s own paper [3]. More citations also can boost one’s h-index and the ISI impact factor of a journal, which, with the increase in the desire to ‘quantify’ science output, may be a factor as well.

However, there is a difference between more citations per paper and potentially reading ‘too much’. The cited works are not always read [4], although this may be a relative invariant and not something from recent times (as an aside: does anyone have data to see if relatively more ‘write-only’ papers are produced now compared to, say, 50 years ago?). If the latter, perhaps the increase in citations may be a proxy for the increase in amount of papers read after all.

For the time being, I’ll just keep on reading (and improving on the management of my references) and, shortly, the CS honours students here at UKZN will have to do so too—intensified with seminars on research methods, literature research, and scientific communication. The citations and references in this post demonstrates a clear example what would not be acceptable for their honours project.

‘References’

[1] Anonymous. Evaluating the usage and impact of E-Journals in the UK. CIBER Working paper 2, 12 Nov 2008. Here should have been citation who wrote that; iirc, it was in an editorial in one of the Nature journals. Anyway, this other analysis by CIBER—with lots of more data—corroborates this, noting an increase from 3% in history to 161% in chemistry.

[2] Stefan Wuchty, Benjamin F. Jones, and Brian Uzzi. The Increasing Dominance of Teams in Production of Knowledge. Sciencexpress, 12 April 2007. www.sciencexpress.org.

[3] Zoë Corbyn. An easy way to boost a paper’s citations. Nature News, August 13, 2010.

[4] Multiple Authors. Did you actually read everything in your th***s bibliography. The PhD Forums. Aug 6, 2010—Aug 10, 2010. (there are several online polls, and comments on online fora)