PLoS has published a cross-journal special collection on Genomics of Emerging Infectious Diseases last week. Perhaps unsurprisingly, I had a look at the article about limitations of and challenges for the computational resources by Berglund, Nystedt, and Andersson . It reads quite like the one about computational problems for metagenomics I wrote about earlier (here and here), but they have a somewhat curious request in the closing section of the paper.
Concerning the overall contents of the paper and its similarity with the computational aspects of metagenomics, the computational aspects of complete genome assembly is still not quite sorted out fully, and in particular the need “for better ways to integrate data from diverse sources, including shotgun sequencing, paired-end sequencing, [and more]…” and a quality scoring standard. The other major one is the recurring topic is the hard work of annotation to give meaning to the data. Then there are the requests for better, and 3D, visualizations of the data and the cross-granular analysis of data along the omics trail and at different scales.
Limitations and challenges that are more specific to this subject domain are the classification and risk assessments for the emergence of novel infectious strains and risk prediction software for disease outbreaks. In addition, they put a higher importance put on the request for supporting tools to figure out the evolutionary aspects of the sequences and how the pieces of DNA have recombined, including how and from where they have horizontally transferred.
In the closing section, the authors reiterate that
To achieve these goals, investments in user-friendly software and improved visualization tools, along with excellent expertise in computational biology, will be of utmost importance.
I fancy the thought that our WONDER system for the HGT-DB for, in particular, graphical querying meets the first requirement—at least as proof of concept that one can construct an arbitrary query graphically using an ontology and all that in a web browser. Having said that, I am also aware of the authors’ complaint that
Currently, the slow transition from a scientific in-house program to the distribution of a stable and efficient software package is a major bottleneck in scientific knowledge sharing, preventing efficient progress in all areas of computational biology. Efforts to design, share, and improve software must receive increased funding, practical support, and, not the least, scientific impact.
Yes, like most bioinformatics tools and proof-of-concept and prototype software that comes from academia, it is not industry-grade software. To get the latter, companies have to do some more ‘shopping’ around and investment into it, i.e., monitor the latest engineering developments that demonstrate working theory presented at conferences, take up the ones they deem interesting, and transform it into a stable tool. We—be it here at FUB or almost any other university—do not have an army of experienced programmers, not only because we do not have the financial resources to pay programmers (cf. researchers with whom more scientific brownie-points can be scored) but, moreover, a computing department is not a cheap software house. The authors’ demand for more funding for software development to program cuter and more stable software would kill computing and engineering research at the faculty if the extra funding would not be real extra funding on top of existing budgets. The reality these days, is that many universities face cuts in funding. Go figure where that leaves the authors’ request. The complaint may have been more appropriate and effective when the authors would have voiced it in an industry journal.
The last part of the quote, receiving increased scientific impact, seems to me a difficult one. Descriptions of proof-of-concept and prototype software to experimentally validate the implementability of a theory can find a home in a scientific publication outlet, but a paper that tells the reader the authors have made a tool more stable is not reporting on research results and it does not bring us any new knowledge, does not answer a research question, does not solve an hitherto unsolved problem, does not confirm/refute a hypothesis. Why should—“must” in the authors’ words—improved, more usable and more stable, software receive scientific impact? Stable and freely available tools have an impact on doing science and some tasks would be nigh on undoable without them, but this does not imply such tools are part and parcel of the scientific discovery. One does not include in the “scientific impact” the Petri dish vendor, PCR machine developers, or Oracle 10g development team either. There are different activities with different scopes, goals, outcomes, and reward mechanisms; and that’s fine. Proposing to offer companies some fairly difficult to determine scientific-impact-brownie-points may not be the most effective way to motivate them to develop tools for science—getting across the possibility to make profit in the medium- to long term and to do something of societal relevance may well be a better motivator.
 Berglund EC, Nystedt B, Andersson SGE (2009) Computational Resources in Infectious Disease: Limitations and Challenges. PLoS Comput Biol 5(10): e1000481. doi:10.1371/journal.pcbi.1000481