Bioinformatics developers adopt open source

23 Jun 2004 00:00 IST

Updated On 23 Jun 2004 20:56 IST

New Update

BANGALORE: Java is going places, from desktops to enterprises to mobile applications and now it's the field of BioInformatics. According to Dr. Matthew Pocock, the bioinformatics developershave a complex task of reverse engineering the human genetic make up, at hand. Day in and out large research labs generate about several hundred gigabytes of data that needs to be analyzed. On an average it is a text of about three billion characters, in a language that's only passably understood even by molecular biologists.

To address this growing need, researchers at the Great Britain's Sanger Institute for genetic study developed BioJava- an open-source project, which uses the cross-platform, network-aware power of Java. BioJava is dedicated to providing a Java framework for processing biological data. It include objects for manipulating sequences, file parsers, DAS client and server support, access to BioSQL and Ensembl databases, and powerful analysis and statistical routines including a dynamic programming toolkit.

BioJava has a Java based developer's toolkit with over 1200 classes and interfaces for manipulating genomic sequences, file parsing, CORBA interoperability etc. The kit is already being used by research and pharmaceuticals center in over 85 countries.

Genesis

Matthew Pocock and Thomas Down, Ph.D. students at the Sanger Institute in Cambridge, started working on a solution that could address this computational need in the field of bioinformatics. Pocock, an experienced Perl and C++ programmer could not achieve the desired results with any one of these because of huge data volume, portability as well as security concerns. At this time Java 2 platform was being ported onto the various systems being used at the centre and because of the cross-platform compatibility that it offered Java became the obvious choice.

Up and about

The first set of about 100 BioJava classes was released officially in the fall of 2000. Owing to the open source nature of the project developers could extend the implementations. The first version of the software was hosted out of a hardware box out of someone's bedroom but currently the facility is up and running in the labs of Open Bioinformatics Foundation (OBF). The software is currently available under GNU Lesser General Public License (LGPL), under which developers can modify code and fix bugs.

Since its inception the project has grown tremendously. Pocock informed that currently there are about 1,264 public classes and interfaces, over 200,000 lines of code, and about 14 people regularly contributing to its development. Both Pocock and Down take active interest in maintaining and enhancing the code.

Â The Java edge!

Talking on the benefits of Java, Pocock is of the view that cross-platform compatibility is a major hit with the project. Scalability is another major advantage that Java has brought to BioJava as the length of genetic sequences can vary from 8 characters to 3GB and the same API has to manipulate both the types. Another important aspect of the software is the ability to seamlessly handle disparate data types and format. Memory representation in BioJava is format agnostic, enabling developers to dump data in any format, this is achieved to a very great extent by using JDBC.

BioJava 1.3 is the current official release of the facility built on Java 1.2. By late 2004 or early 2005 BioJava 2.0, built on Java 1.5 is slated for release. According to Pocock with Java 1.5 they are able to re-use much more utility code, keeping the safety intact.

More to come...

Accrding to Pocock, this is just the beginning of the bioinformatics realm and 'the more we learn in bioinformatics, the more we discover there is to learn.' Proteomics (the study of protein structure and function) is another field that is coming up and may eventually dwarf genomics in terms of data and computational analysis needs. Then there is the "in-silico" computational simulations of living systems--beginning at the cellular level, then, theoretically, on up to the organ level, and eventually to the level of entire organisms. Such simulations offer the promise of predictive models, where the effects of a potential new drug can be tested computationally, prior to any actual animal or human testing. With so many discoveries happening day in and out opportunities for developers in the field of Bioinformatics seem to be endless.

Advertisment

More about BioJava Â

tech-news