Skip to content

jaredwilkening/biogo

Repository files navigation

#BioGo

##Installation

    $ go get github.com/kortschak/BioGo/...

##Overview

BioGo is a bioinformatics library for the Go language. It is a work in progress.

##The Purpose of BioGo

BioGo stems from the need to address the size and structure of modern genomic and metagenomic data sets. These properties enforce requirements on the libraries and languages used for analysis:

  • speed - size of data sets
  • concurrency - problems often embarrassingly parallelisable

In addition to the computational burden of massive data set sizes in modern genomics there is an increasing need for complex pipelines to resolve questions in tightening problem space and also a developing need to be able to develop new algorithms to allow novel approaches to interesting questions. These issues suggest the need for a simplicity in syntax to facilitate:

  • ease of coding
  • checking for correctness in development and particularly in peer review

Related to the second issue is the reluctance of some researchers to release code because of quality concerns.

The issue of code release is the first of the principles formalised in the Science Code Manifesto.

Code  All source code written specifically to process data for a published
      paper must be available to the reviewers and readers of the paper.

A language with a simple, yet expressive, syntax should facilitate development of higher quality code and thus help reduce this barrier to research code release.

##Yet Another Bioinformatics Library

It seems that nearly every language has it own bioinformatics library, some of which are very mature, for example BioPerl and BioPython. Why add another one?

The different libraries excel in different fields, acting as scripting glue for applications in a pipeline (much of [1, 2, 3]) and interacting with external hosts [1, 2, 4, 5], wrapping lower level high performance languages with more user friendly syntax [1, 2, 3, 4] or providing bioinformatics functions for high performance languages [5, 6].

The intended niche for BioGo lies somewhere between the scripting libraries and high performance language libraries in being easy to use for both small and large projects while having reasonable performance with computationally intensive tasks.

The intent is to reduce the level of investment required to develop new research software for computationally intensive tasks.

  1. BioPerl
    http://genome.cshlp.org/content/12/10/1611.full
    http://www.springerlink.com/content/pp72033m171568p2

  2. BioPython
    http://bioinformatics.oxfordjournals.org/content/25/11/1422

  3. BioRuby
    http://bioinformatics.oxfordjournals.org/content/26/20/2617

  4. PyCogent
    http://genomebiology.com/2007/8/8/R171

  5. BioJava
    http://bioinformatics.oxfordjournals.org/content/24/18/2096

  6. SeqAn
    http://www.biomedcentral.com/1471-2105/9/11

##Library Structure and Coding Style

The BioGo library structure is influenced both by the structure of BioPerl and the Go core libraries.

The coding style is increasingly aligning itself with the style of Go core library (I hope), although the use of 'self' as the receiver variable is aligned with the BioPerl and BioPython coding styles. While this complicates refactoring, I currently feel that it provides a more informative description of the underlying intent of the code. The alignment with the BioPerl and BioPython styles is also intended to ease adoption by bioinformatics researchers, many of whom use these libraries.

##Quality Scores

Quality scores are supported for all sequence types, including protein. Phred and Solexa scoring systems are able to be read from files, however internal representation of quality scores is with Phred, so there will be precision loss in conversion. A Solexa quality score type is provided for use where this will be a problem.

##Copyright and License

Copyright ©2011-2012 Dan Kortschak <dan.kortschak@adelaide.edu.au> except where otherwise noted.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http:www.gnu.org/licenses/.

About

Bioinformatics library for the Go language

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages