Knapsack Problem Variant: Build Optimal Length Fasta Sequences from Ligands

Given, say, 150 different fasta sequences, we need to find an optimal way to bin them where each bin is around 2kb.

Thus, we find ourselves with a variant of the Knapsack problem. The code for this tool was heavily borrowed from https://github.com/handcraftsman/GeneticGo's "samples/ukp/rosetta.go" file.

Normally in UKP, you're trying to find the maximal value of a single bin, given some contraints like weight/volume and some motivators like "value".

We abuse the heck out of this (in order to have to write less new code) by writing a fitness function which:

treats "count" (number of objects per bin in UKP) as the bin number
biasing heavily towards low bin numbers (to force multiple items into bins together)
biasing against bins which are far from our target size (similar to weight/volume)
biasing heavily against solutions which do not use all of the available resources (i.e. sequences)

Usage

$ go run bin_fasta.go -lengthTable test.tsv -targetLength=2000 > out.tsv

This will take the features defined in test.tsv and their lengths in column 2, and try to group them such that each grouping is approximately 2kb. See -help for options.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
bin_fasta.go		bin_fasta.go
bin_fasta.xml		bin_fasta.xml
out.tsv		out.tsv
postprocess.py		postprocess.py
postprocess.xml		postprocess.xml
test.fa		test.fa
test.tsv		test.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

bin_fasta.go

bin_fasta.go

bin_fasta.xml

bin_fasta.xml

out.tsv

out.tsv

postprocess.py

postprocess.py

postprocess.xml

postprocess.xml

test.fa

test.fa

test.tsv

test.tsv

Repository files navigation

Knapsack Problem Variant: Build Optimal Length Fasta Sequences from Ligands

Usage

About

Releases

Packages

Languages

hexylena/bin-fasta

Folders and files

Latest commit

History

Repository files navigation

Knapsack Problem Variant: Build Optimal Length Fasta Sequences from Ligands

Usage

About

Resources

Stars

Watchers

Forks

Languages