Skip to content

vtphan/multigenome-sim

Repository files navigation

Tools used to generate / simulate multi-genomes, reads

==========================================================================
eval_alignment.py
	This program evaluate aligned reads.
	A read can be aligned to multiple locations.

usage: eval_alignment.py [-h] [-g G] reads alignment

Evaludate aligned reads.

positional arguments:
  reads       file containing all reads.
  alignment   file containing aligned reads.

optional arguments:
  -h, --help  show this help message and exit
  -g G        Maximally allowable gap distance between positions of a read and
              aligned read. Default=20


Test data: reads.txt, alignment0.txt, alignment1.txt

==========================================================================
verify_reads.go

Usage: go run verify_reads.go -s genome.fasta -r reads.txt

	genome.fasta is the genome from which reads (reads.txt) are generated. 
	(no mutation, only sequencing errors)

==========================================================================
generate_reads.go
	This program generates reads given a sequence and sequencing error.
	A read can occur at multiple locations.
	All reads will have at least one A, C, T, G.  No reads have only N's.
	
Usage:
(1) generate index
	go run generate_reads.go -seq sequence_file
(2) generate actual reads
	go run generate_reads.go -seq sequence_file -reads N -len M -erate E

Help:
go run generate_reads.go --help
  -c=2: Coverage
  -debug=false: Turn on debug mode.
  -e=0.01: Error rate.
  -l=100: Read length.
  -s="": Specify a file containing the sequence.

Test data: reference.fasta

==========================================================================
generate_genomes.py
	given a genome generate other genomes with SNPs (including indels)


Usage: generate_genomes.py [-h] [-m MUTATION_RATE] [-p FIRST_PROB]
                           [-i INDEL_FRAC] [-ie INDEL_EXT] [-n N] [--debug]
                           file_name

Generate multiple genomes based on a reference genome.

positional arguments:
  file_name             genome_file.fasta

optional arguments:
  -h, --help            show this help message and exit
  -m MUTATION_RATE, --mutation_rate MUTATION_RATE
                        Base mutation rate (default 0.001)
  -p FIRST_PROB, --first_prob FIRST_PROB
                        Probability of first base in a SNP profile (default
                        0.75)
  -i INDEL_FRAC, --indel_frac INDEL_FRAC
                        Indel fraction of SNP (default 0.1111)
  -ie INDEL_EXT, --indel_ext INDEL_EXT
                        Indel extension probability (default 0.3)
  -n N                  Number of genomes (default 10)
  --debug               Turn on debug mode (default False)


Test data: reference.fasta

==========================================================================

About

tools used to generate/simulate multi-genomes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published