MinorAlleleCatcher
Minor allele catcher filters out reads based on attributes that could contribute to false base calling. The filtered reads left are then used to inform on the frequency of particular SNPs in a pooled sequencing library comprised of multiple independent samples.
Taylor, S. M. et al. Absence of putative Plasmodium falciparum artemisinin resistance mutations in sub-Saharan Africa: A molecular epidemiologic study. J. Infect. Dis. 211:680-8 (2015).
Introduction



Usage
Requirements:
Note: version numbers for dependencies are the ones on which the program was built. Older or newer versions may work. numpy 1.8.2 scipy 0.13.3 pysam 0.7.4 (Samtools 0.1.18)
Usage:
Usage: minor_allele_catcher [options]<ref.fasta> <in.sorted.bam> If no default given, either false or 0 Options: -q/--qual minimum read qual [default: 10] -m/--mapq minimum map qual [default: 10] -a/--aln_score minimum alignment score [default: None] -e/--ends minimum distance from read ends -d/--depth max depth for pysam.pileup [default: 100000 -r/--rmdup remove duplicates -n/--nearq minimum distance near snp of passing quality -s/--size minimum read size -b/--bothstrands filter if strand bias [default: False] -c/--cumulative count filter cumulatively [default: False] -i/--indel skip indel reads [default: False] --refrelative compare snps to ref rather than major allele [default: false] -p/--primersize size of primer -l/--lower minimum alignment range -u/--upper maximum alignment range -o/--out output filename -h/--help help screen
Note: alignment score is a scoring calculation produced by bowtie2.
In Taylor et al. 2014, the following options were used:
python minor_allele_catcher.py -q 34 -m 10 -e 10 -s 200 -a 81 -b input.sorted.bam
Output
Printed to standard out are the SNP loci by line and the frequency and filters in a tab-delimited column based format. In addition, per loci filter counts of reads are included in the tab delimited format. By default, this is non-cumulative count, and read filtrations are performed in order: indel, map quality, read size, alignment score, read quality, neighboring quality, read ends, optical duplicates.Column legend:
chrom -- chromosome pos -- 0-based reference position ref_allele -- reference allele major_allele -- major allele (can be different from reference allele) minor_allele -- minor allele filtered_depth -- depth of reads at base position after filtration minor_allele_freq -- frequency of minor allele major_f -- number of reads with major allele on forward strand major_r -- number of reads with major allele on reverse strand minor_f -- number of reads with minor allele on forward strand minor_r -- number of reads with minor allele on reverse strand unfiltered_depth -- depth of reads before filtration of reads mapq_f -- number of reads filtered by map quality read_size_f -- number of reads filtered by read size readq_f -- number of reads filtered by read quality neighbor_f -- number of reads filtered by neighboring loci read quality read_ends_f -- number of reads filtered for being at read ends indel_f -- number of reads filtered for having indels (insertions/deletions) bias_f -- number of reads filtered for having a strand bias as_f -- number of reads filtered for alignment score dup_f -- number of reads filtered for being optical duplicates total_median_depth -- deprecated minor_median_depth -- deprecated