07 February 2013

A tool to compare the BAMs

Following that thread on Biostar, I've created a tool that compares two or more BAMs. This java program uses the Picard and berkeleydb-JE libraries and is available at: http://code.google.com/p/jvarkit/wiki/CompareBams.

Download & install

see http://code.google.com/p/jvarkit/wiki/CompareBams.

Example

The following Makefile align the same pair of FASTQs with 5 different parameters for bwa aln -O (gap_open_penalty)

FASTQ1=001.fastq.gz
FASTQ2=002.fastq.gz
REF=/path/to/human_g1k_v37.fasta
BWA=bwa
SAMTOOLS=samtools
ALL_BAMS= 

define SAI
$(1)_1.sai : ${FASTQ1}  $(REF)
        $(BWA) aln  $(2) -t 2  -f $$@ $(REF) $$<
$(1)_2.sai :${FASTQ2}  $(REF)
        $(BWA) aln  $(2) -t 2  -f $$@ $(REF) $$<

endef

define ALN

ALL_BAMS+= $(1).bam 

$(eval $(call SAI,$(1),$(2)))

$(1).bam: $(1)_1.sai $(1)_2.sai
        $(BWA) sampe $(3)  ${REF} \
                $(1)_1.sai $(1)_2.sai  \
                $(FASTQ1) $(FASTQ2) |\
        $(SAMTOOLS) view -S -b -o $$@ -T $(REF) - 

endef

.PHONY:all

all: diff.gz

$(eval $(foreach GAP, 8 9 10 11 12 , $(call ALN,GAP$(GAP), -O $(GAP) , )))


diff.gz: $(ALL_BAMS)
        mkdir -p tmp.bdb
        java -jar  /commun/data/packages/jvarkit/comparebams.jar -d tmp.bdb $^ | gzip --best > $@
execute:
$ make

(...)
java -jar  /path/to/jvarkit/comparebams.jar -d tmp.bdb GAP8.bam GAP9.bam GAP10.bam GAP11.bam GAP12.bam | gzip --best > diff.gz
Feb 07, 2013 2:09:57 PM fr.inserm.umr1087.jvarkit.tools.cmpbams.CompareBams run
#INFO: in GAP8.bam count:1
Feb 07, 2013 2:10:30 PM fr.inserm.umr1087.jvarkit.tools.cmpbams.CompareBams run
INFO: in GAP9.bam count:1
(...)

The output is a tab delimited file containing

  • the name of the read
  • the comparison of each bam vs the others EQ=equals, NE=not-equals
  • the positions of the reads in each bam
$ gunzip -c diff.gz | egrep -E  '(#|NE)' | head


#READ-Name      GAP8.bam GAP9.bam|GAP8.bam GAP10.bam|GAP8.bam GAP11.bam|GAP8.bam GAP12.bam|GAP9.bam GAP10.bam|GAP9.bam GAP11.bam|GAP9.bam GAP12.bam|GAP10.bam GAP11.bam|GAP10.bam GAP12.bam|GAP11.bam GAP12.bam GAP8.bam        GAP9.bam        GAP10.bam       GAP11.bam       GAP12.bam
M00491:10:000000000-A27BP:1:1101:10029:10672    EQ|NE|NE|NE|NE|NE|NE|EQ|EQ|EQ   6:123892013,6:123892006 6:123892013,6:123892006 6:123892005,6:123892013 6:123892005,6:123892013 6:123892005,6:123892013
M00491:10:000000000-A27BP:1:1101:10265:10054    EQ|EQ|NE|NE|EQ|NE|NE|NE|NE|NE   19:49671437,19:49671412 19:49671437,19:49671412 19:49671437,19:49671412 19:49671435     19:49671412,19:49671435
M00491:10:000000000-A27BP:1:1101:10904:12333    EQ|NE|NE|NE|NE|NE|NE|EQ|EQ|EQ   10:88681151,10:88681156 10:88681151,10:88681156 10:88681156,10:88681150 10:88681156,10:88681150 10:88681156,10:88681150
M00491:10:000000000-A27BP:1:1101:11211:13492    EQ|NE|NE|NE|NE|NE|NE|EQ|EQ|EQ   8:52321469      8:52321469      8:52321470      8:52321470      8:52321470
M00491:10:000000000-A27BP:1:1101:11298:18283    NE|NE|NE|NE|EQ|EQ|EQ|EQ|EQ|EQ   6:126071103,6:126071110 6:126071104,6:126071110 6:126071104,6:126071110 6:126071104,6:126071110 6:126071104,6:126071110
M00491:10:000000000-A27BP:1:1101:11381:15675    EQ|NE|NE|NE|NE|NE|NE|EQ|EQ|EQ   1:156106900,1:156106905 1:156106900,1:156106905 1:156106905,1:156106899 1:156106905,1:156106899 1:156106905,1:156106899
M00491:10:000000000-A27BP:1:1101:12189:14088    EQ|NE|NE|EQ|NE|NE|EQ|EQ|NE|NE   15:22015803     15:22015803     15:21009140     15:21009140     15:22015803
M00491:10:000000000-A27BP:1:1101:12382:11193    EQ|NE|NE|NE|NE|NE|NE|EQ|EQ|EQ   4:111542263,4:111542256 4:111542263,4:111542256 4:111542263,4:111542254 4:111542263,4:111542254 4:111542263,4:111542254
M00491:10:000000000-A27BP:1:1101:12998:24492    EQ|NE|NE|NE|NE|NE|NE|EQ|EQ|EQ   2:179433503,2:179433496 2:179433503,2:179433496 2:179433503,2:179433497 2:179433503,2:179433497 2:179433503,2:179433497
(....)

That's it
Pierre

No comments: