This article aim to help me to remember mapping statistics method.

1.cleaned reads number

samtools view -c aligned_reads.bam

cleaned reads base = cleaned reads number * reads length

2.mapped reads number

samtools view -F 0x04 -c aligned_reads.bam

count of unmapped reads number = cleaned reads number - mapped reads number

3.unmapped reads number

samtools view -f4 -c aligned_reads.bam

4.Sequenced exon/gene number

samtools bedcov exon_region.bed/gene_region.bed aligned_reads.bam 

5.read depth

Updated on Jan 13, 2020

samtools depth  *bamfile*  |  awk '{sum+=$3} END { print "Average = ",sum/NR}'

6.bam tags

Tag Meaning
NM Edit distance
MD Mismatching positions/bases
AS Alignment score
BC Barcode sequence
X0 Number of best hits
X1 Number of suboptimal hits found by BWA
XN Number of ambiguous bases in the reference
XM Number of mismatches in the alignment
XO Number of gap opens
XG Number of gap extentions
XT Type: Unique/Repeat/N/Mate-sw
XA Alternative hits; format: (chr,pos,CIGAR,NM;)*
XS Suboptimal alignment score
XF Support from forward/reverse alignment
XE Number of supporting seeds

Updated on Apr 16, 2020

7. Counting Number Of Bases In A Fastq File

zcat data.clean.fq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c 

Reference