Bam/FASTQ file mapping statistics
This article aim to help me to remember mapping statistics method.
1.cleaned reads number
samtools view -c aligned_reads.bam
cleaned reads base = cleaned reads number * reads length
2.mapped reads number
samtools view -F 0x04 -c aligned_reads.bam
count of unmapped reads number = cleaned reads number - mapped reads number
3.unmapped reads number
samtools view -f4 -c aligned_reads.bam
4.Sequenced exon/gene number
samtools bedcov exon_region.bed/gene_region.bed aligned_reads.bam
5.read depth
Updated on Jan 13, 2020
samtools depth *bamfile* | awk '{sum+=$3} END { print "Average = ",sum/NR}'
6.bam tags
Tag | Meaning |
---|---|
NM | Edit distance |
MD | Mismatching positions/bases |
AS | Alignment score |
BC | Barcode sequence |
X0 | Number of best hits |
X1 | Number of suboptimal hits found by BWA |
XN | Number of ambiguous bases in the reference |
XM | Number of mismatches in the alignment |
XO | Number of gap opens |
XG | Number of gap extentions |
XT | Type: Unique/Repeat/N/Mate-sw |
XA | Alternative hits; format: (chr,pos,CIGAR,NM;)* |
XS | Suboptimal alignment score |
XF | Support from forward/reverse alignment |
XE | Number of supporting seeds |
Updated on Apr 16, 2020
7. Counting Number Of Bases In A Fastq File
zcat data.clean.fq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c