Bam/FASTQ file mapping statistics
This article aim to help me to remember mapping statistics method.
1.cleaned reads number
samtools view -c aligned_reads.bam
cleaned reads base = cleaned reads number * reads length
2.mapped reads number
samtools view -F 0x04 -c aligned_reads.bam
count of unmapped reads number = cleaned reads number - mapped reads number
3.unmapped reads number
samtools view -f4 -c aligned_reads.bam
4.Sequenced exon/gene number
samtools bedcov exon_region.bed/gene_region.bed aligned_reads.bam
5.read depth
Updated on Jan 13, 2020
samtools depth *bamfile* | awk '{sum+=$3} END { print "Average = ",sum/NR}'
6.bam tags
| Tag | Meaning |
|---|---|
| NM | Edit distance |
| MD | Mismatching positions/bases |
| AS | Alignment score |
| BC | Barcode sequence |
| X0 | Number of best hits |
| X1 | Number of suboptimal hits found by BWA |
| XN | Number of ambiguous bases in the reference |
| XM | Number of mismatches in the alignment |
| XO | Number of gap opens |
| XG | Number of gap extentions |
| XT | Type: Unique/Repeat/N/Mate-sw |
| XA | Alternative hits; format: (chr,pos,CIGAR,NM;)* |
| XS | Suboptimal alignment score |
| XF | Support from forward/reverse alignment |
| XE | Number of supporting seeds |
Updated on Apr 16, 2020
7. Counting Number Of Bases In A Fastq File
zcat data.clean.fq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c