There are many fold-change calculation formulars. Since I read the paper A comparison of fold-change and the t-statistic for microarray data analysis, I think I need to summaries each mathods and make sure I understand them.

Because fold-change is widely used in RNA research. The background of the fold-change calculation in this post is based on how find the differentially-expressed genes in a experiment.

The original method

The standard definition of the fold-change is

\[foldchange = \frac{control}{treatment}\]

Note: control and treatment are the raw expression levels.

An simple method

\[foldchange = control - treatment\]

In some papers, I found they use deviation of control and treatment as the fold-change result.

If the difference of control and treatment data are not to large, I think we should use the simple method.

Some differentially expressed genes have large differences (B-A) but small ratios (A/B), this is another point why using simple methold instead of the original method.

log fold-changes

If the difference or ratios of control and treatment is dynamic between genes, we need to scale the range of fold-change result.

Here we need the log fold-changes.

\[foldchange = log_2(\frac{control}{treatment})= log_2(control)-log_2(treatment)\]

delta-delta-Ct method

\[foldchange = \frac{2^{treatment}}{2^{control}} = 2^{treatment - control}\]

This method is used in qPCR experiment.

DCt: Target Ct - Housekeeping Ct

DDCt: Sample DCt - Calibrator DCt (Calibrator is your group of comparison)

Fold calculus: 2^-DDCt

For more detail, please see:

If we need the direction of the fold-change trend, we can use the sign function.

\[foldchange = SIGN(treatment - control)*2^{\left|treatment - control\right|}\]

Bookmarks is a ProductHunt like blog, all titles would link to other website.

This site is used as my bookmarks, collection of interesting things on website.

When I post a article about one bookmark on my blog, the bookmark here will be removed.


This article aim to help me to remember mapping statistics method.

1.cleaned reads number

samtools view -c aligned_reads.bam

cleaned reads base = cleaned reads number * reads length

2.mapped reads number

samtools view -F 0x04 -c aligned_reads.bam

count of unmapped reads number = cleaned reads number - mapped reads number

3.unmapped reads number

samtools view -f4 -c aligned_reads.bam

4.Sequenced exon/gene number

samtools bedcov exon_region.bed/gene_region.bed aligned_reads.bam 

5.read depth

Updated on Jan 13, 2020

samtools depth  *bamfile*  |  awk '{sum+=$3} END { print "Average = ",sum/NR}'

6.bam tags

Tag Meaning
NM Edit distance
MD Mismatching positions/bases
AS Alignment score
BC Barcode sequence
X0 Number of best hits
X1 Number of suboptimal hits found by BWA
XN Number of ambiguous bases in the reference
XM Number of mismatches in the alignment
XO Number of gap opens
XG Number of gap extentions
XT Type: Unique/Repeat/N/Mate-sw
XA Alternative hits; format: (chr,pos,CIGAR,NM;)*
XS Suboptimal alignment score
XF Support from forward/reverse alignment
XE Number of supporting seeds

Updated on Apr 16, 2020

7. Counting Number Of Bases In A Fastq File

zcat data.clean.fq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c 

Reference


Venn diagrams are similar to Euler diagrams. However, a Venn diagram for n component sets must contain all 2n hypothetically possible zones that correspond to some combination of inclusion or exclusion in each of the component sets. Euler diagrams contain only the actually possible zones in a given context. I often draw a Venn diagram with same zone size, however Euler diagram do not need same zone size. All Venn diagrams are Euler diagrams, but not all Euler diagrams are Venn diagrams.

In some R packages, they only provide us to draw Venn diagram with 2~5 subsets. It is difficult to find a symmetric Venn diagram over 5 subset. For more information of Venn diagram please see wiki

Drew Skau gave us a great explain of Venn and Euler Diagrams.

Last year, Alexander Lex and Nils Gehlenborg create a new set-based data visualization technique UpSet and published their Points of view: Sets and intersections. UpSet is a new way to visualize sets intersection, compare to Euler diagram, it can provides all intersection set info (more than 6 subsets) in one figure. Although it isn’t as beautiful as Euler diagram, I think it provide the more effective way to represent intersection set.

Here is the set visualization techniques we often used in published paper.

img


rgl_package_problem

Maybe you have met the problem above when apply sciClone package with a server without libGLU (X server).

For me, our lab use a cluster can not connect to outside network and without X-server. When apply sciClone package, it would got a error message like,

.onLoad filed in loadNamespace() for 'rgl'  
libGLU.so.1 can not open shared object file 

The fast way to fix this problem is, if you do not need sc.plot3d() function, just remove it, then you can run other subfunction in this package successfully.

If you do not know how to remove the sc.plot3d() function, you can download a modified sciClone package from my repository or install the modified package using install_git()

install_github("sciclone",username="yulijia",ref = "master")