HotNet2 is a Python based software to identify subnetworks of gene interaction network with mutation infomation.
I read the Algorithm last month and got a basic knowledge of random walk.
Random walk with restart
For gene(protein) interaction network, random walk starts from a protein
g and at each time step moves to one of the neighbors with the probability ().
The walk can also restarts from
g with probability . This process is defined by a transition matrix .
is the number of neighbors (the degree) of protein in the interaction network.
represent the probability with which the walk starting at is forced to restart from .
The random walk will reaches a stationary discribution described by the vector
When , we can get
where is the vector with a 1 in position and 0 is in the remaining positions.
This part is called diffusion matrix .
Note that is the column of .
To calculate the diffusion matrix, we need know the value of . In HotNet2, they chose to balance the amount of heat that diffuses from a protein to its immediate neighbors and to the rest of the network.
There is another parameter , it is the edge wight parameter. It’s used to make sure the HotNet2 will not find large subnetworks using random data.
For more detail of these parameters, please see the supplementary of this paper.
sample size problem
I ask a question about sample size on HotNet google group.
I read the HotNet2 paper and find the size of samples used in this paper is very large. If my sample size is small ( for example, 10 samples or 20 samples ), could I use hotnet2 to do the pathway analysis? Is there any baseline of sample size ?
As I understand, you can use hotnet2 with whatever sample size you have. For example if you take a look on this analysis here: https://cs.brown.edu/research/pubs/theses/ugrad/2014/jain.pdf you can see that they used p-values as input, thus hotnet itself does not depend on the sample size. This means, that you have to take into account sample size caused biases at the p-value calculation. Therefore, if your sample size is low, you might want to consider more robust ways to calculate p-values, e.g. some rank based approaches (for example Rank Product).
Answer by Akos Tenyi