# HotNet2

HotNet2 is a Python based software to identify subnetworks of gene interaction network with mutation infomation.

I read the Algorithm last month and got a basic knowledge of random walk.

## Random walk with restart

### concept

For gene(protein) interaction network, random walk starts from a protein `g`

and at each time step moves to one of the neighbors with the probability ().

The walk can also restarts from `g`

with probability . This process is defined by a transition matrix .

is the number of neighbors (the degree) of protein in the interaction network.

represent the probability with which the walk starting at is forced to restart from .

The random walk will reaches a stationary discribution described by the vector

When , we can get

where is the vector with a 1 in position and 0 is in the remaining positions.

This part is called diffusion matrix .

Note that is the column of .

### parameter

To calculate the diffusion matrix, we need know the value of . In HotNet2, they chose to balance the amount of heat that diffuses from a protein to its immediate neighbors and to the rest of the network.

There is another parameter , it is the edge wight parameter. Itâ€™s used to make sure the HotNet2 will not find large subnetworks using random data.

For more detail of these parameters, please see the supplementary of this paper.

## sample size problem

I ask a question about sample size on HotNet google group.

I read the HotNet2 paper and find the size of samples used in this paper is very large. If my sample size is small ( for example, 10 samples or 20 samples ), could I use hotnet2 to do the pathway analysis? Is there any baseline of sample size ?

As I understand, you can use hotnet2 with whatever sample size you have. For example if you take a look on this analysis here: https://cs.brown.edu/research/pubs/theses/ugrad/2014/jain.pdf you can see that they used p-values as input, thus hotnet itself does not depend on the sample size. This means, that you have to take into account sample size caused biases at the p-value calculation. Therefore, if your sample size is low, you might want to consider more robust ways to calculate p-values, e.g. some rank based approaches (for example Rank Product).

Answer by Akos Tenyi