# HotNet2

HotNet2 is a Python based software to identify subnetworks of gene interaction network with mutation infomation.

I read the Algorithm last month and got a basic knowledge of random walk.

## Random walk with restart

### concept

For gene(protein) interaction network, random walk starts from a protein `g`

and at each time step moves to one of the neighbors with the probability \(1-\beta\) (\(0 \leq \beta \leq 1\)).

The walk can also restarts from `g`

with probability \(\beta\). This process is defined by a transition matrix \(W\).

\(deg(j)\) is the number of neighbors (the degree) of protein \(g_{i}\) in the interaction network.

\(\beta\) represent the probability with which the walk starting at \(g_{i}\) is forced to restart from \(g_{i}\).

The random walk will reaches a stationary discribution described by the vector \(\vec{s}_{i}\)

\[\vec{s}^{\prime}_{i}=(1-\beta)W\vec{s}_{i}+\beta\vec{s}_{i}\]When \(\vec{s}^{\prime}_{i}=\vec{s}_{i}\), we can get

\[\vec{s}_{i}=\beta(I-(1-\beta)W)^{-1}\vec{e}_{i}\]where\(\vec{e}_{i}\) is the vector with a 1 in position \(i\) and 0 is in the remaining positions.

This part \(\beta(I-(1-\beta)W)^{-1}\) is called diffusion matrix \(F\).

\[F=\beta(I-(1-\beta)W)^{-1}\]Note that \(\vec{s}_{i}\) is the \(i^{th}\) column of \(F\).

### parameter

To calculate the diffusion matrix, we need know the value of \(\beta\). In HotNet2, they chose \(\beta\) to balance the amount of heat that diffuses from a protein to its immediate neighbors and to the rest of the network.

There is another parameter \(\delta\), it is the edge wight parameter. Itâ€™s used to make sure the HotNet2 will not find large subnetworks using random data.

For more detail of these parameters, please see the supplementary of this paper.

## sample size problem

I ask a question about sample size on HotNet google group.

I read the HotNet2 paper and find the size of samples used in this paper is very large. If my sample size is small ( for example, 10 samples or 20 samples ), could I use hotnet2 to do the pathway analysis? Is there any baseline of sample size ?

As I understand, you can use hotnet2 with whatever sample size you have. For example if you take a look on this analysis here: https://cs.brown.edu/research/pubs/theses/ugrad/2014/jain.pdf you can see that they used p-values as input, thus hotnet itself does not depend on the sample size. This means, that you have to take into account sample size caused biases at the p-value calculation. Therefore, if your sample size is low, you might want to consider more robust ways to calculate p-values, e.g. some rank based approaches (for example Rank Product).

Answer by Akos Tenyi