rspeare.github.io

Brief Note on Information Gain Definition

08 Oct 2015

After being confused for quite some time, I've realized that information gain and mutual information gain are essentially the same thing, yet the first one is not symmetric. Mutual information as an intuitive measure of correlation can be displayed by noting that for a bivariate normal distribution, the mutual information between the two variables is: \begin{eqnarray} I(x,y) &=& IG(x \vert y) \\ &=& H(x)-H(x \vert y) \\ &\sim & -\frac{1}{2}\log(1-\rho^2) \end{eqnarray}

[

](http://2.bp.blogspot.com/-tIxn9qOI4Kg/VhantUnUgLI/AAAAAAAACQs/q2fNffiB1zI/s1600/Screenshot%2B2015-10-08%2B13.28.03.png) Where $\rho$ is the Pearson correlation, varying between zero and one. So, when choosing to split on variables in a decision tree, or designing an experiment that probes some underlying model space, we obviously want to choose pairs $x,y$ that have $\rho \to 1$, in order to yield as much information as possible.