Note On "Information Gain" and the Fisher Information

08 Oct 2015

From last post, [http://rspeare.blogspot.com/2015/08/the-fisher-matrix-and-volume-collapse_31.html](http://rspeare.blogspot.com/2015/08/the-fisher-matrix-and-volume-collapse_31.html), I was talking about "volume" collapse in parameter space due to some data, $\vec{x}$. I'd like to relate this to information gain, which can be defined pretty simply as: \begin{eqnarray} H[p(\vec{\theta})] - H[p(\vec{\theta} \vert \vec{x})] &=& IG(\vec{\theta} \vert \vec{x}) \end{eqnarray} Now, using Bayes' rule we can change what we've written in the second term above: \begin{eqnarray} H[p(\vec{\theta})] - H[\mathcal{L}(\vec{x} \vert \vec{\theta}) p(\vec{\theta})] &=& IG(\vec{\theta} \vert \vec{x}) \end{eqnarray} And using the addition property of entropy, we can write: \begin{eqnarray} IG(\vec{\theta} \vert \vec{x}) &=& - H[\mathcal{L}(\vec{x} \vert \vec{\theta})] \end{eqnarray} But, with the fisher information matrix, \begin{eqnarray} \mathbf{F}_{ij} &=&\langle \frac{-\partial^2 \log \mathcal{L}(\mathbf{x} \vert \mathbf{\theta})}{\partial \theta_i \partial \theta_j} \rangle \end{eqnarray} we can estimate the covariance of the likelihood function, and therefore it's entropy -- if we use the laplace approximation and denote the likelihood a gaussian in parameter space: \begin{eqnarray} H[\mathcal{L}] &=& \frac{d}{2}\log(2\pi e) + \log \left( \vert \mathbf{F}_{ij} \vert^{-1} \right) \end{eqnarray} This means that our information gain on $\vec{\theta}$ given an experiment $\vec{x}$ is proportional to the logarithm of the determinant of the Fisher matrix. \begin{eqnarray} IG(\vec{\theta} \vert \vec{x} ) &\sim &\log \left( \vert \mathbf{F}_{ij}\vert \right) \end{eqnarray} And so, we now see intuitively why this is \textbf{called} the fisher information. Our ``volume'' collapse on the variables of interest $\vec{\theta}$ given our experiment, is: \begin{eqnarray} e^{IG(\mathrm{\theta} \vert \mathrm{x})} & \sim & \mathrm{det}\vert \mathbf{F}_{ij} \vert \end{eqnarray}