Note On "Information Gain" and the Fisher Information
From last post,
[http://rspeare.blogspot.com/2015/08/the-fisher-matrix-and-volume-collapse_31.html](http://rspeare.blogspot.com/2015/08/the-fisher-matrix-and-volume-collapse_31.html),
I was talking about "volume" collapse in parameter space due to some data,
$\vec{x}$. I'd like to relate this to information gain, which can be defined
pretty simply as:
\begin{eqnarray}
H[p(\vec{\theta})] - H[p(\vec{\theta} \vert \vec{x})] &=&
IG(\vec{\theta} \vert \vec{x})
\end{eqnarray}
Now, using Bayes' rule we can change what we've written in the second term
above:
\begin{eqnarray}
H[p(\vec{\theta})] - H[\mathcal{L}(\vec{x} \vert \vec{\theta})
p(\vec{\theta})] &=& IG(\vec{\theta} \vert \vec{x})
\end{eqnarray}
And using the addition property of entropy, we can write:
\begin{eqnarray}
IG(\vec{\theta} \vert \vec{x}) &=& - H[\mathcal{L}(\vec{x} \vert
\vec{\theta})]
\end{eqnarray}
But, with the fisher information matrix,
\begin{eqnarray}
\mathbf{F}_{ij} &=&\langle \frac{-\partial^2 \log
\mathcal{L}(\mathbf{x} \vert \mathbf{\theta})}{\partial \theta_i \partial
\theta_j} \rangle
\end{eqnarray}
we can estimate the covariance of the likelihood function, and therefore it's
entropy -- if we use the laplace approximation and denote the likelihood a
gaussian in parameter space:
\begin{eqnarray}
H[\mathcal{L}] &=& \frac{d}{2}\log(2\pi e) + \log \left( \vert
\mathbf{F}_{ij} \vert^{-1} \right)
\end{eqnarray}
This means that our information gain on $\vec{\theta}$ given an experiment
$\vec{x}$ is proportional to the logarithm of the determinant of the Fisher
matrix.
\begin{eqnarray}
IG(\vec{\theta} \vert \vec{x} ) &\sim &\log \left( \vert
\mathbf{F}_{ij}\vert \right)
\end{eqnarray}
And so, we now see intuitively why this is \textbf{called} the fisher
information. Our ``volume'' collapse on the variables of interest
$\vec{\theta}$ given our experiment, is:
\begin{eqnarray}
e^{IG(\mathrm{\theta} \vert \mathrm{x})} & \sim & \mathrm{det}\vert
\mathbf{F}_{ij} \vert
\end{eqnarray}