Equivalence of Shannon Entropy and Logarithm of Posterior Partition function (under Laplace Approximation)
Now, the integral of the
posterior over parameter space is called the evidence:
\begin{eqnarray}
P(D) = Z &=& \int d\vec{\theta} \mathcal{L}(D \vert
\vec{\theta})P(\vec{\theta})
\end{eqnarray}
If we approximate the un-normalized posterior as a Gaussian -- which I've
recently learned from Christopher Bishop is called the Laplace approximation
-- by taking logarithmic derivatives, then we get:
\begin{eqnarray}
P(D \vert \vec{\theta})P(\vec{\theta}) &\approx & P(D \vert
\vec{\theta}_{\mathrm{MAP}})P(\vec{\theta}_{\mathrm{MAP}})\mathrm{exp}\left[-(\vec{\theta}-\vec{\theta}_{\mathrm{MAP}})_i\left(F_{ij}+\Sigma_{ij}^{-1}\right)(\vec{\theta}-\vec{\theta}_{\mathrm{MAP}})_j
\right]
\end{eqnarray}
We already know what the integral of this Gaussian over parameter space will
be:
\begin{eqnarray}
\log Z &=& \frac{D}{2}\log(2\pi)+\frac{1}{2}\log \left( \vert
F_{ij}+\Sigma_{ij}^{-1} \vert\right)
\end{eqnarray}
Comparing this with our measured entropy of the posterior, we see we're off by
just a constant:
\begin{eqnarray}
H\left[ P(\vec{\theta} \vert D \right] &=&
\frac{D}{2}\log(2\pi)+\frac{1}{2}\log \left( \vert F_{ij}+\Sigma_{ij}^{-1}
\vert\right) + \frac{D}{2}
\end{eqnarray}
This is another example of why a statistical mechanics interpretation of $Z$,
our normalization of the posterior, is right on point. It's logarithm -- up to
an additive constant, which can be thrown away -- is equal to the Entropy of
our distribution, which is a common point of wisdom in statistical mechanics.
So in conclusion, under the laplace approximation, writing our posterior as a
Gaussian by expanding in the exponential, collecting the first
$\vec{\theta}_{\mathrm{MAP}}$ and second $F_{ij}$ cumulants, we get:
\begin{eqnarray}
\log Z &=& H\left[P(\vec{\theta} \vert D )\right] + \mathrm{const}
\end{eqnarray}