rspeare.github.io

Equivalence of Shannon Entropy and Logarithm of Posterior Partition function (under Laplace Approximation)

01 Sep 2015

Now, the integral of the posterior over parameter space is called the evidence: \begin{eqnarray} P(D) = Z &=& \int d\vec{\theta} \mathcal{L}(D \vert \vec{\theta})P(\vec{\theta}) \end{eqnarray} If we approximate the un-normalized posterior as a Gaussian -- which I've recently learned from Christopher Bishop is called the Laplace approximation -- by taking logarithmic derivatives, then we get: \begin{eqnarray} P(D \vert \vec{\theta})P(\vec{\theta}) &\approx & P(D \vert \vec{\theta}_{\mathrm{MAP}})P(\vec{\theta}_{\mathrm{MAP}})\mathrm{exp}\left[-(\vec{\theta}-\vec{\theta}_{\mathrm{MAP}})_i\left(F_{ij}+\Sigma_{ij}^{-1}\right)(\vec{\theta}-\vec{\theta}_{\mathrm{MAP}})_j \right] \end{eqnarray} We already know what the integral of this Gaussian over parameter space will be: \begin{eqnarray} \log Z &=& \frac{D}{2}\log(2\pi)+\frac{1}{2}\log \left( \vert F_{ij}+\Sigma_{ij}^{-1} \vert\right) \end{eqnarray} Comparing this with our measured entropy of the posterior, we see we're off by just a constant: \begin{eqnarray} H\left[ P(\vec{\theta} \vert D \right] &=& \frac{D}{2}\log(2\pi)+\frac{1}{2}\log \left( \vert F_{ij}+\Sigma_{ij}^{-1} \vert\right) + \frac{D}{2} \end{eqnarray} This is another example of why a statistical mechanics interpretation of $Z$, our normalization of the posterior, is right on point. It's logarithm -- up to an additive constant, which can be thrown away -- is equal to the Entropy of our distribution, which is a common point of wisdom in statistical mechanics. So in conclusion, under the laplace approximation, writing our posterior as a Gaussian by expanding in the exponential, collecting the first $\vec{\theta}_{\mathrm{MAP}}$ and second $F_{ij}$ cumulants, we get: \begin{eqnarray} \log Z &=& H\left[P(\vec{\theta} \vert D )\right] + \mathrm{const} \end{eqnarray}