Conditioning Multinormals
In the following, all
equals signs will really mean "proportional" as we're going to work with
probability densities and for the time being, don't want to worry about
normalization.
Let $\mathbf{x}$ and $\mathbf{y}$ be random vectors drawn from the following
multinormal:
\begin{eqnarray}
P(\mathbf{x},\mathbf{y}) \sim \mathrm{N}\left( \left[\begin{array}{c} \mu_x \\
\mu_y \end{array}\right], \left[\begin{array}{cc} A & C \\ C^T & B
\end{array}\right]^{-1} \right)\\
\sim e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x-\mu_x ,& y - \mu_y
\end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & B
\end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x, \\ y-\mu_y
\end{array}\right] \right)}
\end{eqnarray}
The covariance matrix above is quite complicated, but we can think of $C^{-1}$
as cross correlating our vectors $\mathbf{x}$ and $\mathbf{y}$ and $A^{-1}$
and $B^{-1}$ "auto"-correlating within the vectors $\mathbf{x}$ and
$\mathbf{y}$. We have set this up to be a **Block Diagonal Matrix**. We need
not have $\mathbf{x}$ and $\mathbf{y}$ be of the same dimension.
To find the inverse of a block diagonal matrix, like the above, we write
\begin{equation}
A=\left[\begin{array}{cc} P & Q \\ R & S \end{array}\right]
\end{equation}
we write:
\begin{eqnarray}
\left[\begin{array}{cc} P & Q \\ R & S
\end{array}\right]\left[\begin{array}{cc} X_1 & X_2 \\ X_3 & X_4
\end{array}\right] &=& \left[\begin{array}{cc} I & 0 \\ 0 & I
\end{array}\right]
\end{eqnarray}
Which results in the following linear equations:
\begin{eqnarray}
PX_1+QX_3 &=& I \\
PX_2 &=& - Q X_4 \\
RX_1 = - S X_3 \\
RX_2+SX_4 &=& I
\end{eqnarray}
with the solutions:
\begin{eqnarray}
X_1 &=& P^{-1}+P^{-1}QMRP^{-1}\\
X_2 &=& -P^{-1}QM\\
X_3 &=& -MRP^{-1} \\
X_4 &=& M \\
M &=& (S-RP^{-1}Q)^{-1}
\end{eqnarray}
and
\begin{eqnarray}
X_1 &=& N\\
X_2 &=& -NQS^{-1}\\
X_3 &=& -S^{-1}RN \\
X_4 &=& S^{-1}+S^{-1}RNQS^{-1} \\
N &=& (P-QS^{-1}R)^{-1}
\end{eqnarray}
Now, let's say we are interested in the conditional multinormal
\begin{eqnarray}
(x \vert y) \sim ?
\end{eqnarray}
We expect ?, above, to be another Gaussian, which will be uniquely determined
by its mean and variance -- the first two cumulants. Let's right down the
conditional expectation value using Bayes' Theorem:
\begin{eqnarray}
\langle\mathbf{x} \vert \mathbf{y}\rangle &=& \int \mathbf{x} P(x\vert
y) d\mathbf{x} \\
&=& \int \mathbf{x} \frac{P(\mathbf{x},\mathbf{y})}{P(\mathbf{y})}
d\mathbf{x}
\end{eqnarray}
And now, to expand things to higher order in $\mathbf{x}$, we can create a
conditional moment generating function:
\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x}
\left(e^{\mathbf{\lambda}\cdot (\mathbf{x} \vert \mathbf{y})}\right)\ \
\frac{e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x-\mu_x & y-\mu_y
\end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & B
\end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x \\ y-\mu_y
\end{array}\right] \right)} }{e^{-\frac{1}{2} \left( \left[\begin{array}{cc}
x-\mu_x & y-\mu_y \end{array}\right] \cdot \left[\begin{array}{cc} 0 &
0 \\ 0 & B \end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x \\
y-\mu_y \end{array}\right] \right)}}
\end{eqnarray}
Notice how the denominator is just the Probability density on $\mathbf{y}$.
Let's rewrite $\mathbf{x}^\prime = \mathbf{x}-\mathbf{\mu}_x$ and
$\mathbf{y}^\prime = \mathbf{y}-\mathbf{\mu}_y$ to simplify things. Let's also
get rid of the conditional notation $\vert y $ in the exponent to clean things
up and make conditioning implicit:
\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x}
\left(e^{\mathbf{\lambda}\cdot (\mathbf{x^\prime}+\mathbf{\mu}_x )}\right)\ \
e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x^\prime & y^\prime
\end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & 0
\end{array}\right]\cdot \left[\begin{array}{c} x^\prime \\ y^\prime
\end{array}\right] \right)} \\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x}
\left(e^{\mathbf{\lambda}\cdot (\mathbf{x^\prime}+\mathbf{\mu}_x )}\right)\ \
e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x^\prime & y^\prime
\end{array}\right] \cdot \left[\begin{array}{c} Ax^\prime+Cy^\prime \\
C^Tx^\prime \end{array}\right] \right)} \\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot
\mathbf{\mu}_x }\int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot
\mathbf{x^\prime}}\right)\ \ e^{-\frac{1}{2}\left( x^\prime_i
A_{ij}x^\prime_j)- C_{ij} x^\prime_i y^\prime_j \right)}
\end{eqnarray}
Now we can simplify things a little bit, by writing $ik_i =
\lambda_i-C_{ij}y^\prime_j$
\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot
\mathbf{\mu}_x }\int d\mathbf{x}
\left(e^{(\lambda_i-C_{ij}y^\prime_j)x^\prime_i}\right)\ \
e^{-\frac{1}{2}\left( x^\prime_i A_{ij}x^\prime_j \right)}\\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot
\mathbf{\mu}_x }\int d\mathbf{x} e^{ik_i x^\prime_i}\ \ e^{-\frac{1}{2}\left(
x^\prime_i A_{ij}x^\prime_j \right)}
\end{eqnarray}
And now we have a multidimensional Fourier transform of a Gaussian, (which is
another Gaussian)
\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot
\mathbf{\mu}_x }e^{-\frac{1}{2}k_i A^{-1}_{ij}k_j}\\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot
\mathbf{\mu}_x
-\frac{1}{2}(\lambda_i-C_{il}y^\prime_l)A^{-1}_{ij}(\lambda_j-C_{jm}y^\prime_m)}\\
\end{eqnarray}
Foiling out and collecting like terms in $\lambda $, we've got:
\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda_i
\left[\mathbf{\mu}^{(x)}_i -\frac{1}{2}A^{-1}_{ij}
C_{jm}(y_m-\mu^{(y)}_m)\right]-\left[\frac{1}{2}\lambda_i
A^{-1}_{ij}\lambda_j\right] \ \ - \ \
(y_l-\mu^{(y)}_l)C_{il}A^{-1}_{ij}C_{jm}(y_m-\mu^{(y)}_m)} \\
\end{eqnarray}
And now we see that the first moment and variance behave differently than one
would expect:
\begin{eqnarray}
\frac{\partial}{\partial \lambda_i}\langle e^{(\mathbf{x} \vert
\mathbf{y})}\vert_{\mathbf{\lambda}=0} \rangle &=&\langle x_i \rangle
= \mathbf{\mu}^{(x)}_i - A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m) \\
\frac{\partial^2}{\partial \lambda_i \lambda_j}\langle e^{(\mathbf{x} \vert
\mathbf{y})}\vert_{\mathbf{\lambda}=0} \rangle &=&\langle x_i x_j
\rangle = A^{-1}_{ij} = A-C B^{-1}C^T
\end{eqnarray}
Where each of these expectation values are modified by the Gaussian on
$\mathbf{y}$, which is essentially another normalization factor that we've
thrown out:
\begin{eqnarray}
f(y) &=&
\mathrm{Exp}\left[-\frac{1}{2}(y_l-\mu^{(y)}_l)C_{il}A^{-1}_{ij}C_{jm}(y_m-\mu^{(y)}_m)\right]
\end{eqnarray}
So now, if we write the conditional Probability density function**
**$\mathbf{P(x \vert y)}$, we have:
\begin{eqnarray}
P(\mathbf{x} \vert \mathbf{y}) &=& N\left[ \mathbf{\mu}^{(x)}_i -
A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m), A^{-1}_{ij} \right] \\
P(\mathbf{x} \vert \mathbf{y}) &=& N\left[ \mathbf{\mu}^{(x)}_i -
A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m), A-C B^{-1}C^T \right]
\end{eqnarray}