Multi-Dimensional Central Limit Theorem
Let us begin with the
definition of a normalized probability distribution:
\begin{eqnarray}
\int_{-\infty}^{\infty} P(x) dx &=& 1
\end{eqnarray}
and a moment,
\begin{eqnarray}
\int_{-\infty}^{\infty} x^n P(x) dx &=& m_n
\end{eqnarray}
Notice that $m_1$ is the `mean' or average, or, `center of mass' of the
Probability distribution. The second moment, $m_2$ is intimately related to
the variance, or width of our distribution $P(x)$.
We can define the moment generating function as
\begin{eqnarray}
\int e^{kx} P(x) dx &=& M(k)
\end{eqnarray}
Where now, taylor expanding our exponential integrand, we see that the $n^{\rm
th}$ derivative of $M(k)$ with respect to $k$ evaluated at $k=0$, gives the
respective moments:
\begin{eqnarray}
M(k) &=& \int e^{kx} P(x) dx\\
&=& \int \Sigma_n \frac{(kx)^n}{n!} P(x) dx \\
\frac{\partial^n M}{\partial k^n}\vert_{k=0} &=& \int x^n P(x) dx \\
\frac{\partial^n M}{\partial k^n}\vert_{k=0} &=& m_n
\end{eqnarray}
We can also define the moment generating function as the laplace transform of
$P(x)$, which leaves us with an added sign convention in the definition of our
moments,
\begin{eqnarray}
M^\prime (k) = \int e^{-kx}P(x) dx \\
(-1)^n \frac{\partial^n M^\prime}{\partial k^n}\vert_{k=0} &=& m_n
\end{eqnarray}
Now we are in a position to define the fourier transorm of our probability
distribution, called the characteristic function,
\begin{eqnarray}
M^\prime(ik) = \phi(k) &=& \int e^{-ikx}P(x) dx
\end{eqnarray}
We can simply think of $\phi(k)$ as the fourier transform of $P(x)$, and note
that $\phi(0) =1$ always -- as required by our normalization condition above.
Writing the inverse fourier transform, we can now write $P(x)$ in terms of
$k-$space components,
\begin{eqnarray}
P(x) &=& \frac{1}{2\pi}\int e^{ikx} \phi(k) dk \\
&=& \frac{1}{\pi}\int e^{ikx} e^{\log\left(\phi(k)\right)}dk
\end{eqnarray}
Where I have used the $\frac{1}{2 \pi}$ for inverse fourier normalization
conventions (not the same as the QM convention!).
I would now like to define cumulants which, are the corresponding derivatives
of the $\psi(k)=\log\left(\phi(k)\right)$ function, seen the exponential
argument of the above equation. First, notice that under the fourier
transform, multiplication in x-space is differentiation in k-space, i.e.
\begin{eqnarray}
\mathcal{F}\left[ xP(x) \right] &=& (-i)\frac{\partial \phi}{\partial
k}
\end{eqnarray}
and so we find that the various moments can be defined by the partial
derivatives of the characteristic function (not much different than what we
did before with the moment generating function $M^\prime$, under simply a
change of variable):
\begin{eqnarray}
m_n &=& (-i)^n \frac{\partial^n \phi}{\partial k^n}\vert_{k=0}
\end{eqnarray}
Using our $\psi(k)$ function from before, we can now define the cumulants
\begin{eqnarray}
c_n &=& (i)^n \frac{\partial^n \psi}{\partial k^n}\vert_{k=0}
\end{eqnarray}
which, can be constructed out of our initial moments,
\begin{eqnarray}
c_1 &=& m_1 \\
c_2 &=& m_2-m_1^2\\
c_3 &=& m_2-3m_2m_1+2m_1^2\\
c_4 &=& m_4-4m_3m_1-3m_2^2+12m_2m_1^2-6m_1^2
\end{eqnarray}
Notice that the first cumulant is our standard mean, or average, and that the
second cumulant is our definition of variance, or the expectation value of
difference from the mean. (i.e. $c_1=\bar{x}$ and $c_2
=\sigma^2=E[(x-\bar{x})^2]$). These cumulants are very good descriptors of a
statistical distribution because they add under convolution. Let's take a
closer look.
Under the fourier transform, convolution in x-space is multiplication in
k-space -- see convolution theorem -- and so cumulants and cumulant-generating
functions add under convolution. In probability theory, when one adds two
statistically independent variables $x$ and $y$ in order to create a new
variable $z=x+y$, then the probability distribution that describes $z$ is the
convolution of the two former probability distributions: $P(z) = \left(P_x
\star P_y\right)(z)$. Let's see what such an addition would do in fourier
space,
\begin{eqnarray}
P(z) &=& \left(P_x \star P_y\right)(z) \\
&=& \mathcal{F}^{-1}\left[ \phi_x(k)\phi_y(k) \right]\\
&=& \frac{1}{2\pi}\int e^{i k x} \phi_x(k)\phi_y(k) dk \\
&=& \frac{1}{2\pi}\int e^{i k x} e^{\log\left(\phi_x(k)\right)}e^{\log
\left(\phi_y(k)\right)}dk \\
&=& \frac{1}{2\pi}\int e^{i k x} e^{\psi_x(k)+\psi_y(k)}dk \\
\end{eqnarray}
Expanding both cumulant-generating functions as a taylor series, and
collecting like powers of $k$, we find
\begin{eqnarray}
P(z) &=& \frac{1}{2\pi}\int e^{i k x}
e^{-ic_{1x}k-c_{2x}k^2/2+ic_{3x}\frac{k^3}{3!}+\dots}e^{-ic_{1y}k-c_{2y}k^2/2+ic_{3y}\frac{k^3}{3!}+\dots}dk
\\
&=& \frac{1}{2\pi}\int e^{i k x}
e^{-i(c_{1x}+c_{1y})k-(c_{2x}+c_{2y})k^2/2+i(c_{3x}+c_{3y})\frac{k^3}{3!}+\dots}dk
\\
\end{eqnarray}
Notice that $c_{1z}= c_{1x}+c_{1y}$ and $c_{2z}= c_{2x}+c_{2y}$, or the mean
is the sum of the two former means, and the variance is the sum of the two
former variances. Very cool!
Under what are called identically independent processes, such as random walks,
we have a new variable $z=\sigma x_i$, which is a superposition of independent
variables that are described by the exact same probability density function.
In this case, our expectation value -- or first moment -- for the variable $z$
becomes $Nm_{1z}=N\bar{x}$, where $N$ is the number of `steps' or `trials'
taken. Similarly we find that the variance of the sum of random variables is
$\sigma^2_z = N \sigma^2_x$. Or the variance grows as $\sqrt{N}$.
The central limit theorem depends intimately upon this addition of cumulants
under convolution, and uses the inverse properties of expansion and dilation
under the fourier transform to truncate the taylor expansion of our effective
cumulant generating functions. Let's take a look at an `iid' process, or
$P(z)$, where $z=\Sigma x_i$.
\begin{eqnarray}
P(z) &=& \frac{1}{2\pi} \int e^{ikz}\Pi_{i=1}^N \phi_i(k) dk \\
P(z) &=& \frac{1}{2\pi} \int e^{ikz}e^{\Sigma_i \psi_i(k)} dk \\
P(z) &=& \frac{1}{2\pi} \int e^{ikz}e^{\Sigma_i \left(
-ic_{1i}\right)-ik}e^{\Sigma_i \left( -c_{2i}\right)\frac{-k^2}{2}}+\cdots dk
\end{eqnarray}
As we convolve more and more probability density functions, our variances will
add, and we will end up with an extremely wide density function. In fourier
space, this corresponds to an extremely narrow characteristic function, and
allows us to assume negligible value of $\phi(k)$ at high $k$-values. Thus, we
can write
\begin{eqnarray}
\psi(k) &=& -ic_1 k -
c_2\frac{k^2}{2}+ic_3\frac{k^3}{3!}-c_4\frac{k^4}{4!}+\dots
\end{eqnarray}
as
\begin{eqnarray}
\psi(k) &\approx & -ic_1 k - c_2\frac{k^2}{2}
\end{eqnarray}
Taking this into account -- and noting that the $c_1$ and $c_2$ are in fact
sums of former means and variances under convolution -- we can now write
$P(z)$ as,
\begin{eqnarray}
P(z) &=& \frac{1}{2 \pi} \int e^{ikz}e^{-ic_1 k}e^{-c_2
\frac{k^2}{2}}dk \\
P(z) &=& \frac{1}{2 \pi} \int e^{ikz}e^{-ic_1 k}e^{-c_2
\frac{k^2}{2}}dk
\end{eqnarray}
Setting $z^\prime = z-c_1$, or, centering about the mean we can exclude the
first exponential and write
\begin{eqnarray}
P(z^\prime) &=& \frac{1}{2 \pi} \int e^{ikz^\prime}e^{-c_2
\frac{k^2}{2}}dk
\end{eqnarray}
This is the fourier transform of a gaussian, which is itself a gaussian,
\begin{eqnarray}
P(z^\prime) &=& \frac{1}{\sqrt{2 \pi c_2}}e^{\frac{-z^2}{2c_2}}\\
P(z) &=& \frac{1}{\sqrt{2 \pi
\sigma^2}}e^{\frac{-(z-c_1)^2}{2\sigma^2}}\\
P(z) &=& \frac{1}{\sqrt{2 \pi
\sigma^2}}e^{\frac{-(z-\bar{z})^2}{2\sigma^2}}
\end{eqnarray}
Whew! Notice that our mean and variance in this final equation are simply the
sums of former means and variances of the independent variables, $x_i$ that
created $z$. ($z = \Sigma_i x_{i}$ ; $c_1 = \Sigma_i c_{1i}$ ; $c_2 =
\Sigma_i c_{2i}$). The central limit theorem does not depend on each of these
variables -- perhaps $N$ of them -- being identical, or, described by the same
probability density function, they need only be independent.
Now, extrapolation to multiple dimension is not to bad, we now deal with
random vectors as compared to random scalars. Let's represent these vectors
with boldface, and label their components with a superscript. We can now write
\begin{eqnarray}
\mathbf{z} &=& \Sigma_j \mathbf{x^j}
\end{eqnarray}
Where the superscript denotes the $j^{\rm th}$ independent random vector.
Similarly, our $k$ space is now multidimensional, and so we write the forward
fourier transform of our probability density as
\begin{eqnarray}
\phi(\mathbf{k}) &=& \int e^{i\mathbf{k}\cdot
\mathbf{x}}P(\mathbf{x})d^d\mathbf{x} \\
\phi(\mathbf{k}) &=& \int
e^{i\mathbf{k_i}\mathbf{x_i}}P(\mathbf{x})d^d\mathbf{x}
\end{eqnarray}
The cumulants and the moments are now rank $n$ tensors, seen by the following
\begin{eqnarray}
\frac{\partial^n M}{\partial \mathbf{k}_i \partial \mathbf{k}_j \cdots
\partial \mathbf{k}_\gamma}\vert_{\mathbf{k}=0} &=& (-i)^n
\mathbf{m}_{ij\cdots \gamma} \\
\frac{\partial^n \psi}{\partial \mathbf{k}_i \partial \mathbf{k}_j \cdots
\partial \mathbf{k}_\gamma}\vert_{\mathbf{k}=0} &=& (-i)^n
\mathbf{c}_{ij\cdots \gamma}
\end{eqnarray}
We now use the same approximation scheme as before, convolving an absurd
number of multi-dimensional probability density functions, $P(\mathbf{x}_i)$
in order to yield a convolution in fourier space -- and thus an addition of
cumulant generating functions
\begin{eqnarray}
P(\mathbf{z}) &=& \frac{1}{(2\pi)^d}\int e^{i\mathbf{k}\cdot
\mathbf{z}}e^{\Sigma \psi_i(\mathbf{k})}d^d\mathbf{k}\\
P(\mathbf{z}) &=& \frac{1}{(2\pi)^d}\int e^{i\mathbf{k}\cdot
\mathbf{z}}e^{-i \mathbf{c}_i \mathbf{k}_i)}e^{-i\mathbf{k}_i \mathbf{c}_{ij}
\mathbf{k}_j)}e^{-i \mathbf{c}_{ijk} \mathbf{k}_i \mathbf{k}_j
\mathbf{k}_k)}\cdots d^d\mathbf{k}\\
\end{eqnarray}
Truncating our taylor expansion and centering about the multidimensional mean
$\mathbf{z}_i^\prime = \mathbf{z}_i - \mathbf{c}_i $,
\begin{eqnarray}
P(\mathbf{z}^\prime) &=& \frac{1}{(2\pi)^d}\int e^{i\mathbf{k}\cdot
\mathbf{z}^\prime}e^{-i\mathbf{k}_i \mathbf{c}_{ij}
\mathbf{k}_j)}d^d\mathbf{k}\\
\end{eqnarray}
We can rescale coordinates by writing
\begin{eqnarray}
\mathbf{w} &=& \sqrt{\mathbf{c}_{ij}}\mathbf{k}_j
\end{eqnarray}
where, the square of a matrix can be written terms of its diagonalization by
two unitary matrices $\mathbf{S}_{ij}$, and the diagonal matrix of eigenvalues
$\mathbf{\Lambda}_{ij} = \delta_{ij}\lambda_i$,
\begin{eqnarray}
\mathbf{c}_{ij} &=& \mathbf{S}^{-1} \mathbf{\Lambda}\mathbf{S} \\
\mathbf{c}_{ij} &=& \mathbf{S}_{il} \lambda_l \delta_{lm}
\mathbf{S}_{lj} \\
\sqrt{\mathbf{c}_{ij}} &=& \mathbf{S}_{il} \sqrt{\lambda_l}
\delta_{lm} \mathbf{S}_{lj} \\
\mathbf{w} &=& \sqrt{\mathbf{c}_{ij}}\mathbf{k}_j
\end{eqnarray}
The determinant of this matrix will be the product of the eigenvalues
$\lambda_j$, which is also the dilating factor by which one expands a single
$\mathbf{k}$ vector; so we can now write our infinitesimal volume element in
$d$-dimensional $\mathbf{k}$ space using this determinant,
\begin{eqnarray}
d^d \mathbf{w} &=& \left( \vert \mathbf{c}_{ij} \vert \right)^{d/2}
d^d\mathbf{k}
\end{eqnarray}
Now rewriting our integral, we have a multi-dimensional transform of a
Gaussian, which is another Gaussian
\begin{eqnarray}
P(\mathbf{z}^\prime) &=& \frac{1}{(2\pi)^d} \int e^{i
\frac{\mathbf{w}_j}{\sqrt{\mathbf{c_{ij}}}}
\mathbf{z}_i^\prime}e^{-\frac{\mathbf{w}_i
\mathbf{w}_i}{2}}\frac{d^d\mathbf{w}}{\vert \mathbf{c}_{ij} \vert^{d/2}}\\
P(\mathbf{z}^\prime) &=& \frac{1}{(2\pi \vert \mathbf{c}_{ij}
\vert)^{d/2}}
\mathrm{exp}\left(-\frac{1}{2}\frac{\mathbf{z^\prime}_l}{\mathbf{c}_{il}}\frac{\mathbf{z^\prime}_\gamma}{\mathbf{c}_{i
\gamma}}\right)
\end{eqnarray}
Whew! The argument in the exponential -- namely, those nasty matrix
multiplications into the second cumulant matrices -- can be simplified,
yielding a single covariance matrix on the bottom.
\begin{eqnarray}
P(\mathbf{z}^\prime) &=& \frac{1}{(2\pi \vert \mathbf{c}_{ij}
\vert)^{d/2}}
\mathrm{exp}\left(-\frac{1}{2}\frac{\mathbf{z^\prime}_l}{\mathbf{c}_{il}}\frac{\mathbf{z^\prime}_\gamma}{\mathbf{c}_{i
\gamma}}\right)
\end{eqnarray}