Marginal distributions of the multinomial normal distribution

 Marginal distributions of a multivariate normal distribution are also normal distributions. Let's prove this.



See also: Multivariate normal distribution [Wikipedia]

The density function of a multivariate normal distribution is given as

\[ f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\exp\left[-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{\top}\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu})\right] \]

where \(\mathbf{x}\in\mathbb{R}^n\) is the random vector, \(\boldsymbol{\mu}\) is the mean vector and \(\Sigma\) is the covariance matrix. By changing the variables \(\mathbf{x} - \boldsymbol{\mu} \mapsto \mathbf{x}\), we can assume the mean is zero without losing generality. So, in the following, we only consider

\[ f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\exp\left[-\frac{1}{2}\mathbf{x}^{\top}\Sigma^{-1}\mathbf{x}\right]. \]

We need the following theorems from linear algebra.

Theorem 1

Let \(A\) be an \(n\times n\) regular, \(D\) be \(m\times m\) regular, \(B\) be \(n\times m\), and \(D\) be \(m\times n\) matrices, and \(S = D - CA^{-1}B\) be regular. Then,

\[ \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} A^{-1} + A^{-1}BS^{-1}CA^{-1} & -A^{-1}BS^{-1}\\ -S^{-1}CA^{-1} & S^{-1} \end{bmatrix} \tag{eq:1} \]

Remark. Perhaps it is easier to memorize if we write in the following way:

\[ \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}^{-1} = \begin{bmatrix} A_{11}^{-1} + A_{11}^{-1}A_{12}S_{22}^{-1}A_{21}A_{11}^{-1} & -A_{11}^{-1}A_{12}S_{22}^{-1}\\ -S_{22}^{-1}A_{21}A_{11}^{-1} & S_{22}^{-1} \end{bmatrix} \]

with \(S_{22} = A_{22} - A_{21}A_{11}^{-1}A_{12}\). □

Proof. Let's define the following matrices:

\[ \begin{align*} F &= \begin{bmatrix} A & B\\ C & D \end{bmatrix},\\ W &= \begin{bmatrix} I_n & O_{n,m}\\ -CA^{-1} & I_m \end{bmatrix},\\ U &= \begin{bmatrix} I_n & -BS^{-1}\\ O_{m,n} & I_m \end{bmatrix}. \end{align*} \]

Here, \(I_n\) is the \(n\times n\) identity matrix and \(O_{n,m}\) is the \(n\times m\) zero matrix. Then,

\[ \begin{align*} G & = WF = \begin{bmatrix} A & B\\ O_{m,n} & S \end{bmatrix},\\ H &= UG = UWF = \begin{bmatrix} A & O_{n,m}\\ O_{m,n} & S \end{bmatrix}. \end{align*} \]

Therefore, 

\[ H^{-1} = F^{-1}W^{-1}U^{-1}. \]

Hence,

\[F^{-1} = H^{-1}UW.\]

But

\[ H^{-1}= \begin{bmatrix} A^{-1} & O_{n,m}\\ O_{m,n} & S^{-1} \end{bmatrix}. \]

Thus, we obtain (eq:1). ■

Theorem 2

With the same setting as Theorem 1, we have the matrix determinant

\[ \begin{align} \begin{vmatrix} A & B\\ C & D \end{vmatrix} &= |A|\cdot |D - CA^{-1}B|\end{align} \]

Proof.   We use the same notation as in the proof of Theorem 1.

  Noting \(G = WF\) and \(|W| = 1\), we have \(|F| = |G|\). But \(G\) is block triangular, so

\[ |G| = |A|\cdot|S| = |A|\cdot|D - CA^{-1}B|. \]

Now, let's go back to our problem. First, split the variable \(\mathbf{x}\) into two blocks:

\[ \mathbf{x} = \begin{bmatrix} \mathbf{x}_1\\ \mathbf{x}_2 \end{bmatrix} \]

where \(\mathbf{x}_1\in \mathbb{R}^p\) and \(\mathbf{x}_2 \in \mathbb{R}^q\) with \(p + q = n\). We consider integrating \(\mathbf{x}_2\) out to get the marginal distribution

\[ f(\mathbf{x}_1) = \int_{\mathbb{R}^q} f(\mathbf{x}_1, \mathbf{x}_2)\,d\mathbf{x}_2. \]

We also split the covariance matrix into blocks as

\[ \Sigma = \begin{bmatrix} \Sigma_{11} & \Sigma_{12}\\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} \]

where \(\Sigma_{11}\) is \(p\times p\),  \(\Sigma_{22}\) is \(q\times q\), \(\Sigma_{12} = \Sigma_{21}^\top\) are \(p\times q\) matrices.

By Theorem 1, we have

\[\Sigma^{-1} = \begin{bmatrix} \Sigma_{11}^{-1} + \Sigma_{11}^{-1}\Sigma_{12}S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1} & - \Sigma_{11}^{-1}\Sigma_{12}S^{-1}\\ -S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1} & S^{-1} \end{bmatrix}  \]

where \(S = \Sigma_{22} - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\Sigma_{12}\). Thus,

\[ \begin{align*} \mathbf{x}^{\top}\Sigma^{-1}\mathbf{x} &= \mathbf{x}_1^\top(\Sigma_{11}^{-1} + \Sigma_{11}^{-1}\Sigma_{12}S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1})\mathbf{x}_{1} -2\mathbf{x}_1^{\top}\Sigma_{11}^{-1}\Sigma_{12}S^{-1}\mathbf{x}_2 + \mathbf{x}_2^{\top}S^{-1}\mathbf{x}_2\\ &= \mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1\\ & ~~ + \mathbf{x}_1^\top\Sigma_{11}^{-1}\Sigma_{12}S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1 - \mathbf{x}_1^{\top}\Sigma_{11}^{-1}\Sigma_{12}S^{-1}\mathbf{x}_2\\ & ~~ -\mathbf{x}_2^{\top}S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1 + \mathbf{x}_2^{\top}S^{-1}\mathbf{x}_2\\ &= \mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1\\ & ~~ -\mathbf{x}_1^\top\Sigma_{11}^{-1}\Sigma_{12}S^{-1}(\mathbf{x}_2 - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1)\\ & ~~ + \mathbf{x}_2^{\top}S^{-1}(\mathbf{x}_2 - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1)\\ &=\mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1 +(\mathbf{x}_2 - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1)^{\top}S^{-1}(\mathbf{x}_2 - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1) \end{align*} \]

By Theorem 2, 

\[|\Sigma|=|\Sigma_{11}|\cdot|S|\]

Thus,

\[ \begin{align} \frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\int_{\mathbb{R}^q}\exp\left(-\frac{1}{2}\mathbf{x}^{\top}\Sigma^{-1}\mathbf{x}\right)\,d\mathbf{x}_2 &= \frac{\sqrt{(2\pi)^q|S|}}{\sqrt{(2\pi)^n|\Sigma|}}\exp\left(-\frac{1}{2}\mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1\right)\\ &= \frac{1}{\sqrt{(2\pi)^p|\Sigma_{11}|}} \exp\left(-\frac{1}{2}\mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1\right). \end{align} \]

This is the multivariate normal distribution of \(\mathbf{x}_1\) with the covariance matrix \(\Sigma_{11}\) that is obtained from \(\Sigma\) by discarding the terms involving elements related to \(\mathbf{x}_2\). Therefore, marginal distributions of a multivariate normal distribution are normal distributions.

Comments

Popular posts from this blog

Open sets and closed sets in \(\mathbb{R}^n\)

Euclidean spaces

Newton's method