Marginal distributions of the multinomial normal distribution
Marginal distributions of a multivariate normal distribution are also normal distributions. Let's prove this.
See also: Multivariate normal distribution [Wikipedia]
The density function of a multivariate normal distribution is given as
\[ f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\exp\left[-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{\top}\Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu})\right] \]
where \(\mathbf{x}\in\mathbb{R}^n\) is the random vector, \(\boldsymbol{\mu}\) is the mean vector and \(\Sigma\) is the covariance matrix. By changing the variables \(\mathbf{x} - \boldsymbol{\mu} \mapsto \mathbf{x}\), we can assume the mean is zero without losing generality. So, in the following, we only consider
\[ f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\exp\left[-\frac{1}{2}\mathbf{x}^{\top}\Sigma^{-1}\mathbf{x}\right]. \]
We need the following theorems from linear algebra.
Theorem 1
Let \(A\) be an \(n\times n\) regular, \(D\) be \(m\times m\) regular, \(B\) be \(n\times m\), and \(D\) be \(m\times n\) matrices, and \(S = D - CA^{-1}B\) be regular. Then,
\[ \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} A^{-1} + A^{-1}BS^{-1}CA^{-1} & -A^{-1}BS^{-1}\\ -S^{-1}CA^{-1} & S^{-1} \end{bmatrix} \tag{eq:1} \]
Remark. Perhaps it is easier to memorize if we write in the following way:
\[ \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}^{-1} = \begin{bmatrix} A_{11}^{-1} + A_{11}^{-1}A_{12}S_{22}^{-1}A_{21}A_{11}^{-1} & -A_{11}^{-1}A_{12}S_{22}^{-1}\\ -S_{22}^{-1}A_{21}A_{11}^{-1} & S_{22}^{-1} \end{bmatrix} \]
with \(S_{22} = A_{22} - A_{21}A_{11}^{-1}A_{12}\). □
Proof. Let's define the following matrices:
\[ \begin{align*} F &= \begin{bmatrix} A & B\\ C & D \end{bmatrix},\\ W &= \begin{bmatrix} I_n & O_{n,m}\\ -CA^{-1} & I_m \end{bmatrix},\\ U &= \begin{bmatrix} I_n & -BS^{-1}\\ O_{m,n} & I_m \end{bmatrix}. \end{align*} \]
Here, \(I_n\) is the \(n\times n\) identity matrix and \(O_{n,m}\) is the \(n\times m\) zero matrix. Then,
\[ \begin{align*} G & = WF = \begin{bmatrix} A & B\\ O_{m,n} & S \end{bmatrix},\\ H &= UG = UWF = \begin{bmatrix} A & O_{n,m}\\ O_{m,n} & S \end{bmatrix}. \end{align*} \]
Therefore,
\[ H^{-1} = F^{-1}W^{-1}U^{-1}. \]
Hence,
\[F^{-1} = H^{-1}UW.\]
But
\[ H^{-1}= \begin{bmatrix} A^{-1} & O_{n,m}\\ O_{m,n} & S^{-1} \end{bmatrix}. \]
Thus, we obtain (eq:1). ■
Theorem 2
With the same setting as Theorem 1, we have the matrix determinant
\[ \begin{align} \begin{vmatrix} A & B\\ C & D \end{vmatrix} &= |A|\cdot |D - CA^{-1}B|\end{align} \]
Proof. We use the same notation as in the proof of Theorem 1.
Noting \(G = WF\) and \(|W| = 1\), we have \(|F| = |G|\). But \(G\) is block triangular, so
\[ |G| = |A|\cdot|S| = |A|\cdot|D - CA^{-1}B|. \]
■
Now, let's go back to our problem. First, split the variable \(\mathbf{x}\) into two blocks:
\[ \mathbf{x} = \begin{bmatrix} \mathbf{x}_1\\ \mathbf{x}_2 \end{bmatrix} \]
where \(\mathbf{x}_1\in \mathbb{R}^p\) and \(\mathbf{x}_2 \in \mathbb{R}^q\) with \(p + q = n\). We consider integrating \(\mathbf{x}_2\) out to get the marginal distribution
\[ f(\mathbf{x}_1) = \int_{\mathbb{R}^q} f(\mathbf{x}_1, \mathbf{x}_2)\,d\mathbf{x}_2. \]
We also split the covariance matrix into blocks as
\[ \Sigma = \begin{bmatrix} \Sigma_{11} & \Sigma_{12}\\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} \]
where \(\Sigma_{11}\) is \(p\times p\), \(\Sigma_{22}\) is \(q\times q\), \(\Sigma_{12} = \Sigma_{21}^\top\) are \(p\times q\) matrices.
By Theorem 1, we have
\[\Sigma^{-1} = \begin{bmatrix} \Sigma_{11}^{-1} + \Sigma_{11}^{-1}\Sigma_{12}S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1} & - \Sigma_{11}^{-1}\Sigma_{12}S^{-1}\\ -S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1} & S^{-1} \end{bmatrix} \]
where \(S = \Sigma_{22} - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\Sigma_{12}\). Thus,
\[ \begin{align*} \mathbf{x}^{\top}\Sigma^{-1}\mathbf{x} &= \mathbf{x}_1^\top(\Sigma_{11}^{-1} + \Sigma_{11}^{-1}\Sigma_{12}S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1})\mathbf{x}_{1} -2\mathbf{x}_1^{\top}\Sigma_{11}^{-1}\Sigma_{12}S^{-1}\mathbf{x}_2 + \mathbf{x}_2^{\top}S^{-1}\mathbf{x}_2\\ &= \mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1\\ & ~~ + \mathbf{x}_1^\top\Sigma_{11}^{-1}\Sigma_{12}S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1 - \mathbf{x}_1^{\top}\Sigma_{11}^{-1}\Sigma_{12}S^{-1}\mathbf{x}_2\\ & ~~ -\mathbf{x}_2^{\top}S^{-1}\Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1 + \mathbf{x}_2^{\top}S^{-1}\mathbf{x}_2\\ &= \mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1\\ & ~~ -\mathbf{x}_1^\top\Sigma_{11}^{-1}\Sigma_{12}S^{-1}(\mathbf{x}_2 - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1)\\ & ~~ + \mathbf{x}_2^{\top}S^{-1}(\mathbf{x}_2 - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1)\\ &=\mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1 +(\mathbf{x}_2 - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1)^{\top}S^{-1}(\mathbf{x}_2 - \Sigma_{12}^{\top}\Sigma_{11}^{-1}\mathbf{x}_1) \end{align*} \]
By Theorem 2,
\[|\Sigma|=|\Sigma_{11}|\cdot|S|\]
Thus,
\[ \begin{align} \frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\int_{\mathbb{R}^q}\exp\left(-\frac{1}{2}\mathbf{x}^{\top}\Sigma^{-1}\mathbf{x}\right)\,d\mathbf{x}_2 &= \frac{\sqrt{(2\pi)^q|S|}}{\sqrt{(2\pi)^n|\Sigma|}}\exp\left(-\frac{1}{2}\mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1\right)\\ &= \frac{1}{\sqrt{(2\pi)^p|\Sigma_{11}|}} \exp\left(-\frac{1}{2}\mathbf{x}_1^\top\Sigma_{11}^{-1}\mathbf{x}_1\right). \end{align} \]
This is the multivariate normal distribution of \(\mathbf{x}_1\) with the covariance matrix \(\Sigma_{11}\) that is obtained from \(\Sigma\) by discarding the terms involving elements related to \(\mathbf{x}_2\). Therefore, marginal distributions of a multivariate normal distribution are normal distributions.
Comments
Post a Comment