1  The Multivariate Normal Linear Model

1.1 Setting

Definition 1.1 (Multivariate Linear Model) Let \(Y:n \times p\) be a random matrix of \(n\) uncorrelated \(p\)-variate observations with common covariance \(\Sigma\). The Multivariate Linear Model (MLM) is expressed as follows:

\[E[Y] = XB\]

\[\text{Cov}(Y) = I_n \otimes \Sigma\]

Where \(B \in \mathbb{R}^{q \times p}\) and \(X \in \mathbb{R}^{n \times q}\)

1.2 Joint PDF of a Multivariate Normal Sample

Definition 1.2  

  • Let \(X_1, X_2, \dots, X_n \overset{iid}{\sim} N_p(\mu, \Sigma)\), with \(\Sigma > 0\) (where \(\Sigma\) is pos-def).
  • Let \(\bar{X} = \frac{1}{n} \sum_i X_i, \quad S = \sum_i (X_i - \bar{X})(X_i - \bar{X})', \quad T = \sum_i X_i X_i'\)
  • The joint pdf of \(X_1, X_2, \dots, X_n\) is:

\[\begin{aligned} f(x_1, x_2, \dots, x_n) &= \prod_{i=1}^n (2\pi)^{-p/2} |\Sigma|^{-1/2} \exp \left\{ -\frac{1}{2} (x_i - \mu)' \Sigma^{-1} (x_i - \mu) \right\} \\ &= (2\pi)^{-np/2} |\Sigma|^{-n/2} \exp \left\{ -\frac{n}{2} (\bar{x} - \mu)' \Sigma^{-1} (\bar{x} - \mu) - \frac{1}{2} \text{tr}(\Sigma^{-1}s) \right\} \\ &= (2\pi)^{-np/2} |\Sigma|^{-n/2} \exp \left\{ -\frac{n}{2} \mu' \Sigma^{-1} \mu + n\bar{x}' \Sigma^{-1} \mu - \frac{1}{2} \text{tr}(\Sigma^{-1}t) \right\} \end{aligned}\]

  • \((\bar{X}, S)\), and \((\bar{X}, T)\) are equivalent representations of the minimal sufficient statistics for \((\mu, \Sigma)\)

1.3 Chi-square Distribution

Lemma 1.1 Let \(Z \sim N_n(0, I_n)\), then \[X_n^2 = Z'Z \sim \text{Gamma}(n/2, 1/2) \sim \chi_n^2\] is a Chi-square distribution with \(n\) degrees of freedom, denoted \(X_n^2 \sim \chi^2(n)\).

Corollary 1.1 If \(Y \sim N_n(\mu, \Sigma)\), then \[(Y - \mu)' \Sigma^{-1} (Y - \mu) \sim \chi^2(n)\]

Without rescaling:

\[(Y - \mu)'(Y - \mu) = Z'\Sigma Z = Z' \Gamma D_\lambda \Gamma' Z = (\Gamma' Z)' D_\lambda (\Gamma' Z)\]

Lemma 1.2 (Non-central Chi-square Distribution) Let \(Z \sim N_n(\mu, I_n), \mu \in \mathbb{R}^n\), then \[X^2 = Z'Z \sim \chi^2(n, \delta = \mu'\mu)\] is called a non-central Chi-square distribution with non-centrality parameter \(\delta = \|\mu\|^2 = \mu'\mu\).

Corollary 1.2 (Moments of Non-central \(\chi^2\)) If \(X \sim \chi^2(n, \delta)\), then \[\begin{aligned} E(X) &= n + \delta \\ Var(X) &= 2n + 4\delta \end{aligned}\]

Lemma 1.3 (Poisson Mixture Representation) Let \(K \sim \text{Poiss}(\delta/2)\), and \(X \mid k \sim \chi^2(n + 2k)\), then \[X \sim \chi^2(n, \delta)\]

1.4 Wishart Distribution

Definition 1.3 (Wishart Distribution) Let \(X_1, X_2, \dots, X_n \overset{iid}{\sim} N_p(0, \Sigma)\), or equivalently consider a matrix \(X : n \times p\) with independent \(N_p(0, \Sigma)\) rows:

\[S = X'X = \sum_{i} X_i X_i' \sim W_p(n, \Sigma)\]

  • \(S\) is said to follow a Wishart distribution with degrees of freedom \(n\) and scale matrix \(\Sigma\).
  • \(S : p \times p\) is a random, symmetric, positive semi-definite (psd) matrix with \(E(S) = n\Sigma\).
  • When \(p = 1, \Sigma = \sigma^2\), then \(W_1(n, \sigma^2) \overset{d}{=} \sigma^2 \chi^2(n)\).

Lemma 1.4 (Preservation under Linear Transformation) Let \(S \sim W_p(n, \Sigma)\). For a matrix \(A : q \times p\), \[ASA' \sim W_q(n, A\Sigma A')\]

Lemma 1.5 (Additivity of Wishart Distributions) Let \(S_1 \sim W_p(n_1, \Sigma)\) independent of \(S_2 \sim W_p(n_2, \Sigma)\), then \[S_1 + S_2 \sim W_p(n_1 + n_2, \Sigma)\]

1.4.1 Wishart Density and Singularity

Proposition 1.1 (Rank of a Random Matrix) Let \(X : n \times p\) be a random matrix with pdf absolutely continuous wrt. the Lebesgue measure on \(\mathbb{R}^{n \times p}\). If \(n \geq p\), then: \[\text{Pr}[\text{rank}(X) = p] = 1 \quad \Leftrightarrow \quad \text{Pr}[S = X'X > 0] = 1\]

Lemma 1.6 (Non-singularity \(\equiv\) Positive Definiteness) Let \(S = X'X\), with \(X_1, X_2, \dots, X_n \overset{iid}{\sim} N_p(0, \Sigma)\). \(S > 0\) with probability \(1\) iff \(\Sigma > 0\) and \(n \geq p\).

Definition 1.4 (Non-Singular Wishart Density) Let \(W : p \times p\) be a symmetric random matrix, s.t. \(\text{Pr}[W > 0] = 1\). If \(m > p\), then \(W\) follows a non-singular Wishart distribution with \(m\) degrees of freedom if the joint density of the \(p(p+1)/2\) distinct elements of \(W\) is:

\[f(w_{11}, w_{12}, \dots, w_{pp}) = c |W|^{\frac{m-p-1}{2}} \exp \left\{ -\frac{1}{2} \text{tr}(\Sigma^{-1} W) \right\}\]

where the normalizing constant \(c\) is given by: \[c^{-1} = 2^{mp/2} |\Sigma|^{m/2} \Gamma_p \left( \frac{m}{2} \right)\]

and the multivariate gamma function is: \[\Gamma_p \left( \frac{m}{2} \right) = \pi^{\frac{p(p-1)}{4}} \prod_{j=1}^p \Gamma \{ (m + 1 - j)/2 \}\]

1.5 Kronecker Product

Definition 1.5 (Kronecker Product) The Kronecker product of a matrix \(A : p \times q\) and \(B : m \times n\) is the \((pm) \times (qn)\) matrix:

\[A \otimes B = \begin{pmatrix} a_{11}B & \cdots & a_{1q}B \\ \vdots & \ddots & \vdots \\ a_{p1}B & \cdots & a_{pq}B \end{pmatrix}\]

Remark 1.1 (The “Circle-Cross” Confusion). Sometimes you may see \(\widetilde{\otimes}\). This is used to represent: \[A \widetilde{\otimes} B = \begin{bmatrix} Ab_{11} & Ab_{12} & \cdots \\ & \ddots & \\ & & Ab_{mn} \end{bmatrix}\] This is NOT the same as the Kronecker product (\(A \otimes B\)).

1.5.1 Key Property for Vectorization

Example 1.1 (Covariance of Vectorized Sample) If \(X_1, X_2, \dots, X_n \overset{iid}{\sim} N_p(0, \Sigma)\), then the \(np\)-vector \(\text{Vec}(X) = (X_1, X_2, \dots, X_n)'\) has covariance:

\[\text{Cov}(\text{Vec}(X)) = I_n \otimes \Sigma = \begin{pmatrix} \Sigma & & 0 \\ & \ddots & \\ 0 & & \Sigma \end{pmatrix}\]

1.6 Matrix Normal Distribution

Definition 1.6 (Matrix Normal Distribution) A random matrix \(X : n \times p\) follows the matrix Normal distribution, with mean \(M : n \times p\), column-covariance \(\Sigma : p \times p\), and row-covariance \(\Omega : n \times n\), iff:

\[\text{Vec}(X) = (X_1, X_2, \dots, X_n)' \sim N_{np}(\text{Vec}(M), \Omega \otimes \Sigma)\]

where \(X_i\) is a \(1 \times p\) vector. The joint pdf of \(X\) is expressed as: \[f(x) = \frac{\exp \left\{ -\frac{1}{2} \text{tr}[\Sigma^{-1}(x - M)' \Omega^{-1}(x - M)] \right\}}{(2\pi)^{np/2} |\Omega|^{p/2} |\Sigma|^{n/2}}\]

Lemma 1.7 (Multivariate Normal Linear Model) If we assume \(Y_1, Y_2, \dots, Y_n\) are normally distributed, the multivariate normal linear model may be expressed as:

\[Y \sim MN(XB, I_n, \Sigma)\]

Covariance Structure: \[\text{Cov}(Y) = I_n \otimes \Sigma\]

We assume that the design matrix \(X \in \mathbb{R}^{n \times p}\) is of full rank \(p < n\), so that \(X'X\) is non-singular and \(B\) is identifiable.

Lemma 1.8 (Existence and Form of MLE) The MLE \((\hat{B}, \hat{\Sigma})\) exists with probability 1, iff \(n - q \geq p\), and is given by:

\[\begin{aligned} \hat{B} &= (X'X)^{-1} X'Y \\ \hat{\Sigma} &= \frac{1}{n} Y' ( I_n - X(X'X)^{-1} X' ) Y = \frac{1}{n} Y' Q Y \end{aligned}\]

where \(Q = I_n - X(X'X)^{-1} X'\) is the projection matrix onto the orthogonal complement of the column space of \(X\).

1.7 Joint Sampling Distribution of MLEs

Theorem 1.1 (Joint Sampling Distribution of \((\hat{B}, \hat{\Sigma})\)) Let \(Y \sim MN(XB, I_n, \Sigma)\). Considering the joint transformation of the data through the design matrix \(X\) and the projection matrix \(Q = I_n - X(X'X)^{-1}X'\):

\[\begin{aligned} Y' \begin{bmatrix} X & Q \end{bmatrix} &\sim MN\left( B'X' \begin{bmatrix} X & Q \end{bmatrix}, \Sigma, \begin{bmatrix} X' \\ Q \end{bmatrix} \begin{bmatrix} X & Q \end{bmatrix} \right) \\ &\sim MN\left( \begin{bmatrix} B'X'X & 0 \end{bmatrix}, \Sigma, \begin{bmatrix} X'X & 0 \\ 0 & Q \end{bmatrix} \right) \end{aligned}\]

Remark 1.2 (Marginal Distributions:).

  • \(Y'X \sim MN(B'X'X, \Sigma, X'X)\)
  • \(Y'Q \sim MN(0, \Sigma, Q)\)

Independence Property: \[Y'X \perp\!\!\!\perp Y'Q\]

Theorem 1.2 (Sampling Distribution of \((\hat{B}, \hat{\Sigma})\)) In the Multivariate Normal Linear Model (MNLM), where \(Y \sim MN(XB, I_n, \Sigma)\) and \(Q = I_n - X(X'X)^{-1}X'\), the estimators have the following sampling properties:

  1. Distribution of \(\hat{B}\): \[\hat{B} = (X'X)^{-1}X'Y \sim MN(B, (X'X)^{-1}, \Sigma)\]

  2. Distribution of \(\hat{\Sigma}\): \[n\hat{\Sigma} = Y'QY \sim W_p(n - q, \Sigma)\]

  3. Independence: \[\hat{B} \perp\!\!\!\perp \hat{\Sigma}\]

1.8 Sampling Distribution of \(\hat{B}\)

Theorem 1.3 (Distribution and Properties of the Least Squares Estimator) In the Multivariate Normal Linear Model (MNLM), the Maximum Likelihood Estimator (MLE) for the coefficient matrix \(B\) is:

\[\hat{B} = (X'X)^{-1}X'Y \sim MN(B, (X'X)^{-1}, \Sigma)\]

where \(\hat{B}\) is a \(q \times p\) matrix.

Proof (Expected Value (Unbiasedness)). Using \(E[Y] = XB\): \[\begin{aligned} E[\hat{B}] &= (X'X)^{-1}X' E[Y] \\ &= (X'X)^{-1}X'XB = B \end{aligned}\]

1.8.1 Linear Transformations of Matrix Normal Variables

Based on the sampling distribution of \(\hat{B}\), we can derive properties for linear combinations of the coefficients. Let \(B \sim MN(M, \Omega, \Sigma)\) with dimensions \(B: q \times p\), \(\Omega: q \times q\), and \(\Sigma: p \times p\).

Lemma 1.9 (Left and Right Linear Transformations)  

  1. Left Transformation (Row-wise): For a constant matrix \(A: m \times q\), the product \(AB\) remains Matrix Normal: \[AB \sim MN(AM, A\Omega A', \Sigma)\] (Note: This is often used when testing linear hypotheses about treatment effects across different groups.)

  2. Right Transformation (Column-wise): For a constant matrix \(C: p \times m\), the product \(BC\) remains Matrix Normal: \[BC \sim MN(MC, \Omega, C'\Sigma C)\]