2  Inference for Multivariate Normal Linear Models

2.1 Quadratic Forms and Projections

Lemma 2.1 (Projection Matrix Quadratic Form) Let \(X \sim N_n(\mu, \sigma^2 I_n)\), and \(P\) be a projection matrix of \(\text{rank } m\). Then \[(X - \mu)' P (X - \mu) \sim \sigma^2 \chi^2(m)\]

Proof. If \(P\) is a projection matrix, then there exists an orthogonal matrix \(\Gamma : n \times n\), s.t. \[P = \Gamma \begin{pmatrix} I_m & 0 \\ 0 & 0 \end{pmatrix} \Gamma'; \quad m < n.\]

Now consider \(\underbrace{Y}_{n \times p} = \underbrace{X}_{n \times q} \underbrace{B}_{q \times p} + \underbrace{U}_{n \times p}\), and assume \(\text{rank}(X) = q \leq n\) and \(U \sim MN(0, I_n, \Sigma)\)

We consider general linear hypotheses of the form: \(H_0 : \underbrace{C}_{g \times q} \underbrace{B}_{q \times p} \underbrace{M}_{p \times r} = \underbrace{D}_{g \times r}\) with

  • \(\text{rank}(C) = g \leq q\)

  • \(\text{rank}(M) = r \leq p\)

Example 2.1 (Global effect of a predictor)  

  • \(H_0 : B_{j,\cdot} = 0\)
    • Let \(C = e'_j = (0, 0, 1, 0 \cdots, 0) \in \mathbb{R}^{1 \times q}\)
    • Note: \(e'_j B = B_{j,\cdot}\)
    • Extension to multiple predictors

Example 2.2 (Equality of effects across outcomes)  

  • \(H_0 : B_{j1} = B_{j2} = \cdots = B_{jp}\)
    • Let \(C = e'_j\) (variable selector)
    • Let \(M = \begin{pmatrix} I_{p-1} \\ -\mathbf{1}'_{p-1} \end{pmatrix}\)
    • This results in the system: \(\begin{bmatrix} B_{j1} - B_{jp} \\ B_{j2} - B_{jp} \\ \vdots \end{bmatrix} = 0\)

Example 2.3 (MANOVA)  

  • \(H_0 : \mu_1 = \mu_2 = \cdots = \mu_k\)

Example 2.4 (Linear Combinations)  

  • \(H_0 : a'B = 0\) or \(H_0 : a'Bm = 0\)

2.2 The Likelihood Ratio Test

Now we move into a distributional setting. Let \(X_1, X_2, \dots, X_n\) be a random sample from a distribution \(f(x \mid \theta)\), where \(\theta \subset \Theta \subset \mathbb{R}^k\). Consider testing hypotheses structured as

\[H_0 : \theta \in \Theta_0, \quad H_1 : \theta \in \Theta_1\]

Definition 2.1 (Likelihood Ratio Statistic) The likelihood ratio statistic for testing \(H_0\) against \(H_1\) is defined as \[\lambda(x) = \frac{\max_{\theta \in \Theta_0} f(x_1, \dots, x_n \mid \theta)}{\max_{\theta \in \Theta_1} f(x_1, \dots, x_n \mid \theta)}\]

Definition 2.2 (Likelihood Ratio Test (LRT)) The likelihood ratio test (LRT) of size \(\alpha\) has rejection region \(R = \{x : \lambda(x) < c\}\), where \(c\) is determined so that \(\sup_{\theta \in \Theta_0} [\Pr(x \in R)] = \alpha\)

If \(\Theta_0 \subset \mathbb{R}^q\), \(\Theta_1 \subset \mathbb{R}^k\), and \(\Theta_0 \subset \Theta_1\), (Wilks, 1938) showed that, for any \(\theta \in \Theta_0\)

\[-2 \log \lambda(x) \xrightarrow{L} \chi^2(v = k - q)\]

Theorem 2.1 (Wilks’ Theorem (Intuition)) The relationship between the log-likelihood ratio and the asymptotic \(\chi^2\) distribution can be understood via a quadratic form:

\[2(\ell(\hat{\theta}) - \ell(\theta^*)) \approx (\theta - \theta_0)' \mathcal{I}(\theta)^{-1} (\theta - \theta_0)\]

Which leads to the asymptotic result:

\[\chi^2(\alpha)\]

Theorem 2.2 (Theorem) For any matrix \(A \in \mathbb{R}^{p \times p} > 0\) (p.d), \[f(\Sigma) = |\Sigma|^{-n/2} \exp \left\{ -\frac{1}{2} \text{tr} \, \Sigma^{-1} A \right\}\] is maximized over \(\Sigma > 0\) by \(\hat{\Sigma} = \frac{1}{n} A\), and \(f(\hat{\Sigma}) = |\frac{A}{n}|^{-n/2} e^{-np/2}\)

Definition 2.3 (Profile likelihood (MNLM)) The general likelihood for the multivariate normal likelihood model is given by:

\[L(\hat{\beta}, \Sigma) = (2\pi)^{-n/2} |\Sigma|^{-n/2} \exp \left\{ -\frac{1}{2} \text{tr} \, \Sigma^{-1} (Y' Q Y) \right\}\]

\[L(\hat{\beta}, \hat{\Sigma}) = (2\pi)^{-n/2} |\hat{\Sigma}|^{-n/2} e^{-np/2}\]

For the general linear hypotheses, consider:

\[H_0 : CB = D, \quad H_1 : H_0^c.\]

The likelihoods under the two hypotheses is given by:

  • \(\max_{\Theta_0} f(y \mid B, \Sigma) = (2\pi)^{-np/2} |\hat{\Sigma}_0|^{-n/2} \exp\{-np/2\}\)
  • \(\max_{\Theta_1} f(y \mid B, \Sigma) = (2\pi)^{-np/2} |\hat{\Sigma}|^{-n/2} \exp\{-np/2\}\)

The LR statistics is \(\lambda(y) = \left\{ \frac{|\hat{\Sigma}_0|}{|\hat{\Sigma}|} \right\}^{-n/2} = \left[ \frac{|\hat{\Sigma}|}{|\hat{\Sigma}_0|} \right]^{n/2}\)

This can be rewritten into a more convenient form:

\[\lambda^{2/n} = \frac{|\hat{\Sigma}|}{|\hat{\Sigma}_0|} = \frac{|(Y - X\hat{B})'(Y - X\hat{B})|}{|(Y - X\hat{B}_0)'(Y - X\hat{B}_0)|} = \frac{|E|}{|E_0|}\]

We can estimate \(\hat{B}_0\) as

\[\begin{aligned} \hat{B}_0 &= \arg \min_{CB=D} \text{tr} \{(Y - XB)'(Y - XB)\} \\ &= \arg \min_{B, \Lambda} \text{tr} \{(Y - XB)'(Y - XB) + 2\Lambda(CB - D)\} \end{aligned}\]

This gives us two normal equations:

  1. \[X'X\hat{B}_0 - X'Y + C'\Lambda' = 0\]

  2. \[ C\hat{B}_0 = D\]

When these sets of equations are solved, the solution becomes \(\hat{B}_0 = \hat{B} - (X'X)^{-1} C' [C(X'X)^{-1} C']^{-1} (C\hat{B} - D)\)

2.3 Wilk’s U or Lambda Test

Lemma 2.2 (Under \(H_0 : CB = D\), the following holds)  

  1. \(E \sim W_p(n - q, \Sigma)\)
  2. \(H = E_0 - E \sim W_p(g, \Sigma)\)
  3. \(E\) indep of \(H\)

Consider \(\lambda^{2/n} = \frac{|E|}{|E_0|} = \frac{|E|}{|E + H|} = |I + E^{-1}H|^{-1}\)

Under \(H_0 : CB = D\), we have \(\Lambda = \frac{|E|}{|E + H|} \overset{H_0}{\sim} U(p, n - q, g)\)

  • \(\Lambda\) takes values in \((0, 1]\)

  • If \(H_0\) is true \(H\) is small

  • We reject \(H_0\) if \(\Lambda\) is small

Proposition 2.1 \(H = (C\hat{B} - D)' [C(X'X)^{-1}C']^{-1} (C\hat{B} - D)\)

Proof 2.1. To do: Show that \(H = (C\hat{B} - D)' [C(X'X)^{-1}C']^{-1} (C\hat{B} - D)\)


Now let \(A \sim W_p(m, I)\) independent of \(B \sim W_p(n, I)\), \(m > n\). We say that

\[\Lambda = \frac{|A|}{|A + B|} = |I + A^{-1}B|^{-1} \sim \Lambda(p, m, n)\]

Lemma 2.3 (Lemma) The \(\Lambda\) distribution is invariant under changes of the scale parameters of \(A\) and \(B\)

Theorem 2.3 (Lambda Product) \[\Lambda(p, m, n) \sim \prod_{i=1}^n u_i\]

with \(u_1, u_2, \dots, u_n\) mutually independent, and

\[u_i \sim Beta((m + i - p)/2, p/2)\]

Corollary 2.1 (Corollary:) If \(p = 1\), the lambda distribution reduces to an \(F\) ratio

\[\Lambda = \left[ 1 + \frac{r}{(n - q)} F_{r,(n - q)} \right]^{-1}\]

Example 2.5 (MANOVA) Consider \(Y = XB + U\), with

\[X = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ \vdots & \vdots & & \\ 1 & 0 & & \\ 0 & 1 & & \\ \vdots & \vdots & & \\ 0 & 1 & & \\ \vdots & 0 & & \end{pmatrix}, \text{ and } C = \begin{pmatrix} 1 & 0 & 0 & -1 \\ 0 & 1 & \vdots & -1 \\ \vdots & \vdots & & -1 \\ 0 & 0 & 1 & -1 \end{pmatrix}\]

and \(X : n \times q\), and \(C : (q - 1) \times q\)

Testing equality of means across \(p\) dimensions requires testing \(H_0 : CB = 0\)

This leads to the following propeties:

  • (MLE): \(\hat{B} = (X'X)^{-1}X'Y\)

  • (Unrestricted Errors) \(E = Y'QY\)

  • (Residual Error Differences) \(H = E_0 - E = (C\hat{B})'(C(X'X)^{-1}C')^{-1}(C\hat{B})\)

    • Note: \(D=0\), \(C\) is contrast matrix
  • (Wilk’s Test) \(\Lambda = \frac{|E|}{|E+H|} = |I + E^{-1}H|^{-1} \sim_{H_0} U(p, n - q, q - 1)\)

    • We reject for \(\Lambda\) close to 0

Example 2.6 Simply consider the transformation \[Y_0 = YM = XBM + UM = X\theta + \Omega\]

with \(\Omega \sim MN(0, I_n, M'\Sigma M)\) and

\[H_0 : CBM = D \quad \equiv \quad H_0 : C\theta = D\]

Example 2.7 (Assessing Parallelism in \(q\) Subpopulations) Are boys and girls experiencing similar timing in pubertal growth?

  • In the case of gender comparison (\(q=2\)): \[H_{0j} : \mu_{1j} - \mu_{1(j-1)} = \mu_{2j} - \mu_{2(j-1)}; \quad \text{for } j = 2, \dots, p\]

  • In general, test \(H_0 : CBM = 0\) with: \[\underbrace{C}_{(q-1) \times q} = \begin{bmatrix} 1 & -1 & 0 & \dots & 0 \\ 0 & 1 & -1 & \dots & 0 \\ \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & 0 & \dots & -1 \end{bmatrix}; \quad \underbrace{M}_{p \times (p-1)} = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ -1 & 1 & 0 & \dots & 0 \\ 0 & -1 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & 0 & \dots & -1 \end{bmatrix}\]

Consider the transformation of \(Y\) by the matrix \(M\): \[Y_0 = YM\] where \(Y\) is \(n \times p\) and \(M\) is \(p \times (p-1)\).

Under the model: \[Y_0 = XB + E\]

We test the null hypothesis: \[\rightarrow H_0 : CB = 0\]

Let \(H = E_0 - E\) be the difference in residual error matrices. The resulting Wilk’s \(\Lambda\) statistic is: \[\Lambda = \frac{|E|}{|E + H|} \overset{H_0}{\sim} \Lambda(p-1, 2, 2-1)\]

2.4 Union-Intersection Principle and Intersection-Union Principle

2.4.1 Union-Intersection Principle

Consider \(H_0 : \theta \in \Theta_0 = \bigcap_{\gamma \in \Gamma} \Theta_\gamma\), where \(\Gamma\) is arbitrary. Suppose that individual tests are available for \(H_{0\gamma}\), with

\[H_{0\gamma} : \theta \in \Theta_\gamma, \quad \text{vs.} \quad H_{1\gamma} : \theta \in \Theta_\gamma^c\]

Lemma 2.4 (Lemma: Rejection Region for UI Test) If the rejection region for the test of \(H_{0\gamma}\) is \(\{x : T_\gamma(x) \in R_\gamma\}\), then the rejection regions for the UI test is: \[\bigcup_{\gamma \in \Gamma} \{x : T_\gamma(x) \in R_\gamma\}\]

Corollary 2.2 (Corollary: Rejection Region for UI Test) If the rejection region for \(H_{0\gamma}\) is of the form \(\{x : T_\gamma(x) > c\}\), then: \[\bigcup_{\gamma \in \Gamma} \{x : T_\gamma(x) > c\} = \left\{ x : \sup_{\gamma \in \Gamma} T_\gamma(x) > c \right\}\]

(Note: rejecting one rejects them all)

2.4.2 Intersection-Union Principle

  • Consider \[H_0 : \theta \in \Theta_0 = \bigcup_{\gamma \in \Gamma} \Theta_\gamma, \quad \Gamma \text{ arbitrary}\]

  • Suppose that individual tests are available for \(H_{0\gamma}\), with \[H_{0\gamma} : \theta \in \Theta_\gamma, \quad \text{vs.} \quad H_{1\gamma} : \theta \in \Theta_\gamma^c\]

Lemma 2.5 (Lemma: Rejection Region for UI Test) If the rejection region for the test of \(H_{0\gamma}\) is \(\{x : T_\gamma(x) \in R_\gamma\}\), then the rejection regions for the UI test is: \[\bigcap_{\gamma \in \Gamma} \{x : T_\gamma(x) \in R_\gamma\}\]

Example 2.8 (Union-Intersection) Let \(X_1, \dots, X_n\) iid \(N(\mu, \sigma^2)\), and consider testing: \(H_0 : \mu = \mu_0 \quad \text{vs.} \quad H_1 : \mu \neq \mu_0\)

We can write the null hypothesis as an intersection: \(H_0 : \{ \mu : \mu \leq \mu_0 \} \cap \{ \mu : \mu \geq \mu_0 \}\)

  • The LRT of \(H_{0L} : \mu \leq \mu_0\) versus \(H_{1L} : \mu > \mu_0\) rejects \(H_{0L}\) if: \[\frac{\bar{X} - \mu_0}{S/\sqrt{n}} \geq t_L\]

  • The LRT of \(H_{0U} : \mu \geq \mu_0\) versus \(H_{1U} : \mu < \mu_0\) rejects \(H_{0U}\) if: \[\frac{\bar{X} - \mu_0}{S/\sqrt{n}} \leq t_U\]

  • The UI Test rejects \(H_0\) if: \[\frac{\bar{X} - \mu_0}{S/\sqrt{n}} \geq t_L \quad \text{or} \quad \frac{\bar{X} - \mu_0}{S/\sqrt{n}} \leq t_U\]

2.4.3 UI Test for General Linear Hypotheses (Roy’s Maximum Root)

We can consider testing: \[H_0 : CB = D \quad \equiv \quad \bigcap_{\ell} CB\ell = D\ell, \quad \forall \ell \in \mathbb{R}^p\]

Each \(H_{\ell}^0 : CB\ell = D\ell\) is univariate with F ratio:

\[F_{\ell} = \frac{\ell' H\ell / r}{\ell' E\ell / (n - q)}\]

This leads to the rejection region:

\[\bigcup_{\ell} \{y : F_{\ell} > k\} = \{y : \sup_{\ell} F_{\ell} > k\} = \{y : \phi_{\max} > \phi_{\alpha}\}\]

where \(\phi_{\max}\) is the largest eigenvalue of \(HE^{-1}\).

Remark 2.1. \(E^{-1}H\) gives the full information we need. Under \(H_0: E_0-E\)

\[\Lambda = |I + E^{-1}H|^{-1} = \prod_{i=1}^{p} \frac{1}{1 + \lambda_i}\]

where \(\lambda_1, \dots, \lambda_p\) are eigenvalues of \(E^{-1}H\).

Now let \(\lambda_1, \dots, \lambda_p\) be the eigenvalues of \(E^{-1}H\). We have the following test statistics:

  • Wilk’s Lambda: \(\prod_{i} (1 + \lambda_i)^{-1}\)

  • Pillai’s Trace: \(\sum_{i} \frac{\lambda_i}{1 + \lambda_i}\)

  • Hotelling-Lawley Trace: \(\sum_{i} \lambda_i\)

  • Roy’s Largest Root: \(\phi = \max \lambda_i\)

We can construct simultaneous confidence regions for linear combinations \(AB\) using significance levels from the null \(H_0 : AB = 0\).

  • Null Hypothesis: \[H_0 : \bigcap_{a,b} a'ABb = 0\]

  • Acceptance Region: \[\bigcap_{a,b} \{y : F_{a,b} \leq k\} = \{y : \sup_{a,b} F_{a,b} \leq k\} = \{y : \phi_{\max} \leq \phi_{\alpha}\}\]

  • Test Statistic: \[F_{a,b} = \frac{(a'\hat{A}Bb)^2}{(a'(X'X)^{-1}a)(b'Eb)}\]

  • Confidence Level Derivation: \[(1 - \alpha) = Pr\{\phi_{\max} \leq \phi_{\alpha}\} =\] \[= Pr \left[ |a'A(\hat{B} - B)b| \leq \{\phi_{\alpha} (a'(X'X)^{-1}a)(b'Eb)\}^{1/2}, \forall a,b \right]\]

  • Resulting \((1 - \alpha)\) confidence regions for \(a'ABb\): \[a'\hat{A}Bb \pm \{\phi_{\alpha} (a'(X'X)^{-1}a)(b'Eb)\}^{1/2}\]