2 Inference for Multivariate Normal Linear Models
2.1 Quadratic Forms and Projections
Lemma 2.1 (Projection Matrix Quadratic Form) Let \(X \sim N_n(\mu, \sigma^2 I_n)\), and \(P\) be a projection matrix of \(\text{rank } m\). Then \[(X - \mu)' P (X - \mu) \sim \sigma^2 \chi^2(m)\]
Proof. If \(P\) is a projection matrix, then there exists an orthogonal matrix \(\Gamma : n \times n\), s.t. \[P = \Gamma \begin{pmatrix} I_m & 0 \\ 0 & 0 \end{pmatrix} \Gamma'; \quad m < n.\]
Now consider \(\underbrace{Y}_{n \times p} = \underbrace{X}_{n \times q} \underbrace{B}_{q \times p} + \underbrace{U}_{n \times p}\), and assume \(\text{rank}(X) = q \leq n\) and \(U \sim MN(0, I_n, \Sigma)\)
We consider general linear hypotheses of the form: \(H_0 : \underbrace{C}_{g \times q} \underbrace{B}_{q \times p} \underbrace{M}_{p \times r} = \underbrace{D}_{g \times r}\) with
\(\text{rank}(C) = g \leq q\)
\(\text{rank}(M) = r \leq p\)
Example 2.1 (Global effect of a predictor)
- \(H_0 : B_{j,\cdot} = 0\)
- Let \(C = e'_j = (0, 0, 1, 0 \cdots, 0) \in \mathbb{R}^{1 \times q}\)
- Note: \(e'_j B = B_{j,\cdot}\)
- Extension to multiple predictors
Example 2.2 (Equality of effects across outcomes)
- \(H_0 : B_{j1} = B_{j2} = \cdots = B_{jp}\)
- Let \(C = e'_j\) (variable selector)
- Let \(M = \begin{pmatrix} I_{p-1} \\ -\mathbf{1}'_{p-1} \end{pmatrix}\)
- This results in the system: \(\begin{bmatrix} B_{j1} - B_{jp} \\ B_{j2} - B_{jp} \\ \vdots \end{bmatrix} = 0\)
Example 2.3 (MANOVA)
- \(H_0 : \mu_1 = \mu_2 = \cdots = \mu_k\)
Example 2.4 (Linear Combinations)
- \(H_0 : a'B = 0\) or \(H_0 : a'Bm = 0\)
2.2 The Likelihood Ratio Test
Now we move into a distributional setting. Let \(X_1, X_2, \dots, X_n\) be a random sample from a distribution \(f(x \mid \theta)\), where \(\theta \subset \Theta \subset \mathbb{R}^k\). Consider testing hypotheses structured as
\[H_0 : \theta \in \Theta_0, \quad H_1 : \theta \in \Theta_1\]
Definition 2.1 (Likelihood Ratio Statistic) The likelihood ratio statistic for testing \(H_0\) against \(H_1\) is defined as \[\lambda(x) = \frac{\max_{\theta \in \Theta_0} f(x_1, \dots, x_n \mid \theta)}{\max_{\theta \in \Theta_1} f(x_1, \dots, x_n \mid \theta)}\]
Definition 2.2 (Likelihood Ratio Test (LRT)) The likelihood ratio test (LRT) of size \(\alpha\) has rejection region \(R = \{x : \lambda(x) < c\}\), where \(c\) is determined so that \(\sup_{\theta \in \Theta_0} [\Pr(x \in R)] = \alpha\)
If \(\Theta_0 \subset \mathbb{R}^q\), \(\Theta_1 \subset \mathbb{R}^k\), and \(\Theta_0 \subset \Theta_1\), (Wilks, 1938) showed that, for any \(\theta \in \Theta_0\)
\[-2 \log \lambda(x) \xrightarrow{L} \chi^2(v = k - q)\]
Theorem 2.1 (Wilks’ Theorem (Intuition)) The relationship between the log-likelihood ratio and the asymptotic \(\chi^2\) distribution can be understood via a quadratic form:
\[2(\ell(\hat{\theta}) - \ell(\theta^*)) \approx (\theta - \theta_0)' \mathcal{I}(\theta)^{-1} (\theta - \theta_0)\]
Which leads to the asymptotic result:
\[\chi^2(\alpha)\]
Theorem 2.2 (Theorem) For any matrix \(A \in \mathbb{R}^{p \times p} > 0\) (p.d), \[f(\Sigma) = |\Sigma|^{-n/2} \exp \left\{ -\frac{1}{2} \text{tr} \, \Sigma^{-1} A \right\}\] is maximized over \(\Sigma > 0\) by \(\hat{\Sigma} = \frac{1}{n} A\), and \(f(\hat{\Sigma}) = |\frac{A}{n}|^{-n/2} e^{-np/2}\)
Definition 2.3 (Profile likelihood (MNLM)) The general likelihood for the multivariate normal likelihood model is given by:
\[L(\hat{\beta}, \Sigma) = (2\pi)^{-n/2} |\Sigma|^{-n/2} \exp \left\{ -\frac{1}{2} \text{tr} \, \Sigma^{-1} (Y' Q Y) \right\}\]
\[L(\hat{\beta}, \hat{\Sigma}) = (2\pi)^{-n/2} |\hat{\Sigma}|^{-n/2} e^{-np/2}\]
For the general linear hypotheses, consider:
\[H_0 : CB = D, \quad H_1 : H_0^c.\]
The likelihoods under the two hypotheses is given by:
- \(\max_{\Theta_0} f(y \mid B, \Sigma) = (2\pi)^{-np/2} |\hat{\Sigma}_0|^{-n/2} \exp\{-np/2\}\)
- \(\max_{\Theta_1} f(y \mid B, \Sigma) = (2\pi)^{-np/2} |\hat{\Sigma}|^{-n/2} \exp\{-np/2\}\)
The LR statistics is \(\lambda(y) = \left\{ \frac{|\hat{\Sigma}_0|}{|\hat{\Sigma}|} \right\}^{-n/2} = \left[ \frac{|\hat{\Sigma}|}{|\hat{\Sigma}_0|} \right]^{n/2}\)
This can be rewritten into a more convenient form:
\[\lambda^{2/n} = \frac{|\hat{\Sigma}|}{|\hat{\Sigma}_0|} = \frac{|(Y - X\hat{B})'(Y - X\hat{B})|}{|(Y - X\hat{B}_0)'(Y - X\hat{B}_0)|} = \frac{|E|}{|E_0|}\]
We can estimate \(\hat{B}_0\) as
\[\begin{aligned} \hat{B}_0 &= \arg \min_{CB=D} \text{tr} \{(Y - XB)'(Y - XB)\} \\ &= \arg \min_{B, \Lambda} \text{tr} \{(Y - XB)'(Y - XB) + 2\Lambda(CB - D)\} \end{aligned}\]
This gives us two normal equations:
\[X'X\hat{B}_0 - X'Y + C'\Lambda' = 0\]
\[ C\hat{B}_0 = D\]
When these sets of equations are solved, the solution becomes \(\hat{B}_0 = \hat{B} - (X'X)^{-1} C' [C(X'X)^{-1} C']^{-1} (C\hat{B} - D)\)
2.3 Wilk’s U or Lambda Test
Lemma 2.2 (Under \(H_0 : CB = D\), the following holds)
- \(E \sim W_p(n - q, \Sigma)\)
- \(H = E_0 - E \sim W_p(g, \Sigma)\)
- \(E\) indep of \(H\)
Consider \(\lambda^{2/n} = \frac{|E|}{|E_0|} = \frac{|E|}{|E + H|} = |I + E^{-1}H|^{-1}\)
Under \(H_0 : CB = D\), we have \(\Lambda = \frac{|E|}{|E + H|} \overset{H_0}{\sim} U(p, n - q, g)\)
\(\Lambda\) takes values in \((0, 1]\)
If \(H_0\) is true \(H\) is small
We reject \(H_0\) if \(\Lambda\) is small
Proposition 2.1 \(H = (C\hat{B} - D)' [C(X'X)^{-1}C']^{-1} (C\hat{B} - D)\)
Proof 2.1. To do: Show that \(H = (C\hat{B} - D)' [C(X'X)^{-1}C']^{-1} (C\hat{B} - D)\)
Now let \(A \sim W_p(m, I)\) independent of \(B \sim W_p(n, I)\), \(m > n\). We say that
\[\Lambda = \frac{|A|}{|A + B|} = |I + A^{-1}B|^{-1} \sim \Lambda(p, m, n)\]
Lemma 2.3 (Lemma) The \(\Lambda\) distribution is invariant under changes of the scale parameters of \(A\) and \(B\)
Theorem 2.3 (Lambda Product) \[\Lambda(p, m, n) \sim \prod_{i=1}^n u_i\]
with \(u_1, u_2, \dots, u_n\) mutually independent, and
\[u_i \sim Beta((m + i - p)/2, p/2)\]
Corollary 2.1 (Corollary:) If \(p = 1\), the lambda distribution reduces to an \(F\) ratio
\[\Lambda = \left[ 1 + \frac{r}{(n - q)} F_{r,(n - q)} \right]^{-1}\]
Example 2.5 (MANOVA) Consider \(Y = XB + U\), with
\[X = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ \vdots & \vdots & & \\ 1 & 0 & & \\ 0 & 1 & & \\ \vdots & \vdots & & \\ 0 & 1 & & \\ \vdots & 0 & & \end{pmatrix}, \text{ and } C = \begin{pmatrix} 1 & 0 & 0 & -1 \\ 0 & 1 & \vdots & -1 \\ \vdots & \vdots & & -1 \\ 0 & 0 & 1 & -1 \end{pmatrix}\]
and \(X : n \times q\), and \(C : (q - 1) \times q\)
Testing equality of means across \(p\) dimensions requires testing \(H_0 : CB = 0\)
This leads to the following propeties:
(MLE): \(\hat{B} = (X'X)^{-1}X'Y\)
(Unrestricted Errors) \(E = Y'QY\)
(Residual Error Differences) \(H = E_0 - E = (C\hat{B})'(C(X'X)^{-1}C')^{-1}(C\hat{B})\)
- Note: \(D=0\), \(C\) is contrast matrix
(Wilk’s Test) \(\Lambda = \frac{|E|}{|E+H|} = |I + E^{-1}H|^{-1} \sim_{H_0} U(p, n - q, q - 1)\)
- We reject for \(\Lambda\) close to 0
Example 2.6 Simply consider the transformation \[Y_0 = YM = XBM + UM = X\theta + \Omega\]
with \(\Omega \sim MN(0, I_n, M'\Sigma M)\) and
\[H_0 : CBM = D \quad \equiv \quad H_0 : C\theta = D\]
Example 2.7 (Assessing Parallelism in \(q\) Subpopulations) Are boys and girls experiencing similar timing in pubertal growth?
In the case of gender comparison (\(q=2\)): \[H_{0j} : \mu_{1j} - \mu_{1(j-1)} = \mu_{2j} - \mu_{2(j-1)}; \quad \text{for } j = 2, \dots, p\]
In general, test \(H_0 : CBM = 0\) with: \[\underbrace{C}_{(q-1) \times q} = \begin{bmatrix} 1 & -1 & 0 & \dots & 0 \\ 0 & 1 & -1 & \dots & 0 \\ \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & 0 & \dots & -1 \end{bmatrix}; \quad \underbrace{M}_{p \times (p-1)} = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ -1 & 1 & 0 & \dots & 0 \\ 0 & -1 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & 0 & \dots & -1 \end{bmatrix}\]
Consider the transformation of \(Y\) by the matrix \(M\): \[Y_0 = YM\] where \(Y\) is \(n \times p\) and \(M\) is \(p \times (p-1)\).
Under the model: \[Y_0 = XB + E\]
We test the null hypothesis: \[\rightarrow H_0 : CB = 0\]
Let \(H = E_0 - E\) be the difference in residual error matrices. The resulting Wilk’s \(\Lambda\) statistic is: \[\Lambda = \frac{|E|}{|E + H|} \overset{H_0}{\sim} \Lambda(p-1, 2, 2-1)\]
2.4 Union-Intersection Principle and Intersection-Union Principle
2.4.1 Union-Intersection Principle
Consider \(H_0 : \theta \in \Theta_0 = \bigcap_{\gamma \in \Gamma} \Theta_\gamma\), where \(\Gamma\) is arbitrary. Suppose that individual tests are available for \(H_{0\gamma}\), with
\[H_{0\gamma} : \theta \in \Theta_\gamma, \quad \text{vs.} \quad H_{1\gamma} : \theta \in \Theta_\gamma^c\]
Lemma 2.4 (Lemma: Rejection Region for UI Test) If the rejection region for the test of \(H_{0\gamma}\) is \(\{x : T_\gamma(x) \in R_\gamma\}\), then the rejection regions for the UI test is: \[\bigcup_{\gamma \in \Gamma} \{x : T_\gamma(x) \in R_\gamma\}\]
Corollary 2.2 (Corollary: Rejection Region for UI Test) If the rejection region for \(H_{0\gamma}\) is of the form \(\{x : T_\gamma(x) > c\}\), then: \[\bigcup_{\gamma \in \Gamma} \{x : T_\gamma(x) > c\} = \left\{ x : \sup_{\gamma \in \Gamma} T_\gamma(x) > c \right\}\]
(Note: rejecting one rejects them all)
2.4.2 Intersection-Union Principle
Consider \[H_0 : \theta \in \Theta_0 = \bigcup_{\gamma \in \Gamma} \Theta_\gamma, \quad \Gamma \text{ arbitrary}\]
Suppose that individual tests are available for \(H_{0\gamma}\), with \[H_{0\gamma} : \theta \in \Theta_\gamma, \quad \text{vs.} \quad H_{1\gamma} : \theta \in \Theta_\gamma^c\]
Lemma 2.5 (Lemma: Rejection Region for UI Test) If the rejection region for the test of \(H_{0\gamma}\) is \(\{x : T_\gamma(x) \in R_\gamma\}\), then the rejection regions for the UI test is: \[\bigcap_{\gamma \in \Gamma} \{x : T_\gamma(x) \in R_\gamma\}\]
Example 2.8 (Union-Intersection) Let \(X_1, \dots, X_n\) iid \(N(\mu, \sigma^2)\), and consider testing: \(H_0 : \mu = \mu_0 \quad \text{vs.} \quad H_1 : \mu \neq \mu_0\)
We can write the null hypothesis as an intersection: \(H_0 : \{ \mu : \mu \leq \mu_0 \} \cap \{ \mu : \mu \geq \mu_0 \}\)
The LRT of \(H_{0L} : \mu \leq \mu_0\) versus \(H_{1L} : \mu > \mu_0\) rejects \(H_{0L}\) if: \[\frac{\bar{X} - \mu_0}{S/\sqrt{n}} \geq t_L\]
The LRT of \(H_{0U} : \mu \geq \mu_0\) versus \(H_{1U} : \mu < \mu_0\) rejects \(H_{0U}\) if: \[\frac{\bar{X} - \mu_0}{S/\sqrt{n}} \leq t_U\]
The UI Test rejects \(H_0\) if: \[\frac{\bar{X} - \mu_0}{S/\sqrt{n}} \geq t_L \quad \text{or} \quad \frac{\bar{X} - \mu_0}{S/\sqrt{n}} \leq t_U\]
2.4.3 UI Test for General Linear Hypotheses (Roy’s Maximum Root)
We can consider testing: \[H_0 : CB = D \quad \equiv \quad \bigcap_{\ell} CB\ell = D\ell, \quad \forall \ell \in \mathbb{R}^p\]
Each \(H_{\ell}^0 : CB\ell = D\ell\) is univariate with F ratio:
\[F_{\ell} = \frac{\ell' H\ell / r}{\ell' E\ell / (n - q)}\]
This leads to the rejection region:
\[\bigcup_{\ell} \{y : F_{\ell} > k\} = \{y : \sup_{\ell} F_{\ell} > k\} = \{y : \phi_{\max} > \phi_{\alpha}\}\]
where \(\phi_{\max}\) is the largest eigenvalue of \(HE^{-1}\).
Remark 2.1. \(E^{-1}H\) gives the full information we need. Under \(H_0: E_0-E\)
\[\Lambda = |I + E^{-1}H|^{-1} = \prod_{i=1}^{p} \frac{1}{1 + \lambda_i}\]
where \(\lambda_1, \dots, \lambda_p\) are eigenvalues of \(E^{-1}H\).
Now let \(\lambda_1, \dots, \lambda_p\) be the eigenvalues of \(E^{-1}H\). We have the following test statistics:
Wilk’s Lambda: \(\prod_{i} (1 + \lambda_i)^{-1}\)
Pillai’s Trace: \(\sum_{i} \frac{\lambda_i}{1 + \lambda_i}\)
Hotelling-Lawley Trace: \(\sum_{i} \lambda_i\)
Roy’s Largest Root: \(\phi = \max \lambda_i\)
We can construct simultaneous confidence regions for linear combinations \(AB\) using significance levels from the null \(H_0 : AB = 0\).
Null Hypothesis: \[H_0 : \bigcap_{a,b} a'ABb = 0\]
Acceptance Region: \[\bigcap_{a,b} \{y : F_{a,b} \leq k\} = \{y : \sup_{a,b} F_{a,b} \leq k\} = \{y : \phi_{\max} \leq \phi_{\alpha}\}\]
Test Statistic: \[F_{a,b} = \frac{(a'\hat{A}Bb)^2}{(a'(X'X)^{-1}a)(b'Eb)}\]
Confidence Level Derivation: \[(1 - \alpha) = Pr\{\phi_{\max} \leq \phi_{\alpha}\} =\] \[= Pr \left[ |a'A(\hat{B} - B)b| \leq \{\phi_{\alpha} (a'(X'X)^{-1}a)(b'Eb)\}^{1/2}, \forall a,b \right]\]
Resulting \((1 - \alpha)\) confidence regions for \(a'ABb\): \[a'\hat{A}Bb \pm \{\phi_{\alpha} (a'(X'X)^{-1}a)(b'Eb)\}^{1/2}\]