4 Introduction to Longitudinal Data Analysis

4.1 Longitudinal Designs

Outcomes \(Y(t)\) are measured for the same experimental unit over an evaluation index \(t\). The measurement process is repeated across all units.

Definition 4.1 (Vector Notation) For unit \(i\), the vector of observations is: \[Y_i(t_i) = [y_i(t_{i1}), y_i(t_{i2}), \dots, y_i(t_{im_i})]'\] where \(t_i = (t_{i1}, \dots, t_{im_i})\).

Remark 4.1 (Nature of the Index \(t\)). While often interpreted as time, \(t\) can be generalized as a multi-dimensional index for:

Spatial Data: Measurements across geographic coordinates.
Clustered Outcomes: Repeated measures within villages, herds, or hospitals.

4.2 Modeling Strategies

4.3 Parametric Models for the Covariance Structure

Considering the design structure typical of longitudinal studies, some common parametric forms for the covariance matrix are usually sought to aid efficiency. Common modeling strategies build on Markovian assumptions about the process under observation. When dealing with repeated measures, the following structured forms are typically employed:

Compound Symmetry (CS): Assumes constant variance and constant covariance between any two observations within the same unit. \[\Sigma = \sigma^2 \begin{pmatrix} 1 & \rho & \dots & \rho \\ \rho & 1 & \dots & \rho \\ \vdots & \vdots & \ddots & \vdots \\ \rho & \rho & \dots & 1 \end{pmatrix}\]
Autoregressive (AR) Structures: Typically AR(1), where correlations decay as the distance between measurements increases. \[\text{Cov}(Y_{it}, Y_{it'}) = \sigma^2 \rho^{|t-t'|}\]
Markov Structures: Generalizations where the current state depends only on a finite number of previous states.

Definition 4.2 (Compound Symmetry) Let \(\Gamma\) be the correlation matrix:

\[\Gamma = \begin{pmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & & \vdots \\ \vdots & & \ddots & \rho \\ \rho & \cdots & \rho & 1 \end{pmatrix}\]

The resulting covariance matrix \(\Sigma\) is then defined as:

\[\Sigma = \begin{pmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2 & \cdots & \rho\sigma_1\sigma_p \\ \rho\sigma_1\sigma_2 & \sigma_2^2 & & \vdots \\ \vdots & & \ddots & \rho\sigma_p\sigma_{(p-1)} \\ \rho\sigma_1\sigma_p & \cdots & \rho\sigma_p\sigma_{(p-1)} & \sigma_p^2 \end{pmatrix}\]

Definition 4.3 (Autoregressive (AR1) Structure) For equally spaced data, the correlation matrix \(\Gamma\) follows an autoregressive structure of order 1, where the correlation between observations decays exponentially as the distance between time points increases:

\[\Gamma = \begin{pmatrix} 1 & \rho & \rho^2 & \rho^3 & \cdots & \rho^p \\ \rho & 1 & \rho & \rho^2 & & \rho^{p-1} \\ \rho^2 & \rho & 1 & \rho & & \vdots \\ \rho^3 & \rho^2 & \rho & 1 & & \vdots \\ \vdots & & & & \ddots & \rho \\ \rho^p & \rho^{p-1} & \cdots & \cdots & \rho & 1 \end{pmatrix}\]

Definition 4.4 (Markov Model (Unequally Spaced Data)) For unequally spaced data, we define the distance between time points \(j\) and \(k\) as \(d_{jk} = |t_j - t_k|\). The correlation matrix \(\Gamma\) generalizes the AR(1) structure by using these actual distances in the exponent:

\[\Gamma = \begin{pmatrix} 1 & \rho^{d_{12}} & \rho^{d_{13}} & \rho^{d_{14}} & \cdots & \rho^{d_{1p}} \\ \rho^{d_{12}} & 1 & \rho^{d_{23}} & \rho^{d_{24}} & & \rho^{d_{2p}} \\ \rho^{d_{13}} & \rho^{d_{23}} & 1 & \rho^{d_{34}} & & \vdots \\ \rho^{d_{14}} & \rho^{d_{24}} & \rho^{d_{34}} & 1 & & \vdots \\ \vdots & & & & \ddots & \rho^{d_{p(p-1)}} \\ \rho^{d_{1p}} & \rho^{d_{2p}} & \cdots & \cdots & \rho^{d_{p(p-1)}} & 1 \end{pmatrix}\]

4.4 Linear Mixed Effects

A natural way to think about longitudinal data involves the explicit modeling of subject-specific patterns of variation. We assume the response \(Y_{ij}\) follows a distribution within the exponential family:

\[Y_{ij} \mid \theta_{ij}, \alpha \sim p(\cdot), \quad \text{with } p \in \text{Exponential Family}\]

The density (or mass) function is given by:

\[p(y_{ij} \mid \theta_{ij}, \alpha) = \exp \left[ \frac{y_{ij}\theta_{ij} - b(\theta_{ij})}{a(\alpha)} + c(y_{ij}, \alpha) \right]\]

for subjects \(i = 1, 2, \dots, n\) and observations \(j = 1, 2, \dots, m_i\).

Define the conditional mean \(\mu_{ij} = E(Y_{ij} \mid \theta_{ij}, \alpha)\). We introduce a link function \(g(\cdot)\) such that:

\[g(\mu_{ij}) = x_{ij}'\beta + z_{ij}'b_i\]

where:

\(x_{ij} \in \mathbb{R}^q\) represents fixed effects covariates.
\(z_{ij} \in \mathbb{R}^k\) represents random effects covariates for subject \(i\).

We may want to impose a structure on \(b_i\), leading us to a random effects model.

Definition 4.5 (Random Effects Model) Building on the subject-specific patterns, we formally introduce unit-specific random effects \(b_i\), generally assuming they are independently distributed:

\[b_i \sim_{ind} N(0, D(\alpha))\]

The conditional mean and link function remain: \[g(\mu_{ij}) = x_{ij}'\beta + z_{ij}'b_i\]

To understand the population-level implications, we derive the marginal moments by integrating over the distribution of the random effects:

Marginal Mean: \[E(Y_{ij}) = E[E(Y_{ij} \mid b_i)] = E_b [g^{-1}(x_{ij}'\beta + z_{ij}'b_i)]\]
Marginal Variance: \[Var(Y_{ij}) = E[Var(Y_{ij} \mid b_i)] + Var[E(Y_{ij} \mid b_i)]\]
Marginal Covariance: \[Cov(Y_{ij}, Y_{ik}) = E[Cov(Y_{ij}, Y_{ik} \mid b_i)] + Cov[E(Y_{ij} \mid b_i), E(Y_{ik} \mid b_i)]\]

We note that the \(Y_{ij}\) is the marginal value. In this sense, our \(g\) becomes a function of the conditional expectation:

\[ g[E(Y_{ij}|b_i)] = g(\mu_{ij})\]

4.5 Poisson Random Effects Model

A specific application of the subject-specific longitudinal framework is the Poisson model for count data. We assume:

\[Y_{ij} \mid \beta, b_i \sim \text{Poisson}(\mu_{ij})\]

where we use the log link function:

\[g(\mu_{ij}) = \log \mu_{ij} = x_{ij}'\beta + b_i\]

\[Z_i = \text{1}_{mi}\] with independent and identically distributed random effects:

\[b_i \sim iid \text{ } N(0, \sigma^2)\]

Under the Poisson assumption, the conditional mean and variance are identical:

\[E(Y_{ij} \mid b_i) = Var(Y_{ij} \mid b_i) = \exp \{x_{ij}'\beta + b_i\}\]

To find the population-averaged (marginal) mean, we integrate out the random effect. Using the property of the moment-generating function for a normal distribution:

Example 4.1 (Poisson Marginal Mean) Our marginal mean becomes:

\[E(Y_{ij}) = E\{\exp[x_{ij}'\beta + b_i]\} = \exp\{x_{ij}'\beta + \sigma^2/2\}\]

\[e^{x_{ij}'\beta}E_{b_i}[e^{b_i}] = e^{x_{ij}'\beta} e^{\frac{\sigma^2}{2}}\]

Example 4.2 (Marginal variance) \[\begin{aligned} Var(Y_{ij}) &= E(\mu_{ij}) + Var(\mu_{ij}) \\ &= E(\mu_{ij}) \{1 + E(\mu_{ij})(e^{\sigma^2} - 1)\} \\ &= E(\mu_{ij}) \{1 + E(\mu_{ij}) c_{\sigma}\} \end{aligned}\]

Example 4.3 (Marginal covariance) \[\begin{aligned} Cov(Y_{ij}, Y_{ik}) &= Cov \left[ \exp\{x_{ij}'\beta + b_i\}, \exp\{x_{ik}'\beta + b_i\} \right] \\ &= \exp\{x_{ij}'\beta + x_{ik}'\beta\} e^{\sigma^2} (e^{\sigma^2} - 1) \\ &= E(Y_{ij})E(Y_{ik}) \, c_{\sigma} \end{aligned}\]

4.6 Likelihood Inference

In the Generalized Linear Mixed Models (GLMM) framework, likelihood inference focuses on the estimation of the fixed effects \(\beta\) and the variance components \(\alpha\).

Standard Maximum Likelihood (ML) theory applies to estimators derived from the marginal likelihood function, which is obtained by integrating out the random effects \(b_i\):

\[\mathcal{L}(\beta, \alpha) = \prod_{i} \int p(Y_i \mid b_i, \beta, \alpha) \, dp(b_i \mid \alpha)\]

Example 4.4 (Log-linear Poisson Regression Likelihood) For a log-linear Poisson regression model with random intercepts \(b_i \sim N(0, \sigma^2)\), the marginal likelihood function is constructed by integrating the product of the conditional Poisson densities and the normal density of the random effects:

\[\mathcal{L}(\beta, \sigma^2) = \prod_{i} (2\pi\sigma^2)^{-1/2} \int \prod_{j} \frac{e^{-\mu_{ij}} \mu_{ij}^{y_{ij}}}{y_{ij}!} (2\pi\sigma^2)^{-1/2} \exp \left\{ -\frac{b_i^2}{2\sigma^2} \right\} \, db_i\]

4.7 Linear Mixed Effects Models

If we can operate under Gaussian sampling, let \(Y_i = (Y_{i1}, \dots, Y_{im})'\). A Linear Mixed Effects Model assumes:

\[Y_i = X_i \beta + Z_i b_i + \epsilon_i\]

with

\(\epsilon_i : m \times 1 \sim N\{0, E(\alpha)\}\)

\(E[\alpha] = \sigma^2I_m\) where \(\alpha\) is any variance parameter.

\(b_i : k \times 1 \sim N\{0, D(\alpha)\}\)
\(\epsilon_i \perp b_i, \quad b_i \perp b_j\)

The model \(Y_i=X_i\beta + Z_i b_i + \epsilon_i, \quad b_i \sim N(0, D(\alpha))\) yields closed form marginals

Expectation: \[E(Y_i) = X_i \beta\]
Covariance (Within-subject): \[Cov(Y_i) = Z_i D(\alpha) Z_i' + E(\alpha) = V_i(\alpha)\]

where \(D(\alpha)\) and \(E(\alpha)\) are some matrices specified a priori.

Covariance (Between-subjects): \[Cov(Y_i, Y_j) = 0, \quad \text{for any } i \neq j\]

From the joint distribution of \(Y\):

\[p(Y \mid \beta, \alpha) = \prod_{i} \int p(Y_i \mid \beta, \alpha, b_i) dp(b_i \mid \alpha)\]

Using the fact that convolutions of Normals are Normal:

\[Y_i \mid \beta, \alpha \sim N(X_i \beta, V_i(\alpha))\]

Leading to the marginal log-likelihood:

\[\ell(\beta, \alpha) = -\frac{1}{2} \sum_{i} \log |V_i(\alpha)| - \frac{1}{2} \sum_{i} (y_i - X_i \beta)' V_i(\alpha)^{-1} (y_i - X_i \beta)\]

For the inference of the fixed effects in thsi model:

The marginal log-likelihood is: \[\ell(\beta, \alpha) = -\frac{1}{2} \sum_{i} \log |V_i(\alpha)| - \frac{1}{2} \sum_{i} (y_i - X_i \beta)' V_i(\alpha)^{-1} (y_i - X_i \beta)\]
The corresponding score equations for \(\beta\) are obtained by taking the partial derivative and setting it to zero: \[\frac{\partial \ell(\cdot)}{\partial \beta} = \sum_{i} X_i' V_i(\alpha)^{-1} (y_i - X_i \beta) := 0\]
This produces the familiar Generalized Least Squares (GLS) estimator of \(\beta\): \[\hat{\beta}(\alpha) = \left\{ \sum_{i} X_i' V_i(\alpha)^{-1} X_i \right\}^{-1} \left\{ \sum_{i} X_i' V_i(\alpha)^{-1} Y_i \right\}\]

Remark 4.2. The MLE for \(\alpha\) is usually not avaiulable in closed form. It is often useful to work wth the profile log-likelihood function:

\[\begin{aligned} \ell(\alpha) &= \max_{\beta} \ell(\beta, \alpha) \\ &= -\frac{1}{2} \sum_{i} \log |V_i(\alpha)| - \frac{1}{2} \sum_{i} \{y_i - X_i \hat{\beta}(\alpha)\}' V_i(\alpha)^{-1} \{y_i - X_i \hat{\beta}(\alpha)\} \end{aligned}\]

The asymptotics are given below:

The expected Information is block diagonal

\[I(\beta, \alpha) = \begin{bmatrix} I_{\beta\beta} & 0 \\ 0 & I_{\alpha\alpha} \end{bmatrix}\]

with

\[I_{\beta\beta} = -E \left[ \frac{\partial^2 \ell(\beta, \alpha)}{\partial \beta \, \partial \beta'} \right] = \sum_{i} X_i' V_i(\alpha)^{-1} X_i\]

For any consistent estimator \(\hat{\alpha}\) of \(\alpha\)

\[\left( \sum_{i} X_i' V_i(\hat{\alpha})^{-1} X_i \right)^{-1/2} (\hat{\beta}_n - \beta) \xrightarrow{d} N(0, I_q)\]

Testing based on LRT follows standard theory. Several corrections are aimed at improving finite sample operative characteristics.

Example 4.5 (Random Intercept Regression) Let \(Y_i = X_i \beta + 1_{m_i} b_i + \epsilon_i\), \(\epsilon_i \sim N(0, \sigma^2_\epsilon I_{m_i}) \perp b_i \sim N\{0, \sigma^2_0\}\)

Variance components: \(\alpha = (\sigma^2_\epsilon, \sigma^2_0)'\)

Marginal moments

\[\begin{aligned} E(Y_i) &= X_i \beta \\ Cov(Y_i) &= 1_{m_i} \sigma^2_0 1'_{m_i} + \sigma^2_\epsilon I_{m_i} \end{aligned}\]

Marginal moments

\[\begin{aligned} E(Y_i) &= X_i \beta \\ Cov(Y_i) &= [1_{m_i} \quad t_i] \begin{pmatrix} \sigma^2_0 & 0 \\ 0 & \sigma^2_1 \end{pmatrix} \begin{bmatrix} 1'_{m_i} \\ t'_i \end{bmatrix} + \sigma^2_\epsilon I_{m_i} \end{aligned}\]