Nonlinear latent variable models

ML-inference in non-linear SEMs is complex. Computational intensive methods based on numerical integration are needed and results are sensitive to distributional assumptions.

In a recent paper: A two-stage estimation procedure for non-linear structural equation models by Klaus Kähler Holst & Esben Budtz-Jørgensen (https://doi.org/10.1093/biostatistics/kxy082), we consider two-stage estimators as a computationally simple alternative to MLE. Here both steps are based on linear models: first we predict the non-linear terms and then these are related to latent outcomes in the second step.

Introduction: the measurement error problem

Measurement error in covariates is a common source of bias in regression analyses. In a simple linear regression setting, let Y denote the response variable, ξ the true exposure, for which we only observe replicated proxy measurements X1,,Xq, and Z an additional confounder:

Y=μ+βξ+γZ+δ ξ=α+κZ+ζ Xj=νj+ξ+ϵj,j=1,,q,

where the measurement error terms δ,ζ,ϵj are assumed to be independent of the true exposure ξ. Standard techniques, e.g. linear regression (MLE) on Xj and Z fails to provide consistent estimates. To see this observe that

β^MLE=Cov(Y,XjZ)Var(XjZ)=βVar(ξ)Var(ξ)+Var(ϵj)<β

In the following, generalizations of the above problem to general latent variable models including non-linear effects of the exposure variable are considered.

Statistical model

For subjects i=1,,n, we have latent response variable: ηi=(ηi1,,ηip)t non-linearly related to latent predictor: ξi=(ξi1,,ξiq)t after adjustment for covariates Zi=(Zi1,,Zir)

(1)ηi=α+Bφ(ξi)+ΓZi+ζi,

where φ:RpRl is a known measurable function such that φ(ξi)=(φ1(ξi),,φl(ξi))t has finite variance (see Figure 1). The target parameter of interest parametrizing BRp×l describes the non-linear relation between ξi and ηi. Note, that φ may also depend on some of the covariates thereby allowing the introduction of interaction terms.

The latent predictors (ξi) are related to each other and the covariates in a linear structural equation:

(2)ξi=α~+B~ξi+Γ~Zi+ζ~i.

where diagonal elements in B~ are zero.

Figure 1: Non-linear structural equation model (SEM).

The observed variables Xi=(Xi1,,Xih)t and Yi=(Yi1,,Yim)t are linked to the latent variable in the two measurement models

(3)Yi=ν+Ληi+KZi+εi(4) Xi=ν~+Λ~ξi+K~Zi+ε~i,

where the error terms εi and ε~i are assumed to be independent with mean zero and covariance matrices of Ω and Ω~, respectively. The Øparameters* are collected into θ=(θ1t,θ2t)t, where θ1=(α~,B~,γ~,ν~,Λ~,K~,Ω~,Ψ~) are the parameters of the linear SEM describing the conditional distribution of Xi given Zi. The rest of the parameters are collected into θ2.

Two-stage estimator (2SSEM)

  1. The linear SEM given by equations (2) and is (4) is fitted (e.g., MLE) to (Xi,Zi),i=1,,n to obtain consistent estimate of θ1.
  2. The parameter θ2 is estimated via a linear SEM with measurement model given by equation (3) and structural model given by equation (1), where the latent predictor, φ(ξi), is replaced by the Empirical Bayes plug-in estimate φn(ξi)=Eθ^1[φ(ξi)Xi,Zi].

Prediction of non-linear latent terms

For important classes of non-linear functions (φ) closed form expressions can be derived for φ(ξi)=E[φ(ξi)|Xi,Zi]. Under Gaussian assumptions the conditional distribution of ξi given Xi and Zi is normal with mean and variance

mx,z=E(ξi|Xi,Zi)=α~+Γ~Zi+ΣXξΣX1(XiμX) v=Var(ξi|Xi,Zi)=Ψ~ΣXξΣX1ΣXξt,

where μX=ν~+Λ~(IB~)1α+Λ~(IB~)1Γ~Zi+K~Zi,ΣX=Λ~(IB~)1Ψ~(IB~)1tΛ~t+Ω~ and ΣXξ=Λ~(IB~)1Ψ~(IB~)1t. Note, that the conditional variance does not depend on X, Z.

Polynomials

ηi=α+m=1kβmξim+ζi,

Here φ(ξi)=ξim,(mN) and conditional means are given by

E(ξim|Xi,Zi)=k=0[m/2],mx,zm2k,vk,m!2kk!(m2k)!

Exponentials

For the exponential function φ(ξi)=exp(ξi) an expression can be obtained as exp(ξi) will follow a logarithmic normal distribution where the mean is

E[exp(ξi)|Xi,Zi]=exp(0.5v+mx,y)

The conditional mean of functions on the form φ(ξi)=exp(ξi)m is straightforward to calculate as this variable again follows a logarithmic normal distribution.

Interactions

Product-interaction model

ηi=α+β1ξ1i+β2ξ2i+β12ξ1iξ2i+ζi,

now E(ξ1iξ2i|Xi,Zi)=Cov(ξ1i,ξ2i|Xi,Zi)+E(ξ1i|Xi,Zi)E(ξ2i|Xi,Zi), where terms on the right-hand side are directly available from the bivariate normal distribution of ξ1i,ξ2i given Xi,Zi. Regression calibration leads to the correct mean expect that the intercept will be biased as this method will not include the covariance term above.

Splines

A natural cubic spline with k knots t1<t2<<tk is given by

ηi=α+β0ξi+j=1k2βjfj(ξi)+ζi,

with fj(ξi)=gj(ξi)tktjtktk1gk1(ξi)+tk1tjtktk1gk(ξi),,j=1,,k2 and gj(ξi)=(ξitj)31ξi>tj,,j=1,,k. Thus, predictions E[fj(ξi)|Xi,Zi] are linear functions of E[gj(ξi)|Xi,Zi], and

E[gj(ξi)|Xi,Zi]=s2π[(2s2+(mx,ztj)2)exp([(mx,ztj)s2]2)]+ (mx,ztj)[(mx,ztj)2+3s2]px,z,j,

where s=v and px,z,j=P(ξi>tj|Xi,Zi)=1Φ(tjmx,zs).

Relaxing distributional assumptions

Let Gimultinom(π) be class indicator Gi{1,,K}, and ξi the q-dim. latent predictor from the mixture

ξi=i=1KI(Gi=k)ξki

with ζ~kiN(0,Ψ~k) where ξki=α~k+B~ξki+Γ~Zi+ζ~ki. Assuming Gi to be independent of (ζ~1i,,ζ~Ki) and (εi,ε~i),

E[φ(ξi)|Xi,Zi]=E{E[φ(ξi)|Xi,Zi,Gi]|Xi,Zi} =k=1KP(Gi=k|Xi,Zi),E[φ(ξki)|Xi,Zi,Gi=k] =k=1KP(Gi=k|Xi,Zi),E[φ(ξki)|Xi,Zi].

Results can be extended also to the case where π depends on covariates. Under regularity conditions (Redner 1984) the MLE of the stage 1 mixture model is regular and asymptotic linear (RAL).

Asymptotics

Theorem. Under correctly specified model 2SSEM will yield consistent estimation of all parameters θ except for the residual covariance, Ψ, of the latent variables in step 2.

Proof: (intuitive version - linear models and Berkson error)

Let Y=α+βX+ϵ with Berkson error X=W+U and Cov(W,U)=0. Let Y=α+βW+ϵ~. Berkson error does not lead to bias in β, but residual variance is too high Follows from noting that we have Berkson error in the second step: ηi=α+Bφ(ξi)+ΓZi+ζi. Iterated conditional means: Cov{Eφ(ξ)|X,Z],E[φ(ξ)|X,Z]φ(ξi)}=0.

Assume that we have i.i.d. observations (Yi,Xi,Zi),i=1,,n, and we also restrict attention only to the consistent parameter estimates i.e., θ2 does not contain any of the parameters belonging to Ψ.

We will assume that the stage 1 model estimator is obtained as the solution to the following score equation:

U1(θ1;X,Z)=i=1nU1(θ1;Xi,Zi)=0,

and that

  1. The estimator of the stage 1 model is consistent, linear, regular, and asymptotic normal.
  2. U is twice continuous differentiable in a neighbourhood around the true (limiting) parameters (θ01T,θ02T)T. Further, n1i=1nU2(Yi,Xi,Zi;θ1,θ2) converges uniformly to E[U2(Yi,Xi,Zi;θ1,θ2)] in a neighbourhood around (θ01T,θ02T)T,
  3. and when evaluated here E(U2) is positive definite.

This implies the following i.i.d. decomposition of the two-stage estimator

n(θ^2(θ^1)θ2)=n12i=1nξ2(Yi,Xi,Zi;θ2) +n12E[θ2U2(Y,Z,X;θ2,θ1)]1 ×E[θ1U(Y,Z,X;θ2,θ1)]i=1nξ1(Xi,Zi;θ1)+op(1) =n12i=1nξ3(Yi,Xi,Zi;θ2,θ1)+op(1).

where ξ1 and ξ2 are the influence functions from the stage 1 and stage 2 models.

Implementation

Installation in R:

  library(lava)
  library(magrittr)

Simulation

      f <- function(x) cos(1.25*x) + x - 0.25*x^2
      m <- lvm(x1+x2+x3 ~ eta1,
	       y1+y2+y3 ~ eta2,
	       eta1+eta2 ~ z,
	       latent=~eta1+eta2)
      functional(m, eta2~eta1) <- f

      # Default: all parameter values are 1. Here we change the covariate effect on eta1
      d <- sim(m, n=200, seed=1, p=c('eta1~z'=-1))
head(d)
          x1         x2         x3       eta1         y1         y2         y3
1 -1.3117148 -0.2758592  0.3891800 -0.6852610  0.6565382  2.8784121  0.1864112
2  1.6733480  3.1785780  3.3853595  1.4897047 -0.9867733  1.9512415  2.7624733
3  0.5661292  2.9883463  0.7987605  1.4017578  0.5694039 -1.2966555 -2.2827075
4  1.7946719 -0.1315167 -0.1914767  0.1993911  0.7221921  0.9447854 -1.3720646
5  0.3702222 -2.2445211 -0.3755076  0.0407144 -0.3144152  0.3546089  0.9828617
6 -2.8786355  0.4394945 -2.4338245 -2.0581671 -4.1310534 -5.6157543 -3.0456611
        eta2           z
1  1.7434470  0.34419403
2  0.8393097  0.01271984
3 -0.4258779 -0.87345013
4  0.7340538  0.34280028
5  0.2852132 -0.17738775
6 -3.9531055  0.92143325

Specification of stage 1

    m1 <- lvm(x1+x2+x3 ~ eta1, eta1 ~ z, latent = ~ eta1)

Specification of stage 2:

      m2 <- lvm() %>%
	  regression(y1+y2+y3 ~ eta2) %>%
	  regression(eta2 ~ z) %>%
	  latent(~ eta2) %>%
	  nonlinear(type="quadratic", eta2 ~ eta1)

Estimation:

(mm <- twostage(m1, m2, data=d))
                    Estimate Std. Error  Z-value   P-value
Measurements:
   y2~eta2           0.98628    0.04612 21.38536    <1e-12
   y3~eta2           0.97614    0.05203 18.76166    <1e-12
Regressions:
   eta2~z            1.08890    0.17735  6.13990 8.257e-10
   eta2~eta1_1       1.13932    0.15548  7.32770    <1e-12
   eta2~eta1_2      -0.38404    0.05770 -6.65582 2.817e-11
Intercepts:
   y2               -0.09581    0.11350 -0.84413    0.3986
   y3                0.01440    0.10849  0.13273    0.8944
   eta2              0.50414    0.17550  2.87264  0.004071
Residual Variances:
   y1                1.27777    0.18980  6.73206
   y2                1.02924    0.13986  7.35895
   y3                0.82589    0.14089  5.86181
   eta2              1.94918    0.26911  7.24305
  pf <- function(p) p["eta2"]+p["eta2~eta1_1"]*u + p["eta2~eta1_2"]*u^2
  plot(mm,f=pf,data=data.frame(u=seq(-2,2,length.out=100)),lwd=2)