Regression models for the relative risk
$ \newcommand{\pr}{\mathbb{P}}\newcommand{\E}{\mathbb{E}} $ Relative risks (and risk differences) are collapsible and generally considered easier to interpret than odds-ratios. In a recent publication Richardson et al (JASA, 2017) proposed a new regression model for a binary exposure which solves the computational problems that are associated with using for example binomial regression with a log-link function (or identify link for the risk difference) to obtain such parameter estimates.
Let \(Y\) be the binary response, \(A\) binary exposure, and \(V\) a vector of covariates, then the target parameter is
\begin{align*} &\mathrm{RR}(v) = \frac{\pr(Y=1\mid A=1, V=v)}{\pr(Y=1\mid A=0, V=v)}. \end{align*}
Let \(p_a(V) = \pr(Y \mid A=a, V), a\in\{0,1\}\), the idea is then to posit a linear model for \[ \theta(v) = \log \big(RR(v)\big) \] and a nuisance model for the odds-product \[ \phi(v) = \log\left(\frac{p_{0}(v)p_{1}(v)}{(1-p_{0}(v))(1-p_{1}(v))}\right) \] noting that these two parameters are variation independent which can be from the below L’Abbé plot. Similarly, a model can be constructed for the risk-difference on the following scale \[\theta(v) = \mathrm{arctanh} \big(RD(v)\big).\]
p0 <- seq(0,1,length.out=100)
p1 <- function(p0,op) 1/(1+(op*(1-p0)/p0)^-1)
plot(0, type="n", xlim=c(0,1), ylim=c(0,1),
xlab="P(Y=1|A=0)", ylab="P(Y=1|A=1)", main="Constant odds product")
for (op in exp(seq(-6,6,by=.25))) lines(p0,p1(p0,op), col="lightblue")
logrr <- 0.25; abline(a=0, b=exp(logrr), col=1, lty=2) # Add line \(\log(\mathrm{RR})=0.25\)
The model can be specified in
R with
lava
using the following syntax
library("lava")
m <- lvm(a ~ x+z,
logrr ~ 1,
nuisance ~ x + z)
m <- binomial.rr(m, y ~ a | logrr | nuisance)
## Alternative syntax:
## binomial.rr(m, response="y", exposure="a", target.model="logrr", nuisance.model="nuisance")
m
Binomial regression (exposure | relative-risk | odds-product) :
y ~ a | logrr | nuisance
Latent Variable Model
a ~ x+z binomial(logit)
logrr ~ 1 deterministic
nuisance ~ x+z deterministic
Exogenous variables:
x gaussian
z gaussian
We can then simulate from the model
d <- lava::sim(m, n=5e4, seed=1, p=c(logrr=1))
head(d)
a x z logrr nuisance y
1 1 -0.6264538 0.5258908 1 -0.10056299 1
2 0 0.1836433 -0.4875444 1 -0.30390103 0
3 0 -0.8356286 1.1382508 1 0.30262217 1
4 1 1.5952808 1.2151344 1 2.81041518 1
5 0 0.3295078 -0.4248307 1 -0.09532293 0
6 0 -0.8204684 -1.4508403 1 -2.27130867 0
A double-robust estimator can be constructed by considering both a
model for the exposure as a function of the auxillary covariates as
well as a model for the nuisance parameter. This estimator (and other
double-robust estimators) can be applied using a newly
developed library targeted
in R (and python)
library(targeted)
riskreg(y ~ a | 1 | x+z | x+z, data=d, type="rr")
riskreg(formula = y ~ a | 1 | x + z | x + z, data = d, type = "rr")
Estimate Std.Err 2.5% 97.5% P-value
(Intercept) 1.005 0.01327 0.9787 1.031 0
It is also possible to examine the interaction effect between the exposure and a covariate:
riskreg(y ~ a | x | x+z | x+z, data=d, type="rr")
riskreg(formula = y ~ a | x | x
z | x
z, data = d, type = "rr")
Estimate Std.Err 2.5% 97.5% P-value
(Intercept) 1.000487 0.01323 0.97456 1.02642 0.0000
x 0.001409 0.01527 -0.02853 0.03134 0.9265
The double-robustness is here illustrated with a misspecified nuissance model (odds-product)
summary(riskreg(y ~ a | 1 | 1 | x+z, data=d, type="rr"))
riskreg(formula = y ~ a | 1 | 1 | x+z, data = d, type = "rr")
Relative risk model
Response: y
Exposure: a
Estimate Std.Err 2.5% 97.5% P-value
log(RR):
(Intercept) 1.005273 0.01439 0.97707 1.03348 0.000e+00
log(OP):
(Intercept) 0.142626 0.02187 0.09976 0.18549 6.978e-11
logit(Pr):
(Intercept) 0.001136 0.01094 -0.02030 0.02257 9.173e-01
x 1.004365 0.01366 0.97758 1.03115 0.000e+00
z 1.004225 0.01372 0.97734 1.03111 0.000e+00
or a misspecified model for the exposure
summary(riskreg(y ~ a | 1 | x+z | 1, data=d, type="rr"))
riskreg(formula = y ~ a | 1 | x+z | 1, data = d, type = "rr")
Relative risk model
Response: y
Exposure: a
Estimate Std.Err 2.5% 97.5% P-value
log(RR):
(Intercept) 1.004493 0.013201 0.97862 1.03037 0.0000
log(OP):
(Intercept) 0.003391 0.021968 -0.03967 0.04645 0.8773
x 1.003925 0.023428 0.95801 1.04984 0.0000
z 1.007712 0.023480 0.96169 1.05373 0.0000
logit(Pr):
(Intercept) -0.002000 0.008944 -0.01953 0.01553 0.8231