R | Klaus K. Holst

Influence Function Calculus

Influence functions (IFs), also known as influence curves or canonical gradients, are essential for characterizing regular and asymptotic linear estimators. They enable the direct calculation of properties such as asymptotic variance and facilitate the construction of new estimators through straightforward combinations and transformations.

Targeted Inference `targeted`

The `targeted` package implements various methods for targeted learning and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).

ODEs with the targeted package using external pointers

Mathematical and statistical software often relies on sequential computations. Examples are likelihood evaluations where it typically is necessary to loop over the rows of the data, or solving ordinary differential equations where numerical approximations are based on looping over the evolving time. When using high-level languages such as R or python such calculations can be very slow unless the algorithms can be vectorized. Fortunately, it is straight-forward to make the implementations in C/C++ and subsequently make an interface to R and python (Rcpp and pybind11). ...

Nonlinear latent variable models

ML-inference in non-linear SEMs is complex. Computational intensive methods based on numerical integration are needed and results are sensitive to distributional assumptions. In a recent paper: A two-stage estimation procedure for non-linear structural equation models by Klaus Kähler Holst & Esben Budtz-Jørgensen (https://doi.org/10.1093/biostatistics/kxy082), we consider two-stage estimators as a computationally simple alternative to MLE. Here both steps are based on linear models: first we predict the non-linear terms and then these are related to latent outcomes in the second step. ...

A simple ODE Class

A small illustration on using the armadillo C++ linear algebra library for solving an ordinary differential equation of the form \[ X’(t) = F(t,X(t),U(t)).\] The abstract super class Solver defines the methods solve (for approximating the solution in user-defined time-points) and solveint (for interpolating user-defined input functions on a finer grid). As an illustration a simple Runge-Kutta solver is derived in the class RK4. The first step is to define the ODE, here a simple one-dimensional ODE \(X’(t) = \theta\cdot\{U(t)-X(t)\}\) with a single input \(U(t)\): rowvec dX(const rowvec &input, // time (first element) and additional input variables const rowvec &x, // state variables const rowvec &theta) { // parameters rowvec res = { theta(0)*theta(1)*(input(1)-x(0)) }; return( res ); } The ODE may then be solved using the following syntax odesolver::RK4 MyODE(dX); arma::mat res = MyODE.solve(input, init, theta); with the step size defined implicitly by input (first column is the time variable and the following columns the optional different input variables) and boundary conditions defined by init. ...

Regression models for the relative risk

\[ \newcommand{\pr}{\mathbb{P}}\newcommand{\E}{\mathbb{E}} \] Relative risks (and risk differences) are collapsible and generally considered easier to interpret than odds-ratios. In a recent publication Richardson et al (JASA, 2017) proposed a new regression model for a binary exposure which solves the computational problems that are associated with using for example binomial regression with a log-link function (or identify link for the risk difference) to obtain such parameter estimates. Let \(Y\) be the binary response, \(A\) binary exposure, and \(V\) a vector of covariates, then the target parameter is \begin{align*} &\mathrm{RR}(v) = \frac{\pr(Y=1\mid A=1, V=v)}{\pr(Y=1\mid A=0, V=v)}. \end{align*} Let \(p_a(V) = \pr(Y \mid A=a, V), a\in\{0,1\}\), the idea is then to posit a linear model for \[ \theta(v) = \log \big(RR(v)\big) \] and a nuisance model for the odds-product \[ \phi(v) = \log\left(\frac{p_{0}(v)p_{1}(v)}{(1-p_{0}(v))(1-p_{1}(v))}\right) \] noting that these two parameters are variation independent which can be from the below L’Abbé plot. Similarly, a model can be constructed for the risk-difference on the following scale \[\theta(v) = \mathrm{arctanh} \big(RD(v)\big).\] ...