Package {multipleOutcomes}


Title: Joint Covariance and Treatment-Effect Tests for Multiple Outcomes
Version: 0.18.1
Description: Fits generalized linear models, Cox proportional-hazards models, log-rank tests, generalized estimating equations, mixed models with repeated measures, Kaplan-Meier curves, quantile differences, and hierarchical net-benefit (win-difference) and log win-ratio statistics jointly across multiple endpoints, and returns the full asymptotic covariance matrix linking them. Implements PATED (Prognostic Assisted Treatment Effect Detection), a randomized-trial method that exploits balanced prognostic covariates to tighten standard errors and increase statistical power without introducing bias.
URL: https://github.com/zhangh12/multipleOutcomes, https://zhangh12.github.io/multipleOutcomes/
BugReports: https://github.com/zhangh12/multipleOutcomes/issues
License: MIT + file LICENSE
Encoding: UTF-8
Imports: dplyr, ggplot2, ggpubr, mmrm (≥ 0.3.15), mvtnorm, rlang, sandwich, stringr, survival, tidyr, tidyselect
Suggests: asaur, coin, iBST, invGauss, JM, knitr, momentfit, numDeriv, pec, rmarkdown, testthat (≥ 3.0.0)
Depends: R (≥ 3.5)
LazyData: true
Config/roxygen2/version: 8.0.0
RoxygenNote: 7.3.3
NeedsCompilation: yes
Packaged: 2026-06-15 07:11:49 UTC; zhhan
Author: Han Zhang [aut, cre]
Maintainer: Han Zhang <zhangh.ustc@gmail.com>
Repository: CRAN
Date/Publication: 2026-06-15 07:40:02 UTC

ACTG 320 Clinical Trial Dataset

Description

actg dataset from Hosmer et al.

Format

A data frame

id

Identification Code

time

Time to AIDS diagnosis or death (days).

censor

Event indicator. 1 = AIDS defining diagnosis, 0 = Otherwise.

time_d

Time to death (days)

censor_d

Event indicator for death (only). 1 = Death, 0 = Otherwise.

tx

Treatment indicator. 1 = Treatment includes IDV, 0 = Control group.

txgrp

Treatment group indicator. 1 = ZDV + 3TC. 2 = ZDV + 3TC + IDV. 3 = d4T + 3TC. 4 = d4T + 3TC + IDV.

strat2

CD4 stratum at screening. 0 = CD4 <= 50. 1 = CD4 > 50.

sex

0 = Male. 1 = Female.

raceth

Race/Ethnicity. 1 = White Non-Hispanic. 2 = Black Non-Hispanic. 3 = Hispanic. 4 = Asian, Pacific Islander. 5 = American Indian, Alaskan Native. 6 = Other/unknown.

ivdrug

IV drug use history. 1 = Never. 2 = Currently. 3 = Previously.

hemophil

Hemophiliac. 1 = Yes. 0 = No.

karnof

Karnofsky Performance Scale. 100 = Normal; no complaint no evidence of disease. 90 = Normal activity possible; minor signs/symptoms of disease. 80 = Normal activity with effort; some signs/symptoms of disease. 70 = Cares for self; normal activity/active work not possible.

cd4

Baseline CD4 count (Cells/Milliliter).

priorzdv

Months of prior ZDV use (months).

age

Age at Enrollment (years).

Source

ftp://ftp.wiley.com/public/sci_tech_med/survival

References

Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY

Examples

data(actg)


Extract Model Coefficients

Description

coef is a generic function.

Usage

## S3 method for class 'jointCovariance'
coef(object, model_index = NULL, ...)

Arguments

object

an object returned by jointCovariance().

model_index

NULL if displaying coefficients of all fitted models; otherwise, an integer indicating the fitted model.

...

for debugging only

Value

a vector of coefficient estimates


Creating Objects of Proportional Hazards Regression Model

Description

coxph_ is a wrapper function of survival::coxph to create an object to be passed into jointCovariance, the main function of this package through its argument .... The object defines how a proportional hazard model would be fitted.

Usage

coxph_(formula, data_index = 1)

Arguments

formula

see formula in survival::coxph.

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used when fitting a proportional hazards model.

Details

Not all arguments of survival::coxph are supported in coxph_ due to the complexity in handling environment and scope, which is particularly difficult for arguments like weights, subset, etc.


Creating Objects of Generalized Estimation Equation Model

Description

gee_ is a wrapper function of gee::gee to create an object to be passed into jointCovariance, the main function of this package through its argument .... The object defines how a GEE model would be fitted.

This package does not import the package gee. Instead, codes of gee are modified and integrated to compute score and information matrix. Thus, users does not need to install the package gee to use this package.

Usage

gee_(formula, family, corstr, R = NULL, b = NULL, Mv = 1, data_index = 1)

Arguments

formula

see formula in gee::gee.

family

see family in gee::gee.

corstr

see corstr in gee::gee.

R

see R in gee::gee.

b

see b in gee::gee.

Mv

see Mv in gee::gee.

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used when fitting a GEE model.

Details

Not all arguments of stats::gee are supported in gee_ due to the complexity in handling environment and scope, which is particularly difficult for arguments like subset, etc.


Creating Objects of Generalized Linear Models

Description

glm_ is a wrapper function of stats::glm to create an object to be passed into jointCovariance, the main function of this package through its argument .... The object defines how a GLM model would be fitted.

Usage

glm_(formula, family, data_index = 1)

Arguments

formula

see formula in stats::glm.

family

currently supports "gaussian" or "binomial". Other families are under testing.

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used when fitting a generalized linear model.

Details

Not all arguments of stats::glm are supported in glm_ due to the complexity in handling environment and scope, which is particularly difficult for arguments like weights, subset, etc.


Rectal Indomethacin for Prevention of Post-ERCP Pancreatitis

Description

Patient-level data from a multi-center, randomized, double-blind, placebo-controlled 2-arm trial (n = 602) of rectal indomethacin (100 mg) versus placebo to prevent post-ERCP pancreatitis in high-risk patients, as reported by Elmunzer, Higgins, et al. (2012) in the New England Journal of Medicine.

This dataset was originally collected, cleaned, reformatted, and released for public teaching and research use by Dr. Peter D. R. Higgins in the medicaldata R package as indo_rct. The version shipped here is redistributed in support of the worked examples in this package. The variable definitions below follow medicaldata; note that a few columns are stored as numeric (rather than factor) in this copy. Users who need the authoritative copy or accompanying documentation should consult medicaldata.

Format

A data frame with 602 observations on the following 33 variables:

id

Subject identifier (numeric); leading digit indicates center. Range 1001–4003.

site

Study site (factor, 4 levels): 1_UM = University of Michigan, 2_IU = Indiana University, 3_UK = University of Kentucky, 4_Case = Case Western Reserve University.

age

Age in years (numeric), range 19–90.

risk

Risk score for post-ERCP pancreatitis (numeric), range 1–5.5.

gender

Sex (factor): 1_female, 2_male.

outcome

Primary outcome: post-ERCP pancreatitis (numeric, 1 = yes, 0 = no).

sod

Sphincter of Oddi dysfunction present (factor): 0_no, 1_yes.

pep

History of prior post-ERCP pancreatitis (factor): 0_no, 1_yes.

recpanc

History of recurrent pancreatitis (factor): 0_no, 1_yes.

psphinc

Pancreatic sphincterotomy performed (factor): 0_no, 1_yes.

precut

Sphincter pre-cut needed to enter papilla (factor): 0_no, 1_yes.

difcan

Cannulation of papilla was difficult (factor): 0_no, 1_yes.

pneudil

Pneumatic dilation of papilla performed (factor): 0_no, 1_yes.

amp

Ampullectomy performed (factor): 0_no, 1_yes.

paninj

Contrast injected into pancreas (factor): 0_no, 1_yes.

acinar

Pancreas appeared to have acinarization on imaging (factor): 0_no, 1_yes.

brush

Brushings taken from pancreatic duct (factor): 0_no, 1_yes.

asa81

Aspirin used at 81 mg per day (factor with 3 levels): 0_no, 1_yes, and a third level retained from the source coding.

asa325

Aspirin used at 325 mg per day (factor with 3 levels): 0_no, 1_yes, and a third level retained from the source coding.

asa

Aspirin used at any dose (factor with 3 levels): 0_no, 1_yes, and a third level retained from the source coding.

prophystent

Pancreatic duct stent placed per endoscopist judgment (factor): 0_no, 1_yes.

therastent

Pancreatic duct stent placed to treat narrowing (factor): 0_no, 1_yes.

pdstent

Pancreatic duct stent placed for any reason (factor): 0_no, 1_yes.

sodsom

Sphincter of Oddi manometry performed (factor): 0_no, 1_yes.

bsphinc

Biliary sphincterotomy performed (factor): 0_no, 1_yes.

bstent

Biliary stent placed to relieve obstruction (factor): 0_no, 1_yes.

chole

Choledocholithiasis present (factor): 0_no, 1_yes.

pbmal

Biliary duct or pancreatic malignancy found (factor): 0_no, 1_yes.

train

Trainee participated in ERCP (factor): 0_no, 1_yes.

status

Patient status (factor): 0_inpatient, 1_outpatient.

type

Sphincter of Oddi dysfunction type (factor): 0_no SOD, 1_type 1, 2_type 2, 3_type 3.

rx

Treatment assignment (numeric, 1 = indomethacin, 0 = placebo).

bleed

Reportable gastrointestinal bleeding (numeric, coded 1 = no, 2 = yes; NA when not assessed).

Source

Higgins, P. D. R. medicaldata: Data Package for Medical Datasets. R package, dataset indo_rct. https://CRAN.R-project.org/package=medicaldata

References

Elmunzer BJ, Higgins PDR, Saini SD, et al. A randomized trial of rectal indomethacin to prevent post-ERCP pancreatitis. New England Journal of Medicine 2012; 366(15):1414–1422. doi:10.1056/NEJMoa1111103

Examples

data(indo)


Fitting Regression Models for Multiple Outcomes and Returning the Matrix of Covariance

Description

jointCovariance can fit different types of models for multiple outcomes simultaneously and return model parameters and variance-covariance matrix for further analysis.

Usage

jointCovariance(..., data, nboot = 0, compute_cov = TRUE, seed = NULL)

Arguments

...

objects returned by glm_(), coxph_(), logrank_(), gee_(), mmrm_(), km_(), quantile_(), netbenefit_(), or winratio_().

data

a data frame if all models are fitted on the same dataset; otherwise a list of data frames for fitting models in .... Note that a dataset can be used to fit multiple models, thus, length(data) is unnecessary to be equal to the number of models in .... The row names in a data frame are not treated as subject IDs. Instead, all data frame should consist of a column pid as subject IDs. For any two records in different data frames that correspond to the same subject, their values in pid should be consistent. For data frames to be used by GEE, pid defines clusters. See id and the Details section of gee:gee.

nboot

non-zero integer if bootstrap is adopted. By default 0.

compute_cov

logic. If TRUE and nboot > 0, empirical covariance matrix is computed using bootstrap estimate and returned. Bootstrap estimate will be abandoned. If FALSE, bootstrap estimate will be returned and no empirical covariance matrix is computed.

seed

random seed when generate bootstrap data.

Value

It returns an object of class "jointCovariance", which is a list containing the following components:

coefficients an unnamed vector of coefficients of all fitted models. Use id_map for variable mapping.
mcov a unnamed matrix of covariance of coefficients. Use id_map for variable mapping.
id_map a list mapping the elements in coefficients and mcov to variable names.
n_shared_sample_sizes a matrix of shared sample sizes between datasets being used to fit the models.
call the matched call.

Examples

## More examples can be found in the vignettes.
library(survival)
library(mvtnorm)
library(tidyr)
genData <- function(seed = NULL){
  
  set.seed(seed)
  n <- 300
  sigma <- matrix(.7, 4, 4)
  diag(sigma) <- 1
  v <- rmvnorm(n, sigma = sigma)
  x1 <- v[, 1]
  x2 <- v[, 2]
  z1 <- (v[, 3] > 0) + 0
  z2 <- v[, 4]
  
  trt <- rbinom(n, 1, .5)
  
  bet <- c(-.3,.3)
  y <- -log(runif(n))/
    exp(-.3 * x1 + .3 * x2 + z1 * .5 - z2 * .3 + .1 * trt + rnorm(n))
  
  z1[sample.int(n, 50)] <- NA
  z2[sample.int(n, 50)] <- NA
  x1[sample.int(n, 50)] <- NA
  x2[sample.int(n, 50)] <- NA
  death <- ifelse(y > 2, 0, 1)
  y[y > 2] <- 2
  
  pid <- paste0('pid-', 1:n)
  ret <- data.frame(
    y = y, trt = trt, 
    z1 = z1, z2 = z2, 
    x1 = x1, x2 = x2, 
    death, pid)
  ret
}

dat1 <- genData()

## create a dataset with repeated measurements x
dat2 <- dat1 %>% pivot_longer(c(x1, x2), names_to='visit', values_to='x') %>% 
  dplyr::select(x, trt, visit, pid) %>% as.data.frame()

dat2$visit <- as.factor(dat2$visit)
dat2$pid <- as.factor(dat2$pid)

fit <- jointCovariance(
  coxph_(Surv(time = y, event = death) ~ trt, data_index = 1),
  logrank_(Surv(time = y, event=death) ~ trt, data_index = 1),
  glm_(z1 ~ trt, family = 'binomial', data_index = 1), 
  glm_(z2 ~ trt, family = 'gaussian', data_index = 1), 
  mmrm_(x ~ trt + us(visit | pid), reml = TRUE, data_index = 2), 
  gee_(x ~ trt, family = 'gaussian', corstr = 'independence', data_index = 2), 
  data = list(dat1, dat2))

fit


bfit <-jointCovariance(
  coxph_(Surv(time=y, event=death) ~ trt, data_index = 1),
  logrank_(Surv(time=y, event=death) ~ trt, data_index = 1),
  glm_(z1 ~ trt, family = 'binomial', data_index = 1),
  glm_(z2 ~ trt, family = 'gaussian', data_index = 1),
  mmrm_(x ~ trt + us(visit | pid), reml = TRUE, data_index = 2),
  gee_(x ~ trt, family = 'gaussian', corstr = 'independence', data_index = 2),
  data = list(dat1, dat2), nboot = 10)

summary(bfit)

## km_() and quantile_() require nboot > 0 because they have no
## closed-form score. compute_cov is forced to FALSE for km_().
## When all models share one dataset, `data_index` and `list(...)`
## can be omitted.
kfit <- jointCovariance(
  km_(Surv(time = y, event = death) ~ trt, conf_type = 'log',
      times = c(0.5, 1, 1.5)),
  glm_(z1 ~ trt, family = 'binomial'),
  data = dat1, nboot = 30, seed = 1)

qfit <- jointCovariance(
  quantile_(y ~ trt, probs = c(0.25, 0.5, 0.75)),
  glm_(z2 ~ trt, family = 'gaussian'),
  data = dat1, nboot = 30, seed = 1)



Creating Objects of Kaplan-Meier Curve

Description

km_ is a wrapper function creating an object of Kaplan-Meier curve to be passed into jointCovariance, the main function of this package through its argument .... The object defines how a Kaplan-Meier curve would be fitted.

Usage

km_(formula, times = NULL, conf_type, data_index = 1)

Arguments

formula

a formula created by survival::Surv().

times

numeric vector of time. Survival probabilities at times are computed (with g-transformation defined by conf_type).

conf_type

character. Type of confidence interval. It must be one of "log", "log-log", "plain", "logit", or "arcsin".

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used when fitting a generalized linear model.

Details

Usually, g-transformation is applied to the survival probability S(t) to obtain pointwise confidence interval of a Kaplan-Meier curve. This can be achieved by specifying conf_type. For identity transformation, use conf_type = "plain".

This function can only be used with jointCovariance when the bootstrap method is used to estimate variance-covariance matrix of multiple outcome models.


Creating Objects of Logrank Test

Description

logrank_ is a wrapper function survival::coxph to create an object to be passed into jointCovariance, the main function of this package through its argument .... Logrank test is the score test under the proportional hazards regression model. The object defines how a logrank test would be computed.

Usage

logrank_(formula, ties = c("efron", "breslow", "exact"), data_index = 1)

Arguments

formula

see formula in survival::coxph.

ties

character string specifying the method for tie handling. One of "efron" (default), "breslow", or "exact". Passed through to survival::coxph.

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used when computing testing statistic of logrank test.

Details

Not all arguments of survival::coxph are supported in logrank_ due to the complexity in handling environment and scope, which is particularly difficult for arguments like weights, subset, etc.


Creating Objects of Mixed Models for Repeated Measures

Description

mmrm_ is a wrapper function of mmrm::mmrm to create an object to be passed into jointCovariance, the main function of this package through its argument .... The object defines how a MMRM model would be fitted.

Usage

mmrm_(
  formula,
  covariance = NULL,
  reml = TRUE,
  control = mmrm::mmrm_control(...),
  ...,
  data_index = 1
)

Arguments

formula

see formula in mmrm::mmrm.

covariance

see covariance in mmrm::mmrm.

reml

see reml in mmrm::mmrm.

control

see control in mmrm::mmrm.

...

see ... in mmrm::mmrm.

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used when fitting a GEE model.

Details

The argument weights of mmrm::mmrm is supported in mmrm_ due to the complexity in handling environment and scope.

Please always refer to help document of mmrm::mmrm before using mmrm_. For example, time variable and observation ID must be factor variables in some cases, otherwise error may be prompted. Users can call mmrm::mmrm using the same arguments being passed to mmrm_ to check validity.


Endpoint Constructors for netbenefit_()

Description

These helpers build the individual endpoint specifications that netbenefit_() consumes through its endpoints argument. Each call returns a small list describing one comparison rule; the order in which they appear in the endpoints list defines the hierarchical priority (first = highest).

Usage

nb_tte(
  time,
  event,
  direction = c("longer_better", "shorter_better"),
  margin = 0,
  censor_rule = c("informative", "ignore")
)

nb_continuous(
  value,
  direction = c("larger_better", "smaller_better"),
  margin = 0
)

nb_binary(value, direction = c("larger_better", "smaller_better"))

Arguments

time

character. Name of the time column in data.

event

character. Name of the event-indicator column in data (1 = event observed, 0 = censored).

direction

one of "longer_better" / "shorter_better" (nb_tte()) or "larger_better" / "smaller_better" (nb_continuous(), nb_binary()).

margin

non-negative numeric. Minimum directional gap required to declare a winner on this endpoint. Defaults to 0.

censor_rule

(nb_tte() only) one of "informative" / "ignore".

value

character. Name of the outcome column in data.

Value

An object of class c("nb_endpoint_<type>", "nb_endpoint") carrying the rule and its arguments.

Direction

For each endpoint type, direction says whether a larger value (or longer time, for TTE) counts as a win for the subject. With the default convention, a treatment subject "wins" the endpoint when its value, possibly minus a margin, exceeds the control subject's value. Flip the default for endpoints where smaller (or shorter) is better (e.g., pain scores; time to symptom relief).

Margin

margin is a non-negative clinically meaningful difference threshold. A pair only declares a winner when the directional gap strictly exceeds margin; otherwise the pair ties at this endpoint and falls through to the next one. nb_binary() does not expose margin because binary values admit only two outcomes.

Censoring

nb_tte() accepts a censor_rule argument. The default "informative" matches the conventional Pocock/Buyse rule: if one subject is censored and the other observed, a winner can only be declared in the direction the censoring is uninformative about (i.e., the censored time is already past the event time). "ignore" drops pairs where either subject is censored and treats them as ties at this level.

See Also

netbenefit_()


Creating Objects of Hierarchical Net-Benefit (Win-Difference) Statistic

Description

netbenefit_ creates an object to be passed into jointCovariance or pated through its ... argument. The object defines a hierarchical net-benefit (a.k.a. win-difference, proportion-in-favor) statistic across a list of endpoints, each declared by nb_tte(), nb_continuous(), or nb_binary().

Usage

netbenefit_(formula, endpoints, data_index = 1)

Arguments

formula

a two-sided R formula ⁠<label> ~ arm⁠. The LHS is used purely as a row label in pated() output; the RHS must contain exactly one variable, the column in data holding treatment-arm assignment.

endpoints

a non-empty list of endpoint specs built by nb_tte(), nb_continuous(), or nb_binary(). The order of the list encodes the hierarchical priority: endpoint 1 is checked first; if it ties, the pair falls through to endpoint 2; and so on.

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used.

Details

The estimator is

\widehat\Delta = \frac{N_W - N_L}{N_W + N_L + N_T}

where N_W, N_L, and N_T are the numbers of treatment wins, losses, and overall ties across all n_C \times n_T pairs (control vs. treatment subject). The per-subject influence function is available in closed form, so both the asymptotic and the bootstrap paths of jointCovariance are supported.

The arm reference level is inferred from the arm column the same way model.matrix(~ arm) would: levels(arm)[1] for factor, the smaller value for numeric or logical, and the alphabetically first value for character. To override, convert the column to a factor with the desired level order before calling netbenefit_().

Value

An object of class c("jc_spec_netbenefit", "jc_spec").

See Also

nb_tte(), nb_continuous(), nb_binary(), jointCovariance(), pated().


OAK Phase III Trial of Atezolizumab vs Docetaxel in NSCLC

Description

Patient-level data (primary intention-to-treat population, n = 850) from OAK, an international, open-label, randomized Phase III trial (NCT02008227) comparing the anti-PD-L1 antibody atezolizumab (MPDL3280A, 1200 mg IV every 3 weeks) with docetaxel (75 mg/m^2 IV every 3 weeks) as second- or third-line therapy in patients with previously treated locally advanced or metastatic non-small-cell lung cancer (NSCLC). Patients were randomized 1:1 to the two arms.

The variables retained here are those released in Supplementary Table 8 of Gandara et al. (2018), Nature Medicine, who retrospectively assayed banked baseline plasma samples for blood-based tumor mutational burden (bTMB) and reported the biomarker-evaluable analyses on this primary ITT cohort. The shipped copy applies the following purely cosmetic transformations to the source: source variable names lower-cased; categorical values lower-cased; the string "." (the source's missing code) converted to NA; numeric-looking character columns coerced to numeric; and the event indicators inverted so that 1 = event, 0 = censored (the source uses the opposite convention). Categorical variables are stored as character vectors (not factors); convert to factor on demand with factor() if a modeling function requires it. No rows are dropped: the full primary ITT population is included.

Authoritative reference. This documentation is a convenience summary; users should consult the original publication and its supplementary materials (Gandara et al. 2018; Rittmeyer et al. 2017) for the definitive description of the trial design, eligibility criteria, endpoint definitions, and assay protocol.

Format

A data frame with 850 observations on 27 variables:

id

Anonymized patient identifier; integer 301–1150. (identifier)

arm

Randomized treatment assignment, character: "docetaxel" or "atezolizumab". (treatment assignment)

pfs

Progression-free survival, months (numeric). (post-randomization outcome)

pfs_event

PFS event indicator, integer: 1 = progression or death, 0 = censored. Inverted from the source's PFS.CNSR. (post-randomization outcome)

os

Overall survival, months (numeric). (post-randomization outcome)

os_event

OS event indicator, integer: 1 = death, 0 = censored. Inverted from the source's OS.CNSR. (post-randomization outcome)

bcor

Best confirmed overall response per investigator (RECIST 1.1), character: "CR", "PR", "SD", "PD", "NE"; NA when not assessable. (post-randomization outcome)

btmb

Blood-based tumor mutational burden: count of somatic base substitutions on a 394-gene, \sim1.1 Mb panel of cell-free DNA from the baseline plasma sample (numeric); NA when the assay was not run or failed QC. (baseline characteristic; predictive of atezolizumab benefit at btmb \ge 16; not strongly prognostic in itself)

msaf

Maximum somatic allele frequency in the baseline cfDNA sample (numeric, 0–1); a tumor-content / shedding surrogate. (baseline characteristic; weakly prognostic)

cfdna_ng

cfDNA mass entering NGS library construction (nanograms, numeric, capped at 100 ng). (baseline characteristic; not prognostic — assay input)

coverage

Median exon sequencing coverage (numeric). (baseline characteristic; not prognostic — assay quality)

qc

bTMB assay QC status, character: "pass", "fail"; NA when not run. (baseline characteristic; not prognostic — assay quality)

bep

Logical flag for the biomarker-evaluable population used in the bTMB analyses: coverage >= 800 & qc == "pass" & msaf >= 0.01 (n = 642 of 850). (baseline derived flag; not prognostic in itself)

age

Baseline age, years (numeric); range 33–85. (baseline characteristic; modestly prognostic)

sex

Sex, character: "F", "M". (baseline characteristic; weakly prognostic; correlates with smoking and bTMB)

race

Race, character: "white", "asian", "other". (baseline characteristic; not robustly prognostic in this setting)

histology

Tumor histology, character: "non-squamous", "squamous". (baseline characteristic; modestly prognostic)

ecog

ECOG performance status at baseline, integer (0 or 1). (baseline characteristic; strongly prognostic for PFS and OS)

prior_tx

Number of prior lines of systemic therapy, integer (1 or 2). (baseline characteristic; prognostic — patients with 2 prior lines tend to do worse)

smoking

Smoking history, character: "never", "previous", "current". (baseline characteristic; modestly prognostic and also a correlate of IO benefit through bTMB)

sld

Baseline sum of longest diameters of target lesions per RECIST 1.1, mm (numeric). (baseline characteristic; strongly prognostic — tumor burden)

metsites

Number of metastatic sites at enrollment, integer. (baseline characteristic; strongly prognostic — disease burden)

kras

KRAS mutation status from CRF, character: "negative", "positive"; NA when not tested. (baseline characteristic; weakly / inconsistently prognostic; possible predictive signal for IO)

egfr

EGFR mutation status from CRF, character: "negative", "positive"; NA when not tested. (baseline characteristic; weakly prognostic in IO-treated populations; strongly predictive of differential treatment effect — EGFR-positive patients typically derive less benefit from anti-PD-L1)

eml4

EML4-ALK rearrangement status from CRF, character: "negative", "positive"; NA when not tested. (baseline characteristic; analogous to egfr in interpretation — predictive of reduced IO benefit)

tc1ic1

PD-L1 IHC (Ventana SP142) dichotomized at the TC1IC1 cut-point, character: "low" = TC0 and IC0; "high" = TC1/2/3 or IC1/2/3; NA when unknown. (baseline characteristic; predictive of atezolizumab benefit; weakly prognostic)

tc3ic3

PD-L1 IHC dichotomized at the TC3IC3 cut-point, character: "low" = TC0/1/2 and IC0/1/2; "high" = TC3 or IC3; NA when unknown. (baseline characteristic; predictive of atezolizumab benefit; weakly prognostic)

Details

OAK's trial-level eligibility criteria (locally advanced or metastatic NSCLC after platinum failure, ECOG 0–1, measurable disease per RECIST 1.1, no prior immune-checkpoint therapy, adequate organ function, etc.) were applied before randomization, so the n = 850 primary ITT cohort is randomized 1:1. The bTMB assay was performed retrospectively on banked baseline plasma; all assay-derived variables therefore reflect quantities present before treatment exposure but were generated post-randomization in calendar time. EGFR, EML4-ALK, and PD-L1 IHC are recorded at screening and can be used to subset without compromising randomization. The published bTMB analyses (PFS and OS hazard ratios at the btmb >= 16 cut-point) used the subset bep & !(egfr %in% "positive") & !(eml4 %in% "positive") (n = 583). OAK published a primary ITT of 850 patients and a secondary ITT of 1225 patients; only the primary ITT is included here. Variant-level information (per-mutation chromosome, position, gene, protein change, etc.) was released alongside this table but is not included here.

Variable role labels

Each variable in the format section above carries a role tag in italic parentheses at the end of its description. The tags combine two dimensions:

Variables that are outcomes, identifiers, or treatment assignment carry a single role tag in place of the two-dimensional tag.

Pre- vs post-randomization variables

For users planning to apply PATED or any analysis that relies on baseline covariates being balanced across arms:

Pre-randomization (collected at screening; safe to condition on):

age, sex, race, histology, ecog, prior_tx, smoking, sld, metsites, kras, egfr, eml4, tc1ic1, tc3ic3.

Pre-treatment but derived post-randomization (measured on the baseline blood draw, before first dose; empirically balanced across arms per the source paper's Supp. Tables 4 and 5):

btmb, msaf, cfdna_ng, coverage, qc, bep.

Post-randomization outcomes (must not be used as covariates):

pfs, pfs_event, os, os_event, bcor.

Conditioning on the biomarker-evaluable population (bep == TRUE, n = 642) or further restricting to !(egfr %in% "positive") & !(eml4 %in% "positive") (the paper's primary bTMB analysis subgroup, n = 583) is a selection on pre-treatment variables and does not break randomization in expectation; Supp. Tables 4 and 5 of the source paper confirm empirical arm-balance (no nominally significant imbalance on age, race, ECOG, histology, prior lines, KRAS, or PD-L1 IHC; the btmb >= 16 subgroup is enriched for males and for smokers, but those are biomarker–covariate associations, not arm–covariate associations). Users should still verify arm-balance on their working subset (e.g. with tableone) before invoking methods that assume it.

Source

Built from Supplementary Table 8 (Excel file MOESM3) of: Gandara DR, Paul SM, Kowanetz M, Schleifman E, Zou W, Li Y, Rittmeyer A, Fehrenbacher L, Otto G, Malboeuf C, Lieber DS, Lipson D, Silterra J, Amler L, Riehl T, Cummings CA, Hegde PS, Sandler A, Ballinger M, Fabrizio D, Mok T, Shames DS. Blood-based tumor mutational burden as a predictor of clinical benefit in non-small-cell lung cancer patients treated with atezolizumab. Nature Medicine 2018; 24(9):1441–1448. doi:10.1038/s41591-018-0134-3. Supplementary file downloaded from https://www.nature.com/articles/s41591-018-0134-3. See data-raw/poplar_oak.R in the package source for the exact cleaning steps. Redistribution in this package is for non-commercial research and teaching use; users intending other use should verify the licensing terms of the Springer Nature supplementary materials. Always cite the original publication when using this dataset.

References

Rittmeyer A, Barlesi F, Waterkamp D, Park K, Ciardiello F, von Pawel J, Gadgeel SM, Hida T, Kowalski DM, Dols MC, Cortinovis DL, Leach J, Polikoff J, Barrios C, Kabbinavar F, Frontera OA, De Marinis F, Turna H, Lee JS, Ballinger M, Kowanetz M, He P, Chen DS, Sandler A, Gandara DR; OAK Study Group. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a Phase 3, open-label, multicentre randomised controlled trial. Lancet 2017; 389(10066):255–265. doi:10.1016/S0140-6736(16)32517-X.

Examples

data(oak)
table(oak$arm, useNA = "ifany")
summary(oak$btmb)
# Primary bTMB analysis subset (paper's n = 583); %in% keeps NA rows
# (untested patients) since they are not confirmed positive.
oak_bep <- subset(oak, bep & !(egfr %in% "positive") & !(eml4 %in% "positive"))


Prognostic Variables Assisted Treatment Effect Detection

Description

pated is a wrapper function of jointCovariance for testing treatment effect in randomized clinical trials. It assumes that prognostic variables are fully randomized. This assumption can help enhancing statistical power of conventional approaches in detecting the treatment effect. Specifically, the sensitivity of the conventional models specified in ... are improved by pated.

Usage

pated(
  ...,
  data,
  nboot = 0,
  compute_cov = TRUE,
  seed = NULL,
  transform = "identity"
)

Arguments

...

model specifications built by glm_(), coxph_(), logrank_(), gee_(), mmrm_(), km_(), quantile_(), netbenefit_(), or winratio_(). The first specification is the primary outcome whose treatment effect is being tested; the rest are prognostic covariates used to tighten the SE.

data

either a single data frame (when all models are fitted on the same dataset) or a list of data frames (one entry per data_index). Each data frame must have a pid column carrying subject identifiers; records with the same pid across different data frames refer to the same subject.

nboot

non-zero integer if bootstrap is adopted. By default 0.

compute_cov

logic. If TRUE, empirical covariance matrix is computed using bootstrap estimate and returned. Bootstrap estimate will be abandoned. If FALSE, bootstrap estimate will be returned and no empirical covariance matrix is computed.

seed

random seed when generate bootstrap data.

transform

character. Now only supports "identity".

Value

a data frame of testing results.

Examples

## More examples can be found in the vignettes.
library(survival)
library(mvtnorm)

genData <- function(seed = NULL){
  set.seed(seed)
  n <- 300
  sigma <- matrix(c(1, .6, .6, 1), 2)
  x <- rmvnorm(n, sigma = sigma)
  z1 <- rbinom(n, 1, .6)
  z2 <- rnorm(n)
  trt <- rbinom(n, 1, .5)

  bet <- c(-.2, .2)
  y <- -.5 + x %*% bet + z1 * .3 - z2 * .1 + .1 * trt - .1 * rnorm(n)
  death <- rbinom(n, 1, .8)
  data.frame(
    y = as.numeric(y), trt = trt,
    z1 = z1, z2 = z2,
    x1 = x[, 1], x2 = x[, 2],
    death, pid = paste0('s-', seq_len(n))
  )
}

dat <- genData(seed = 31415926)

## `data_index` defaults to 1 in every spec constructor and a single
## data.frame is auto-wrapped into a list, so neither needs spelling out
## when all models are fitted on the same dataset.
fit <-
  pated(
    coxph_(Surv(time = y, event = death) ~ trt),
    glm_(z1 ~ trt, family = 'binomial'),
    glm_(z2 ~ trt, family = 'gaussian'),
    glm_(x1 ~ trt, family = 'gaussian'),
    glm_(x2 ~ trt, family = 'gaussian'),
    data = dat
  )

fit


Plot PATED Analysis Results

Description

Plot PATED Analysis Results

Usage

## S3 method for class 'pated'
plot(x, ...)

Arguments

x

an object returned from pated().

...

currently not supported.

Value

NULL


POPLAR Phase II Trial of Atezolizumab vs Docetaxel in NSCLC

Description

Patient-level data (full intention-to-treat population, n = 287) from POPLAR, a multi-center, open-label, randomized Phase II trial (NCT01903993) comparing the anti-PD-L1 antibody atezolizumab (MPDL3280A, 1200 mg IV every 3 weeks) with docetaxel (75 mg/m^2 IV every 3 weeks) as second- or third-line therapy in patients with previously treated locally advanced or metastatic non-small-cell lung cancer (NSCLC). Patients were randomized 1:1 to the two arms.

The variables retained here are those released in Supplementary Table 8 of Gandara et al. (2018), Nature Medicine, who retrospectively assayed banked baseline plasma samples for blood-based tumor mutational burden (bTMB). The shipped copy applies the following purely cosmetic transformations to the source: source variable names lower-cased; categorical values lower-cased; the string "." (the source's missing code) converted to NA; numeric-looking character columns coerced to numeric; and the event indicators inverted so that 1 = event, 0 = censored (the source uses the opposite convention). Categorical variables are stored as character vectors (not factors); convert to factor on demand with factor() if a modeling function requires it. No rows are dropped: the full ITT population is included.

Authoritative reference. This documentation is a convenience summary; users should consult the original publication and its supplementary materials (Gandara et al. 2018; Fehrenbacher et al. 2016) for the definitive description of the trial design, eligibility criteria, endpoint definitions, and assay protocol.

Format

A data frame with 287 observations on 25 variables:

id

Anonymized patient identifier; integer 1–287. (identifier)

arm

Randomized treatment assignment, character: "docetaxel" or "atezolizumab". (treatment assignment)

pfs

Progression-free survival, months (numeric). (post-randomization outcome)

pfs_event

PFS event indicator, integer: 1 = progression or death, 0 = censored. Inverted from the source's PFS.CNSR. (post-randomization outcome)

os

Overall survival, months (numeric). (post-randomization outcome)

os_event

OS event indicator, integer: 1 = death, 0 = censored. Inverted from the source's OS.CNSR. (post-randomization outcome)

bcor

Best confirmed overall response per investigator (RECIST 1.1), character: "CR", "PR", "SD", "PD", "NE"; NA when not assessable. (post-randomization outcome)

btmb

Blood-based tumor mutational burden: count of somatic base substitutions on a 394-gene, \sim1.1 Mb panel of cell-free DNA from the baseline plasma sample (numeric); NA when the assay was not run or failed QC. (baseline characteristic; predictive of atezolizumab benefit at btmb \ge 16; not strongly prognostic in itself)

msaf

Maximum somatic allele frequency in the baseline cfDNA sample (numeric, 0–1); a tumor-content / shedding surrogate. (baseline characteristic; weakly prognostic)

cfdna_ng

cfDNA mass entering NGS library construction (nanograms, numeric, capped at 100 ng). (baseline characteristic; not prognostic — assay input)

coverage

Median exon sequencing coverage (numeric). (baseline characteristic; not prognostic — assay quality)

qc

bTMB assay QC status, character: "pass", "fail"; NA when not run. (baseline characteristic; not prognostic — assay quality)

bep

Logical flag for the biomarker-evaluable population used in the bTMB analyses: coverage >= 800 & qc == "pass" & msaf >= 0.01 (n = 211 of 287). (baseline derived flag; not prognostic in itself)

age

Baseline age, years (numeric); range 36–84. (baseline characteristic; modestly prognostic)

sex

Sex, character: "F", "M". (baseline characteristic; weakly prognostic; correlates with smoking and bTMB)

race

Race, character: "white", "asian", "other". (baseline characteristic; not robustly prognostic in this setting)

histology

Tumor histology, character: "non-squamous", "squamous". (baseline characteristic; modestly prognostic)

ecog

ECOG performance status at baseline, integer (0 or 1). (baseline characteristic; strongly prognostic for PFS and OS)

prior_tx

Number of prior lines of systemic therapy, integer (1 or 2). (baseline characteristic; prognostic — patients with 2 prior lines tend to do worse)

smoking

Smoking history, character: "never", "previous", "current". (baseline characteristic; modestly prognostic and also a correlate of IO benefit through bTMB)

sld

Baseline sum of longest diameters of target lesions per RECIST 1.1, mm (numeric). (baseline characteristic; strongly prognostic — tumor burden)

metsites

Number of metastatic sites at enrollment, integer. (baseline characteristic; strongly prognostic — disease burden)

kras

KRAS mutation status from CRF, character: "negative", "positive"; NA when not tested. (baseline characteristic; weakly / inconsistently prognostic; possible predictive signal for IO)

egfr

EGFR mutation status from CRF, character: "negative", "positive"; NA when not tested. A single T790M patient is coded "positive". (baseline characteristic; weakly prognostic in IO-treated populations; strongly predictive of differential treatment effect — EGFR-positive patients typically derive less benefit from anti-PD-L1)

eml4

EML4-ALK rearrangement status from CRF, character: "negative", "positive"; NA when not tested. (baseline characteristic; analogous to egfr in interpretation — predictive of reduced IO benefit)

Details

POPLAR's trial-level eligibility criteria (locally advanced or metastatic NSCLC after platinum failure, ECOG 0–1, measurable disease, no prior anti-PD-L1/PD-1 therapy, adequate organ function, etc.) were applied before randomization, so the n = 287 ITT cohort is randomized 1:1. The bTMB assay was performed retrospectively on banked baseline plasma; all assay-derived variables therefore reflect quantities present before treatment exposure, but were generated post-randomization in calendar time. EGFR and EML4-ALK mutation status are CRF variables collected at screening and can be used to subset without compromising randomization. Variant-level information (per-mutation chromosome, position, gene, protein change, etc.) was released alongside this table but is not included here; consult the source supplementary file if needed.

Variable role labels

Each variable in the format section above carries a role tag in italic parentheses at the end of its description. The tags combine two dimensions:

Variables that are outcomes, identifiers, or treatment assignment carry a single role tag in place of the two-dimensional tag.

Pre- vs post-randomization variables

For users planning to apply PATED or any analysis that relies on baseline covariates being balanced across arms:

Pre-randomization (collected at screening; safe to condition on):

age, sex, race, histology, ecog, prior_tx, smoking, sld, metsites, kras, egfr, eml4.

Pre-treatment but derived post-randomization (measured on the baseline blood draw, before first dose; empirically balanced across arms per the source paper's Supp. Tables 3 and 5):

btmb, msaf, cfdna_ng, coverage, qc, bep.

Post-randomization outcomes (must not be used as covariates):

pfs, pfs_event, os, os_event, bcor.

Conditioning on the biomarker-evaluable population (bep == TRUE) or further restricting to !(egfr %in% "positive") & !(eml4 %in% "positive") is a selection on pre-treatment variables and does not break randomization in expectation; Supp. Tables 3 and 5 of the source paper confirm empirical arm-balance. Users should still verify arm-balance on their working subset (e.g. with tableone) before invoking methods that assume it.

Source

Built from Supplementary Table 8 (Excel file MOESM3) of: Gandara DR, Paul SM, Kowanetz M, Schleifman E, Zou W, Li Y, Rittmeyer A, Fehrenbacher L, Otto G, Malboeuf C, Lieber DS, Lipson D, Silterra J, Amler L, Riehl T, Cummings CA, Hegde PS, Sandler A, Ballinger M, Fabrizio D, Mok T, Shames DS. Blood-based tumor mutational burden as a predictor of clinical benefit in non-small-cell lung cancer patients treated with atezolizumab. Nature Medicine 2018; 24(9):1441–1448. doi:10.1038/s41591-018-0134-3. Supplementary file downloaded from https://www.nature.com/articles/s41591-018-0134-3. See data-raw/poplar_oak.R in the package source for the exact cleaning steps. Redistribution in this package is for non-commercial research and teaching use; users intending other use should verify the licensing terms of the Springer Nature supplementary materials. Always cite the original publication when using this dataset.

References

Fehrenbacher L, Spira A, Ballinger M, Kowanetz M, Vansteenkiste J, Mazieres J, Park K, Smith D, Artal-Cortes A, Lewanski C, Braiteh F, Waterkamp D, He P, Zou W, Chen DS, Yi J, Sandler A, Rittmeyer A; POPLAR Study Group. Atezolizumab versus docetaxel for patients with previously treated non-small-cell lung cancer (POPLAR): a multicentre, open-label, Phase 2 randomised controlled trial. Lancet 2016; 387(10030):1837–1846. doi:10.1016/S0140-6736(16)00587-0.

Examples

data(poplar)
table(poplar$arm, useNA = "ifany")
summary(poplar$btmb)
# Biomarker-evaluable subset used in the published bTMB analyses
poplar_bep <- subset(poplar, bep)


Title Summarize an Analysis of Multiple Outcomes.

Description

Summarize an analysis of multiple outcomes.

Usage

## S3 method for class 'summary.jointCovariance'
print(x, ...)

Arguments

x

an object returned by jointCovariance().

...

for debugging only.

Value

an invisible object.

Examples

## no example

Creating Objects of Group Quantile Differences

Description

quantile_ is a wrapper function creating an object that, for each requested probability, computes the difference between the two arms' sample quantiles of the outcome. The object is passed to jointCovariance or pated through .... Because the empirical-quantile estimator has no tractable closed-form score, this engine is available only through the bootstrap path (nboot > 0).

Usage

quantile_(formula, probs = c(0.25, 0.5, 0.75), data_index = 1)

Arguments

formula

a two-sided formula y ~ arm where the right-hand side is a binary grouping variable.

probs

numeric vector of probabilities in ⁠(0, 1)⁠. By default the first quartile, median, and third quartile.

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used.


Generating Data for Simulation and Testing

Description

simulateMoData generates data for simulation and testing purposes.

Usage

simulateMoData(n = 500, hr = 0.8, seed = NULL)

Arguments

n

an integer for total sample size of a randomized control trial of two arms.

hr

hazard ratio of treatment.

seed

random seed. By default NULL for no seed being specified.


Object Summaries

Description

summary method for class jointCovariance.

Usage

## S3 method for class 'jointCovariance'
summary(object, model_index = NULL, ...)

Arguments

object

an object returned by jointCovariance().

model_index

NULL if displaying summary of all fitted models; otherwise, an integer indicating the fitted model.

...

for debugging only

Value

a list


Calculate Variance-Covariance Matrix for a Fitted Model Object

Description

Returns the variance-covariance matrix of the main parameters of fitted model objects. The "main" parameters of models correspond to those returned by coef.

Usage

## S3 method for class 'jointCovariance'
vcov(object, model_index = NULL, ...)

Arguments

object

an object returned by jointCovariance().

model_index

NULL if displaying covariance matrix of all fitted models; otherwise, an integer indicating the fitted model.

...

for debugging only

Value

a matrix of covariance of all estimates


Creating Objects of Hierarchical Log Win-Ratio Statistic

Description

winratio_ creates an object to be passed into jointCovariance or pated through its ... argument. The object defines a hierarchical win-ratio statistic across a list of endpoints, each declared by nb_tte(), nb_continuous(), or nb_binary().

Usage

winratio_(formula, endpoints, data_index = 1)

Arguments

formula

a two-sided R formula ⁠<label> ~ arm⁠. The LHS is used purely as a row label in pated() output; the RHS must contain exactly one variable, the column in data holding treatment-arm assignment.

endpoints

a non-empty list of endpoint specs built by nb_tte(), nb_continuous(), or nb_binary(). The order of the list encodes the hierarchical priority: endpoint 1 is checked first; if it ties, the pair falls through to endpoint 2; and so on.

data_index

integer. Index of the data frame in the data argument of jointCovariance to be used.

Details

The fitted coefficient is on the log scale:

\widehat\theta = \log(\widehat{WR}) = \log(\widehat\pi_W / \widehat\pi_L)

where \widehat\pi_W and \widehat\pi_L are the proportions of all control-treatment pairs where the treatment subject wins or loses, respectively. Tied pairs remain in the denominator of both proportions but do not contribute to either numerator.

The raw win ratio can be recovered by exponentiating the coefficient. The log scale is used because it gives the standard large-sample Wald representation and maps the null win ratio of 1 to 0.

The arm reference level is inferred from the arm column the same way model.matrix(~ arm) would: levels(arm)[1] for factor, the smaller value for numeric or logical, and the alphabetically first value for character. To override, convert the column to a factor with the desired level order before calling winratio_().

Value

An object of class c("jc_spec_winratio", "jc_spec").

See Also

nb_tte(), nb_continuous(), nb_binary(), jointCovariance(), pated().