Title: | A Causality-Informed Modeling Approach |
---|---|
Description: | A system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) <doi:10.1177/003591576505800503>. This system includes the addition of formal attributes that modify base `R` objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) <doi:10.2202/1557-4679.1203>. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele and Robins (2007) <doi:10.1097/EDE.0b013e318127181b>. The essential goal is to help contextualize formulas and models in causality-oriented workflows. |
Authors: | Anish S. Shah [aut, cre, cph] |
Maintainer: | Anish S. Shah <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0.9000 |
Built: | 2024-10-25 04:02:35 UTC |
Source: | https://github.com/shah-in-boots/rmdl |
These related functions are intended to analyze a single data vector (e.g. column from a dataset) and help predict its classification, or other relevant attributes. These are simple yet opionated convenience functions.
number_of_missing(x) is_dichotomous(x)
number_of_missing(x) is_dichotomous(x)
x |
A vector of any of the atomic types (see [ |
The functions that are currently supported are:
number_of_missing()
returns the number of missing values in a vector
is_dichotomous()
returns TRUE if the vector is dichotomous, FALSE otherwise
Returns a single value determined by the individual functions
tm
vectorDescribe attributes of a tm
vector
describe(x, property)
describe(x, property)
x |
A vector |
property |
A character vector of the following attributes of a |
A list of term = property
pairs, where the term is the name of the
element (e.g. could be the ‘role’ of the term).
f <- .o(output) ~ .x(input) + .m(mediator) + random t <- tm(f) describe(t, "role")
f <- .o(output) ~ .x(input) + .m(mediator) + random t <- tm(f) describe(t, "role")
dplyr
for tm
classThe filter()
function extension subsets tm
that satisfy set conditions.
To be retained, the tm
object must produce a value of TRUE
for all conditions.
Note that when a condition evaluates to NA
, the row will be dropped, unlike
base subsetting with [
.
## S3 method for class 'tm' filter(.data, ...)
## S3 method for class 'tm' filter(.data, ...)
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< |
An object of the same type as .data
. The output as the following properties:
tm
objects are a subset of the input, but appear in the same order
Underlying data.frame
columns are not modified
Underlying data.frame
object's attributes are preserved
dplyr::filter()
for examples of generic implementation
When using categorical interaction terms in a mdl_tbl
object, estimates
on interaction terms and their confidence intervals can be evaluated. The
effect of interaction on the estimates is based on the levels of interaction
term. The estimates and intervals can be derived through the
estimate_interaction()
function. The approach is based on the method
described by Figueiras et al. (1998).
estimate_interaction(object, exposure, interaction, conf_level = 0.95, ...)
estimate_interaction(object, exposure, interaction, conf_level = 0.95, ...)
object |
A |
exposure |
The exposure variable in the model |
interaction |
The interaction variable in the model |
conf_level |
The confidence level for the confidence interval |
... |
Arguments to be passed to or from other methods |
The estimate_interaction()
requires a mdl_tbl
object that is a
single row in length. Filtering the mdl_tbl
should occur prior to
passing it to this function. Additionally, this function assumes the
interaction term is binary. If it is categorical, the current
recommendation is to use dummy variables for the corresponding levels prior
to modeling.
A data.frame
with n = levels(interaction)
rows (for the
presence or absence of the interaction term) and n = 5
columns:
estimate: beta coefficient for the interaction effect based on level
conf_low: lower bound of confidence interval for the estimate
conf_high: higher bound of confidence interval for the estimate
p_value: p-value for the overall interaction effect across levels
nobs: number of observations within the interaction level
level: level of the interaction term
A. Figueiras, J. M. Domenech-Massons, and Carmen Cadarso, 'Regression models: calculating the confidence intervals of effects in the presence of interactions', Statistics in Medicine, 17, 2099-2105 (1998)
This function defines a modified formula
class that has been
vectorized. The fmls
serves as a set of instructions or a script for the
formula and its tm. It expands upon the functionality of formulas,
allowing for additional descriptions and relationships to exist between the
tm.
fmls( x = unspecified(), pattern = c("direct", "sequential", "parallel", "fundamental"), ... ) is_fmls(x) key_terms(x)
fmls( x = unspecified(), pattern = c("direct", "sequential", "parallel", "fundamental"), ... ) is_fmls(x) key_terms(x)
x |
Objects of the following types can be used as inputs
|
pattern |
A
|
... |
Arguments to be passed to or from other methods |
This is not meant to supersede a stats::formula()
object, but provide a
series of relationships that can be helpful in causal modeling. All fmls
can be converted to a traditional formula
with ease. The base for this
object is built on the tm()
object.
An object of class fmls
The expansion pattern allows for instructions on how the covariates should be included in different formulas. Below, assuming that x1, x2, and x3 are covariates...
Direct:
Seqential:
Parallel:
Specific roles the variable plays within the formula. These are of particular importance, as they serve as special terms that can effect how a formula is interpreted.
Role | Shortcut | Description |
outcome | .o(...) |
outcome ~ exposure |
exposure | .x(...) |
outcome ~ exposure |
predictor | .p(...) |
outcome ~ exposure + predictor |
confounder | .c(...) |
outcome + exposure ~ confounder |
mediator | .m(...) |
outcome mediator exposure |
interaction | .i(...) |
outcome ~ exposure * interaction |
strata | .s(...) |
outcome ~ exposure / strata |
group | .g(...) |
outcome ~ exposure + group |
unknown | - |
not yet assigned |
Formulas can be condensed by applying their specific role to individual runes
as a function/wrapper. For example, y ~ .x(x1) + x2 + x3
. This would
signify that x1
has the specific role of an exposure.
Grouped variables are slightly different in that they are placed together in
a hierarchy or tier. To indicate the group and the tier, the shortcut can
have an integer
following the .g
. If no number is given, then it is
assumed they are all on the same tier. Ex: y ~ x1 + .g1(x2) + .g1(x3)
Warning: Only a single shortcut can be applied to a variable within a formula directly.
For a single argument, e.g. for the tm.formula()
method, such as to
identify variable X as an exposure, a formula
should be given with the
term of interest on the LHS, and the description or instruction on the
RHS. This would look like role = "exposure" ~ X
.
For the arguments that would be dispatched for objects that are plural, e.g.
containing multiple terms, each formula()
should be placed within a
list()
. For example, the role argument would be written:
role = list(X ~ "exposure", M ~ "mediator", C ~ "confounder")
Further implementation details can be seen in the implementation of
labeled_formulas_to_named_list()
.
Tools for working with formula-like objects
lhs(x, ...) rhs(x, ...) ## S3 method for class 'formula' rhs(x, ...) ## S3 method for class 'formula' lhs(x, ...)
lhs(x, ...) rhs(x, ...) ## S3 method for class 'formula' rhs(x, ...) ## S3 method for class 'formula' lhs(x, ...)
x |
A formula-like object |
... |
Arguments to be passed to or from other methods |
A character
describing part of a formula
or fmls
object
Take list of formulas, or a similar construct, and returns a named list. The convention here is similar to reading from left to right, where the name or position is the term is the on the LHS and the output label or target instruction is on the RHS.
If no label is desired, then the LHS can be left empty, such as ~ x
.
labeled_formulas_to_named_list(x)
labeled_formulas_to_named_list(x)
x |
An argument that may represent a formula to label variables, or can
be converted to one. This includes, |
A named list with the index as a character
representing the term
or variable of interest, and the value at that position as a character
representing the label value.
The model_table()
or mdl_tbl()
function creates a mdl_tbl
object that
is composed of either fmls
objects or mdl
objects, which are
thin/informative wrappers for generic formulas and hypothesis-based models.
The mdl_tbl
is a data frame of model information, such as model fit,
parameter estimates, and summary statistics about a model, or a formula if it
has not yet been fit.
mdl_tbl(..., data = NULL) model_table(..., data = NULL) is_model_table(x)
mdl_tbl(..., data = NULL) model_table(..., data = NULL) is_model_table(x)
... |
Named or unnamed |
data |
A |
x |
A |
The table itself allows for ease of organization of model information and has three additional, major components (stored as scalar attributes).
A formula matrix that describes the terms used in each model, and how they are combined.
A term table that describes the terms and their properties and/or labels.
A list of datasets used for the analyses that can help support additional diagnostic testing.
We go into further detail in the sections below.
A mdl_tbl
object, which is essentially a data.frame
with
additional information on the relevant data, terms, and formulas used to
generate the models.
NA
NA
NA
These functions are used to help manage the mdl_tbl
object. They allow
for specific manipulation of the internal components, and are intended to
generally extend the functionality of the object.
attach_data()
: Attaches a dataset to a mdl_tbl
object
flatten_models()
: Flattens a mdl_tbl
object down to its specific parameters
attach_data(x, data, ...) flatten_models(x, exponentiate = FALSE, which = NULL, ...)
attach_data(x, data, ...) flatten_models(x, exponentiate = FALSE, which = NULL, ...)
x |
A |
data |
A |
... |
Arguments to be passed to or from other methods |
exponentiate |
A |
which |
A |
When using attach_data()
, this returns a modified version of the
mdl_tbl
object however with the dataset attached. When using the
flatten_models()
function, this returns a simplified data.frame
of the
original model table that contains the model-level and parameter-level
statistics.
When models are built, oftentimes the included matrix of data is available within the raw model, however when handling many models, this can be expensive in terms of memory and space. By attaching datasets independently that persist regardless of the underlying models, and by knowing which models used which datasets, it can be ease to back-transform information.
A mdl_tbl
object can be flattened to its specific parameters, their
estimates, and model-level summary statistics. This function additionally
helps by allowing for exponentiation of estimates when deemed appropriate.
The user can specify which models to exponentiate by name. This heavily
relies on the broom::tidy()
functionality.
mdl(x = unspecified(), ...) ## S3 method for class 'character' mdl( x, formulas, parameter_estimates = data.frame(), summary_info = list(), data_name, strata_variable = NA_character_, strata_level = NA_character_, ... ) ## S3 method for class 'lm' mdl( x = unspecified(), formulas = fmls(), data_name = character(), strata_variable = character(), strata_level = character(), ... ) ## S3 method for class 'glm' mdl( x = unspecified(), formulas = fmls(), data_name = character(), strata_variable = character(), strata_level = character(), ... ) ## S3 method for class 'coxph' mdl( x = unspecified(), formulas = fmls(), data_name = character(), strata_variable = character(), strata_level = character(), ... ) ## Default S3 method: mdl(x, ...) model(x = unspecified(), ...)
mdl(x = unspecified(), ...) ## S3 method for class 'character' mdl( x, formulas, parameter_estimates = data.frame(), summary_info = list(), data_name, strata_variable = NA_character_, strata_level = NA_character_, ... ) ## S3 method for class 'lm' mdl( x = unspecified(), formulas = fmls(), data_name = character(), strata_variable = character(), strata_level = character(), ... ) ## S3 method for class 'glm' mdl( x = unspecified(), formulas = fmls(), data_name = character(), strata_variable = character(), strata_level = character(), ... ) ## S3 method for class 'coxph' mdl( x = unspecified(), formulas = fmls(), data_name = character(), strata_variable = character(), strata_level = character(), ... ) ## Default S3 method: mdl(x, ...) model(x = unspecified(), ...)
x |
Model object or representation |
... |
Arguments to be passed to or from other methods |
formulas |
Formula(s) given as either an |
parameter_estimates |
A
|
summary_info |
A
|
data_name |
String representing name of dataset that was used |
strata_variable |
String of a term that served as a stratifying variable |
strata_level |
Value of the level of the term specified by
|
An object of the mdl
class, which is essentially an equal-length
list of parameters that describe a single model. It retains the original
formula call and the related roles in the formula.
The family of apply_*_pattern()
functions that are used to expand fmls
by specified patterns. These functions are not intended to be used directly
but as internal functions. They have been exposed to allow for potential
user-defined use cases.
apply_pattern(x, pattern) apply_fundamental_pattern(x) apply_direct_pattern(x) apply_sequential_pattern(x) apply_parallel_pattern(x) apply_rolling_interaction_pattern(x)
apply_pattern(x, pattern) apply_fundamental_pattern(x) apply_direct_pattern(x) apply_sequential_pattern(x) apply_parallel_pattern(x) apply_rolling_interaction_pattern(x)
x |
A |
pattern |
A character string that specifies the pattern to use |
Currently supported patterns are: fundamental, direct, sequential, parallel.
Returns a tbl_df
object that has special column names and rows.
Each row is essentially a precursor to a new formula.
These columns and rows must be present to be used with the fmls()
function, and generally are the expected result of the specified pattern.
They will undergo further internal modification prior to being turned into
a fmls
object, but this is an developer consideration. If developing a
pattern, please use this guide to ensure that the output is compatible with
the fmls()
function.
outcome: a single term that is the expected outcome variable
exposure: a single term that is the expected exposure variable, which may not be present in every row
covariate_*: the covariates expand based on the number that are present (e.g. "covariate_1", "covariate_2", etc)
tm(x = unspecified(), ...) ## S3 method for class 'character' tm( x, role = character(), side = character(), label = character(), group = integer(), type = character(), distribution = character(), description = character(), transformation = character(), ... ) ## S3 method for class 'formula' tm( x, role = formula(), label = formula(), group = formula(), type = formula(), distribution = formula(), description = formula(), transformation = formula(), ... ) ## S3 method for class 'fmls' tm(x, ...) ## S3 method for class 'tm' tm(x, ...) ## Default S3 method: tm(x = unspecified(), ...) is_tm(x)
tm(x = unspecified(), ...) ## S3 method for class 'character' tm( x, role = character(), side = character(), label = character(), group = integer(), type = character(), distribution = character(), description = character(), transformation = character(), ... ) ## S3 method for class 'formula' tm( x, role = formula(), label = formula(), group = formula(), type = formula(), distribution = formula(), description = formula(), transformation = formula(), ... ) ## S3 method for class 'fmls' tm(x, ...) ## S3 method for class 'tm' tm(x, ...) ## Default S3 method: tm(x = unspecified(), ...) is_tm(x)
x |
An object that can be coerced to a |
... |
Arguments to be passed to or from other methods |
role |
Specific roles the variable plays within the formula. These are of particular importance, as they serve as special terms that can effect how a formula is interpreted. Please see the Roles section below for further details. The options for roles are as below:
|
side |
Which side of a formula should the term be on. Options are
|
label |
Display-quality label describing the variable |
group |
Grouping variable name for modeling or placing terms together.
An integer value is given to identify which group the term will be in. The
hierarchy will be |
type |
Type of variable, either categorical (qualitative) or continuous (quantitative) |
distribution |
How the variable itself is more specifically subcategorized, e.g. ordinal, continuous, dichotomous, etc |
description |
Option for further descriptions or definitions needed for the tm, potentially part of a data dictionary |
transformation |
Modification of the term to be applied when combining with data |
A vectorized term object that allows for additional information to be carried with the variable name.
This is not meant to replace traditional stats::terms()
, but to supplement
it using additional information that is more informative for causal modeling.
A tm
object, which is a series of individual terms with
corresponding attributes, including the role, formula side, label,
grouping, and other related features.
Specific roles the variable plays within the formula. These are of particular importance, as they serve as special terms that can effect how a formula is interpreted.
Role | Shortcut | Description |
outcome | .o(...) |
outcome ~ exposure |
exposure | .x(...) |
outcome ~ exposure |
predictor | .p(...) |
outcome ~ exposure + predictor |
confounder | .c(...) |
outcome + exposure ~ confounder |
mediator | .m(...) |
outcome mediator exposure |
interaction | .i(...) |
outcome ~ exposure * interaction |
strata | .s(...) |
outcome ~ exposure / strata |
group | .g(...) |
outcome ~ exposure + group |
unknown | - |
not yet assigned |
Formulas can be condensed by applying their specific role to individual runes
as a function/wrapper. For example, y ~ .x(x1) + x2 + x3
. This would
signify that x1
has the specific role of an exposure.
Grouped variables are slightly different in that they are placed together in
a hierarchy or tier. To indicate the group and the tier, the shortcut can
have an integer
following the .g
. If no number is given, then it is
assumed they are all on the same tier. Ex: y ~ x1 + .g1(x2) + .g1(x3)
Warning: Only a single shortcut can be applied to a variable within a formula directly.
For a single argument, e.g. for the tm.formula()
method, such as to
identify variable X as an exposure, a formula
should be given with the
term of interest on the LHS, and the description or instruction on the
RHS. This would look like role = "exposure" ~ X
.
For the arguments that would be dispatched for objects that are plural, e.g.
containing multiple terms, each formula()
should be placed within a
list()
. For example, the role argument would be written:
role = list(X ~ "exposure", M ~ "mediator", C ~ "confounder")
Further implementation details can be seen in the implementation of
labeled_formulas_to_named_list()
.
tm
objectsThis updates properties or attributes of a tm
vector. This only updates
objects that already exist.
## S3 method for class 'tm' update(object, ...)
## S3 method for class 'tm' update(object, ...)
object |
A |
... |
A series of |
A tm
object with updated attributes