| Title: | Computation of Generalized Linear Models with Misclassified Covariates Using Side Information |
|---|---|
| Description: | Estimates models that extend the standard GLM to take misclassification into account. The models require side information from a secondary data set on the misclassification process, i.e. some sort of misclassification probabilities conditional on some common covariates. A detailed description of the algorithm can be found in Dlugosz, Mammen and Wilke (2015) <https://ftp.zew.de/pub/zew-docs/dp/dp15043.pdf>. |
| Authors: | Stephan Dlugosz [aut, cre] |
| Maintainer: | Stephan Dlugosz <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3.6 |
| Built: | 2026-05-31 08:30:41 UTC |
| Source: | https://github.com/cran/misclassGLM |
misclassGLM FitsObtain bootstrapped standard errors.
Obtain bootstrapped standard errors.
boot.misclassGLM(ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000) boot.misclassGLM(ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000)boot.misclassGLM(ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000) boot.misclassGLM(ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000)
ret |
a fitted object of class inheriting from 'misclassGLM'. |
Y |
a vector of integers or numerics. This is the dependent variable. |
X |
a matrix containing the independent variables. |
Pmodel |
a fitted model (e.g. of class 'GLM' or 'mlogit') to implicitly produce variations of the predicted true values probabilities. (Usually conditional on the observed misclassified values and additional covariates.) |
PX |
covariates matrix suitable for predicting probabilities from |
boot.fraction |
fraction of sample to be used for estimating the bootstrapped standard errors, for speedup. |
repetitions |
number of bootstrap samples to be drown. |
misclassMlogit FitsObtain bootstrapped standard errors.
boot.misclassMlogit( ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000 )boot.misclassMlogit( ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000 )
ret |
a fitted object of class inheriting from 'misclassMlogit'. |
Y |
a matrix of 0s and 1s, indicating the target class. This is the dependent variable. |
X |
a matrix containing the independent variables. |
Pmodel |
a fitted model (e.g. of class 'GLM' or 'mlogit') to implicitly produce variations of the predicted true values probabilities. (Usually conditional on the observed misclassified values and additional covariates.) |
PX |
covariates matrix suitable for predicting probabilities from |
boot.fraction |
fraction of sample to be used for estimating the bootstrapped standard errors, for speedup. |
repetitions |
number of bootstrap samples to be drown. |
misclassGLM FitsObtain marginal Effects.
Obtain marginal Effects.
mfx.misclassGLM(w, x.mean = TRUE, rev.dum = TRUE, digits = 3, ...) mfx.misclassGLM(w, x.mean = TRUE, rev.dum = TRUE, digits = 3, ...)mfx.misclassGLM(w, x.mean = TRUE, rev.dum = TRUE, digits = 3, ...) mfx.misclassGLM(w, x.mean = TRUE, rev.dum = TRUE, digits = 3, ...)
w |
a fitted object of class inheriting from 'misclassGLM'. |
x.mean |
logical, if true computes marginal effects at mean, otherwise average marginal effects. |
rev.dum |
logical, if true, computes differential effects for switch from 0 to 1. |
digits |
number of digits to be presented in output. |
... |
further arguments passed to or from other functions. |
Obtain marginal effects.
mfx.misclassMlogit( w, x.mean = TRUE, rev.dum = TRUE, outcome = 2, baseoutcome = 1, digits = 3, ... )mfx.misclassMlogit( w, x.mean = TRUE, rev.dum = TRUE, outcome = 2, baseoutcome = 1, digits = 3, ... )
w |
a fitted object of class inheriting from 'misclassMlogit'. |
x.mean |
logical, if true computes marginal effects at mean, otherwise average marginal effects. |
rev.dum |
logical, if true, computes differential effects for switch from 0 to 1. |
outcome |
for which the ME should be computed. |
baseoutcome |
base outcome, e.g. reference class of the model. |
digits |
number of digits to be presented in output. |
... |
further arguments passed to or from other functions. |
misclassGLM computes estimator for a GLM with a misclassified covariate
using additional side information on the misclassification process
misclassGLM computes estimator for a GLM with a misclassified covariate
using additional side information on the misclassification process
misclassGLM( Y, X, setM, P, na.action = na.omit, family = gaussian(link = "identity"), control = list(), par = NULL, x = FALSE, robust = FALSE ) misclassGLM( Y, X, setM, P, na.action = na.omit, family = gaussian(link = "identity"), control = list(), par = NULL, x = FALSE, robust = FALSE )misclassGLM( Y, X, setM, P, na.action = na.omit, family = gaussian(link = "identity"), control = list(), par = NULL, x = FALSE, robust = FALSE ) misclassGLM( Y, X, setM, P, na.action = na.omit, family = gaussian(link = "identity"), control = list(), par = NULL, x = FALSE, robust = FALSE )
Y |
a vector of integers or numerics. This is the dependent variable. |
X |
a matrix containing the independent variables. |
setM |
(optional) matrix, rows containing potential patterns for a misclassified (latent) covariate M in any coding for a categorical independent variable, e.g. dummy coding (default: Identity). |
P |
probabilities corresponding to each of the potential pattern conditional on the other covariates denoted in x. |
na.action |
how to treat NAs |
family |
a description of the error distribution and link function to be used in the model.
This can be a character string naming a family function, a family function or the result
of a call to a family function. (See |
control |
options for the optimization procedure (see |
par |
(optional) starting parameter vector |
x |
logical, add covariates matrix to result? |
robust |
logical, if true the computed asymptotic standard errors are replaced by their robust counterparts. |
## simulate data data <- simulate_GLM_dataset() ## estimate model without misclassification error summary(lm(Y ~ X + M2, data)) ## estimate model with misclassification error summary(lm(Y ~ X + M, data)) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM est <- misclassGLM(Y = data$Y, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassGLM(est, Y = data$Y, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run) ## simulate data data <- simulate_GLM_dataset() ## estimate model without misclassification error summary(lm(Y ~ X + M2, data)) ## estimate model with misclassification error summary(lm(Y ~ X + M, data)) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM est <- misclassGLM(Y = data$Y, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassGLM(est, Y = data$Y, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run)## simulate data data <- simulate_GLM_dataset() ## estimate model without misclassification error summary(lm(Y ~ X + M2, data)) ## estimate model with misclassification error summary(lm(Y ~ X + M, data)) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM est <- misclassGLM(Y = data$Y, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassGLM(est, Y = data$Y, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run) ## simulate data data <- simulate_GLM_dataset() ## estimate model without misclassification error summary(lm(Y ~ X + M2, data)) ## estimate model with misclassification error summary(lm(Y ~ X + M, data)) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM est <- misclassGLM(Y = data$Y, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassGLM(est, Y = data$Y, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run)
misclassMLogit computes estimator for a GLM with a misclassified covariate
using additional side information on the misclassification process
misclassMlogit( Y, X, setM, P, na.action = na.omit, control = list(), par = NULL, baseoutcome = NULL, x = FALSE )misclassMlogit( Y, X, setM, P, na.action = na.omit, control = list(), par = NULL, baseoutcome = NULL, x = FALSE )
Y |
a matrix of 0s and 1s, indicating the target class. This is the dependent variable. |
X |
a matrix containing the independent variables |
setM |
matrix, rows containing potential patterns for a misclassed (latent) covariate M in any coding for a categorical independent variable, e.g. dummy coding. |
P |
probabilities corresponding to each of the potential pattern conditional on the other covariates denoted in x. |
na.action |
how to treat NAs |
control |
options for the optimization procedure (see |
par |
(optional) starting parameter vector |
baseoutcome |
reference outcome class |
x |
logical, add covariates matrix to result? |
## simulate data data <- simulate_mlogit_dataset() ## estimate model without misclassification error library(mlogit) data2 <- mlogit.data(data, varying = NULL, choice = "Y", shape = "wide") summary(mlogit(Y ~ 1 | X + M2, data2, reflevel = "3")) ## estimate model with misclassification error summary(mlogit(Y ~ 1 | X + M, data2, reflevel = "3")) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM Yneu <- matrix(rep.int(0, nrow(data) * 3), ncol = 3) for (i in 1:nrow(data)) Yneu[i, data$Y[i]] <- 1 est <- misclassMlogit(Y = Yneu, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassMlogit(est, Y = Yneu, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run)## simulate data data <- simulate_mlogit_dataset() ## estimate model without misclassification error library(mlogit) data2 <- mlogit.data(data, varying = NULL, choice = "Y", shape = "wide") summary(mlogit(Y ~ 1 | X + M2, data2, reflevel = "3")) ## estimate model with misclassification error summary(mlogit(Y ~ 1 | X + M, data2, reflevel = "3")) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM Yneu <- matrix(rep.int(0, nrow(data) * 3), ncol = 3) for (i in 1:nrow(data)) Yneu[i, data$Y[i]] <- 1 est <- misclassMlogit(Y = Yneu, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassMlogit(est, Y = Yneu, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run)
misclassGLM FitsObtains predictions
Obtains predictions
## S3 method for class 'misclassGLM' ## S3 method for class 'misclassGLM' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...) ## S3 method for class 'misclassGLM' ## S3 method for class 'misclassGLM' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...)## S3 method for class 'misclassGLM' ## S3 method for class 'misclassGLM' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...) ## S3 method for class 'misclassGLM' ## S3 method for class 'misclassGLM' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...)
object |
a fitted object of class inheriting from 'misclassGLM'. |
X |
matrix of fixed covariates |
P |
a-posteriori probabilities for the true values of the misclassified variable. If provided, the conditional expectation on X,P is computed, otherwise a set of marginal predictions is provided, one for each alternative. |
type |
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The value of this argument can be abbreviated. |
na.action |
function determining what should be done with missing values in |
... |
additional arguments (not used at the moment) |
misclassMlogit FitsObtains predictions
## S3 method for class 'misclassMlogit' ## S3 method for class 'misclassMlogit' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...)## S3 method for class 'misclassMlogit' ## S3 method for class 'misclassMlogit' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...)
object |
a fitted object of class inheriting from 'misclassMlogit'. |
X |
matrix of fixed covariates. |
P |
a-posteriori probabilities for the true values of the misclassified variable. If provided, the conditional expectation on X,P is computed, otherwise a set of marginal predictions is provided, one for each alternative. |
type |
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The value of this argument can be abbreviated. |
na.action |
function determining what should be done with missing values in |
... |
additional arguments (not used at the moment) |
misclassGLM
simulates a data set with - one continuous variable X drawn from a Gaussian distribution, - a binary or trinary variable M with misclassification (M2) - a dependent variable either with added Gaussian noise or drawn from a logit distribution
simulates a data set with - one continuous variable X drawn from a Gaussian distribution, - a binary or trinary variable M with misclassification (M2) - a dependent variable either with added Gaussian noise or drawn from a logit distribution
simulate_GLM_dataset( n = 50000, const = 0, alpha = 1, beta = -2, beta2 = NULL, logit = FALSE ) simulate_GLM_dataset( n = 50000, const = 0, alpha = 1, beta = -2, beta2 = NULL, logit = FALSE )simulate_GLM_dataset( n = 50000, const = 0, alpha = 1, beta = -2, beta2 = NULL, logit = FALSE ) simulate_GLM_dataset( n = 50000, const = 0, alpha = 1, beta = -2, beta2 = NULL, logit = FALSE )
n |
number observations |
const |
constant |
alpha |
parameter for X |
beta |
parameter for M(1) |
beta2 |
parameter for M2, if NULL, M is a binary covariate, otherwise a three-valued categorical |
logit |
logical, if true logit regression, otherwise Gaussian regression |
This can be used to demonstrate the abilities of misclassGLM. For an example
see misclassGLM.
This can be used to demonstrate the abilities of misclassGLM. For an example
see misclassGLM.
misclassMlogit
simulates a data set with - one continuous variable X drawn from a Gaussian distribution, - a binary or trinary variable M with misclassification (M2) - a dependent variable drawn from a multionomial distribution dependent on X and M.
simulate_mlogit_dataset( n = 1000, const = c(0, 0), alpha = c(1, 2), beta = -2 * c(1, 2), beta2 = NULL )simulate_mlogit_dataset( n = 1000, const = c(0, 0), alpha = c(1, 2), beta = -2 * c(1, 2), beta2 = NULL )
n |
number observations |
const |
constants |
alpha |
parameters for X |
beta |
parameters for M(1) |
beta2 |
parameters for M2, if NULL, M is a binary covariate, otherwise a three-valued categorical. |
This can be used to demonstrate the abilities of misclassMlogit. For an example
see misclassMlogit.