Title: | Computation of Generalized Linear Models with Misclassified Covariates Using Side Information |
---|---|
Description: | Estimates models that extend the standard GLM to take misclassification into account. The models require side information from a secondary data set on the misclassification process, i.e. some sort of misclassification probabilities conditional on some common covariates. A detailed description of the algorithm can be found in Dlugosz, Mammen and Wilke (2015) <https://www.zew.de/publikationen/generalised-partially-linear-regression-with-misclassified-data-and-an-application-to-labour-market-transitions>. |
Authors: | Stephan Dlugosz |
Maintainer: | Stephan Dlugosz <[email protected]> |
License: | GPL-3 |
Version: | 0.3.5 |
Built: | 2024-11-23 04:01:38 UTC |
Source: | https://github.com/cran/misclassGLM |
misclassGLM
FitsObtain bootstrapped standard errors.
boot.misclassGLM(ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000)
boot.misclassGLM(ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000)
ret |
a fitted object of class inheriting from 'misclassGLM'. |
Y |
a vector of integers or numerics. This is the dependent variable. |
X |
a matrix containing the independent variables. |
Pmodel |
a fitted model (e.g. of class 'GLM' or 'mlogit') to implicitly produce variations of the predicted true values probabilities. (Usually conditional on the observed misclassified values and additional covariates.) |
PX |
covariates matrix suitable for predicting probabilities from |
boot.fraction |
fraction of sample to be used for estimating the bootstrapped standard errors, for speedup. |
repetitions |
number of bootstrap samples to be drown. |
misclassMlogit
FitsObtain bootstrapped standard errors.
boot.misclassMlogit( ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000 )
boot.misclassMlogit( ret, Y, X, Pmodel, PX, boot.fraction = 1, repetitions = 1000 )
ret |
a fitted object of class inheriting from 'misclassMlogit'. |
Y |
a matrix of 0s and 1s, indicating the target class. This is the dependent variable. |
X |
a matrix containing the independent variables. |
Pmodel |
a fitted model (e.g. of class 'GLM' or 'mlogit') to implicitly produce variations of the predicted true values probabilities. (Usually conditional on the observed misclassified values and additional covariates.) |
PX |
covariates matrix suitable for predicting probabilities from |
boot.fraction |
fraction of sample to be used for estimating the bootstrapped standard errors, for speedup. |
repetitions |
number of bootstrap samples to be drown. |
misclassGLM
FitsObtain marginal Effects.
mfx.misclassGLM(w, x.mean = TRUE, rev.dum = TRUE, digits = 3, ...)
mfx.misclassGLM(w, x.mean = TRUE, rev.dum = TRUE, digits = 3, ...)
w |
a fitted object of class inheriting from 'misclassGLM'. |
x.mean |
logical, if true computes marginal effects at mean, otherwise average marginal effects. |
rev.dum |
logical, if true, computes differential effects for switch from 0 to 1. |
digits |
number of digits to be presented in output. |
... |
further arguments passed to or from other functions. |
Obtain marginal effects.
mfx.misclassMlogit( w, x.mean = TRUE, rev.dum = TRUE, outcome = 2, baseoutcome = 1, digits = 3, ... )
mfx.misclassMlogit( w, x.mean = TRUE, rev.dum = TRUE, outcome = 2, baseoutcome = 1, digits = 3, ... )
w |
a fitted object of class inheriting from 'misclassMlogit'. |
x.mean |
logical, if true computes marginal effects at mean, otherwise average marginal effects. |
rev.dum |
logical, if true, computes differential effects for switch from 0 to 1. |
outcome |
for which the ME should be computed. |
baseoutcome |
base outcome, e.g. reference class of the model. |
digits |
number of digits to be presented in output. |
... |
further arguments passed to or from other functions. |
misclassGLM
computes estimator for a GLM with a misclassified covariate
using additional side information on the misclassification process
misclassGLM( Y, X, setM, P, na.action = na.omit, family = gaussian(link = "identity"), control = list(), par = NULL, x = FALSE, robust = FALSE )
misclassGLM( Y, X, setM, P, na.action = na.omit, family = gaussian(link = "identity"), control = list(), par = NULL, x = FALSE, robust = FALSE )
Y |
a vector of integers or numerics. This is the dependent variable. |
X |
a matrix containing the independent variables. |
setM |
(optional) matrix, rows containing potential patterns for a misclassified (latent) covariate M in any coding for a categorical independent variable, e.g. dummy coding (default: Identity). |
P |
probabilities corresponding to each of the potential pattern conditional on the other covariates denoted in x. |
na.action |
how to treat NAs |
family |
a description of the error distribution and link function to be used in the model.
This can be a character string naming a family function, a family function or the result
of a call to a family function. (See |
control |
options for the optimization procedure (see |
par |
(optional) starting parameter vector |
x |
logical, add covariates matrix to result? |
robust |
logical, if true the computed asymptotic standard errors are replaced by their robust counterparts. |
## simulate data data <- simulate_GLM_dataset() ## estimate model without misclassification error summary(lm(Y ~ X + M2, data)) ## estimate model with misclassification error summary(lm(Y ~ X + M, data)) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM est <- misclassGLM(Y = data$Y, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassGLM(est, Y = data$Y, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run)
## simulate data data <- simulate_GLM_dataset() ## estimate model without misclassification error summary(lm(Y ~ X + M2, data)) ## estimate model with misclassification error summary(lm(Y ~ X + M, data)) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM est <- misclassGLM(Y = data$Y, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassGLM(est, Y = data$Y, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run)
misclassMLogit
computes estimator for a GLM with a misclassified covariate
using additional side information on the misclassification process
misclassMlogit( Y, X, setM, P, na.action = na.omit, control = list(), par = NULL, baseoutcome = NULL, x = FALSE )
misclassMlogit( Y, X, setM, P, na.action = na.omit, control = list(), par = NULL, baseoutcome = NULL, x = FALSE )
Y |
a matrix of 0s and 1s, indicating the target class. This is the dependent variable. |
X |
a matrix containing the independent variables |
setM |
matrix, rows containing potential patterns for a misclassed (latent) covariate M in any coding for a categorical independent variable, e.g. dummy coding. |
P |
probabilities corresponding to each of the potential pattern conditional on the other covariates denoted in x. |
na.action |
how to treat NAs |
control |
options for the optimization procedure (see |
par |
(optional) starting parameter vector |
baseoutcome |
reference outcome class |
x |
logical, add covariates matrix to result? |
## simulate data data <- simulate_mlogit_dataset() ## estimate model without misclassification error library(mlogit) data2 <- mlogit.data(data, varying = NULL, choice = "Y", shape = "wide") summary(mlogit(Y ~ 1 | X + M2, data2, reflevel = "3")) ## estimate model with misclassification error summary(mlogit(Y ~ 1 | X + M, data2, reflevel = "3")) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM Yneu <- matrix(rep.int(0, nrow(data) * 3), ncol = 3) for (i in 1:nrow(data)) Yneu[i, data$Y[i]] <- 1 est <- misclassMlogit(Y = Yneu, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassMlogit(est, Y = Yneu, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run)
## simulate data data <- simulate_mlogit_dataset() ## estimate model without misclassification error library(mlogit) data2 <- mlogit.data(data, varying = NULL, choice = "Y", shape = "wide") summary(mlogit(Y ~ 1 | X + M2, data2, reflevel = "3")) ## estimate model with misclassification error summary(mlogit(Y ~ 1 | X + M, data2, reflevel = "3")) ## estimate misclassification probabilities Pmodel <- glm(M2 ~ M + X, data = data, family = binomial("logit")) summary(Pmodel) ## construct a-posteriori probabilities from Pmodel P <- predict(Pmodel, newdata = data, type = "response") P <- cbind(1 - P, P) dimnames(P)[[2]] <- c("M0", "M1") ## speaking names ## estimate misclassGLM Yneu <- matrix(rep.int(0, nrow(data) * 3), ncol = 3) for (i in 1:nrow(data)) Yneu[i, data$Y[i]] <- 1 est <- misclassMlogit(Y = Yneu, X = as.matrix(data[, 2, drop = FALSE]), setM = matrix(c(0, 1), nrow = 2), P = P) summary(est) ## and bootstrapping the results from dataset ## Not run: summary(boot.misclassMlogit(est, Y = Yneu, X = data.matrix(data[, 2, drop = FALSE]), Pmodel = Pmodel, PX = data, repetitions = 100)) ## End(Not run)
misclassGLM
FitsObtains predictions
## S3 method for class 'misclassGLM' ## S3 method for class 'misclassGLM' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...)
## S3 method for class 'misclassGLM' ## S3 method for class 'misclassGLM' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...)
object |
a fitted object of class inheriting from 'misclassGLM'. |
X |
matrix of fixed covariates |
P |
a-posteriori probabilities for the true values of the misclassified variable. If provided, the conditional expectation on X,P is computed, otherwise a set of marginal predictions is provided, one for each alternative. |
type |
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The value of this argument can be abbreviated. |
na.action |
function determining what should be done with missing values in |
... |
additional arguments (not used at the moment) |
misclassMlogit
FitsObtains predictions
## S3 method for class 'misclassMlogit' ## S3 method for class 'misclassMlogit' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...)
## S3 method for class 'misclassMlogit' ## S3 method for class 'misclassMlogit' predict(object, X, P = NULL, type = c("link", "response"), na.action = na.pass, ...)
object |
a fitted object of class inheriting from 'misclassMlogit'. |
X |
matrix of fixed covariates. |
P |
a-posteriori probabilities for the true values of the misclassified variable. If provided, the conditional expectation on X,P is computed, otherwise a set of marginal predictions is provided, one for each alternative. |
type |
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The value of this argument can be abbreviated. |
na.action |
function determining what should be done with missing values in |
... |
additional arguments (not used at the moment) |
misclassGLM
simulates a data set with - one continuous variable X drawn from a Gaussian distribution, - a binary or trinary variable M with misclassification (M2) - a dependent variable either with added Gaussian noise or drawn from a logit distribution
simulate_GLM_dataset( n = 50000, const = 0, alpha = 1, beta = -2, beta2 = NULL, logit = FALSE )
simulate_GLM_dataset( n = 50000, const = 0, alpha = 1, beta = -2, beta2 = NULL, logit = FALSE )
n |
number observations |
const |
constant |
alpha |
parameter for X |
beta |
parameter for M(1) |
beta2 |
parameter for M2, if NULL, M is a binary covariate, otherwise a three-valued categorical |
logit |
logical, if true logit regression, otherwise Gaussian regression |
This can be used to demonstrate the abilities of misclassGLM
. For an example
see misclassGLM
.
misclassMlogit
simulates a data set with - one continuous variable X drawn from a Gaussian distribution, - a binary or trinary variable M with misclassification (M2) - a dependent variable drawn from a multionomial distribution dependent on X and M.
simulate_mlogit_dataset( n = 1000, const = c(0, 0), alpha = c(1, 2), beta = -2 * c(1, 2), beta2 = NULL )
simulate_mlogit_dataset( n = 1000, const = c(0, 0), alpha = c(1, 2), beta = -2 * c(1, 2), beta2 = NULL )
n |
number observations |
const |
constants |
alpha |
parameters for X |
beta |
parameters for M(1) |
beta2 |
parameters for M2, if NULL, M is a binary covariate, otherwise a three-valued categorical. |
This can be used to demonstrate the abilities of misclassMlogit. For an example
see misclassMlogit
.