Title: | Imprecise Imputation for Statistical Matching |
---|---|
Description: | Imputing blockwise missing data by imprecise imputation, featuring a domain-based, variable-wise, and case-wise strategy. Furthermore, the estimation of lower and upper bounds for unconditional and conditional probabilities based on the obtained imprecise data is implemented. Additionally, two utility functions are supplied: one to check whether variables in a data set contain set-valued observations; and another to merge two already imprecisely imputed data. The method is described in a technical report by Endres, Fink and Augustin (2018, <doi:10.5282/ubm/epub.42423>). |
Authors: | Paul Fink [aut, cre], Eva Endres [aut], Melissa Schmoll [ctb] |
Maintainer: | Paul Fink <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.3.1 |
Built: | 2024-11-12 03:09:24 UTC |
Source: | https://github.com/cran/impimp |
Check whether the variables of a data frame contain imprecise observations
checkImprecision(data)
checkImprecision(data)
data |
data.frame to test to apply the check onto. |
A named logical vector of length ncol(data)
,
where TRUE
indicates that "|"
is present in the
values, which is used to indicate an imprecise observations.
This check is only reliabe for data
, inheriting
class "impimp"
. If data
does not inherit class
"impimp"
, the check is tried, but additionaly the
user is notified with a warning.
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "variable_wise") BimpA <- impimp(B, A, method = "variable_wise") AB <- rbindimpimp(AimpB, BimpA) checkImprecision(AB) data(iris) checkImprecision(iris) # emits a warning
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "variable_wise") BimpA <- impimp(B, A, method = "variable_wise") AB <- rbindimpimp(AimpB, BimpA) checkImprecision(AB) data(iris) checkImprecision(iris) # emits a warning
Generating a tuple representation of a data.frame with imprecise observations
generateTupelData(data, constraints = NULL)
generateTupelData(data, constraints = NULL)
data |
a data.frame object, with potentially imprecise entries; see 'Note'. |
constraints |
a list of so-called logical constraints or
fixed zeros. Each element must be an object of class
|
By specifying constraints
one can exlude combinations of
imputed values which are deemed impossible, so called
‘logical constraints’ or ‘fixed zeros’.
A list of length NROW(data)
of data.frames
for the observation within the original data.frame.
Each such data.frame contains the precise observations which are compatible with its imprecise representation.
No sanity check is performed on whether data
actually
contains imprecise observations or is in the form for denoting
imprecision throughoutly used in the impimp-package. A warning is
triggered if it is not of class "impimp"
.
impimp
, impimp_event
for
sepcifying the constraints
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "domain") ## no constraints generateTupelData(AimpB) ## (y1,z1) = (0,0) as constraint generateTupelData(AimpB, list(impimp_event(y1 = 0, z1 = 0))) data(iris) generateTupelData(iris) # emits a warning
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "domain") ## no constraints generateTupelData(AimpB) ## (y1,z1) = (0,0) as constraint generateTupelData(AimpB, list(impimp_event(y1 = 0, z1 = 0))) data(iris) generateTupelData(iris) # emits a warning
Estimate the probability of some events based on data obtained by imprecise imputation
impest(data, event, constraints = NULL)
impest(data, event, constraints = NULL)
data |
a data.frame obtained as result from an
imprecise imputation e.g. by a call to
|
event |
a list of objects of class |
constraints |
a list of so-called logical constraints or
fixed zeros. Each element must be an object of class
|
event
should be a list of objects of class
"impmp_event"
, where the set union of impimp_events is the
actual event of interest.
By specifying constraints
one can exlude combinations of
imputed values which are deemed impossible, so called
‘logical constraints’ or ‘fixed zeros’.
constraints
should be a list of objects of class
"impimp_event"
.
An object of class "impimp_event"
is obtained as a result
of a call to impimp_event
.
For both event
and constraints
holds that overlapping
in the resulting events generated by the individual impimp_events
does not have any side effects, besides a potential decrease
in performance.
A numeric vector of length 2, where the first component contains the lower and the second component the upper probability of the event of interest.
Endres, E., Fink, P. and Augustin, T. (2018), Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data, Department of Statistics (LMU Munich): Technical Reports, No. 214
impimp
, impimp_event
for
sepcifying constraints and events; impestcond
for
the estimation of conditional probabilities
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "variable_wise") BimpA <- impimp(B, A, method = "variable_wise") AB <- rbindimpimp(AimpB, BimpA) ## P(Z1=1, Z2=0) myevent1 <- list(impimp_event(z1 = 1, z2 = 0)) impest(AB, event = myevent1) ## P[(Z1,Z2) in {(1,0),(0,1),(1,1)}] myevent2 <- list(impimp_event(z1 = 1,z2 = 0), impimp_event(z1 = c(0,1), z2 = 1)) impest(AB, event = myevent2)
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "variable_wise") BimpA <- impimp(B, A, method = "variable_wise") AB <- rbindimpimp(AimpB, BimpA) ## P(Z1=1, Z2=0) myevent1 <- list(impimp_event(z1 = 1, z2 = 0)) impest(AB, event = myevent1) ## P[(Z1,Z2) in {(1,0),(0,1),(1,1)}] myevent2 <- list(impimp_event(z1 = 1,z2 = 0), impimp_event(z1 = c(0,1), z2 = 1)) impest(AB, event = myevent2)
Estimate conditional probability of some events based on data obtained by imprecise imputation
impestcond(data, event, condition, constraints = NULL)
impestcond(data, event, condition, constraints = NULL)
data |
a data.frame obtained as result from an
imprecise imputation e.g. by a call to
|
event |
a list of objects of class |
condition |
a list of objects of class |
constraints |
a list of so-called logical constraints or
fixed zeros. Each element must be an object of class
|
event
and condition
should each be a list of objects
of class "impmp_event"
, where within each list the set union
of impimp_events is the actual event of interest or conditioning
event, respectively.
By specifying constraints
one can exlude combinations of
imputed values which are deemed impossible, so called
‘logical constraints’ or ‘fixed zeros’.
constraints
should be a list of objects of class
"impimp_event"
.
An object of class "impimp_event"
is obtained as a result
of a call to impimp_event
.
For event
, condition
and constraints
holds
that overlapping in the resulting events generated by the
individual impimp_events does not have any side effects, besides
a potential decrease in performance.
A numeric vector of length 2, where the first component contains the lower and the second component the upper conditional probability of the event of interest.
Dubois, D. and Prade, H. (1992), Evidence, knowledge, and belief functions, International Journal of Approximate Reasoning 6(3), 295–319.
impimp
, impimp_event
for
sepcifying constraints and events; impest
for
the estimation of unconditional probabilities
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "domain") BimpA <- impimp(B, A, method = "domain") AB <- rbindimpimp(AimpB, BimpA) myevent <- list(impimp_event(z1 = 1,z2 = 0), impimp_event(z1 = c(0,1), z2 = 1)) cond <- list(impimp_event(x1 = 1)) impestcond(AB, event = myevent, condition = cond) constr <- list(impimp_event(y1 = 0, z1 = 0)) impestcond(AB, event = myevent, condition = cond, constraints = constr)
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "domain") BimpA <- impimp(B, A, method = "domain") AB <- rbindimpimp(AimpB, BimpA) myevent <- list(impimp_event(z1 = 1,z2 = 0), impimp_event(z1 = c(0,1), z2 = 1)) cond <- list(impimp_event(x1 = 1)) impestcond(AB, event = myevent, condition = cond) constr <- list(impimp_event(y1 = 0, z1 = 0)) impestcond(AB, event = myevent, condition = cond, constraints = constr)
Impute a data frame imprecisely
impimp(recipient, donor, method = c("variable_wise", "case_wise", "domain"), matchvars = NULL, vardomains = NULL) ## S3 method for class 'impimp' print(x, ...) is.impimp(z)
impimp(recipient, donor, method = c("variable_wise", "case_wise", "domain"), matchvars = NULL, vardomains = NULL) ## S3 method for class 'impimp' print(x, ...) is.impimp(z)
recipient |
a data.frame acting as recipient; see details. |
donor |
a data.frame acting as donor; see details. |
method |
1-character string of the desired imputation method.
The following values are possible, see details for an explanantion:
|
matchvars |
a character vector containing the variable names
to be used as matching variables. If |
vardomains |
a named list containing the possible values of
all variable in |
x |
object of class 'impimp' |
... |
further arguments passed down to
|
z |
object to test for class |
As in the context of statistical matching the data.frames
recipient
and donor
are assumed to contain an
overlapping set of variables.
The missing values in recipient
are subsituted with
observed values in donor
for approaches based on donation
classes and otherwise with the set of all possible values for
the variable in question.
For method = "domain"
a missing value of a variable in
recipient
is imputed by the set of all possible values
of that variable.
The other methods are based on donation classes which are formed
based on the matching variables whose names are provided by
matchvars
. They need to be present in both recipient
and donor
:
For method = "variable_wise"
a missing value of a variable
in recipient
is imputed by the set of all observed values
of that variable in donor
.
For method = "case_wise"
the variables only present in
donor
are represented as tuples. A missing tuple in
recipient
is then imputed by the set of all observed
tuples in donor
.
The data.frame resulting in an imprecise imputation
of donor
into recipient
.
It is also of class "impimp"
and stores the imputation
method in its attribute "impmethod"
, the names of the
variables of the resulting object containing imputed values
in the attribute "imputedvarnames"
, as well as the
list of (guessed) levels of each underlying variable in
"varlevels"
.
The variable names and observations in recipient
and
donor
must not contain characters that are reserved for
internal purpose.
The actual characters that are internally used are stored in the
options options("impimp.obssep")
and
options("impimp.varssep")
. The former is used to separate
the values of a set-valued observation, while the other is used
for a concise tupel representation.
This method does not require that all variables in recipient
and donor
are factor
variables, however,
the imputation methods apply coercion to factor, so purely
numerical variables will be treated as factors eventually.
It does assume (and test for it) that there are no missing
values present in the matching variables.
Endres, E., Fink, P. and Augustin, T. (2018), Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data, Department of Statistics (LMU Munich): Technical Reports, No. 214. URL https://epub.ub.uni-muenchen.de/42423/.
for the estimation of probabilities impest
and impestcond
; rbindimpimp
for
joining two impimp
objects
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) impimp(A, B, method = "variable_wise") ## Specifically setting the possible levels of 'z1' impimp(A, B, method = "domain", vardomains = list(z1 = c(0:5)))
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) impimp(A, B, method = "variable_wise") ## Specifically setting the possible levels of 'z1' impimp(A, B, method = "domain", vardomains = list(z1 = c(0:5)))
Helper function to allow the generation of a set of events as cartesian product.
impimp_event(..., isEventList = FALSE) is.impimp_event(x)
impimp_event(..., isEventList = FALSE) is.impimp_event(x)
... |
these arguments are of the form |
isEventList |
logical; if |
x |
object to test for class |
A object of class "impimp_event"
as a list of lists,
where each sublist contains one point in the cartesian product,
spanned by the input values and variables.
There is no plausibility check on whether the supplied varnames are actually contained in the data.frame for which the resulting impimp_event object is later used for.
## underlying data set: x1: 1:6, x2: 1:10 ## subspace, requiring: x1 == 1 & ((x2 == 1 ) | (x2 == 2)) impimp_event(x1 = 1, x2 = c(1,2)) ## subsapce containing all points whitin the Cartesian ## product of (x1 =) {1,2,3,6} x {5,8} (= x2) # via ... argument impimp_event(x1 = c(1:3,6), x2 = c(5,8)) # via EVENTLIST impimp_event(list(x1 = c(1:3,6), x2 = c(5,8)), isEventList = TRUE)
## underlying data set: x1: 1:6, x2: 1:10 ## subspace, requiring: x1 == 1 & ((x2 == 1 ) | (x2 == 2)) impimp_event(x1 = 1, x2 = c(1,2)) ## subsapce containing all points whitin the Cartesian ## product of (x1 =) {1,2,3,6} x {5,8} (= x2) # via ... argument impimp_event(x1 = c(1:3,6), x2 = c(5,8)) # via EVENTLIST impimp_event(list(x1 = c(1:3,6), x2 = c(5,8)), isEventList = TRUE)
Combine two object of class "impimp"
like rbind
would do with data frames.
rbindimpimp(x, y)
rbindimpimp(x, y)
x , y
|
objects of class |
The resulting object is constructed in such a way that minimizes the creation of 'tupled' variables. Only those variables are joined as tuples which are actually necessary to keep the data frame like consise representation of impimp objects.
The attributes "impmethod"
and "varlevels"
contain
the set union of those of x
and y
on a global and
per underlying variable basis, respectively.
An object of class "impimp"
, inheriting the
attributes, specific to imimp objects, of x
and y
.
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) impA <- impimp(A, B, method = "case_wise") impB <- impimp(B, A, method = "case_wise") rbindimpimp(impA, impB)
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) impA <- impimp(A, B, method = "case_wise") impB <- impimp(B, A, method = "case_wise") rbindimpimp(impA, impB)