| Title: | Imprecise Imputation for Statistical Matching |
|---|---|
| Description: | Imputing blockwise missing data by imprecise imputation, featuring a domain-based, variable-wise, and case-wise strategy. Furthermore, the estimation of lower and upper bounds for unconditional and conditional probabilities based on the obtained imprecise data is implemented. Additionally, two utility functions are supplied: one to check whether variables in a data set contain set-valued observations; and another to merge two already imprecisely imputed data. The method is described in a technical report by Endres, Fink and Augustin (2018, <doi:10.5282/ubm/epub.42423>). |
| Authors: | Paul Fink [aut, cre], Eva Endres [aut], Melissa Schmoll [ctb] |
| Maintainer: | Paul Fink <[email protected]> |
| License: | GPL-2 | GPL-3 |
| Version: | 0.3.1 |
| Built: | 2026-05-27 09:18:23 UTC |
| Source: | https://github.com/cran/impimp |
Check whether the variables of a data frame contain imprecise observations
checkImprecision(data)checkImprecision(data)
data |
data.frame to test to apply the check onto. |
A named logical vector of length ncol(data),
where TRUE indicates that "|" is present in the
values, which is used to indicate an imprecise observations.
This check is only reliabe for data, inheriting
class "impimp". If data does not inherit class
"impimp", the check is tried, but additionaly the
user is notified with a warning.
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "variable_wise") BimpA <- impimp(B, A, method = "variable_wise") AB <- rbindimpimp(AimpB, BimpA) checkImprecision(AB) data(iris) checkImprecision(iris) # emits a warningA <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "variable_wise") BimpA <- impimp(B, A, method = "variable_wise") AB <- rbindimpimp(AimpB, BimpA) checkImprecision(AB) data(iris) checkImprecision(iris) # emits a warning
Generating a tuple representation of a data.frame with imprecise observations
generateTupelData(data, constraints = NULL)generateTupelData(data, constraints = NULL)
data |
a data.frame object, with potentially imprecise entries; see 'Note'. |
constraints |
a list of so-called logical constraints or
fixed zeros. Each element must be an object of class
|
By specifying constraints one can exlude combinations of
imputed values which are deemed impossible, so called
‘logical constraints’ or ‘fixed zeros’.
A list of length NROW(data) of data.frames
for the observation within the original data.frame.
Each such data.frame contains the precise observations which are compatible with its imprecise representation.
No sanity check is performed on whether data actually
contains imprecise observations or is in the form for denoting
imprecision throughoutly used in the impimp-package. A warning is
triggered if it is not of class "impimp".
impimp, impimp_event for
sepcifying the constraints
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "domain") ## no constraints generateTupelData(AimpB) ## (y1,z1) = (0,0) as constraint generateTupelData(AimpB, list(impimp_event(y1 = 0, z1 = 0))) data(iris) generateTupelData(iris) # emits a warningA <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "domain") ## no constraints generateTupelData(AimpB) ## (y1,z1) = (0,0) as constraint generateTupelData(AimpB, list(impimp_event(y1 = 0, z1 = 0))) data(iris) generateTupelData(iris) # emits a warning
Estimate the probability of some events based on data obtained by imprecise imputation
impest(data, event, constraints = NULL)impest(data, event, constraints = NULL)
data |
a data.frame obtained as result from an
imprecise imputation e.g. by a call to
|
event |
a list of objects of class |
constraints |
a list of so-called logical constraints or
fixed zeros. Each element must be an object of class
|
event should be a list of objects of class
"impmp_event", where the set union of impimp_events is the
actual event of interest.
By specifying constraints one can exlude combinations of
imputed values which are deemed impossible, so called
‘logical constraints’ or ‘fixed zeros’.
constraints should be a list of objects of class
"impimp_event".
An object of class "impimp_event" is obtained as a result
of a call to impimp_event.
For both event and constraints holds that overlapping
in the resulting events generated by the individual impimp_events
does not have any side effects, besides a potential decrease
in performance.
A numeric vector of length 2, where the first component contains the lower and the second component the upper probability of the event of interest.
Endres, E., Fink, P. and Augustin, T. (2018), Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data, Department of Statistics (LMU Munich): Technical Reports, No. 214
impimp, impimp_event for
sepcifying constraints and events; impestcond for
the estimation of conditional probabilities
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "variable_wise") BimpA <- impimp(B, A, method = "variable_wise") AB <- rbindimpimp(AimpB, BimpA) ## P(Z1=1, Z2=0) myevent1 <- list(impimp_event(z1 = 1, z2 = 0)) impest(AB, event = myevent1) ## P[(Z1,Z2) in {(1,0),(0,1),(1,1)}] myevent2 <- list(impimp_event(z1 = 1,z2 = 0), impimp_event(z1 = c(0,1), z2 = 1)) impest(AB, event = myevent2)A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "variable_wise") BimpA <- impimp(B, A, method = "variable_wise") AB <- rbindimpimp(AimpB, BimpA) ## P(Z1=1, Z2=0) myevent1 <- list(impimp_event(z1 = 1, z2 = 0)) impest(AB, event = myevent1) ## P[(Z1,Z2) in {(1,0),(0,1),(1,1)}] myevent2 <- list(impimp_event(z1 = 1,z2 = 0), impimp_event(z1 = c(0,1), z2 = 1)) impest(AB, event = myevent2)
Estimate conditional probability of some events based on data obtained by imprecise imputation
impestcond(data, event, condition, constraints = NULL)impestcond(data, event, condition, constraints = NULL)
data |
a data.frame obtained as result from an
imprecise imputation e.g. by a call to
|
event |
a list of objects of class |
condition |
a list of objects of class |
constraints |
a list of so-called logical constraints or
fixed zeros. Each element must be an object of class
|
event and condition should each be a list of objects
of class "impmp_event", where within each list the set union
of impimp_events is the actual event of interest or conditioning
event, respectively.
By specifying constraints one can exlude combinations of
imputed values which are deemed impossible, so called
‘logical constraints’ or ‘fixed zeros’.
constraints should be a list of objects of class
"impimp_event".
An object of class "impimp_event" is obtained as a result
of a call to impimp_event.
For event, condition and constraints holds
that overlapping in the resulting events generated by the
individual impimp_events does not have any side effects, besides
a potential decrease in performance.
A numeric vector of length 2, where the first component contains the lower and the second component the upper conditional probability of the event of interest.
Dubois, D. and Prade, H. (1992), Evidence, knowledge, and belief functions, International Journal of Approximate Reasoning 6(3), 295–319.
impimp, impimp_event for
sepcifying constraints and events; impest for
the estimation of unconditional probabilities
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "domain") BimpA <- impimp(B, A, method = "domain") AB <- rbindimpimp(AimpB, BimpA) myevent <- list(impimp_event(z1 = 1,z2 = 0), impimp_event(z1 = c(0,1), z2 = 1)) cond <- list(impimp_event(x1 = 1)) impestcond(AB, event = myevent, condition = cond) constr <- list(impimp_event(y1 = 0, z1 = 0)) impestcond(AB, event = myevent, condition = cond, constraints = constr)A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) AimpB <- impimp(A, B, method = "domain") BimpA <- impimp(B, A, method = "domain") AB <- rbindimpimp(AimpB, BimpA) myevent <- list(impimp_event(z1 = 1,z2 = 0), impimp_event(z1 = c(0,1), z2 = 1)) cond <- list(impimp_event(x1 = 1)) impestcond(AB, event = myevent, condition = cond) constr <- list(impimp_event(y1 = 0, z1 = 0)) impestcond(AB, event = myevent, condition = cond, constraints = constr)
Impute a data frame imprecisely
impimp(recipient, donor, method = c("variable_wise", "case_wise", "domain"), matchvars = NULL, vardomains = NULL) ## S3 method for class 'impimp' print(x, ...) is.impimp(z)impimp(recipient, donor, method = c("variable_wise", "case_wise", "domain"), matchvars = NULL, vardomains = NULL) ## S3 method for class 'impimp' print(x, ...) is.impimp(z)
recipient |
a data.frame acting as recipient; see details. |
donor |
a data.frame acting as donor; see details. |
method |
1-character string of the desired imputation method.
The following values are possible, see details for an explanantion:
|
matchvars |
a character vector containing the variable names
to be used as matching variables. If |
vardomains |
a named list containing the possible values of
all variable in |
x |
object of class 'impimp' |
... |
further arguments passed down to
|
z |
object to test for class |
As in the context of statistical matching the data.frames
recipient and donor are assumed to contain an
overlapping set of variables.
The missing values in recipient are subsituted with
observed values in donor for approaches based on donation
classes and otherwise with the set of all possible values for
the variable in question.
For method = "domain" a missing value of a variable in
recipient is imputed by the set of all possible values
of that variable.
The other methods are based on donation classes which are formed
based on the matching variables whose names are provided by
matchvars. They need to be present in both recipient
and donor:
For method = "variable_wise" a missing value of a variable
in recipient is imputed by the set of all observed values
of that variable in donor.
For method = "case_wise" the variables only present in
donor are represented as tuples. A missing tuple in
recipient is then imputed by the set of all observed
tuples in donor.
The data.frame resulting in an imprecise imputation
of donor into recipient.
It is also of class "impimp" and stores the imputation
method in its attribute "impmethod", the names of the
variables of the resulting object containing imputed values
in the attribute "imputedvarnames", as well as the
list of (guessed) levels of each underlying variable in
"varlevels".
The variable names and observations in recipient and
donor must not contain characters that are reserved for
internal purpose.
The actual characters that are internally used are stored in the
options options("impimp.obssep") and
options("impimp.varssep"). The former is used to separate
the values of a set-valued observation, while the other is used
for a concise tupel representation.
This method does not require that all variables in recipient
and donor are factor variables, however,
the imputation methods apply coercion to factor, so purely
numerical variables will be treated as factors eventually.
It does assume (and test for it) that there are no missing
values present in the matching variables.
Endres, E., Fink, P. and Augustin, T. (2018), Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data, Department of Statistics (LMU Munich): Technical Reports, No. 214. URL https://epub.ub.uni-muenchen.de/42423/.
for the estimation of probabilities impest
and impestcond; rbindimpimp for
joining two impimp objects
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) impimp(A, B, method = "variable_wise") ## Specifically setting the possible levels of 'z1' impimp(A, B, method = "domain", vardomains = list(z1 = c(0:5)))A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) impimp(A, B, method = "variable_wise") ## Specifically setting the possible levels of 'z1' impimp(A, B, method = "domain", vardomains = list(z1 = c(0:5)))
Helper function to allow the generation of a set of events as cartesian product.
impimp_event(..., isEventList = FALSE) is.impimp_event(x)impimp_event(..., isEventList = FALSE) is.impimp_event(x)
... |
these arguments are of the form |
isEventList |
logical; if |
x |
object to test for class |
A object of class "impimp_event" as a list of lists,
where each sublist contains one point in the cartesian product,
spanned by the input values and variables.
There is no plausibility check on whether the supplied varnames are actually contained in the data.frame for which the resulting impimp_event object is later used for.
## underlying data set: x1: 1:6, x2: 1:10 ## subspace, requiring: x1 == 1 & ((x2 == 1 ) | (x2 == 2)) impimp_event(x1 = 1, x2 = c(1,2)) ## subsapce containing all points whitin the Cartesian ## product of (x1 =) {1,2,3,6} x {5,8} (= x2) # via ... argument impimp_event(x1 = c(1:3,6), x2 = c(5,8)) # via EVENTLIST impimp_event(list(x1 = c(1:3,6), x2 = c(5,8)), isEventList = TRUE)## underlying data set: x1: 1:6, x2: 1:10 ## subspace, requiring: x1 == 1 & ((x2 == 1 ) | (x2 == 2)) impimp_event(x1 = 1, x2 = c(1,2)) ## subsapce containing all points whitin the Cartesian ## product of (x1 =) {1,2,3,6} x {5,8} (= x2) # via ... argument impimp_event(x1 = c(1:3,6), x2 = c(5,8)) # via EVENTLIST impimp_event(list(x1 = c(1:3,6), x2 = c(5,8)), isEventList = TRUE)
Combine two object of class "impimp"
like rbind would do with data frames.
rbindimpimp(x, y)rbindimpimp(x, y)
x, y
|
objects of class |
The resulting object is constructed in such a way that minimizes the creation of 'tupled' variables. Only those variables are joined as tuples which are actually necessary to keep the data frame like consise representation of impimp objects.
The attributes "impmethod" and "varlevels" contain
the set union of those of x and y on a global and
per underlying variable basis, respectively.
An object of class "impimp", inheriting the
attributes, specific to imimp objects, of x and y.
A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) impA <- impimp(A, B, method = "case_wise") impB <- impimp(B, A, method = "case_wise") rbindimpimp(impA, impB)A <- data.frame(x1 = c(1,0), x2 = c(0,0), y1 = c(1,0), y2 = c(2,2)) B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0), z1 = c(0,1,1), z2 = c(0,1,2)) impA <- impimp(A, B, method = "case_wise") impB <- impimp(B, A, method = "case_wise") rbindimpimp(impA, impB)