A mechanism for excluding outliers during data cleaning. Create exploratory
plots, identify rows of the dataset to be considered outliers for exclusion,
and then feed that filtered dataset into this function to exclude them from
the dataset. Requires a corresponding IGNORE
statement - see argument
descriptions for more details.
exclude_rows(d, dexcl, exclude_col = "EXCL")
A data.frame
for containing the full NONMEM dataset. Should
contain a column for identifying excluded rows named with the exclude_col
argument.
A smaller data.frame
consisting of rows to be ignored. Need
not contain all columns of d
but each column should be present in d
.
Character (default = "EXCL"
). Name of a binary exclude
column in d
. This should be accompanied with a IGNORE=(EXCL.GT.0)
statement in $DATA.
A modified version of d
with exclude_col
set to 1
for rows
coinciding with dexcl
.
# create example object m1 from package demo files
exdir <- system.file("extdata", "examples", "theopp", package = "NMproject")
m1 <- new_nm(run_id = "m1",
based_on = file.path(exdir, "Models", "ADVAN2.mod"),
data_path = file.path(exdir, "SourceData", "THEOPP.csv"))
d <- input_data(m1)
d$EXCL <- 0 ## start with no rows excluded
## use with dplyr
dexcl <- d %>%
dplyr::filter(ID == 6, TIME > 3) %>%
dplyr::select(ID, TIME, DV, EXCL)
dexcl ## view rows to be excluded
#> ID TIME DV EXCL
#> 1 6 3.57 5.53 0
#> 2 6 5.00 4.94 0
#> 3 6 7.00 4.02 0
#> 4 6 9.22 3.46 0
#> 5 6 12.10 2.78 0
#> 6 6 23.85 0.92 0
d <- d %>% exclude_rows(dexcl)
d %>% dplyr::filter(ID %in% 6)
#> ID AMT TIME DV WT EXCL
#> 1 6 4 0.00 NA 80 0
#> 2 6 NA 0.00 0.00 NA 0
#> 3 6 NA 0.27 1.29 NA 0
#> 4 6 NA 0.58 3.08 NA 0
#> 5 6 NA 1.15 6.44 NA 0
#> 6 6 NA 2.03 6.32 NA 0
#> 7 6 NA 3.57 5.53 NA 1
#> 8 6 NA 5.00 4.94 NA 1
#> 9 6 NA 7.00 4.02 NA 1
#> 10 6 NA 9.22 3.46 NA 1
#> 11 6 NA 12.10 2.78 NA 1
#> 12 6 NA 23.85 0.92 NA 1