The convoke
Function
Here is one example of different function interfaces doing the same thing and how the provided DSL would simplify this issue.
The two functions each take a different data type in, produce a different out data, use different name for trim argument, and one does not have a default value for column (works only on vectors).
basemeancalc <- function(listdata, trimmed = 0.0, na.rm = T) {
mean(listdata, trim = trimmed, na.rm = na.rm)
}
tidymeancalc <- function(tibbledata, column, trimend = 0.0, trimstart = 0.0) {
column <- as_string(ensym(column))
colvector <- tibbledata[[column]] %>%
sort() %>%
na.omit()
nelem <- length(colvector)
colvector <-
colvector[floor(nelem * trimstart) + 1:nelem - floor(nelem * trimend)]
return(mean(colvector))
}
mydf <- tribble(
~x, ~y,
1, 5,
4, 3,
6, 2,
17, 4,
8, 12,
14, 16,
21, 72,
19, 32,
10, 15,
NA, NA
)
tidymeancalc(mydf, x, trimend = 0.1, trimstart = 0.1)
#> [1] 11.11111
basemeancalc(mydf$x, trimmed = 0.1)
#> [1] 11.11111
mean(mydf$x, trim = 0.1, na.rm = T)
#> [1] 11.11111
Three vastly different APIs. Now, for unification:
(mymean <- convoke(
list(tibble, column, trim),
basemeancalc(listdata = tibble[[column]], trimmed = trim),
tidymeancalc(
tibbledata = tibble, column = !!column, trimend = trim, trimstart = trim
)
))
#> convoke function
#> interfaces: basemeancalc(), tidymeancalc()
#> args: tibble, column, trim, interface = basemeancalc, interface.args
map(
set_names(c("basemeancalc", "tidymeancalc")),
~ mymean(mydf, "x", 0.1, interface = ., basemeancalc.na.rm = T)
)
#> $basemeancalc
#> [1] 11.11111
#>
#> $tidymeancalc
#> [1] 11.11111
One can also add additional specifications later:
mymean <- mymean + ~ mean(x = tibble[[column]], trimmed = trim)
mymean(mydf, "x", 0.1, interface = "mean", mean.na.rm = T)
#> [1] 11.11111
Nevertheless, the ultimate goal is to specify a DSL by which the package author can convert its functions to a unified interface by only specifying some general rules (in a text file bundled with the package for instance). This would allow the package author to separate the need to conform with a unified interface from writing the packages however is seen best. Nevertheless, hard-codded values can’t be changed or passed down obviously, and would still require upstream fixes.
The %->%
pipem Operator
This operator facilitates chaining instructions without explicitly
writing the magrittr
pipe. Additionally, it allows for
intermediate values to be kept and used later down the chain of
instructions. For instance, consider the case where we want to compute
abs(vec^3) * (sum(vec^2) + 1)
sequentially, instead of
using a single function:
c(-4, 9, -3, 12) %->% {
(function(x) {
x^2
})()
vec2 <- .
(function(x) {
abs(x^3) + sum(vec2) + 1
})()
}
#> [1] 4347 531692 980 2986235
Another convenience of this operator is that one can simply comment
out the remaining lines for debugging purposes, without having to write
an extra identity()
for instance.
Note that since the operator treats symbols specially, functions with
single arguments should be in the form func()
and not just
func
. This is a best practice in the original
magrittr
pipe as well in any case.
The %to%
Operator
This operator facilitates a different issue that arises quite often in model building. And that is requiring multiple results from the same object. Example:
modelfit <- lm(y ~ x, data = mydf)
(res <- summary(modelfit) %to% {
rsquared ~ .$r.squared
adjrsquared ~ .$adj.r.squared
residuals ~ .$residuals
pval ~ coef(.)[, "Pr(>|t|)"]
termsof ~ .$terms
callof ~ .$call
})
#> # A tibble: 1 × 6
#> rsquared adjrsquared residuals pval termsof callof
#> <dbl> <dbl> <I<list>> <I<list>> <I<list>> <I<list>>
#> 1 0.512 0.443 <dbl [9]> <dbl [2]> <terms> <language>
For instance, we can know the p-values now:
res$pval
#> [[1]]
#> (Intercept) x
#> 0.51285227 0.03011035
Comparing this with the result from broom::glance
, one
can see that the above approach is much more flexible.