Preprocess the Inputs to a Spark ML Routine
Preprocess / normalize the inputs typically passed to a Spark ML routine.
ml_prepare_response_features_intercept(x = NULL, response, features,
intercept, envir = parent.frame(),
categorical.transformations = new.env(parent = emptyenv()),
ml.options = ml_options())
ml_prepare_features(x, features, envir = parent.frame(),
ml.options = ml_options())
Arguments
x  An object coercable to a Spark DataFrame (typically, a

response  The name of the response vector (as a lengthone character
vector), or a formula, giving a symbolic description of the model to be
fitted. When 
features  The name of features (terms) to use for the model fit. 
intercept  Boolean; should the model be fit with an intercept term? 
envir  The R environment in which the 
categorical.transformations  An R environment used to record what categorical variables were binarized in this procedure. Categorical variables that included in the model formula will be transformed into binary variables, and the generated mappings will be stored in this environment. 
ml.options  Optional arguments, used to affect the model generated. See

Details
Preprocessing of these inputs typically involves:
Handling the case where
response
is itself a formula describing the model to be fit, thereby extracting the names of theresponse
andfeatures
to be used,Splitting categorical features into dummy variables (so they can easily be accommodated + specified in the underlying Spark ML model fit),
Mutating the associated variables in the specified environment.
Please take heed of the last point, as while this is useful in practice, the behavior will be very surprising if you are not expecting it.
Examples
# NOT RUN {
# note that ml_prepare_features, by default, mutates the 'features'
# binding in the same environment in which the function was called
local({
ml_prepare_features(features = ~ x1 + x2 + x3)
print(features) # c("x1", "x2", "x3")
})
# }