ml_create_dummy_variables(x, input, reference = NULL, levels = NULL, labels = NULL, envir = new.env(parent = emptyenv()))
inputcolumn to column names to be assigned to the associated dummy variable.
Given a column in a Spark DataFrame, generate a new Spark DataFrame containing dummy variable columns.
The dummy variables are generated in a similar mechanism to
model.matrix, where categorical variables are expanded into a
set of binary (dummy) variables. These dummy variables can be used for
regression of categorical variables within the various regression routines
envir argument can be used as a mechanism for returning
optional information. Currently, the following pieces are returned:
|The set of unique values discovered within the input column.|
|The column names generated.|
envir argument is supplied, the names of any dummy variables
generated will be included, under the