Feature Transformation -- OneHotEncoder

One-hot encoding maps a column of label indices to a column of binary vectors, with at most a single one-value. This encoding allows algorithms which expect continuous features, such as Logistic Regression, to use categorical features. Typically, used with ft_string_indexer() to index a column first.

ft_one_hot_encoder(x, input.col, output.col, drop.last = TRUE, ...)

Arguments

x

An object (usually a spark_tbl) coercable to a Spark DataFrame.

input.col

The name of the input column(s).

output.col

The name of the output column.

drop.last

Boolean; drop the last category?

...

Optional arguments; currently unused.

See also

See http://spark.apache.org/docs/latest/ml-features for more information on the set of transformations available for DataFrame columns in Spark.

Other feature transformation routines: ft_binarizer, ft_bucketizer, ft_count_vectorizer, ft_discrete_cosine_transform, ft_elementwise_product, ft_index_to_string, ft_quantile_discretizer, ft_regex_tokenizer, ft_stop_words_remover, ft_string_indexer, ft_tokenizer, ft_vector_assembler, sdf_mutate