Spark ML -- Principal Components Analysis

Perform principal components analysis on a Spark DataFrame.

ml_pca(x, features = tbl_vars(x), k = length(features),
  ml.options = ml_options(), ...)



An object coercable to a Spark DataFrame (typically, a tbl_spark).


The columns to use in the principal components analysis. Defaults to all columns in x.


The number of principal components.


Optional arguments, used to affect the model generated. See ml_options for more details.


Optional arguments. The data argument can be used to specify the data to be used when x is a formula; this allows calls of the form ml_linear_regression(y ~ x, data = tbl), and is especially useful in conjunction with do.

See also

Other Spark ML routines: ml_als_factorization, ml_decision_tree, ml_generalized_linear_regression, ml_gradient_boosted_trees, ml_kmeans, ml_lda, ml_linear_regression, ml_logistic_regression, ml_multilayer_perceptron, ml_naive_bayes, ml_one_vs_rest, ml_random_forest, ml_survival_regression