Spark ML - Evaluators

A set of functions to calculate performance metrics for prediction models. Also see the Spark ML Documentation https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.package

ml_binary_classification_evaluator(x, label_col = "label",
  raw_prediction_col = "rawPrediction", metric_name = "areaUnderROC",
  uid = random_string("binary_classification_evaluator_"), ...)

ml_binary_classification_eval(x, label_col = "label",
  prediction_col = "prediction", metric_name = "areaUnderROC")

ml_multiclass_classification_evaluator(x, label_col = "label",
  prediction_col = "prediction", metric_name = "f1",
  uid = random_string("multiclass_classification_evaluator_"), ...)

ml_classification_eval(x, label_col = "label",
  prediction_col = "prediction", metric_name = "f1")

ml_regression_evaluator(x, label_col = "label",
  prediction_col = "prediction", metric_name = "rmse",
  uid = random_string("regression_evaluator_"), ...)

Arguments

x

A spark_connection object or a tbl_spark containing label and prediction columns. The latter should be the output of sdf_predict.

label_col

Name of column string specifying which column contains the true labels or values.

raw_prediction_col

Name of column contains the scored probability of a success

metric_name

The performance metric. See details.

uid

A character string used to uniquely identify the ML estimator.

...

Optional arguments; currently unused.

prediction_col

Name of the column that contains the predicted label or value NOT the scored probability. Column should be of type Double.

Value

The calculated performance metric

Details

The following metrics are supported

  • Binary Classification: areaUnderROC (default) or areaUnderPR (not available in Spark 2.X.)

  • Multiclass Classification: f1 (default), precision, recall, weightedPrecision, weightedRecall or accuracy; for Spark 2.X: f1 (default), weightedPrecision, weightedRecall or accuracy.

  • Regression: rmse (root mean squared error, default), mse (mean squared error), r2, or mae (mean absolute error.)

ml_binary_classification_eval() is an alias for ml_binary_classification_evaluator() for backwards compatibility. ml_classification_eval() is an alias for ml_multiclass_classification_evaluator() for backwards compatibility.