Spark ML -- Model Persistence
Save/load Spark ML objects
ml_save(x, path, overwrite = FALSE, ...) # S3 method for ml_model ml_save(x, path, overwrite = FALSE, type = c("pipeline_model", "pipeline"), ...) ml_load(sc, path)
A ML object, which could be a
The path where the object is to be serialized/deserialized.
Whether to overwrite the existing path, defaults to
Optional arguments; currently unused.
Whether to save the pipeline model or the pipeline.
A Spark connection.
ml_save() serializes a Spark object into a format that can be read back into
sparklyr or by the Scala or PySpark APIs. When called on
ml_model objects, i.e. those that were created via the
tbl_spark - formula signature, the associated pipeline model is serialized. In other words, the saved model contains both the data processing (
RFormulaModel) stage and the machine learning stage.
ml_load() reads a saved Spark object into
sparklyr. It calls the correct Scala
load method based on parsing the saved metadata. Note that a
PipelineModel object saved from a sparklyr
ml_save() will be read back in as an
ml_pipeline_model, rather than the