Write a Spark DataFrame to a Parquet file

Serialize a Spark DataFrame to the Parquet format.

spark_write_parquet(x, path, mode = NULL, options = list(),
  partition_by = NULL, ...)

Arguments

x

A Spark DataFrame or dplyr operation

path

The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3a://" and "file://" protocols.

mode

A character element. Specifies the behavior when data or table already exists. Supported values include: 'error', 'append', 'overwrite' and ignore. Notice that 'overwrite' will also change the column structure.

For more details see also http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes for your version of Spark.

options

A list of strings with additional options. See http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration.

partition_by

A character vector. Partitions the output by the given columns on the file system.

...

Optional arguments; currently unused.

See also