Read JSON Stream

R/stream_data.R

stream_read_json

Description

Reads a JSON stream as a Spark dataframe stream.

Usage

stream_read_json(sc, path, name = NULL, columns = NULL, options = list(), ...) 

Arguments

Arguments Description
sc A spark_connection.
path The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3a://" and "file://" protocols.
name The name to assign to the newly generated stream.
columns A vector of column names or a named vector of column types. If specified, the elements can be "binary" for BinaryType, "boolean" for BooleanType, "byte" for ByteType, "integer" for IntegerType, "integer64" for LongType, "double" for DoubleType, "character" for StringType, "timestamp" for TimestampType and "date" for DateType.
options A list of strings with additional options.
Optional arguments; currently unused.

Examples

library(sparklyr)
sc <- spark_connect(master = "local") 
dir.create("json-in") 
jsonlite::write_json(list(a = c(1, 2), b = c(10, 20)), "json-in/data.json") 
json_path <- file.path("file://", getwd(), "json-in") 
stream <- stream_read_json(sc, json_path) %>% stream_write_json("json-out") 
stream_stop(stream) 

See Also

Other Spark stream serialization: stream_read_csv(), stream_read_delta(), stream_read_kafka(), stream_read_orc(), stream_read_parquet(), stream_read_socket(), stream_read_text(), stream_write_console(), stream_write_csv(), stream_write_delta(), stream_write_json(), stream_write_kafka(), stream_write_memory(), stream_write_orc(), stream_write_parquet(), stream_write_text()