Connecting to Spark

Functions for installing Spark components and managing connections to Spark.

spark_config

Read Spark Configuration

spark-connections spark-connections spark_connect spark-connections spark_connection_is_open spark-connections spark_disconnect spark-connections spark_disconnect_all

Manage Spark Connections

spark_install spark_uninstall spark_install_dir spark_install_tar spark_installed_versions spark_available_versions

Download and install various versions of Spark

spark_log

View Entries in the Spark Log

spark_web

Open the Spark web interface

Reading and Writing Data

Functions for reading and writing Spark DataFrames.

spark_read_csv

Read a CSV file into a Spark DataFrame

spark_read_json

Read a JSON file into a Spark DataFrame

spark_read_parquet

Read a Parquet file into a Spark DataFrame

spark_read_table

Reads from a Spark Table into a Spark DataFrame.

spark_write_csv

Write a Spark DataFrame to a CSV

spark_write_json

Write a Spark DataFrame to a JSON file

spark_write_parquet

Write a Spark DataFrame to a Parquet file

spark_write_table

Writes a Spark DataFrame into a Spark table

Spark Tables

Functions for manipulating Spark Tables.

tbl_cache

Cache a Spark Table

tbl_uncache

Uncache a Spark Table

Spark DataFrames

Functions for maniplulating Spark DataFrames.

sdf_copy_to sdf_copy_to sdf_import

Copy an Object into Spark

sdf_mutate sdf_mutate sdf_mutate_

Mutate a Spark DataFrame

sdf_partition

Partition a Spark Dataframe

sdf_predict

Model Predictions with Spark DataFrames

sdf_read_column

Read a Column from a Spark DataFrame

sdf_register

Register a Spark DataFrame

sdf_sample

Randomly Sample Rows from a Spark DataFrame

sdf_sort

Sort a Spark DataFrame

sdf_with_unique_id

Add a Unique ID Column to a Spark DataFrame

Machine Learning Algorithms

Functions for invoking machine learning algorithms.

ml_als_factorization

Spark ML -- Alternating Least Squares (ALS) matrix factorization.

ml_decision_tree

Spark ML -- Decision Trees

ml_generalized_linear_regression

Spark ML -- Generalized Linear Regression

ml_gradient_boosted_trees

Spark ML -- Gradient-Boosted Tree

ml_kmeans

Spark ML -- K-Means Clustering

ml_lda

Spark ML -- Latent Dirichlet Allocation

ml_linear_regression

Spark ML -- Linear Regression

ml_logistic_regression

Spark ML -- Logistic Regression

ml_multilayer_perceptron

Spark ML -- Multilayer Perceptron

ml_naive_bayes

Spark ML -- Naive-Bayes

ml_one_vs_rest

Spark ML -- One vs Rest

ml_pca

Spark ML -- Principal Components Analysis

ml_random_forest

Spark ML -- Random Forests

ml_survival_regression

Spark ML -- Survival Regression

Machine Learning Transformers

Functions for transforming features in Spark DataFrames.

ft_binarizer

Feature Transformation -- Binarizer

ft_bucketizer

Feature Transformation -- Bucketizer

ft_discrete_cosine_transform

Feature Transformation -- Discrete Cosine Transform (DCT)

ft_elementwise_product

Feature Transformation -- ElementwiseProduct

ft_index_to_string

Feature Transformation -- IndexToString

ft_one_hot_encoder

Feature Transformation -- OneHotEncoder

ft_quantile_discretizer

Feature Transformation -- QuantileDiscretizer

ft_sql_transformer

Feature Transformation -- SQLTransformer

ft_string_indexer

Feature Transformation -- StringIndexer

ft_vector_assembler

Feature Transformation -- VectorAssembler

Machine Learning Utilities

Functions for interacting with Spark ML model fits.

ml_binary_classification_eval

Spark ML - Binary Classification Evaluator

ml_classification_eval

Spark ML - Classification Evaluator

ml_tree_feature_importance

Spark ML - Feature Importance for Tree Models

ml_saveload ml_load ml_save

Save / Load a Spark ML Model Fit

Machine Learning Extensions

Functions for creating custom wrappers to other Spark ML algorithms.

ml_create_dummy_variables

Create Dummy Variables

ml_model

Create an ML Model Object

ml_options

Options for Spark ML Routines

ml_prepare_dataframe

Prepare a Spark DataFrame for Spark ML Routines

ml_prepare_response_features_intercept ml_prepare_inputs ml_prepare_features

Pre-process the Inputs to a Spark ML Routine

Extensions API

Functions for creating extensions to the sparklyr package.

compile_package_jars

Compile Scala sources into a Java Archive (jar)

connection_config

Read configuration values for a connection

find_scalac

Discover the Scala Compiler

spark-api spark-api spark_context spark-api java_context spark-api hive_context spark-api spark_session

Access the Spark API

invoke invoke invoke invoke_static invoke invoke_new

Invoke a Method on a JVM Object

register_extension registered_extensions

Register a Package that Implements a Spark Extension

spark_compilation_spec

Define a Spark Compilation Specification

spark_default_compilation_spec

Default Compilation Specification for Spark Extensions

spark_connection

Retrieve the Spark Connection Associated with an R Object

spark_dataframe

Retrieve a Spark DataFrame

spark_dependency

Define a Spark dependency

spark_jobj

Retrieve a Spark JVM Object Reference

spark_version

Get the Spark Version Associated with a Spark Connection

Livy

Functions to use with the Livy method (Experimental).

livy_config

Create a Spark Configuration for Livy

livy_service_start livy_service_stop

Start Livy