Function Reference version 0.4

Connecting to Spark

Functions for installing Spark components and managing connections to Spark.

spark_config

Read Spark Configuration

spark_connect

Connect to Spark

spark_disconnect

Disconnect from Spark

spark_install

Download and install various versions of Spark

spark_log

Retrieves entries from the Spark log

spark_web

Open the Spark web interface

Reading and Writing Data

Functions for reading and writing Spark DataFrames.

spark_read_csv

Read a CSV file into a Spark DataFrame

spark_read_json

Read a JSON file into a Spark DataFrame

spark_read_parquet

Read a Parquet file into a Spark DataFrame

spark_write_csv

Write a Spark DataFrame to a CSV

spark_write_json

Write a Spark DataFrame to a JSON file

spark_write_parquet

Write a Spark DataFrame to a Parquet file

sdf-saveload

Save / Load a Spark DataFrame

dplyr Interface

Functions implementing a dplyr backend for Spark DataFrames.

copy_to

Copy a local R data frame to Spark

tbl_cache

Load a table into memory

tbl_uncache

Unload table from memory

Spark DataFrames

Functions for maniplulating Spark DataFrames.

na.replace

Replace Missing Values in Objects

sdf_copy_to

Copy an Object into Spark

sdf_mutate

Mutate a Spark DataFrame

sdf_partition

Partition a Spark Dataframe

sdf_predict

Model Predictions with Spark DataFrames

sdf_read_column

Read a Column from a Spark DataFrame

sdf_register

Register a Spark DataFrame

sdf_sample

Randomly Sample Rows from a Spark DataFrame

sdf_sort

Sort a Spark DataFrame

sdf_with_unique_id

Add a Unique ID Column to a Spark DataFrame

Machine Learning Algorithms

Functions for invoking machine learning algorithms.

ml_als_factorization

Spark ML -- Alternating Least Squares (ALS) matrix factorization.

ml_decision_tree

Spark ML -- Decision Trees

ml_generalized_linear_regression

Spark ML -- Generalized Linear Regression

ml_gradient_boosted_trees

Spark ML -- Gradient-Boosted Tree

ml_kmeans

Spark ML -- K-Means Clustering

ml_lda

Spark ML -- Latent Dirichlet Allocation

ml_linear_regression

Spark ML -- Linear Regression

ml_logistic_regression

Spark ML -- Logistic Regression

ml_multilayer_perceptron

Spark ML -- Multilayer Perceptron

ml_naive_bayes

Spark ML -- Naive-Bayes

ml_one_vs_rest

Spark ML -- One vs Rest

ml_pca

Spark ML -- Principal Components Analysis

ml_random_forest

Spark ML -- Random Forests

ml_survival_regression

Spark ML -- Survival Regression

Machine Learning Transformers

Functions for transforming features in Spark DataFrames.

ft_binarizer

Feature Transformation -- Binarizer

ft_bucketizer

Feature Transformation -- Bucketizer

ft_discrete_cosine_transform

Feature Transformation -- Discrete Cosine Transform (DCT)

ft_elementwise_product

Feature Transformation -- ElementwiseProduct

ft_index_to_string

Feature Transformation -- IndexToString

ft_one_hot_encoder

Feature Transformation -- OneHotEncoder

ft_quantile_discretizer

Feature Transformation -- QuantileDiscretizer

ft_sql_transformer

Feature Transformation -- SQLTransformer

ft_string_indexer

Feature Transformation -- StringIndexer

ft_vector_assembler

Feature Transformation -- VectorAssembler

Machine Learning Utilities

Functions for interacting with Spark ML model fits.

ml_binary_classification_eval

Spark ML - Binary Classification Evaluator

ml_classification_eval

Spark ML - Classification Evaluator

ml_tree_feature_importance

Spark ML - Feature Importance for Tree Models

ml_saveload

Save / Load a Spark ML Model Fit

Machine Learning Extensions

Functions for creating custom wrappers to other Spark ML algorithms.

ensure

Enforce Specific Structure for R Objects

ml_create_dummy_variables

Create Dummy Variables

ml_model

Create an ML Model Object

ml_options

Provide Options for Spark.ML Routines

ml_prepare_dataframe

Prepare a Spark DataFrame for Spark ML Routines

ml_prepare_response_features_intercept

Pre-process the Inputs to a Spark ML Routine

Extensions API

Functions for creating extensions to the sparklyr package.

compile_package_jars

Compile Scala sources into a Java Archive (jar)

connection_config

Read configuration values for a connection

find_scalac

Discover the Scala Compiler

hive_context

Get the HiveContext associated with a connection

invoke

Execute a method on a remote Java object

java_context

Get the JavaSparkContext associated with a connection

register_extension

Register a package that implements a Spark extension

spark_compilation_spec

Define a Spark Compilation Specification

spark_default_compilation_spec

Default Compilation Specification for Spark Extensions

spark_connection

Get the spark_connection associated with an object

spark_context

Get the SparkContext associated with a connection

spark_dataframe

Get the Spark DataFrame associated with an object

spark_dependency

Define a Spark dependency

spark_jobj

Get the spark_jobj associated with an object

spark_session

Get the Spark Session associated with a connection

spark_version

Version of Spark for a connection