Function Reference

Spark Operations

spark_config

Read Spark Configuration

spark_connect spark_connection_is_open spark_disconnect spark_disconnect_all

Manage Spark Connections

spark_install_find spark_install spark_uninstall spark_install_dir spark_install_tar spark_installed_versions spark_available_versions

Find a given Spark installation by version.

spark_log

View Entries in the Spark Log

spark_web

Open the Spark web interface

Spark Data

spark_read_csv

Read a CSV file into a Spark DataFrame

spark_read_jdbc

Read from JDBC connection into a Spark DataFrame.

spark_read_json

Read a JSON file into a Spark DataFrame

spark_read_parquet

Read a Parquet file into a Spark DataFrame

spark_read_source

Read from a generic source into a Spark DataFrame.

spark_read_table

Reads from a Spark Table into a Spark DataFrame.

spark_write_csv

Write a Spark DataFrame to a CSV

spark_write_jdbc

Writes a Spark DataFrame into a JDBC table

spark_write_json

Write a Spark DataFrame to a JSON file

spark_write_parquet

Write a Spark DataFrame to a Parquet file

spark_write_source

Writes a Spark DataFrame into a generic source

spark_write_table

Writes a Spark DataFrame into a Spark table

Spark Tables

src_databases

Show database list

tbl_cache

Cache a Spark Table

tbl_change_db

Use specific database

tbl_uncache

Uncache a Spark Table

Spark DataFrames

sdf_along

Create DataFrame for along Object

sdf_bind_rows sdf_bind_cols

Bind multiple Spark DataFrames by row and column

sdf_broadcast

Broadcast hint

sdf_checkpoint

Checkpoint a Spark DataFrame

sdf_coalesce

Coalesces a Spark DataFrame

sdf_copy_to sdf_import

Copy an Object into Spark

sdf_len

Create DataFrame for Length

sdf_mutate sdf_mutate_

Mutate a Spark DataFrame

sdf_num_partitions

Gets number of partitions of a Spark DataFrame

sdf_partition

Partition a Spark Dataframe

sdf_pivot

Pivot a Spark DataFrame

sdf_predict sdf_transform sdf_fit sdf_fit_and_transform

Spark ML -- Transform, fit, and predict methods (sdf_ interface)

sdf_read_column

Read a Column from a Spark DataFrame

sdf_register

Register a Spark DataFrame

sdf_repartition

Repartition a Spark DataFrame

sdf_residuals

Model Residuals

sdf_sample

Randomly Sample Rows from a Spark DataFrame

sdf_separate_column

Separate a Vector Column into Scalar Columns

sdf_seq

Create DataFrame for Range

sdf_sort

Sort a Spark DataFrame

sdf_with_unique_id

Add a Unique ID Column to a Spark DataFrame

Spark Machine Learning

ml_als ml_recommend ml_als_factorization

Spark ML -- ALS

ml_decision_tree_classifier ml_decision_tree ml_decision_tree_regressor

Spark ML -- Decision Trees

ml_generalized_linear_regression

Spark ML -- Generalized Linear Regression

ml_gbt_classifier ml_gradient_boosted_trees ml_gbt_regressor

Spark ML -- Gradient Boosted Trees

ml_kmeans

Spark ML -- K-Means Clustering

ml_lda

Spark ML -- Latent Dirichlet Allocation

ml_linear_regression

Spark ML -- Linear Regression

ml_logistic_regression

Spark ML -- Logistic Regression

ml_model_data

Extracts data associated with a Spark ML model

ml_multilayer_perceptron_classifier ml_multilayer_perceptron

Spark ML -- Multilayer Perceptron

ml_naive_bayes

Spark ML -- Naive-Bayes

ml_one_vs_rest

Spark ML -- OneVsRest

ft_pca ml_pca

Feature Tranformation -- PCA (Estimator)

ml_random_forest_classifier ml_random_forest ml_random_forest_regressor

Spark ML -- Random Forest

ml_aft_survival_regression ml_survival_regression

Spark ML -- Survival Regression

Spark Feature Transformers

ft_binarizer

Feature Transformation -- Binarizer (Transformer)

ft_bucketizer

Feature Transformation -- Bucketizer (Transformer)

ft_count_vectorizer

Feature Tranformation -- CountVectorizer (Estimator)

ft_dct ft_discrete_cosine_transform

Feature Transformation -- Discrete Cosine Transform (DCT) (Transformer)

ft_elementwise_product

Feature Transformation -- ElementwiseProduct (Transformer)

ft_index_to_string

Feature Transformation -- IndexToString (Transformer)

ft_one_hot_encoder

Feature Transformation -- OneHotEncoder (Transformer)

ft_quantile_discretizer

Feature Transformation -- QuantileDiscretizer (Estimator)

ft_sql_transformer ft_dplyr_transformer

Feature Transformation -- SQLTransformer

ft_string_indexer

Feature Tranformation -- StringIndexer (Estimator)

ft_vector_assembler

Feature Transformation -- VectorAssembler (Transformer)

ft_tokenizer

Feature Tranformation -- Tokenizer (Transformer)

ft_regex_tokenizer

Feature Tranformation -- RegexTokenizer (Transformer)

Spark Machine Learning Utilities

ml_binary_classification_evaluator ml_binary_classification_eval ml_multiclass_classification_evaluator ml_classification_eval ml_regression_evaluator

Spark ML - Evaluators

ml_tree_feature_importance

Spark ML - Feature Importance for Tree Models

Extensions

compile_package_jars

Compile Scala sources into a Java Archive (jar)

connection_config

Read configuration values for a connection

download_scalac

Downloads default Scala Compilers

find_scalac

Discover the Scala Compiler

spark_context java_context hive_context spark_session

Access the Spark API

hive_context_config

Runtime configuration interface for Hive

invoke invoke_static invoke_new

Invoke a Method on a JVM Object

register_extension registered_extensions

Register a Package that Implements a Spark Extension

spark_compilation_spec

Define a Spark Compilation Specification

spark_default_compilation_spec

Default Compilation Specification for Spark Extensions

spark_connection

Retrieve the Spark Connection Associated with an R Object

spark_context_config

Runtime configuration interface for Spark.

spark_dataframe

Retrieve a Spark DataFrame

spark_dependency

Define a Spark dependency

spark_home_set

Set the SPARK_HOME environment variable

spark_jobj

Retrieve a Spark JVM Object Reference

spark_version

Get the Spark Version Associated with a Spark Connection

Distributed Computing

spark_apply

Apply an R Function in Spark