Function Reference test

Spark Operations

spark_config()

Read Spark Configuration

spark_connect() spark_connection_is_open() spark_disconnect() spark_disconnect_all() spark_submit()

Manage Spark Connections

spark_install_find() spark_install() spark_uninstall() spark_install_dir() spark_install_tar() spark_installed_versions() spark_available_versions()

Find a given Spark installation by version.

spark_log()

View Entries in the Spark Log

spark_web()

Open the Spark web interface

Spark Data

spark_read_csv()

Read a CSV file into a Spark DataFrame

spark_read_jdbc()

Read from JDBC connection into a Spark DataFrame.

spark_read_json()

Read a JSON file into a Spark DataFrame

spark_read_parquet()

Read a Parquet file into a Spark DataFrame

spark_read_source()

Read from a generic source into a Spark DataFrame.

spark_read_table()

Reads from a Spark Table into a Spark DataFrame.

spark_write_csv()

Write a Spark DataFrame to a CSV

spark_write_jdbc()

Writes a Spark DataFrame into a JDBC table

spark_write_json()

Write a Spark DataFrame to a JSON file

spark_write_parquet()

Write a Spark DataFrame to a Parquet file

spark_write_source()

Writes a Spark DataFrame into a generic source

spark_write_table()

Writes a Spark DataFrame into a Spark table

Spark Tables

src_databases()

Show database list

tbl_cache()

Cache a Spark Table

tbl_change_db()

Use specific database

tbl_uncache()

Uncache a Spark Table

Spark DataFrames

sdf_along()

Create DataFrame for along Object

sdf_bind_rows() sdf_bind_cols()

Bind multiple Spark DataFrames by row and column

sdf_broadcast()

Broadcast hint

sdf_checkpoint()

Checkpoint a Spark DataFrame

sdf_coalesce()

Coalesces a Spark DataFrame

sdf_copy_to() sdf_import()

Copy an Object into Spark

sdf_len()

Create DataFrame for Length

sdf_mutate() sdf_mutate_()

Mutate a Spark DataFrame

sdf_num_partitions()

Gets number of partitions of a Spark DataFrame

sdf_partition()

Partition a Spark Dataframe

sdf_pivot()

Pivot a Spark DataFrame

sdf_predict() sdf_transform() sdf_fit() sdf_fit_and_transform()

Spark ML -- Transform, fit, and predict methods (sdf_ interface)

sdf_read_column()

Read a Column from a Spark DataFrame

sdf_register()

Register a Spark DataFrame

sdf_repartition()

Repartition a Spark DataFrame

sdf_residuals()

Model Residuals

sdf_sample()

Randomly Sample Rows from a Spark DataFrame

sdf_separate_column()

Separate a Vector Column into Scalar Columns

sdf_seq()

Create DataFrame for Range

sdf_sort()

Sort a Spark DataFrame

sdf_with_unique_id()

Add a Unique ID Column to a Spark DataFrame

Spark Machine Learning

ml_als() ml_recommend() ml_als_factorization()

Spark ML -- ALS

ml_decision_tree_classifier() ml_decision_tree() ml_decision_tree_regressor()

Spark ML -- Decision Trees

ml_generalized_linear_regression()

Spark ML -- Generalized Linear Regression

ml_gbt_classifier() ml_gradient_boosted_trees() ml_gbt_regressor()

Spark ML -- Gradient Boosted Trees

ml_kmeans() ml_compute_cost()

Spark ML -- K-Means Clustering

ml_lda() ml_describe_topics() ml_log_likelihood() ml_log_perplexity() ml_topics_matrix()

Spark ML -- Latent Dirichlet Allocation

ml_linear_regression()

Spark ML -- Linear Regression

ml_logistic_regression()

Spark ML -- Logistic Regression

ml_model_data()

Extracts data associated with a Spark ML model

ml_multilayer_perceptron_classifier() ml_multilayer_perceptron()

Spark ML -- Multilayer Perceptron

ml_naive_bayes()

Spark ML -- Naive-Bayes

ml_one_vs_rest()

Spark ML -- OneVsRest

ft_pca() ml_pca()

Feature Transformation -- PCA (Estimator)

ml_random_forest_classifier() ml_random_forest() ml_random_forest_regressor()

Spark ML -- Random Forest

ml_aft_survival_regression() ml_survival_regression()

Spark ML -- Survival Regression

Spark Feature Transformers

ft_binarizer()

Feature Transformation -- Binarizer (Transformer)

ft_bucketizer()

Feature Transformation -- Bucketizer (Transformer)

ft_count_vectorizer() ml_vocabulary()

Feature Transformation -- CountVectorizer (Estimator)

ft_dct() ft_discrete_cosine_transform()

Feature Transformation -- Discrete Cosine Transform (DCT) (Transformer)

ft_elementwise_product()

Feature Transformation -- ElementwiseProduct (Transformer)

ft_index_to_string()

Feature Transformation -- IndexToString (Transformer)

ft_one_hot_encoder()

Feature Transformation -- OneHotEncoder (Transformer)

ft_quantile_discretizer()

Feature Transformation -- QuantileDiscretizer (Estimator)

ft_sql_transformer() ft_dplyr_transformer()

Feature Transformation -- SQLTransformer

ft_string_indexer() ml_labels() ft_string_indexer_model()

Feature Transformation -- StringIndexer (Estimator)

ft_vector_assembler()

Feature Transformation -- VectorAssembler (Transformer)

ft_tokenizer()

Feature Transformation -- Tokenizer (Transformer)

ft_regex_tokenizer()

Feature Transformation -- RegexTokenizer (Transformer)

Spark Machine Learning Utilities

ml_binary_classification_evaluator() ml_binary_classification_eval() ml_multiclass_classification_evaluator() ml_classification_eval() ml_regression_evaluator()

Spark ML - Evaluators

ml_feature_importances() ml_tree_feature_importance()

Spark ML - Feature Importance for Tree Models

Streaming

stream_find()

Find Stream

stream_generate_test()

Generate Test Stream

stream_id()

Spark Stream's Identifier

stream_name()

Spark Stream's Name

stream_read_csv()

Read CSV Stream

stream_read_jdbc()

Read JDBC Stream

stream_read_json()

Read JSON Stream

stream_read_kafka()

Read Kafka Stream

stream_read_orc()

Read ORC Stream

stream_read_parquet()

Read Parquet Stream

stream_read_text()

Read Text Stream

stream_render()

Render Stream

stream_stats()

Stream Statistics

stream_stop()

Stops a Spark Stream

stream_trigger_continuous()

Spark Stream Continuous Trigger

stream_trigger_interval()

Spark Stream Interval Trigger

stream_view()

View Stream

stream_watermark()

Watermark Stream

stream_write_csv()

Write CSV Stream

stream_write_jdbc()

Write JDBC Stream

stream_write_json()

Write JSON Stream

stream_write_kafka()

Write Kafka Stream

stream_write_memory()

Write Memory Stream

stream_write_orc()

Write a ORC Stream

stream_write_parquet()

Write Parquet Stream

stream_write_text()

Write Text Stream

Extensions

compile_package_jars()

Compile Scala sources into a Java Archive (jar)

connection_config()

Read configuration values for a connection

download_scalac()

Downloads default Scala Compilers

find_scalac()

Discover the Scala Compiler

spark_context() java_context() hive_context() spark_session()

Access the Spark API

hive_context_config()

Runtime configuration interface for Hive

invoke() invoke_static() invoke_new()

Invoke a Method on a JVM Object

register_extension() registered_extensions()

Register a Package that Implements a Spark Extension

spark_compilation_spec()

Define a Spark Compilation Specification

spark_default_compilation_spec()

Default Compilation Specification for Spark Extensions

spark_connection()

Retrieve the Spark Connection Associated with an R Object

spark_context_config()

Runtime configuration interface for the Spark Context.

spark_dataframe()

Retrieve a Spark DataFrame

spark_dependency()

Define a Spark dependency

spark_home_set()

Set the SPARK_HOME environment variable

spark_jobj()

Retrieve a Spark JVM Object Reference

spark_version()

Get the Spark Version Associated with a Spark Connection

Distributed Computing

spark_apply()

Apply an R Function in Spark