The RStudio team would like to share with you some test deployment environments you can use to start your Spark journey.

Disclaimer: Please note that these articles are meant as guides only, RStudio is not responsible for issues or charges incurred if used.

YARN Client

Amazon’s EMR

This example demonstrates a complete workflow using Hadoop and Hive with Amazon Elastic Map Reduce (EMR). We access our data with a Spark cluster, understand our data using sparklyr, and then communicate our insights via a flex dashboard.

Cloudera Express

This example demonstrates a complete workflow using Hadoop and Hive with Cloudera (CDH). In addition to the workflow, we show these useful web tools: Cloudera Manager, HUE, and the Spark UI.

Stand Alone

Amazon’s EC2

You can create a Spark cluster without Hadoop using Spark standalone mode. In this example will show you how to set up a standalone cluster in Amazon EC2.

Performance and Tuning

Understanding Spark Caching

By using a reproducible example, we will review some of the main configuration settings, commands and command arguments that can be used that can help you get the best out of Spark's memory management options.

sparklyr is an RStudio project. © 2016 RStudio, Inc.