The RStudio team would like to share with you some test deployment environments you can use to start your Spark journey.
Disclaimer: Please note that these articles are meant as guides only, RStudio is not responsible for issues or charges incurred if used.
This example demonstrates a complete workflow using Hadoop and Hive with Amazon Elastic Map Reduce (EMR). We access our data with a Spark cluster, understand our data using sparklyr, and then communicate our insights via a flex dashboard.
This example demonstrates a complete workflow using Hadoop and Hive with Cloudera (CDH). In addition to the workflow, we show these useful web tools: Cloudera Manager, HUE, and the Spark UI.
You can create a Spark cluster without Hadoop using Spark standalone mode. In this example will show you how to set up a standalone cluster in Amazon EC2.
Performance and Tuning
Understanding Spark Caching
By using a reproducible example, we will review some of the main configuration settings, commands and command arguments that can be used that can help you get the best out of Spark's memory management options.