Using RStudio Workbench inside of Qubole

Overview

Qubole users can request access to RStudio Server Pro. This allows users to use sparklyr to interact directly with Spark from within the Qubole cluster.

Advantages and limitations

Advantages:

  • Ability for users to connect sparklyr directly to Spark within Qubole
  • Provides a high-bandwidth connection between R and the Spark JVM processes because they are running on the same machine
  • Can load data from the cluster directly into an R session since RStudio Workbench is installed within the Qubole cluster
  • A unique, persistent home directory for each user

Limitations:

  • Persistent packages must be managed using Qubole Environments, not directly from within RStudio
  • RStudio Workbench installed within a Qubole cluster will be limited to the compute resources and lifecycle of that particular Spark cluster
  • Non-Spark jobs will use CPU and RAM resources within the Qubole cluster

Access RStudio Workbench

RStudio Workbench can be accessed from the cluster resources menu:

Use sparklyr

Use the following R code to establish a connection from sparklyr to the Qubole cluster:

library(sparklyr)
sc <- spark_connect(method = "qubole")

Additional information

For more information on using RStudio Workbench inside of Qubole, refer to the Qubole documentation.