Using sparklyr with Qubole
Best practices for working with Qubole
- Manage packages via Qubole Environments - Packages installed via
install.packages()are not available on cluster restart. Packages managed through Qubole Environments are persistent.
- Restrict workloads to interactive analysis - Only perform workloads related to exploratory or interactive analysis with Spark, then write the results to a database, file system, or cloud storage for more efficient retrieval in apps, reports, and APIs.
- Load and query results efficiently - Because of the nature of Spark computations and the associated overhead, Shiny apps that use Spark on the backend tend to have performance and runtime issues; consider reading the results from a database, file system, or cloud storage instead.
Using RStudio Server Pro with Qubole
The Qubole platform includes RStudio Server Pro. More details about how to request RStudio Server Pro and access it from within a Qubole cluster are available from Qubole.
View steps for running RStudio Server Pro inside Qubole
Using RStudio Connect with Qubole
The best configuration for working with Qubole and RStudio Connect is to install RStudio Connect outside of the Qubole cluster and connect to Qubole remotely. This is accomplished using the Qubole ODBC Driver.