Using an ODBC connection with Databricks

Overview

This configuration details how to connect to Databricks using an ODBC connection. With this setup, R can connect to Databricks using the odbc and DBI R packages. This type of configuration is the recommended approach for connecting to Databricks from RStudio Connect and can also be used from RStudio Workbench.

Advantages and limitations

Advantages:

  • ODBC connections tend to be more stable than Spark connections. This is especially beneficial for content published to RStudio Connect.
  • If code is developed using a Spark connection and sparklyr, it is easy to swap out the connection type for an ODBC connection and the remaining code will still run.
  • The Spark ODBC driver provided by Databricks was benchmarked against a native Spark connection and the performance of the two is very comparable.

Limitations: - Not all Spark features and functions are available through an ODBC connection.

Driver configuration

Create a DSN for Databricks.

[Databricks-Spark]
Driver=Simba
Server=<server-hostname>
HOST=<server-hostname>
PORT=<port>
SparkServerType=3
Schema=default
ThriftTransport=2
SSL=1
AuthMech=3
UID=token
PWD=<personal-access-token>
HTTPPath=<http-path>

Connect to Databricks

The connection can be tested from the command line using isql -v Databricks-Spark where Databricks-Spark is the DSN name for the connection. If that connects successfully, then the following code can be used to create a connection from an R session:

library(DBI)
library(odbc)

con <- dbConnect(odbc(), "Databricks-Spark")

Additional information

For more information about ODBC connections from R, please visit db.rstudio.com.