Perform linear regression on a Spark DataFrame.

ml_linear_regression(x, response, features, intercept = TRUE, alpha = 0, lambda = 0, iter.max = 100L, ml.options = ml_options(), ...)

- x
- An object coercable to a Spark DataFrame (typically, a
`tbl_spark`

). - response
- The name of the response vector (as a length-one character
vector), or a formula, giving a symbolic description of the model to be
fitted. When
`response`

is a formula, it is used in preference to other parameters to set the`response`

,`features`

, and`intercept`

parameters (if available). Currently, only simple linear combinations of existing parameters is supposed; e.g.`response ~ feature1 + feature2 + ...`

. The intercept term can be omitted by using`- 1`

in the model fit. - features
- The name of features (terms) to use for the model fit.
- intercept
- Boolean; should the model be fit with an intercept term?
- alpha, lambda
- Parameters controlling loss function penalization (for e.g.
lasso, elastic net, and ridge regression). See
**Details**for more information. - iter.max
- The maximum number of iterations to use.
- ml.options
- Optional arguments, used to affect the model generated. See
`ml_options`

for more details. - ...
- Optional arguments. The
`data`

argument can be used to specify the data to be used when`x`

is a formula; this allows calls of the form`ml_linear_regression(y ~ x, data = tbl)`

, and is especially useful in conjunction with`do`

.

Spark implements for both \(L1\) and \(L2\) regularization in linear regression models. See the preamble in the http://spark.apache.org/docs/latest/ml-classification-regression.html documentation for more details on how the loss function is parameterized.

In particular, with `alpha`

set to 1, the parameterization
is equivalent to a https://en.wikipedia.org/wiki/Lasso_(statistics)
model; if `alpha`

is set to 0, the parameterization is equivalent to
a https://en.wikipedia.org/wiki/Tikhonov_regularization model.

Other Spark ML routines: `ml_als_factorization`

,
`ml_decision_tree`

,
`ml_generalized_linear_regression`

,
`ml_gradient_boosted_trees`

,
`ml_kmeans`

, `ml_lda`

,
`ml_logistic_regression`

,
`ml_multilayer_perceptron`

,
`ml_naive_bayes`

,
`ml_one_vs_rest`

, `ml_pca`

,
`ml_random_forest`

,
`ml_survival_regression`