# Spark ML -- Generalized Linear Regression

## Usage

ml_generalized_linear_regression(x, response, features, intercept = TRUE, family = gaussian(link = "identity"), iter.max = 100L, ml.options = ml_options(), ...)

## Arguments

- x
- An object coercable to a Spark DataFrame (typically, a
`tbl_spark`

).
- response
- The name of the response vector (as a length-one character
vector), or a formula, giving a symbolic description of the model to be
fitted. When
`response`

is a formula, it is used in preference to other
parameters to set the `response`

, `features`

, and `intercept`

parameters (if available). Currently, only simple linear combinations of
existing parameters is supposed; e.g. `response ~ feature1 + feature2 + ...`

.
The intercept term can be omitted by using `- 1`

in the model fit.
- features
- The name of features (terms) to use for the model fit.
- intercept
- Boolean; should the model be fit with an intercept term?
- family
- The family / link function to use; analogous to those normally
passed in to calls to R's own
`glm`

.
- iter.max
- The maximum number of iterations to use.
- ml.options
- Optional arguments, used to affect the model generated. See
`ml_options`

for more details.
- ...
- Optional arguments; currently unused.

## Description

Perform generalized linear regression on a Spark DataFrame.

## Details

In contrast to `ml_linear_regression()`

and
`ml_logistic_regression()`

, these routines do not allow you to
tweak the loss function (e.g. for elastic net regression); however, the model
fits returned by this routine are generally richer in regards to information
provided for assessing the quality of fit.

## See also

Other Spark ML routines:

`ml_als_factorization`

,

`ml_decision_tree`

,

`ml_gradient_boosted_trees`

,

`ml_kmeans`

,

`ml_lda`

,

`ml_linear_regression`

,

`ml_logistic_regression`

,

`ml_multilayer_perceptron`

,

`ml_naive_bayes`

,

`ml_one_vs_rest`

,

`ml_pca`

,

`ml_random_forest`

,

`ml_survival_regression`