Spark ML  GradientBoosted Tree
Perform regression or classification using gradientboosted trees.
ml_gradient_boosted_trees(x, response, features, impurity = c("auto", "gini",
"entropy", "variance"), loss.type = c("auto", "logistic", "squared",
"absolute"), max.bins = 32L, max.depth = 5L, num.trees = 20L,
min.info.gain = 0, min.rows = 1L, learn.rate = 0.1, sample.rate = 1,
type = c("auto", "regression", "classification"), thresholds = NULL,
seed = NULL, checkpoint.interval = 10L, cache.node.ids = FALSE,
max.memory = 256L, ml.options = ml_options(), ...)
Arguments
x  An object coercable to a Spark DataFrame (typically, a

response  The name of the response vector (as a lengthone character
vector), or a formula, giving a symbolic description of the model to be
fitted. When 
features  The name of features (terms) to use for the model fit. 
impurity  Criterion used for information gain calculation One of 'auto', 'gini', 'entropy', or 'variance'. 'auto' defaults to 'gini' for classification and 'variance' for regression. 
loss.type  Loss function which the algorithm tries to minimize. Defaults to 
max.bins  The maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. 
max.depth  Maximum depth of the tree (>= 0); that is, the maximum number of nodes separating any leaves from the root of the tree. 
num.trees  Number of trees to train (>= 1), defaults to 20. 
min.info.gain  Minimum information gain for a split to be considered at a tree node. Should be >= 0, defaults to 0. 
min.rows  Minimum number of instances each child must have after split. 
learn.rate  The learning rate or step size, defaults to 0.1. 
sample.rate  Fraction of the training data used for learning each decision tree, defaults to 1.0. 
type  The type of model to fit. 
thresholds  Thresholds in multiclass classification to adjust the probability of predicting each class. Vector must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class's threshold. 
seed  Seed for random numbers. 
checkpoint.interval  Set checkpoint interval (>= 1) or disable checkpoint (1). E.g. 10 means that the cache will get checkpointed every 10 iterations, defaults to 10. 
cache.node.ids  If 
max.memory  Maximum memory in MB allocated to histogram aggregation. If too small, then 1 node will be split per iteration, and its aggregates may exceed this size. Defaults to 256. 
ml.options  Optional arguments, used to affect the model generated. See

...  Optional arguments. The 
See also
Other Spark ML routines: ml_als_factorization
,
ml_decision_tree
,
ml_generalized_linear_regression
,
ml_kmeans
, ml_lda
,
ml_linear_regression
,
ml_logistic_regression
,
ml_multilayer_perceptron
,
ml_naive_bayes
,
ml_one_vs_rest
, ml_pca
,
ml_random_forest
,
ml_survival_regression