Gradient boosting is based on combining many weak learners and deriving the final decision from this combined “strong” learner. Suppose Least-Squares Regression settings, where the goal is to teach the model how to approximate and predict value using the function , by minimizing MSE . Algorithm goes over iteration. At each iteration the model is in some imperfect state . In order to improve this model, we suppose that we should add some new estimator , adjust the next predictor to be:
which yields the new predictor to be:
Gradient boosting will fit the predictor to the adjusted residual of the form .