2 Machine learning

Machine learning (ML) is a collection of methods that learn patterns from data and use those patterns to make predictions or classifications. In practice, ML is most useful when (1) the relationship between predictors and outcome is complex, (2) there are many predictors and potential interactions, or (3) prediction performance is the primary goal.

This chapter focuses on a practical, end-to-end workflow using the caret ecosystem. The emphasis is on reproducible steps that appear repeatedly in real work:
1) load data and inspect structure,
2) split data into training and test sets,
3) perform preprocessing (imputation, encoding, normalization),
4) explore features visually,
5) train models and tune hyperparameters,
6) evaluate performance using a confusion matrix and cross-validation, and
7) compare/ensemble multiple models.

We will use the Pima Indians Diabetes dataset (PimaIndiansDiabetes) as a standard binary classification example. The outcome variable is diabetes, and the predictors are clinical measurements. The goal is to predict whether a subject has diabetes.

For more details, please read here.

–>