CART
- Builds a binary decision tree where internal nodes split on feature thresholds and leaves give final predictions.
- Used for both classification and regression tasks in predictive modeling and data mining.
- Helps identify key feature thresholds and splits in complex, non-linear data.
Definition
Section titled “Definition”CART, or Classification and Regression Trees, is a decision tree–based machine learning algorithm that creates a binary tree structure: each internal node represents a decision based on a specific feature or attribute, and each leaf node represents a final prediction or outcome.
Explanation
Section titled “Explanation”CART constructs a binary tree by repeatedly splitting the dataset on feature-based decisions (thresholds or conditions) to partition the data into subsets that are more homogeneous with respect to the target. Internal nodes correspond to those decisions, and the terminal (leaf) nodes correspond to the model’s predictions. The algorithm identifies key thresholds and splits in the data to separate outcomes for prediction.
Examples
Section titled “Examples”Credit risk assessment
Section titled “Credit risk assessment”A bank may use CART to predict whether a potential borrower will default on a loan. Input features such as credit score, income, debt-to-income ratio, and loan amount are used to create a decision tree that identifies key thresholds and splits. For instance, the algorithm may determine that borrowers with a credit score below 600 are more likely to default, or that borrowers with a debt-to-income ratio above 40% are also at higher risk. The final leaf nodes represent the predicted default outcome for each borrower.
Customer churn prediction (telecommunications)
Section titled “Customer churn prediction (telecommunications)”CART can predict customer churn using features like customer tenure, average monthly bill, and call volume to build a decision tree that highlights factors contributing to churn. For example, the algorithm may determine that customers with a tenure of less than one year are more likely to churn, or that customers with a high average monthly bill are less likely to churn. The leaf nodes represent the predicted churn outcome for each customer.
Use cases
Section titled “Use cases”- Predictive modeling
- Data mining
- Credit risk assessment (example above)
- Customer churn prediction (example above)
Notes or pitfalls
Section titled “Notes or pitfalls”- Particularly useful when data is complex and non-linear, and when relationships between features and outcomes are not well-understood.
- CART produces a binary tree and identifies explicit feature thresholds and splits.
Related terms
Section titled “Related terms”- Decision tree
- Classification
- Regression
- Classification and Regression Trees (CART)