Innovative Decision Trees: Machine Learning Inspired by Nature
Written on
Introduction to Evolutionary Decision Trees
As our understanding of biology has significantly advanced, it has become a pivotal source of inspiration for engineers aiming to tackle complex challenges and create innovative solutions.
A notable instance of this is the Japanese high-speed train, Shinkansen, which travels at speeds over 300 km/h. During its development, engineers faced substantial obstacles due to the immense noise generated by the air displacement, which could potentially damage tunnels.
To counter this problem, they looked to an unexpected model: the Kingfisher. This bird’s streamlined beak allows it to dive into water with minimal splash.
By mimicking this bird's design, engineers not only resolved the noise issue but also decreased the train's energy consumption by 15% and increased its speed by 10%.
Drawing inspiration from biological principles can also be applied in machine learning, particularly through the concept of Evolutionary Decision Trees.
Article Outline
In this discussion, I will delve into the concept of Evolutionary Decision Trees. These classifiers utilize evolutionary algorithms that draw on principles of biological evolution to create more robust and effective decision trees.
By the end of this article, you will understand: - What constitutes a Decision Tree? - How to implement Evolutionary Algorithms to construct a Decision Tree? - How do Evolutionary Decision Trees compare to other classifiers in terms of performance?
Dataset Overview
To illustrate the concepts covered in this article, I will refer to a dataset derived from an airline passenger satisfaction survey. More details about this dataset can be found here.
The primary aim is to predict the probability that a customer will be satisfied with the airline’s services. Such insights can be essential for a company's decision-making, allowing them to identify which aspects of their services require improvement and the urgency of these enhancements.
Let’s begin by exploring the fundamentals of decision trees.
1. Understanding Decision Trees
Decision Trees are classifiers that utilize a flowchart-like structure to make classifications based on decision rules derived from data features.
Below is an illustration of a Decision Tree trained on the airline passenger satisfaction survey using the Scikit Learn Decision Tree module.
This decision tree reveals that for business travelers, the primary factor contributing to customer satisfaction is the ease of online boarding; an efficient online boarding process enhances customer satisfaction. It also emphasizes the significance of inflight service quality, particularly Wi-Fi.
Decision trees have become popular for classification tasks due to their numerous benefits: - They are intuitive and easy to interpret, mirroring human reasoning. - They can manage both numerical and categorical data. - Their hierarchical structure allows for effective utilization of available variables.
The algorithms that construct decision trees typically employ a greedy top-down recursive partitioning strategy.
The root node represents the source dataset, which is divided into subsets (child nodes) based on specific criteria. This process continues until each node's subset contains only instances of a single class, or further division does not enhance predictive accuracy.
The criteria used to determine the best way to create splits and tests at nodes can vary between algorithms. Common metrics include Information Gain (or Entropy) and Gini Impurity. These metrics measure impurity, achieving a value of 0 when all samples in a node belong to a single class, and reaching maximum values with a uniform class distribution.
However, this approach has two significant drawbacks: 1. It may lead to suboptimal solutions. 2. It can create overly complex trees that fail to generalize well, resulting in overfitting.
Several techniques have been proposed to mitigate these issues: - Pruning Technique: This involves fully constructing the decision tree first and then reducing its size by eliminating insignificant nodes or subtrees. - Ensemble Trees: Multiple trees are created, and the final classification is determined through a specific rule, often a voting scheme, though this can compromise the interpretability of decision trees.
Consequently, exploring alternative methods for generating model trees is crucial. In this context, Evolutionary Algorithms (EAs) have gained notable attention.
EAs conduct a comprehensive global search in the solution space rather than a localized one, allowing them to better address attribute interactions compared to greedy methods.
Let’s examine how they operate.
2. Utilizing Evolutionary Algorithms for Decision Trees
Evolutionary Algorithms are search heuristics that emulate the mechanisms of natural biological evolution.
In this framework, each "individual" in a population signifies a potential solution to a specific problem. Each individual is assessed based on a fitness value, which measures its effectiveness as a solution. The initial population is usually generated randomly and evolves toward more optimal areas of the search space.
During each generation, a selection process ensures that individuals with lower fitness values have a higher likelihood of reproduction.
Moreover, the population undergoes specific transformations through genetic-inspired operations such as: - Crossover and Mutations: Combining information from two individuals to create offspring. - Mutations: Applying small random alterations to individuals.
This iterative process continues until a stopping criterion is met. The fittest individual is selected as the optimal solution.
EA-based decision trees offer a compelling alternative to conventional methods for the following reasons: - They serve as randomized search techniques, allowing them to evade local minima that the greedy strategy might encounter. - They maintain the comprehensibility of decision trees, unlike ensemble methods. - They can optimize multiple objectives, consolidating them into the fitness value.
2.1. Initial Population
In Evolutionary Decision Trees, individuals are represented as trees, with the initial population consisting of randomly generated trees.
Random trees can be created as follows: - Starting from a root node and two child nodes, the algorithm determines whether to split the children or designate the node as a terminal node (leaf) with a predefined probability p.
- If the children are split, the algorithm randomly selects attributes and split values.
- If the node becomes a terminal node (leaf), a random class label is assigned.
2.2. Fitness Function
The objective of a classifier is to achieve optimal predictive accuracy for unseen data. Decision Tree classifiers must also manage the final tree's size, as a smaller tree may lead to underfitting, while a more complex tree could result in overfitting.
Thus, a fitness function can be defined to balance these criteria: Fitness = ?1 f1 + ?2 f2 Where: - f1 represents accuracy on the training set. - f2 penalizes the depth of the tree. - ?1 and ?2 are parameters to be determined.
2.3. Selection Methods
Several methods exist for selecting the parents that will generate the next generation: - Fitness Proportionate Selection (Roulette Wheel Selection): The fitness of each individual relative to the population is used to assign selection probabilities. - Tournament Selection: Parents are chosen by selecting the individuals with the highest fitness from a randomly selected group. - Elitism: The top-performing individuals are carried over directly to the next generation, preserving the most successful traits.
An individual may be chosen multiple times, allowing for greater genetic dissemination.
2.4. Crossover Mechanism
Crossover offspring are produced by merging pairs of parents from the current generation. Two individuals are selected as parents, and a random node is chosen in both trees. The new individual (offspring) is formed by swapping the sub-tree of the first parent with that of the second.
2.5. Mutation Process
Mutations involve small random modifications made to individuals in the population, essential for maintaining genetic diversity and enabling the genetic algorithm to explore a broader solution space.
In the context of Decision Trees, this can entail randomly altering an attribute and split value of a randomly selected node.
2.6. Stopping Criteria
The algorithm is deemed to have converged when the fitness of the best individual within the population no longer improves over a specified number of generations. To manage computation time during slow convergence, a maximum generation count is established beforehand.
3. Performance of Evolutionary Decision Trees Compared to Other Classifiers
While Evolutionary Decision Trees present an intriguing approach, how do they stack up against traditional Machine Learning algorithms?
3.1. Experimental Analysis
To gauge its effectiveness, I implemented and trained the model on the dataset from the airline passenger satisfaction survey.
The objective was to identify factors contributing to high customer satisfaction levels. In this context, a straightforward yet robust model is essential for elucidating customer satisfaction pathways.
#### Dataset Characteristics
The dataset is substantial, comprising over 100,000 entries. - Customer Data: Information includes gender, age, loyalty status, travel type (personal or business), flight class (business, economy, economy plus), and flight distance. - Satisfaction Levels: Ratings pertain to inflight Wi-Fi, convenience of departure/arrival times, online booking ease, gate location, food and beverage quality, online boarding, seat comfort, inflight entertainment, onboard service, legroom, baggage handling, check-in service, cleanliness, and overall inflight service.
The target variable denotes customer satisfaction, classified as either 'satisfied' or 'neutral/dissatisfied'.
#### Methodology Overview
Here’s a brief overview of the steps undertaken: 1. Data Pre-processing: Categorical variables were converted into indicator variables, and the dataset was split into training and testing subsets. 2. Modeling and Evaluation: Each model was trained on the training subset and evaluated on the validation subset. 3. Model Performance Comparison: The Evolutionary Decision Tree (EDT) approach was compared exclusively with tree-based models, namely Decision Tree (DT) and Random Forest (RF), with a maximum tree depth of 3 as a constraint. The population size for EDT and the number of estimators for RF were both set to 10 to facilitate a consistent comparison within a reasonable computational timeframe.
#### Results
Here are the results:
In this context, the performance of EDT is comparable to that of the other two Machine Learning algorithms.
However, the EDT model differentiates itself by producing: - A single decision tree that can be visualized, unlike the RF model, which aggregates multiple decision trees. - A robust decision tree that surpasses the simpler DT model, as it represents the best tree from the population.
Below is a visual representation of the best decision tree from the EDT population, with the algorithm trained to a maximum depth of 2.
3.2. Broader Experimental Validation of the EDT Approach
The previous experiments alone do not sufficiently evaluate the performance and reliability of Evolutionary Decision Trees in contrast to other Machine Learning algorithms.
Since only one dataset was used, it does not encompass all variables, such as the effects of class numbers in the target variable, the influence of features, and the number of observations.
In a study, the authors compared the performance of an EDT approach with various Machine Learning methods using real-world datasets from UCI.
Let’s review the findings of that study.
#### Dataset Summary
The table below presents a brief overview of the datasets employed:
The datasets exhibit significant variation in terms of observation counts, feature counts, and target classes. The first dataset is particularly challenging due to its high class count and limited observations.
#### Methodology Recap
Here’s a summary of the methodology adopted by the authors to evaluate the performance of the EDT model against more traditional Machine Learning algorithms: - The EDT model was trained with hyperparameters set to 500 generations, a population size of 400, and crossover/mutation probabilities of 0.6/0.4, using a stochastic uniform selection method with elitism. - Model performance was assessed using a 5x2 cross-validation method.
#### Results Summary
As noted by the authors, two key conclusions can be drawn from the results: 1. Tree-based algorithms generally outperform other Machine Learning algorithms across most datasets. This can be attributed to decision trees' inherent ability to identify significant features, making them well-suited for datasets where relationships between targets and features are complex. 2. The results from the abalone dataset are notably poor due to its 28 classes and limited observations (only 210). However, the EDT model excels with the highest accuracy, demonstrating its capability to manage challenging datasets and effectively mitigate overfitting.
It's important to note that the EDT results were achieved with default parameters; tuning these parameters could yield even better performance.
References
[1] R. Barros et al., A Survey of Evolutionary Algorithms for Decision Tree Induction, 2011 [2] D. Jankowski et al., Evolutionary Algorithm for Decision Tree Induction, 2016 [3] S. Cha, Constructing Binary Decision Trees using Genetic Algorithms, 2008 [4] D. Carvalho et al., A Hybrid Decision Tree/Genetic Algorithm Method for Data Mining, 2003 [5] Wikipedia, Traveling salesman problem [6] Wikipedia, Genetic Algorithm