top of page Search

# Basic understanding of Decision Trees

The Decision tree algorithm is a powerful and popular predictive machine learning technique. It is a type of supervised learning algorithm. Decision tree is used for both classification and regression. It is also known as CART(Classification and Regression Tree).

Decision tree typically starts with a single node, which branches into possible outcomes. Each of those outcomes leads to additional nodes which branches off into other possibilities. The decision tree looks like an upside-down tree like structure where we can consider stem as the root node, its branches as different conditional attributes and leaves as final result.

The advantages of CART Algorithm are:

• Simple to understand, interpret and visualize.

• Can handle both numerical as well as categorical data.

• Can handle multi-output problems.

• Non linear parameters don’t effect its performance.

• Less effort required from user for data preparation.

The differences and similarities between Classification and Regression:

• Classification trees are used when dependent variable is categorical. Classification problems with categorical solutions like ‘Yes’ or ‘No’, ‘True’ or ‘False’, ‘1’ or ‘0’. Regression trees are used when dependent variable is continuous. Regression problems wherein continuous value needs to be predicted like ‘Product Prices’, ‘Profit’

• A classification tree will determine a set of logical if-then conditions to classify problems. A regression tree is used when the target variable is numerical or continuous in nature.

• In the case of Classification trees, the value obtained by the terminal nodes in the training data is the mode value. In the case of Regression trees, the value obtained by the terminal nodes in the training data is the mean or average response.

Let’s understand what is classification?

• Classification is a technique to divide the datasets into different groups or categories by providing label to it.

• That means, we take the data, analyze it and then each data point is assigned to a labelled group based on a specific condition.

• The reason we classify the data is to perform analysis and predict the data into different categories. We need to train the machine using classification algorithm to analyze data and predict it into different groups whenever new data is given to it.

• For example, we want to make a machine predict certain documents are genuine or fake, any information is valid or invalid, a transaction is fraud or not, etc.

What are the different types of classification?

There are multiple ways to train the machine and we can chose any of the following types for predictive analysis:

• Decision tree

• Random forest

• Naive Bayes

• Logistic regression Below are the two main reasons for using decision tree:

1. Decision Trees usually resemble human thinking ability while making a decision, that’s why it is easy to understand.

2. It’s a tree-like structure which makes it easy to understand the logic behind the decision tree. It is easy to interpret why the classifier made that particular decision.

Decision Tree Terminologies

• Root node: It is the start of a decision tree. It represents the entire dataset, which further gets divided into two or more homogeneous sets.

• Leaf node: They are the final output node, and the tree cannot be segregated further after getting a leaf node.

• Splitting: It is the process of dividing the decision node/root node into sub-nodes according to the given conditions.

• Branch/Sub tree: A tree formed by splitting the tree.

• Pruning: It is the process of removing the unwanted branches from the tree. It is the opposite of Splitting.

• Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes.

First, Visualize the Decision Tree

• Decide Which Question to ask and When?

• Manually prepare the decision tree by looking at the dataset. Which one among them should you pick first?

• Answer for this is determine the attribute that best classifies the training data.

But how do we chose the best attribute? Or

How does the tree decide where to split? Or

How will a tree decide it’s root node? Below are some terminologies that help in deciding a tree where to split or which one attribute is the best attribute as a root node?

Entropy, Information gain and Gini index.

Let’s understand these terminologies one by one:

1. Entropy: Entropy measures the impurity of the split. It measures the randomness in the information being processed. If entropy is higher, it becomes harder to draw conclusions from that information.

Let’s understand what impurity means?

Suppose, there is a basket full of oranges and another bowl with same orange labels. So, if we pick one orange from one basket then we are sure that the probability of getting the correct orange label is 1. So, in this case the impurity is zero. But what if there are 8 different fruits in a basket and 8 different labels in another bowl. So, the probability of matching the fruit with the label is obviously not equal to 1. It’s something less than 1. If any fruit is randomly picked from fruits basket and randomly any label is picked as any permutation and combination is possible. So, in this case we know that the impurity is non-zero. I hope the concept of impurity is clear.

Now, let’s understand what Entropy is?

As stated earlier, Entropy is a measure of impurity. So, when the probability is 0 or 1 which means when the data is either highly impure or highly pure so in that case, the entropy is 0. And when the probability is 0.5, at that point the entropy is maximum. Let’s understand this by below entropy formula:    2. Information gain: Constructing a decision tree is all about finding attribute that returns the highest information gain. That means we will be selecting the node that will give the highest information gain. Information gain is the decrease in entropy after the dataset is spilt on the basis of an attribute is .

• Ig measures the reduction in entropy. It decides which attribute should be selected as decision node. The formula for Ig is follows where S is the sample of the total collection

• Ig = Entropy(S) — [(Weighted Average) x Entropy(each feature)]

3: Gini Index: The measure of impurity or purity used in building decision tree is Gini index. It is the default criterion of Decision tree classifier.

• The formula for Gini Index: The execution time is less while calculating Gini Index because computational time is less as there is no logarithmic calculations. Entropy ranges from 0 to 1 while Gini Index ranges from 0 to 0.5

Python code implementation of Decision Tree Classifier

Let’s see a decision tree implementation where we will use both entropy and gini as criterion. Below is the titanic data from kaggle that we will use to predict the model using DecisionTreeClassifier ```# Froming features (X) and target (y)
X = data_final.drop('Survived', axis='columns')
y = data_final['Survived']```
```# Splitting the data into training data and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=123)```
```# Check the shape of X_train and X_test
print("Training data dimension : ", X_train.shape)
print("Testing data dimension : ", X_test.shape)``` ```# Instantiate DecisionTreeClassifier model with criterion gini index
dtc_gini = tree.DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=0)```
```# Fit the model
dtc_gini.fit(X_train,y_train)```
```score_gini =  dtc_gini.score(X_train,y_train)
print('Training Accuracy score with criterion Gini index: {0:0.2f} %'. format(score_gini * 100.0))```
```# Predict the Test set results with criterion Gini index
y_pred_gini = dtc_gini.predict(X_test)```
```# Evaluate prediction accuracy score with criterion gini index
accuracy_gini = accuracy_score(y_test, y_pred_gini)
print("Testing Accuracy score with criterion Gini index: %.2f %%" % (accuracy_gini * 100.0))``` ```# Instantiate DecisionTreeClassifier model with criterion entropy
dtc_entropy = tree.DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)```
```# Fit the model
dtc_entropy.fit(X_train, y_train)```
```score_entropy = dtc_entropy.score(X_train, y_train)
print('Training Accuracy score with criterion Entropy: {:.2f} %'.format(score_entropy * 100.0))```
```# Predict the Test set results with criterion Entropy
y_pred_entropy = dtc_entropy.predict(X_test)```
```# Evaluate prediction accuracy score with criterion entropy
accuracy_entropy = accuracy_score(y_test, y_pred_entropy)
print('Testing Accuracy score with criterion Entropy: {0:0.2f} %'. format(accuracy_entropy * 100.0))``` If you wish to try this example and execute it yourself, please use the following link to open the kaggle notebook.

Hope you enjoyed learning about the decision tree algorithm.