In each and every moment in our life, we need to find the solutions for our problems. We might come across one or more solutions for a single problem. Out of which we need to decide the best fit. Decision tree is one of the prominent, simple algorithm to implement and easy to interpret. Decision tree belongs to supervised learning algorithm and used to solve the Regression and classification problems(CART). It is a two step process Learning and Classification step. Learning steps are used to develop model according to training data. Classification steps are used to test the model according to responses data.

Decision tree is flow-chart like graph where each node denotes a test of attribute value. Each branch represents the outcome of the test, leaf node represents the classes. Internal nodes has been divide into two or more classes.

Above structure is a simple representation of decision tree .


An attribute selection measure is a heuristic for selecting the splitting criteria. The attribute which having the best score has been chosen as the splitting attribute. There are three popular attribute selection measures metric

Information Gain:

This measure is based on attribute value. Let node N represent or hold the partition Node. The attribute with the highest information gain is chosen as the splitting attribute for node N. This attribute minimizes the information needed to classify the node resulting partition and reflects in random values or impurities in Node.




pi-pi is the probability that a Node in D belongs to class C(parent node)

D1,D2-represent the set of child node.


GINI INDEX or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen .if the all element belongs to single class its called "Pure". The attribute that maximizes the reduction in impurity (or has the

minimum Gini index) is selected as the splitting attribute.



pi-pi is the probability that a Node in D belongs to class C(parent node)

D1,D2-represent the set of child node.


The gain measure is biased toward the tests with many outcomes. It prefers to select attributes having a large number of values. The attribute with the maximum gain ratio is selected as the splitting attribute. However, if the split information approaches 0, the ratio becomes unstable.

In below illustration

Here we analyze the sleep duration which is crucial for the kids growth.

1.SLEEPING HOURS has been specified as GOOD(sleep > 8 hours) or BAD(sleep <8 hours/disturbed sleep).

2.PLAY has been specified as INDOOR or OUTDOOR

3.FOOD has been specified as LESS (unhealthy/less amount )or NORMAL(Healthy)

4.NIGHT BATH has been specified as YES or NO

Let us calculate information gain and Gini index for each attribute .

In the above table. Sleeping hours is parent node.

Sleeping Hours - Parent Node

P(good)=3/4=0.75 P(bad)=1/4=0.25

Entropy for parent node = - ∑ P(value) *log2 (value)(using formula 1)




I have chosen as FOOD

Less (BG) Normal(GG)

Using Formula 1 we found result for Entropy of food(less) in left side and Entropy of Food(normal) in right side.

Using Formula 2 we need to calculate weight of average

Weight of average =(2/4*1+2/4*0)=0.5

Using Formula 3

Information gain=entropy parent-entropy(food, sleep)= 0.8 - 0.5= 0.33

Information Gain(Food)= 0.33

Internal node is NIGHT BATH


Using Formula1 we found result for Entropy of Night bath(Y)(left) and Entropy of Night bath(n)(right).

Using Formula 2 we need to calculate the Weight of average for Night bath Node.

Weight of average =(3/4*0.83+1/4*0)=0.62

Using Formula 3:

Information gain=entropy parent-entropy(bath, sleep)=0.8-0.62=0.28

Information Gain(Night bath) 0.28