Decision Tree - Height of Child

Decision Tree are a type of Supervised Machine Learning, where we predicts the value of a target based on several inputs. Each branch of the decision tree could be possible outcome.

We can make decision tree with the help of three Attributes:

  1. Information Gain

  2. Gini Index


Here, will try to make a decision tree with the help of two attributes: Information Gain and Gini Index.


We are going to see here that which feature will predict Height of a Child more accurately:


We will take 3 features namely; Family History, Diet and Physical Activities. Height of Child will be parent node.

  1. First we will use Information Gain, formula for computing Information Gain is:

Formula 1

Entropy(Amount of randomness or things which can't be predicted) of parent node(Height of Child). Formula for Entropy is:

Formula 2

Weighted average of entropy of children

Formula 3

p(short) = fraction of short in chart = 1/4 = 0.25

p(tall) = fraction of tall in chart = 3/4 = 0.75


Therefore, using formula 2:

Entropy(parent) = - {0.25 * log2(0.25) + 0.75 * log2(0.75)

= - {-1.75 + (-0.30)}

= 2.05


  • Now we will calculate Information Gain for our first feature Family History:


Here Parent Node is TSTT and Child Node is TTST:


First we calculate entropy of left side of child node, p(tall) is 2/3= 0.67 and p(short) is 1/3 = 0.33

So, Entropy(left side: TST) = - {0.67 log2(0.67) + 0.33 log2(0.33)}

= 0.9 (using formula 2)


Entropy of right side of child node, p(tall) is 1 and p(short) is 0

So, Entropy(right side:T) = - {o + 1log2(1)}

= 0 (using formula 2)


Weighted Average according to the above formula 3:

= (3/4) * 0.9 + (1/4) * 0

= 0.675


Information Gain for Family History according to the above formula 1 is:

= 2.05 - 0.675

= 1.375



  • Information Gain for our second feature Diet:

Child node is TTTS:

Entropy of left side of child node, p(tall) is 1 and p(short) is 0

Entropy(TTT) = - { 1 log2(1) + o}

= 0 (using formula 2)


Entropy of right side of child node, p(tall) is 0 and p(short) is 0

Entropy(S) = - { 0 + 1 log2(1)}

= 0 (using formula 2)


Weighted average = (3/4) * 0 + (1/4) * 0

= 0 (using formula 3)


Information Gain for Diet = 2.05 - 0

= 2.05 (using formula 1)


  • Information Gain for third feature Physical Activity:

Child Node is TTTS:


Entropy of left side of child node, p(tall) is 1 and p(short) is 0

Entropy(TT) = - { 1 log2(1) + 0}

= 0 (using formula 2)


Entropy of right side of child node, p(tall) is 1/2 = 0.5 and p(short) is 1/2 = 0.5

Entropy(TS) = - {0.5 log2(0.5) + 0.5 log2(0.5)}

= 1 (using formula 2)


Weighted average = (2/4) * 0 + (2/4) * 1

= 0.5 (using formula 3)



Information Gain for Physical Activities = 2.05 - 0.5

= 1.55 (using formula 1)



Information Gain(family history) => 1.375

information Gain(diet) => 2.05

information Gain(physical activities) => 1.55