How Is Information Gain Calculated?

This post will explore the mathematics behind information gain. We’ll start with the base intuition behind information gain, but then explain why it has the calculation that it does.

What is information gain?

Information gain is a measure frequently used in decision trees to determine which variable to split the input dataset on at each step in the tree. Before we formally define this measure we need to first understand the concept of entropy. Entropy measures the amount of information or uncertainty in a variable’s possible values.

How to calculate entropy

Entropy of a random variable X is given by the following formula:

-Σ_i[p(X_i) * log₂(p(X_i))]

Here, each X_i represents each possible (i^th) value of X. p(x_i) is the probability of a particular (the i^th) possible value of X.

Why is it calculated this way?

First, let’s build some intuition behind the entropy formula. The formula has several useful properties. For example, it’s always non-negative. Also, entropy as a function is monotonically decreasing in the probability, p. In other words, the amount of information about an event (or value) of X decreases as the probability of that event increases.

That may sound a little abstract at first, so let’s consider a specific example. Suppose you’re expecting how much a movie will make in revenue. One of your anticipators is a binary indicator – 1 if the record refers to a movie, and 0 otherwise. Well – that anticipator is useless! Mathematically speaking, it’s useless because every record in the dataset is a movie – so there’s a 100% probability of that event (i.e. the record being a movie) occurring. This means that the variable provides no real information about the data. The closer you get to a variable having a single possible value, the less information that single value gives you.

Why the log₂? Technically, entropy can be calculated using a logarithm of a different base (e.g. natural log). However, it’s common to use base 2 because this returns a result in terms of bits. In this way, entropy can be thought of as the average number of bits needed to encode a value for a specific variable.

Case Example

Information gain in the context of decision trees is the reduction in entropy when splitting on variable X. Let’s do an example to make this clear. In the below mini-dataset, the label we’re trying to anticipate is the type of fruit. This is based off the size, color, and shape variables.

Fruit	Size	Color	Shape
Watermelon	Big	Green	Round
Apple	Medium	Red	Round
Banana	Medium	Yellow	Thin
Grape	Small	Green	Round
Grapefruit	Medium	Yellow	Round
Lemon	Small	Yellow	Round

Suppose we want to calculate the information gained if we select the color variable. 3 out of the 6 records are yellow, 2 are green, and 1 is red. Proportionally, the probability of a yellow fruit is 3 / 6 = 0.5; 2 / 6 = 0.333… for green, and 1 / 6 = 0.1666… for red. Using the formula from above, we can calculate it like this:

-([3/6 * log₂(3/6)] + [2/6 * log₂(2/6)] + [1/6 * log₂(1/6)])

= 1.459148

Likewise, we want to get the information gain for the size variable.

-([3/6 * log₂(3/6)] + [2/6 * log₂(2/6)] + [1/6 * log₂(1/6)])

= 1.459148

In this case, 3 / 6 of the fruits are medium-sized, 2 / 6 are small, 1 / 6 is big.

Lastly, we have the shape variable. Here, 5 / 6 of the fruits are round and 1 / 6 is thin.

-([5/6 * log₂(5/6)] + [1/6 * log₂(1/6)])

= 0.650022

How to calculate information gain in R

So, how do we calculate information gain in R? Thankfully, this is fairly simple to do using the FSelector package:

library(FSelector)

info <- data.frame(fruits = c("watermelon", "apple", "banana", "grape", "grapefruit", "lemon"))
info$sizes <- c("big", "medium", "medium","small" ,"medium", "small")
info$colors <- c("green", "red", "yellow", "green", "yellow", "yellow")
info$shapes <- c("round", "round", "thin", "round", "round", "round")

# get information gain results
information.gain(formula(info), info)

Conclusion

That’s all for now! Information gain is just one of many possible feature importance methods, and I’ll have more articles in the future to explore other possibilities. If you liked this post, please follow my blog on Twitter!.

Visit TheAutomatic.net to get additional insight on this topic: http://theautomatic.net/2020/02/18/how-is-information-gain-calculated/

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from TheAutomatic.net and is being posted with its permission. The views expressed in this material are solely those of the author and/or TheAutomatic.net and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

How Is Information Gain Calculated?

Posted February 19, 2021

What is information gain?

How to calculate entropy

Why is it calculated this way?

Case Example

How to calculate information gain in R

Conclusion

Disclosure: Interactive Brokers

IBKR Campus Newsletters

Interactive Brokers Canada Inc.

Interactive Brokers Australia Pty. Ltd.

Interactive Brokers Hong Kong Limited

Interactive Brokers India Pvt. Ltd.

Interactive Brokers Securities Japan Inc.

Interactive Brokers Singapore Pte. Ltd.

What is information gain?

How to calculate entropy

Why is it calculated this way?

Case Example

How to calculate information gain in R

Conclusion

Related Tags

Disclosure: Interactive Brokers

IBKR Campus Newsletters