Ad Space — Top Banner

Shannon's Entropy Formula

Reference for Shannon entropy H(X) = -Σ p(x) log₂ p(x), measuring information in bits.
Covers data compression, cryptography, and feature selection.

The Formula

H = -Σ p(x) × log₂(p(x))

Shannon's entropy measures the average amount of information (in bits) per symbol in a message. Higher entropy means more unpredictability and more bits needed to encode the data.

Variables

SymbolMeaning
HEntropy (measured in bits when using log base 2)
p(x)Probability of each possible symbol or outcome
ΣSum over all possible symbols
log₂Logarithm base 2

Example 1

Find the entropy of a fair coin flip

Two outcomes: Heads (p = 0.5), Tails (p = 0.5)

H = -(0.5 × log₂(0.5) + 0.5 × log₂(0.5))

H = -(0.5 × (-1) + 0.5 × (-1))

H = 1 bit (maximum entropy for two outcomes)

Example 2

A source emits A (70%), B (20%), C (10%). Find the entropy.

H = -(0.7 × log₂(0.7) + 0.2 × log₂(0.2) + 0.1 × log₂(0.1))

H = -(0.7 × (-0.515) + 0.2 × (-2.322) + 0.1 × (-3.322))

H = -(−0.360 − 0.464 − 0.332)

H ≈ 1.157 bits per symbol

When to Use It

Use Shannon's entropy when:

  • Measuring the information content of a data source
  • Designing efficient data compression algorithms
  • Evaluating the randomness or predictability of data
  • Building decision trees in machine learning (information gain)

Key Notes

  • Formula: H = −Σ p(x) log₂ p(x): Sum over all possible outcomes x. The log base 2 gives entropy in bits. Using natural log gives nats; log base 10 gives hartleys (dits).
  • Maximum entropy means maximum uncertainty: Entropy is maximized when all outcomes are equally likely (uniform distribution). A fair coin (H = 1 bit) has more entropy than a biased coin.
  • Zero entropy means certainty: If one outcome has probability 1 and all others 0, entropy is 0 bits — there is no uncertainty at all.
  • Foundation of data compression: Shannon's theorem shows no lossless compression scheme can compress a message below its entropy rate. ZIP, MP3, and JPEG all approach this theoretical limit.
  • Used in machine learning: Decision trees use information gain (reduction in entropy) to choose which feature to split on. Cross-entropy is the standard loss function for classification models.

Ad Space — Bottom Banner

Embed This Calculator

Copy the code below and paste it into your website or blog.
The calculator will work directly on your page.