Ad Space — Top Banner

Hamming Distance Calculator

Calculate Hamming distance between two equal-length strings: count of differing positions.
Works for binary, hex, ASCII, and DNA sequences.

Hamming Distance

Hamming distance is one of the simplest and most useful metrics in information theory. It counts the number of positions where two equal-length strings differ. That single number drives error-correcting codes, DNA sequence comparison, fuzzy string matching, and a lot more.

The definition:

d(x, y) = number of positions where xᵢ ≠ yᵢ

The strings must be the same length. If they are not, Hamming distance is undefined and you need a different metric (Levenshtein distance) that handles insertions and deletions.

Worked example, binary: String 1: 1011101 String 2: 1001001

Compare position by position:

  • pos 1: 1=1 ✓
  • pos 2: 0=0 ✓
  • pos 3: 1≠0 ✗
  • pos 4: 1=1 ✓
  • pos 5: 1≠0 ✗
  • pos 6: 0=0 ✓
  • pos 7: 1=1 ✓

Hamming distance = 2 (two differing positions)

Worked example, text: “karolin” vs “kathrin”

k=k ✓, a=a ✓, r≠t ✗, o≠h ✗, l≠r ✗, i=i ✓, n=n ✓

Hamming distance = 3 (three differing characters)

The XOR trick for binary: For two binary numbers x and y, the Hamming distance equals the number of 1-bits in (x XOR y). This is called the popcount or Hamming weight of the XOR. Modern CPUs have a single popcount instruction (POPCNT on x86, VCNT on ARM) that computes Hamming distance between two registers in a few clock cycles. This is why bit-level operations like Bloom filter lookups, hash table fingerprints, and cryptographic comparisons are so fast.

Richard Hamming and the 1947 punch card incident: Richard Hamming worked at Bell Labs in the 1940s. The story goes that on a Friday afternoon, he submitted a punch card job and went home for the weekend. He came back on Monday and discovered the job had errored out partway through and the system simply skipped it. Furious that the computer could detect errors but not correct them, he developed Hamming codes (1950), which can detect and correct single-bit errors in transmitted data. His framework is the foundation of every modern error-correcting code, including the ones in ECC RAM, RAID storage, and deep-space communication.

The Hamming bound and code design: The minimum Hamming distance of a code (the smallest d between any two valid codewords) determines its error-handling power:

  • To detect up to t errors, minimum distance must be at least t + 1
  • To correct up to t errors, minimum distance must be at least 2t + 1

A code with minimum distance 3 can detect 2 errors or correct 1. A distance-5 code can correct 2 errors. This is called the Hamming bound, and it sets the theoretical limit on how many errors a code of a given length can fix.

Real applications:

  • ECC RAM uses Hamming-style codes to correct single-bit memory errors transparently. Server hardware almost always has it; consumer hardware usually does not.
  • CD audio uses cross-interleaved Reed-Solomon coding (a sibling of Hamming codes) to correct burst errors from scratches.
  • Deep-space probes (Voyager, Mars rovers) use convolutional codes and turbo codes — generalizations of Hamming — to communicate reliably across millions of kilometers with noise close to the Shannon limit.
  • DNA sequence analysis uses Hamming distance to measure mutation count between two aligned sequences of equal length. For variable-length alignment, biologists use Levenshtein or more sophisticated alignment algorithms (Needleman-Wunsch, Smith-Waterman).
  • Fuzzy string matching in databases (PostgreSQL’s levenshtein(), MySQL’s SOUNDEX()) uses related metrics; pure Hamming distance is faster but requires equal-length strings.
  • Cryptographic hash comparisons compare hash digests bit-by-bit. Hamming distance between two random SHA-256 hashes should be around 128 bits (half the 256 total), as a sanity check that the hash function behaves randomly.

Hamming distance vs Levenshtein distance:

Property Hamming Levenshtein
Operations Substitution only Insertion, deletion, substitution
String length Must be equal Can differ
Computation O(n) O(n × m)
Use case Fixed-length codewords, DNA alignment Spell check, version diffs

Beyond binary: The formula extends to any alphabet. Comparing DNA sequences (alphabet ACGT) uses Hamming distance directly. Comparing color values (RGB triples) at fixed positions also fits. Anywhere you have two equal-length sequences and want to know “how different are they” with a single number, Hamming distance is the right starting tool.


Ad Space — Bottom Banner

Embed This Calculator

Copy the code below and paste it into your website or blog.
The calculator will work directly on your page.