The SOUNDEX Code
The SOUNDEX code was devised by Margaret K.Odell and Robert
C.Russell [US Patent 1261167 (1918) 1435663 (1922)].
The purpose of the SOUNDEX system is to cluster together names that have similar sounds.
(The algorithm is case insensitive, so we can assume all letters are capitals, or lowercase if you prefer.)
Here is the SOUNDEX algorithm as reported by Donald Knuth [The Art of Computer, Volume 3]:
- Retain the first letter of the name, and drop all occurrences of
a,e,h,i,o,u,w,y in other positions.
- Assign the following numbers to the remaining letters after the first:
- b,f,p,v -> 1
- c,g,j,k,q,s,x,z -> 2
- d,t -> 3
- l -> 4
- m,n -> 5
- r -> 6
- If two or more letters with the same code were adjacent in the original
name (before step 1), omit all but the first.
- Convert to the for, "letter, digit, digit, digit" by adding trailing zeros
(if there are less than three digits), or by dropping rightmost digits (if
there are more than three).
Here are examples of applications of the SIMPLEX algorithm (also from Knuth):
Euler, Ellery -> E460
Gauss, Ghosh -> G200
Hilbert,Heilbronn -> H416
Knuth, Kant -> K530
Lloyd, Ladd -> L300
Lukasiewicz, Lissajous -> L222
Here are some sites with information on the SIMPLEX code: