Prior Page     Next Page     This Chapter    Prior Chapter








BUZ Hash

This last hash function is also named after its inventor; it will generate up to 232 different hash values, but it requires more Pascal initialization code. In most other languages the initialization can be done statically in the program; so no initialization call, or activity is necessary.

In this hash like pkphash the noise added to the argument comes from a randomized table. In pkphash it was a permuted array of characters, like the key in a character substitution cipher. For buzhash it is a set of 32-bit `random' aliases for each character. These aliases are made so that for every bit position exactly one half of the aliases have a one and the other half have a zero. These aliases are stored in the table buztbl, which is created in this format by the initialization process. Hashing is done by exclusive-oring together the random aliases for each character in the key. For each character each exclusive-or has exactly a 50% chance of inverting every bit. To make sure that repeated characters will add noise into a different spot in the developing hash value, h, it is rotated one bit for each character inserted.

This means that if keys commonly repeat after exactly 32 characters the input set will not hash well. When keys commonly repeat at exactly 32 characters, one can correct this buzhash deficiency by either just hashing the first 32 characters, or by adding a common character, like blank, at the beginning of every key (thus the repeat will happen after 33 characters). The chance of running into this problem and having it affect you is really small, so small it is worth forgetting. Another more general solution is to switch alias tables after every thirty-two characters to inject additional information. But keys longer than thirty-two characters are so rare, that to build this into the algorithm would slow every hash down for such a small proportion of hash keys that it is not worth considering.

TYPE byte    = 0..255;
TYPE rowbit  = ARRAY [byte] OF 0..1;
TYPE rowlong = ARRAY [byte] OF longint;



VAR
buztbl:   rowlong;    { Global variables for hash function }
buzhinit: longint;

PROCEDURE buzinit( i1, i2: longint );

{ i1, i2 are the seeds for the random number generator }

VAR
btab: rowbit;
i, j: byte;

BEGIN

    tauset( i1, i2 );

    FOR i := 0 TO 255 DO
    BEGIN
        btab[i]   := BAND(i,$1);
        buztbl[i] := 0;
    END;


    FOR i := 0 TO 31 DO
    BEGIN
        Permute ( btab, 256 );
        FOR j := 0 TO 255 DO
            buztbl[j] := BOR( BSL( buztbl[j], 1 ), btab[j] );
    END;


    FOR i := 0 TO 31 DO
       btab[i] := BAND(i,$1);

    Permute ( btab, 32 );

    buzhinit := 0;
    FOR i := 0 TO 31 DO
        buzhinit := BOR( BSL( buzhinit, 1 ), btab[i] );

END;


FUNCTION buzhash( key: STRING; b: integer ): longint;

{ key is the string to be hashed;
   b   is the number of hash values: [0,b-1] }

VAR
    n, h, i: longint;
BEGIN

    n := LENGTH( key );
    h := buzhinit;

    FOR i := 1 TO n DO
        h := BXOR( BROTL( h, 1), buztbl[ ORD(key[i]) ] );

    h := BAND( h, $7fffffff );  /* get rid of random sign bit */
    IF ( b > 0 )
    THEN buzhash := h MOD b
    ELSE buzhash := h;

END;








Prior Page     Next Page     This Chapter    Prior Chapter


Copyright © 1995 Robert Uzgalis. All Rights Reserved.
Contact: buz@cs.aukuni.ac.nz