3. Evaluation of Java Hashing

Java is strongly typed and as such it includes built-in hash functions which hash every base type and return a signed integer. By including the hash function as a part of the language API, Java took the responsibility for hashing away from the user. This puts much stronger requirements on the Java specification to define properties of the hash-function that programmers can use to do what they want.

Following is an evaluation of the current Java hash functions. There are three levels of criticism: language design, specification, and implementation. The design is implicit in the specification and implementation. I have used the December 1995 version of the Java specification. and I have used the Sun Java 1.0.1 Java development kit as ported to Linux as the basis of implementation criticism. But most of the criticism is directed not at the implementation but at the design and coincidentally the specification. The specification has many problems, and it is still evolving.

Criticism 1: Bad Distribution of Values.

The current Java implementation of the string hash function does not distribute hash values well. Here is an example of hash values from some short strings.

User	\|	Java
Supplied	\|	Hash
String	\|	Value

"a"	\|	97
"b"	\|	98
"c"	\|	99
	\|
"0"	\|	48
"1"	\|	49
"2"	\|	50
"3"	\|	51

These hash values should represent a random distribution drawn from the integers. However instead they reflect a small, tight, highly organized, portion of the integers. In particular the hash values should not have any numeric relationship with one another, nor should they reflect the numeric relationship of the original keys. These hash values represent a violation of both the first and second criteria for library hash functions.

Using a better hash function, like the one in the appendix to this paper, one might get something like the following:

User	\|	Full 64-Bit	\|	Modulo
Supplied	\|	Hash Value	\|	1,000,000,000
String	\|	in Hex	\|	Hash Value

"a"	\|	d3e726f5 fe5354f9	\|	782547517
"b"	\|	10efc854 cd7daf94	\|	6048257
"c"	\|	f5522d6c 652bc971	\|	11797343
	\|		\|
"0"	\|	3f9f5eba 50a2b859	\|	406446579
"1"	\|	9adab006 8f8301e4	\|	172457
"2"	\|	fb154dd9 2ecfaf4b	\|	390662591
"3"	\|	4e22c1f5 b342cd09	\|	900541156

Criticism 2: Too Small a Hash Range.

Java uses 32-bit integers as the hash value range. Conceptually hash values from any class should be uniformly distributed across the largest integer domain available in the language. In Java this is long, which is specified as 64 bits.

Criticism 3: No Hash Value Operations.

No guidelines in Java are provided for creating hashing values from other hash values as a part of an abstract type. If I create a new type how do I build a hash value out of the hash values of its components? Clearly the components that should go into a hash value are the components which determine if one value of a type is equal to another value of the same type. But no guidance is given as how to combine hash values.

Criticism 4: No Defined Hash Algorithm in the Specifications.

One last issue needs to be addressed. Java specifies a computation more precisely than most languages so that across a network with various implementations the same results are obtained. For hashing this means that the Java language specification or the application interface specification should define a particular hashing method. However no hash method is specified in any part of the language specification. This means that the programmer can not depend on any behavior at all -- this is clearly inadequate for programming anything.