There are better and worse hash functions. Strong hash functions make it extremely unlikely that two different documents share a hash value. Furthermore, hash functions used for cryptography must be one-way -- that is, given a hash code, you should not be able to create a document with that hash code. A strong one-way hash function must meet several related criteria. These criteria include:
Determinism
The same document always has the same hash code. The hash code does not depend on the time it's calculated, a random number, or anything other than the sequence of bytes in the document. Without this requirement, the same document could have different hash codes at different times, indicating that documents had changed when in fact they hadn't.
Uniform distribution
Given any sample of the documents you wish to track, all hash codes are equally likely. For instance, if the hash code is a 64-bitlogn, even and odd numbers should be equally likely.
Impossible to reverse engineer
There should be no means easier than brute force to produce a document that matches a certain hash code. For instance, if I know the hash code is 9'423'456'789, I should NOT be able to then create a file that happens to have that exact hash code.
No collisions
It should be difficult to find two documents that share a hash code. You cannot easily find two documents with the same hash code, regardless of what that hash code is. The previous criterion means that you can't change a document to match a hash code. This criterion says you can't change two documents to match each other.
Sensitive dependence on initial conditions
Small changes in a document produce large changes in its hash code. Without this requirement, somebody attempting to create a document with a given hash code could modify the document a little at a time until the hash code matched, much as you might adjust the hot and cold water faucets gradually until the water reaches a desired temperature. A hash function should act more like a faucet that can scald or freeze you after the tiniest nudge.
Randomness
The hash code does not say anything about the document it represents. The one-way hash function is not even partially invertible. For instance, knowing that the hash code is even should not suggest that the document being hashed contains an even number of bytes. Nor should it suggest that the document being hashed is 60% more likely to contain an even number of bytes than an odd number. While one-way hash functions need to be reproducible -- that is, the sam e document always has the same hash code -- they should otherwise be completely random. It is extremely hard, perhaps impossible, to prove that any function meets this criterion. Nonetherless, stronger functions come closer than weaker functions; and years of experience among cryptoghraphers allow them to make reasonable guesses about what are and are not strong hash functions, even if their hunches cant't be proved to a mathematical certainty.
The proper design of one-way hash functions is a well-studied field. It's easy to create weak one-way hash functions. However, it is much harder to create truly strong, reliable, one-way hash functions. Nonexperts tend to make nonobvious but serious mistakes when implementing hash functions. Therefore, this is a task that's best left to the experts. Fortunately, the Java core API contains some hash functions designed by experts that the rest of us can use without earning a PhD in applied mathematics first.
The hash codes used by the java.util.Hashtable class and returned by the hashCode() method of any Java object are only intended to be used as IDs for elements of a hash table, not as cryptographically strong digests. These sorts of hash codes have different requirements for utility. Most of the time, they only need to meet the first two of the six criteria, and in practice they often don't meet even those. The hashCode() method is a hash function but not necesserily a one-way hash function.
Виртуальная песочница (тм)
Monday, September 21, 2009
Requirements for Hash Functions
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment