








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An in-depth exploration of Separate Chaining, a popular collision resolution technique used in hash tables. the concept, its implementation using an example, and thoughts on its efficiency. It also discusses Quadratic Probing as an alternative collision resolution method.
Typology: Study notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!
Aim for constant-time (i.e.,
find
,^ insert
, and
delete
A hash table is an array of some fixed size– But growable as we’ll see Spring 2010
CSE332: Data Abstractions
int^
table-index
collision?
collisionresolution
client
hash table library
In terms of a Dictionary ADT for just
insert
,^ find
,^ delete
hash tables and balanced trees are just different data structures– Hash tables
O (1) on average (
assuming
few collisions)
log
n ) worst-case
Constant-time is better, right?– Yes, but you need “hashing to behave” (collisions)– Yes, but
findMin
,^ findMax
,^ predecessor
, and
successor
go from
log
n ) to
n )
3
CSE332: Data Abstractions
When two keys map to the same location in the hash table We try to avoid it, but number-of-keys exceeds table sizeSo hash tables should support collision resolution
CSE332: Data Abstractions
Chaining: All keys that map to the same
table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with
mod hashing and
TableSize
Spring 2010
5
CSE332: Data Abstractions
CSE332: Data Abstractions
10
/^
Chaining: All keys that map to the same
table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with
mod hashing and
TableSize
7
CSE332: Data Abstractions
10
/ 22 /
Chaining: All keys that map to the same
table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with
mod hashing and
TableSize
CSE332: Data Abstractions
10
/ 22 / 107 /
Chaining: All keys that map to the same
table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with
mod hashing and
TableSize
λ ,^ of a hash table is
Spring 2010
13
Under chaining, the average number of elements per bucket is
λ
So if some inserts are followed by
random
finds, then on average:
Each unsuccessful
find
compares against ____ items
Each successful
find
compares against _____ items
λ ,^ of a hash table is
Spring 2010
Under chaining, the average number of elements per bucket is
λ
So if some inserts are followed by
random
finds, then on average:
Each unsuccessful
find
compares against
λ^ items
Each successful
find
compares against
λ^ / 2
items
Another simple idea: If
h(key)
is already full,
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full…
Example: insert 38, 19, 8, 109, 10 Spring 2010
15
CSE332: Data Abstractions
CSE332: Data Abstractions
Another simple idea: If
h(key)
is already full,
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full…
Example: insert 38, 19, 8, 109, 10
17
CSE332: Data Abstractions
Another simple idea: If
h(key)
is already full,
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full…
Example: insert 38, 19, 8, 109, 10
CSE332: Data Abstractions
Another simple idea: If
h(key)
is already full,
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full…
Example: insert 38, 19, 8, 109, 10
19
CSE332: Data Abstractions
Another simple idea: If
h(key)
is already full,
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full, - try
(h(key)
%^ TableSize
. If full…
Example: insert 38, 19, 8, 109, 10
one example
of open addressing
In general, open addressing means resolving collisions by trying a
sequence of other positions in the table. Trying the next spot is called probing
th i probe was
(h(key)
i)^
%^ TableSize
f^ and use
h(key)
f(i)
TableSize
Open addressing does poorly with high load factor
λ
Spring 2010
CSE332: Data Abstractions
Linear-probing performance degrades rapidly as table gets full– (Formula assumes “large table” but point remains)
-^
By comparison, chaining performance is linear in
λ^ and has no
trouble with
λ >
Spring 2010
25
CSE332: Data Abstractions
We can avoid primary clustering by changing the probe function
-^
A common technique is quadratic probing:^ –^
f(i)
(^2) i
th^ probe:
h(key)
TableSize
st^ probe:
(h(key)
%^ TableSize
nd^ probe:
(h(key)
%^ TableSize
rd^ probe:
(h(key)
%^ TableSize
probe:
(h(key)
(^2) i ) %
TableSize
Intuition: Probes quickly “leave the neighborhood” Spring 2010
CSE332: Data Abstractions
27
CSE332: Data Abstractions
CSE332: Data Abstractions
29
CSE332: Data Abstractions
CSE332: Data Abstractions
31
CSE332: Data Abstractions
CSE332: Data Abstractions
37
CSE332: Data Abstractions
TableSize = 7Insert: 76
38
CSE332: Data Abstractions
TableSize = 7Insert: 76
39
CSE332: Data Abstractions
TableSize = 7Insert: 76
Uh-oh: For all
n ,^
((nn)*
7 is
or
(n
((n-7)
c^ and
k ,^
(^2) (n +c)
k^ =
((n-k)
2 +c)
k
TableSize
quadratic probes, we will just
cycle through the same indices
-^
The good news:– Assertion #1: If
=^ TableSize
is^ prime
and
λ^ < ½, then
quadratic probing will find an empty slot in at most
probes
T^ and
≤^ i,j
where
i^
≠^ j
(h(key)
(^2) i ) % T
(h(key)
(^2) j ) % T
So: If you keep
λ^ < ½, no need to detect cycles
Spring 2010
40
CSE332: Data Abstractions
Quadratic probing does not suffer from primary clustering: noproblem with keys initially hashing to the same neighborhood
-^
But it’s no help if keys initially hash to the same index– Called secondary clustering
-^
Can avoid secondary clustering with a probe function thatdepends on the key: double hashing… Spring 2010
41
CSE332: Data Abstractions
h^ and
g , it is very unlikely
that for some
key
,^ h(key)
g(key)
f(i)
ig(key)*
Probe sequence:
th^ probe:
h(key)
TableSize
st^ probe:
(h(key)
g(key))
TableSize
nd^ probe:
(h(key)
2g(key))*
TableSize
rd^ probe:
(h(key)
3g(key))*
TableSize
probe:
(h(key)
ig(key)) %*
TableSize
Detail: Make sure
g(key)
can’t be
Spring 2010
CSE332: Data Abstractions
Intuition: Since each probe is “jumping” by
g(key)
each time,
we “leave the neighborhood”
and
“go different places from other
initial collisions”
-^
But we could still have a problem like in quadratic probing wherewe are not “safe” (infinite loop despite room in table)– It is known that this cannot happen in at least one case:
-^ h(key)
key
p
-^ g(key)
q^ –
(key
q)
q^ <
p
-^ p
and
q^ are prime
Spring 2010
43
CSE332: Data Abstractions
Assume “uniform hashing”– Means probability of
g(key1)
p^ ==
g(key2)
p^ is
1/p
-^
Non-trivial facts we won’t prove:Average # of probes given
λ^ (in the limit as
TableSize
Bottom line: unsuccessful bad (but not as bad as linear probing),but successful is not nearly as bad Spring 2010
CSE332: Data Abstractions
Haven’t emphasized enough for a find or a delete of an item oftype
E , we
hash
E , but then as we go through the chain or keep
probing, we have to
compare
each item we see to
So a hash table needs a hash function and a comparator– In Project 2, you’ll use two function objects– The Java standard library uses a more OO approach where
each object has an
equals
method and a
hashCode
method: Spring 2010
49
CSE332: Data Abstractions
class
Object
boolean
equals(Object
o)
int
hashCode()
The Java library (and your project hash table) make a veryimportant assumption that clients must satisfy…
-^
OO way of saying it:
If^ a.equals(b)
, then we must require
a.hashCode()==b.hashCode()
-^
Function object way of saying i:
If^ c.compare(a,b)
0 , then we must require
h.hash(a)
h.hash(b)
Why is this essential? Spring 2010
50
CSE332: Data Abstractions
Lots of Java libraries use hash tables, perhaps without yourknowledge
-^
So: If you ever override
equals
, you need to override
hashCode
also in a consistent way
51
CSE332: Data Abstractions
52
CSE332: Data Abstractions
class
PolarPoint
double
r^
double
theta
void
addToAngle(double
theta2)
theta+=theta2;
…boolean
equals(Object
otherObject)
if(this==otherObject)
return
true;
if(otherObject==null)
return
false;
if(getClass()!=other.getClass())
return
false;
PolarPoint
other
(PolarPoint)otherObject;
double
angleDiff
(theta
other.theta)
(2Math.PI);*
double
rDiff
r^ –
other.r;
return
Math.abs(angleDiff)
Math.abs(rDiff)
wrong:
must
override
hashCode!
-^ Think about using a hash table holding points }
functions for:– all our dictionaries– sorting (next major topic) In short, comparison must impose a consistent, total ordering:For all
a ,^
b , and
c ,
compare(a,b)
0 , then
compare(b,a)
compare(a,b)
0 , then
compare(b,a)
compare(a,b)
0 and
compare(b,c)
then
compare(a,c)
Spring 2010
53
CSE332: Data Abstractions
The hash table is one of the most important data structures– Supports only
find
,^ insert
, and
delete
efficiently
Important to use a good hash function
-^
Important to keep hash table at a good size
-^
Side-comment: hash functions have uses beyond hash tables– Examples: Cryptography, check-sums Spring 2010
CSE332: Data Abstractions