Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Separate Chaining in Hash Tables: Understanding the Technique, Study notes of Data Structures and Algorithms

An in-depth exploration of Separate Chaining, a popular collision resolution technique used in hash tables. the concept, its implementation using an example, and thoughts on its efficiency. It also discusses Quadratic Probing as an alternative collision resolution method.

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

nath
nath 🇬🇧

4.9

(8)

257 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE332: Data Abstractions
Lecture 11: Hash Tables
Dan Grossman
Spring 2010
Hash Tables: Review
Aim for constant-time (i.e., O(1)) find, insert, and delete
“On average” under some reasonable assumptions
A hash table is an array of some fixed size
But growable as we’ll see
Spring 2010 2CSE332: Data Abstractions
Eint table-index collision? collision
resolution
client hash table library
0
TableSize –1
hash table
Hash Tables: A Different ADT?
In terms of a Dictionary ADT for just insert, find, delete,
hash tables and balanced trees are just different data structures
Hash tables O(1) on average (assuming few collisions)
Balanced trees O(log n) worst-case
Constant-time is better, right?
Yes, but you need “hashing to behave” (collisions)
Yes, but findMin, findMax, predecessor, and
successor go from O(log n) to O(n)
Why your textbook considers this to be a different ADT
Not so important to argue over the definitions
Spring 2010 3CSE332: Data Abstractions
Collision resolution
Collision:
When two keys map to the same location in the hash table
We try to avoid it, but number-of-keys exceeds table size
So hash tables should support collision resolution
Ideas?
Spring 2010 4CSE332: Data Abstractions
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Separate Chaining in Hash Tables: Understanding the Technique and more Study notes Data Structures and Algorithms in PDF only on Docsity!

CSE332: Data AbstractionsLecture 11: Hash Tables

Dan GrossmanSpring 2010

Hash Tables: Review •^

Aim for constant-time (i.e.,

O (1))

find

,^ insert

, and

delete

  • “On average” under some reasonable assumptions -^

A hash table is an array of some fixed size– But growable as we’ll see Spring 2010

CSE332: Data Abstractions

E^

int^

table-index

collision?

collisionresolution

client

hash table library

TableSize –

hash table

Hash Tables: A Different ADT? •^

In terms of a Dictionary ADT for just

insert

,^ find

,^ delete

hash tables and balanced trees are just different data structures– Hash tables

O (1) on average (

assuming

few collisions)

  • Balanced trees

O (

log

n ) worst-case

•^

Constant-time is better, right?– Yes, but you need “hashing to behave” (collisions)– Yes, but

findMin

,^ findMax

,^ predecessor

, and

successor

go from

O (

log

n ) to

O (

n )

  • Why your textbook considers this to be a different ADT• Not so important to argue over the definitions Spring 2010

3

CSE332: Data Abstractions

Collision resolution Collision:

When two keys map to the same location in the hash table We try to avoid it, but number-of-keys exceeds table sizeSo hash tables should support collision resolution

  • Ideas? Spring 2010

CSE332: Data Abstractions

Separate Chaining

Chaining: All keys that map to the same

table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with

mod hashing and

TableSize

Spring 2010

5

CSE332: Data Abstractions

Separate Chaining Spring 2010

CSE332: Data Abstractions

10

/^

Chaining: All keys that map to the same

table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with

mod hashing and

TableSize

Separate Chaining Spring 2010

7

CSE332: Data Abstractions

10

/ 22 /

Chaining: All keys that map to the same

table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with

mod hashing and

TableSize

Separate Chaining Spring 2010

CSE332: Data Abstractions

10

/ 22 / 107 /

Chaining: All keys that map to the same

table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with

mod hashing and

TableSize

More rigorous chaining analysis Definition: The load factor,

λ ,^ of a hash table is

Spring 2010

13

N TableSize CSE332: Data Abstractions

←^

number of elements

Under chaining, the average number of elements per bucket is

λ

So if some inserts are followed by

random

finds, then on average:

•^

Each unsuccessful

find

compares against ____ items

•^

Each successful

find

compares against _____ items

More rigorous chaining analysis Definition: The load factor,

λ ,^ of a hash table is

Spring 2010

N TableSize CSE332: Data Abstractions

←^

number of elements

Under chaining, the average number of elements per bucket is

λ

So if some inserts are followed by

random

finds, then on average:

•^

Each unsuccessful

find

compares against

λ^ items

•^

Each successful

find

compares against

λ^ / 2

items

Alternative: Use empty space in the table •^

Another simple idea: If

h(key)

is already full,

  • try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10 Spring 2010

15

CSE332: Data Abstractions

Alternative: Use empty space in the table Spring 2010

CSE332: Data Abstractions

•^

Another simple idea: If

h(key)

is already full,

  • try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10

Alternative: Use empty space in the table Spring 2010

17

CSE332: Data Abstractions

•^

Another simple idea: If

h(key)

is already full,

  • try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10

Alternative: Use empty space in the table Spring 2010

CSE332: Data Abstractions

•^

Another simple idea: If

h(key)

is already full,

  • try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10

Alternative: Use empty space in the table Spring 2010

19

CSE332: Data Abstractions

•^

Another simple idea: If

h(key)

is already full,

  • try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10

Open addressing This is

one example

of open addressing

In general, open addressing means resolving collisions by trying a

sequence of other positions in the table. Trying the next spot is called probing

  • Our

th i probe was

(h(key)

+^

i)^

%^ TableSize

  • This is called linear probing
    • In general have some probe function

f^ and use

h(key)

+^

f(i)

%^

TableSize

Open addressing does poorly with high load factor

λ

  • So want larger tables– Too many probes means no more

O (1)

Spring 2010

CSE332: Data Abstractions

In a chart •^

Linear-probing performance degrades rapidly as table gets full– (Formula assumes “large table” but point remains)

-^

By comparison, chaining performance is linear in

λ^ and has no

trouble with

λ >

Spring 2010

25

CSE332: Data Abstractions

Quadratic probing •^

We can avoid primary clustering by changing the probe function

-^

A common technique is quadratic probing:^ –^

f(i)

=^

(^2) i

  • So probe sequence is:

th^ probe:

h(key)

%^

TableSize

st^ probe:

(h(key)

+^

1)^

%^ TableSize

nd^ probe:

(h(key)

+^

4)^

%^ TableSize

rd^ probe:

(h(key)

+^

9)^

%^ TableSize

  • …th• i^

probe:

(h(key)

+^

(^2) i ) %

TableSize

•^

Intuition: Probes quickly “leave the neighborhood” Spring 2010

CSE332: Data Abstractions

Quadratic Probing Example Spring 2010

27

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

29

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

31

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Another Quadratic Probing Example Spring 2010

37

CSE332: Data Abstractions

TableSize = 7Insert: 76

Another Quadratic Probing Example Spring 2010

38

CSE332: Data Abstractions

TableSize = 7Insert: 76

Another Quadratic Probing Example Spring 2010

39

CSE332: Data Abstractions

TableSize = 7Insert: 76

Uh-oh: For all

n ,^

((nn)*

%^

7 is

or

  • Excel shows takes “at least” 50 probes and a pattern• Proof uses induction and

(n

%^

((n-7)

  • In fact, for all

c^ and

k ,^

(^2) (n +c)

%^

k^ =

((n-k)

2 +c)

%^

k

From bad news to good news • The bad news is: After

TableSize

quadratic probes, we will just

cycle through the same indices

-^

The good news:– Assertion #1: If

T^

=^ TableSize

is^ prime

and

λ^ < ½, then

quadratic probing will find an empty slot in at most

T/

probes

  • Assertion #2: For prime

T^ and

^0

≤^ i,j

≤^

T/

where

i^

≠^ j

(h(key)

+^

(^2) i ) % T

≠^

(h(key)

+^

(^2) j ) % T

  • Assertion #3: Assertion #2 is the “key fact” for provingAssertion # -^

So: If you keep

λ^ < ½, no need to detect cycles

Spring 2010

40

CSE332: Data Abstractions

Clustering reconsidered •^

Quadratic probing does not suffer from primary clustering: noproblem with keys initially hashing to the same neighborhood

-^

But it’s no help if keys initially hash to the same index– Called secondary clustering

-^

Can avoid secondary clustering with a probe function thatdepends on the key: double hashing… Spring 2010

41

CSE332: Data Abstractions

Double hashing Idea:

  • Given two good hash functions

h^ and

g , it is very unlikely

that for some

key

,^ h(key)

g(key)

  • So make the probe function

f(i)

=^

ig(key)*

Probe sequence:

th^ probe:

h(key)

%^

TableSize

st^ probe:

(h(key)

+^

g(key))

%^

TableSize

nd^ probe:

(h(key)

+^

2g(key))*

%^

TableSize

rd^ probe:

(h(key)

+^

3g(key))*

%^

TableSize

  • …th• i^

probe:

(h(key)

+^

ig(key)) %*

TableSize

Detail: Make sure

g(key)

can’t be

^0

Spring 2010

CSE332: Data Abstractions

Double-hashing analysis •^

Intuition: Since each probe is “jumping” by

g(key)

each time,

we “leave the neighborhood”

and

“go different places from other

initial collisions”

-^

But we could still have a problem like in quadratic probing wherewe are not “safe” (infinite loop despite room in table)– It is known that this cannot happen in at least one case:

-^ h(key)

=^

key

%^

p

-^ g(key)

=^

q^

(key

%^

q)

•^^2

<^

q^ <

p

-^ p

and

q^ are prime

Spring 2010

43

CSE332: Data Abstractions

More double-hashing facts •^

Assume “uniform hashing”– Means probability of

g(key1)

%^

p^ ==

g(key2)

%^

p^ is

1/p

-^

Non-trivial facts we won’t prove:Average # of probes given

λ^ (in the limit as

TableSize

→^ ∞

  • Unsuccessful search (intuitive):– Successful search (less intuitive): -^

Bottom line: unsuccessful bad (but not as bad as linear probing),but successful is not nearly as bad Spring 2010

CSE332: Data Abstractions

log

e^1

⎛^

⎜^

Hashing and comparing •^

Haven’t emphasized enough for a find or a delete of an item oftype

E , we

hash

E , but then as we go through the chain or keep

probing, we have to

compare

each item we see to

E.

•^

So a hash table needs a hash function and a comparator– In Project 2, you’ll use two function objects– The Java standard library uses a more OO approach where

each object has an

equals

method and a

hashCode

method: Spring 2010

49

CSE332: Data Abstractions

class

Object

boolean

equals(Object

o)

int

hashCode()

Equal objects must hash the same •^

The Java library (and your project hash table) make a veryimportant assumption that clients must satisfy…

-^

OO way of saying it:

If^ a.equals(b)

, then we must require

a.hashCode()==b.hashCode()

-^

Function object way of saying i:

If^ c.compare(a,b)

0 , then we must require

h.hash(a)

h.hash(b)

•^

Why is this essential? Spring 2010

50

CSE332: Data Abstractions

Java bottom line •^

Lots of Java libraries use hash tables, perhaps without yourknowledge

-^

So: If you ever override

equals

, you need to override

hashCode

also in a consistent way

  • See CoreJava book, Chapter 5 for other “gotchas” with^ equals Spring 2010

51

CSE332: Data Abstractions

Bad Example Spring 2010

52

CSE332: Data Abstractions

class

PolarPoint

double

r^

=^ 0.0;

double

theta

=^

void

addToAngle(double

theta2)

{^

theta+=theta2;

…boolean

equals(Object

otherObject)

if(this==otherObject)

return

true;

if(otherObject==null)

return

false;

if(getClass()!=other.getClass())

return

false;

PolarPoint

other

=^

(PolarPoint)otherObject;

double

angleDiff

(theta

–^

other.theta)

%^

(2Math.PI);*

double

rDiff

=^

r^

other.r;

return

Math.abs(angleDiff)

<^

&&^

Math.abs(rDiff)

<^

wrong:

must

override

hashCode!

-^ Think about using a hash table holding points }

By the way: comparison has rules too We didn’t emphasize some important “rules” about comparison

functions for:– all our dictionaries– sorting (next major topic) In short, comparison must impose a consistent, total ordering:For all

a ,^

b , and

c ,

  • If

compare(a,b)

<^

0 , then

compare(b,a)

>^

  • If

compare(a,b)

0 , then

compare(b,a)

  • If

compare(a,b)

<^

0 and

compare(b,c)

<^

then

compare(a,c)

<^

Spring 2010

53

CSE332: Data Abstractions

Final word on hashing •^

The hash table is one of the most important data structures– Supports only

find

,^ insert

, and

delete

efficiently

•^

Important to use a good hash function

-^

Important to keep hash table at a good size

-^

Side-comment: hash functions have uses beyond hash tables– Examples: Cryptography, check-sums Spring 2010

CSE332: Data Abstractions