Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Separate Chaining in Hash Tables: Understanding the Technique, Study notes of Data Structures and Algorithms

Alder Grange High School, Rawtenstall Data Structures and Algorithms

An in-depth exploration of Separate Chaining, a popular collision resolution technique used in hash tables. the concept, its implementation using an example, and thoughts on its efficiency. It also discusses Quadratic Probing as an alternative collision resolution method.

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

nath 🇬🇧

4.9

(8)

257 documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

CSE332: Data Abstractions

Lecture 11: Hash Tables

Dan Grossman

Spring 2010

Hash Tables: Review

• Aim for constant-time (i.e., O(1)) find, insert, and delete

– “On average” under some reasonable assumptions

• A hash table is an array of some fixed size

– But growable as we’ll see

Spring 2010 2CSE332: Data Abstractions

Eint table-index collision? collision

resolution

client hash table library

0

…

TableSize –1

hash table

Hash Tables: A Different ADT?

• In terms of a Dictionary ADT for just insert, find, delete,

hash tables and balanced trees are just different data structures

– Hash tables O(1) on average (assuming few collisions)

– Balanced trees O(log n) worst-case

• Constant-time is better, right?

– Yes, but you need “hashing to behave” (collisions)

– Yes, but findMin, findMax, predecessor, and

successor go from O(log n) to O(n)

• Why your textbook considers this to be a different ADT

• Not so important to argue over the definitions

Spring 2010 3CSE332: Data Abstractions

Collision resolution

Collision:

When two keys map to the same location in the hash table

We try to avoid it, but number-of-keys exceeds table size

So hash tables should support collision resolution

– Ideas?

Spring 2010 4CSE332: Data Abstractions

Partial preview of the text

Download Separate Chaining in Hash Tables: Understanding the Technique and more Study notes Data Structures and Algorithms in PDF only on Docsity!

CSE332: Data AbstractionsLecture 11: Hash Tables

Dan GrossmanSpring 2010

Hash Tables: Review •^

Aim for constant-time (i.e.,

O (1))

find

,^ insert

, and

delete

“On average” under some reasonable assumptions -^

A hash table is an array of some fixed size– But growable as we’ll see Spring 2010

CSE332: Data Abstractions

E^

int^

table-index

collision?

collisionresolution

client

hash table library

TableSize –

hash table

Hash Tables: A Different ADT? •^

In terms of a Dictionary ADT for just

insert

,^ find

,^ delete

hash tables and balanced trees are just different data structures– Hash tables

O (1) on average (

assuming

few collisions)

Balanced trees

O (

log

n ) worst-case

•^

Constant-time is better, right?– Yes, but you need “hashing to behave” (collisions)– Yes, but

findMin

,^ findMax

,^ predecessor

, and

successor

go from

O (

log

n ) to

O (

n )

Why your textbook considers this to be a different ADT• Not so important to argue over the definitions Spring 2010

3

CSE332: Data Abstractions

Collision resolution Collision:

When two keys map to the same location in the hash table We try to avoid it, but number-of-keys exceeds table sizeSo hash tables should support collision resolution

Ideas? Spring 2010

CSE332: Data Abstractions

Separate Chaining

Chaining: All keys that map to the same

table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with

mod hashing and

TableSize

Spring 2010

5

CSE332: Data Abstractions

Separate Chaining Spring 2010

CSE332: Data Abstractions

10

/^

Chaining: All keys that map to the same

table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with

mod hashing and

TableSize

Separate Chaining Spring 2010

7

CSE332: Data Abstractions

10

/ 22 /

Chaining: All keys that map to the same

table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with

mod hashing and

TableSize

Separate Chaining Spring 2010

CSE332: Data Abstractions

10

/ 22 / 107 /

Chaining: All keys that map to the same

table location are kept in a list(a.k.a. a “chain” or “bucket”) As easy as it soundsExample: insert 10, 22, 107, 12, 42 with

mod hashing and

TableSize

More rigorous chaining analysis Definition: The load factor,

λ ,^ of a hash table is

Spring 2010

13

N TableSize CSE332: Data Abstractions

←^

number of elements

Under chaining, the average number of elements per bucket is

λ

So if some inserts are followed by

random

finds, then on average:

•^

Each unsuccessful

find

compares against ____ items

•^

Each successful

find

compares against _____ items

More rigorous chaining analysis Definition: The load factor,

λ ,^ of a hash table is

Spring 2010

N TableSize CSE332: Data Abstractions

←^

number of elements

Under chaining, the average number of elements per bucket is

λ

So if some inserts are followed by

random

finds, then on average:

•^

Each unsuccessful

find

compares against

λ^ items

•^

Each successful

find

compares against

λ^ / 2

items

Alternative: Use empty space in the table •^

Another simple idea: If

h(key)

is already full,

try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10 Spring 2010

15

CSE332: Data Abstractions

Alternative: Use empty space in the table Spring 2010

CSE332: Data Abstractions

•^

Another simple idea: If

h(key)

is already full,

try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10

Alternative: Use empty space in the table Spring 2010

17

CSE332: Data Abstractions

•^

Another simple idea: If

h(key)

is already full,

try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10

Alternative: Use empty space in the table Spring 2010

CSE332: Data Abstractions

•^

Another simple idea: If

h(key)

is already full,

try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10

Alternative: Use empty space in the table Spring 2010

19

CSE332: Data Abstractions

•^

Another simple idea: If

h(key)

is already full,

try

(h(key)

+^

1)^

%^ TableSize

. If full, - try

(h(key)

+^

2)^

%^ TableSize

. If full, - try

(h(key)

+^

3)^

%^ TableSize

. If full…

•^

Example: insert 38, 19, 8, 109, 10

Open addressing This is

one example

of open addressing

In general, open addressing means resolving collisions by trying a

sequence of other positions in the table. Trying the next spot is called probing

Our

th i probe was

(h(key)

+^

i)^

%^ TableSize

This is called linear probing
- In general have some probe function

f^ and use

h(key)

+^

f(i)

%^

TableSize

Open addressing does poorly with high load factor

λ

So want larger tables– Too many probes means no more

O (1)

Spring 2010

CSE332: Data Abstractions

In a chart •^

Linear-probing performance degrades rapidly as table gets full– (Formula assumes “large table” but point remains)

-^

By comparison, chaining performance is linear in

λ^ and has no

trouble with

λ >

Spring 2010

25

CSE332: Data Abstractions

Quadratic probing •^

We can avoid primary clustering by changing the probe function

-^

A common technique is quadratic probing:^ –^

f(i)

=^

(^2) i

So probe sequence is:

th^ probe:

h(key)

%^

TableSize

st^ probe:

(h(key)

+^

1)^

%^ TableSize

nd^ probe:

(h(key)

+^

4)^

%^ TableSize

rd^ probe:

(h(key)

+^

9)^

%^ TableSize

…th• i^

probe:

(h(key)

+^

(^2) i ) %

TableSize

•^

Intuition: Probes quickly “leave the neighborhood” Spring 2010

CSE332: Data Abstractions

Quadratic Probing Example Spring 2010

27

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

29

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

31

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Quadratic Probing Example Spring 2010

CSE332: Data Abstractions

TableSize=10Insert: 8918495879

Another Quadratic Probing Example Spring 2010

37

CSE332: Data Abstractions

TableSize = 7Insert: 76

Another Quadratic Probing Example Spring 2010

38

CSE332: Data Abstractions

TableSize = 7Insert: 76

Another Quadratic Probing Example Spring 2010

39

CSE332: Data Abstractions

TableSize = 7Insert: 76

Uh-oh: For all

n ,^

((nn)*

%^

7 is

or

Excel shows takes “at least” 50 probes and a pattern• Proof uses induction and

(n

%^

((n-7)

In fact, for all

c^ and

k ,^

(^2) (n +c)

%^

k^ =

((n-k)

2 +c)

%^

k

From bad news to good news • The bad news is: After

TableSize

quadratic probes, we will just

cycle through the same indices

-^

The good news:– Assertion #1: If

T^

=^ TableSize

is^ prime

and

λ^ < ½, then

quadratic probing will find an empty slot in at most

T/

probes

Assertion #2: For prime

T^ and

^0

≤^ i,j

≤^

T/

where

i^

≠^ j

(h(key)

+^

(^2) i ) % T

≠^

(h(key)

+^

(^2) j ) % T

Assertion #3: Assertion #2 is the “key fact” for provingAssertion # -^

So: If you keep

λ^ < ½, no need to detect cycles

Spring 2010

40

CSE332: Data Abstractions

Clustering reconsidered •^

Quadratic probing does not suffer from primary clustering: noproblem with keys initially hashing to the same neighborhood

-^

But it’s no help if keys initially hash to the same index– Called secondary clustering

-^

Can avoid secondary clustering with a probe function thatdepends on the key: double hashing… Spring 2010

41

CSE332: Data Abstractions

Double hashing Idea:

Given two good hash functions

h^ and

g , it is very unlikely

that for some

key

,^ h(key)

g(key)

So make the probe function

f(i)

=^

ig(key)*

Probe sequence:

th^ probe:

h(key)

%^

TableSize

st^ probe:

(h(key)

+^

g(key))

%^

TableSize

nd^ probe:

(h(key)

+^

2g(key))*

%^

TableSize

rd^ probe:

(h(key)

+^

3g(key))*

%^

TableSize

…th• i^

probe:

(h(key)

+^

ig(key)) %*

TableSize

Detail: Make sure

g(key)

can’t be

^0

Spring 2010

CSE332: Data Abstractions

Double-hashing analysis •^

Intuition: Since each probe is “jumping” by

g(key)

each time,

we “leave the neighborhood”

and

“go different places from other

initial collisions”

-^

But we could still have a problem like in quadratic probing wherewe are not “safe” (infinite loop despite room in table)– It is known that this cannot happen in at least one case:

-^ h(key)

=^

key

%^

p

-^ g(key)

=^

q^ –

(key

%^

q)

•^^2

<^

q^ <

p

-^ p

and

q^ are prime

Spring 2010

43

CSE332: Data Abstractions

More double-hashing facts •^

Assume “uniform hashing”– Means probability of

g(key1)

%^

p^ ==

g(key2)

%^

p^ is

1/p

-^

Non-trivial facts we won’t prove:Average # of probes given

λ^ (in the limit as

TableSize

→^ ∞

Unsuccessful search (intuitive):– Successful search (less intuitive): -^

Bottom line: unsuccessful bad (but not as bad as linear probing),but successful is not nearly as bad Spring 2010

CSE332: Data Abstractions

log

e^1

⎛^

⎜^

Hashing and comparing •^

Haven’t emphasized enough for a find or a delete of an item oftype

E , we

hash

E , but then as we go through the chain or keep

probing, we have to

compare

each item we see to

E. •^

So a hash table needs a hash function and a comparator– In Project 2, you’ll use two function objects– The Java standard library uses a more OO approach where

each object has an

equals

method and a

hashCode

method: Spring 2010

49

CSE332: Data Abstractions

class

Object

boolean

equals(Object

o)

int

hashCode()

Equal objects must hash the same •^

The Java library (and your project hash table) make a veryimportant assumption that clients must satisfy…

-^

OO way of saying it:

If^ a.equals(b)

, then we must require

a.hashCode()==b.hashCode()

-^

Function object way of saying i:

If^ c.compare(a,b)

0 , then we must require

h.hash(a)

h.hash(b)

•^

Why is this essential? Spring 2010

50

CSE332: Data Abstractions

Java bottom line •^

Lots of Java libraries use hash tables, perhaps without yourknowledge

-^

So: If you ever override

equals

, you need to override

hashCode

also in a consistent way

See CoreJava book, Chapter 5 for other “gotchas” with^ equals Spring 2010

51

CSE332: Data Abstractions

Bad Example Spring 2010

52

CSE332: Data Abstractions

class

PolarPoint

double

r^

=^ 0.0;

double

theta

=^

void

addToAngle(double

theta2)

{^

theta+=theta2;

…boolean

equals(Object

otherObject)

if(this==otherObject)

return

true;

if(otherObject==null)

return

false;

if(getClass()!=other.getClass())

return

false;

PolarPoint

other

=^

(PolarPoint)otherObject;

double

angleDiff

(theta

–^

other.theta)

%^

(2Math.PI);*

double

rDiff

=^

r^ –

other.r;

return

Math.abs(angleDiff)

<^

&&^

Math.abs(rDiff)

<^

wrong:

must

override

hashCode!

-^ Think about using a hash table holding points }

By the way: comparison has rules too We didn’t emphasize some important “rules” about comparison

functions for:– all our dictionaries– sorting (next major topic) In short, comparison must impose a consistent, total ordering:For all

a ,^

b , and

c ,

If

compare(a,b)

<^

0 , then

compare(b,a)

>^

If

compare(a,b)

0 , then

compare(b,a)

If

compare(a,b)

<^

0 and

compare(b,c)

<^

then

compare(a,c)

<^

Spring 2010

53

CSE332: Data Abstractions

Final word on hashing •^

The hash table is one of the most important data structures– Supports only

find

,^ insert

, and

delete

efficiently

•^

Important to use a good hash function

-^

Important to keep hash table at a good size

-^

Side-comment: hash functions have uses beyond hash tables– Examples: Cryptography, check-sums Spring 2010

CSE332: Data Abstractions

Separate Chaining in Hash Tables: Understanding the Technique, Study notes of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Separate Chaining in Hash Tables: Understanding the Technique and more Study notes Data Structures and Algorithms in PDF only on Docsity!

CSE332: Data AbstractionsLecture 11: Hash Tables

Dan GrossmanSpring 2010

Hash Tables: Review •^

O (1))

E^

TableSize –

hash table

Hash Tables: A Different ADT? •^

O (

•^

O (

O (

Collision resolution Collision:

Separate Chaining

Separate Chaining Spring 2010

Separate Chaining Spring 2010

Separate Chaining Spring 2010

More rigorous chaining analysis Definition: The load factor,

N TableSize CSE332: Data Abstractions

←^

number of elements

•^

•^

More rigorous chaining analysis Definition: The load factor,

N TableSize CSE332: Data Abstractions

←^

number of elements

•^

•^

Alternative: Use empty space in the table •^

+^

1)^

+^

2)^

+^

3)^

•^

Alternative: Use empty space in the table Spring 2010

•^

+^

1)^

+^

2)^

+^

3)^

•^

Alternative: Use empty space in the table Spring 2010

•^

+^

1)^

+^

2)^

+^

3)^

•^

Alternative: Use empty space in the table Spring 2010

•^

+^

1)^

+^

2)^

+^

3)^

•^

Alternative: Use empty space in the table Spring 2010

•^

+^

1)^

+^

2)^

+^

3)^

•^

Open addressing This is

+^

+^