Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computer Architecture Pipelined Implementation, Lecture Slide - Computer Science, Slides of Computer Architecture and Organization

Implementing Stalling ,Pipeline Register Modes, Data Forwarding, Bypass Paths ,Forwarding Priority ,Implementing Forwarding, Handling Mispredictions

Typology: Slides

2010/2011

Uploaded on 10/08/2011

rolla45
rolla45 🇺🇸

4

(6)

133 documents

1 / 32

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Randal E. Bryant
Carnegie Mellon University
CS:APP2e
CS:APP Chapter 4
Computer Architecture
Pipelined
Implementation
Part I
http://csapp.cs.cmu.edu
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20

Partial preview of the text

Download Computer Architecture Pipelined Implementation, Lecture Slide - Computer Science and more Slides Computer Architecture and Organization in PDF only on Docsity!

Randal E. Bryant

Carnegie Mellon University

CS:APP2e

CS:APP Chapter 4

Computer Architecture

Pipelined

Implementation

Part I

http://csapp.cs.cmu.edu

Overview

General Principles of Pipelining

Goal

Difficulties

Creating a Pipelined Y86 Processor

Rearranging SEQ

Inserting pipeline registers

Problems with data and control hazards

Computational Example

System

Computation requires total of 300 picoseconds

Additional 20 picoseconds to save result in register

Must have clock cycle of at least 320 ps

Combinational logic

R

e g

300 ps 20 ps

Clock

Delay = 320 ps Throughput = 3.12 GIPS

3-Way Pipelined Version

System

Divide combinational logic into 3 blocks of 100 ps each

Can begin new operation as soon as previous one passes through stage A.

 Begin new operation every 120 ps

Overall latency increases

 360 ps from start to finish

R

e g

Clock

Comb. logic A

R

e g

Comb. logic B

R

e g

Comb. logic C

100 ps 20 ps 100 ps 20 ps 100 ps 20 ps

Delay = 360 ps Throughput = 8.33 GIPS

Operating a Pipeline

Time

OP

OP

OP

A B C

A B C

A B C

0 120 240 360 480 640

Clock

R e g

Clock

Comb. logic A

R e g

Comb. logic B

R e g

Comb. logic C

100 ps 20 ps 100 ps 20 ps 100 ps 20 ps

R e g

Clock

Comb. logic A

R e g

Comb. logic B

R e g

Comb. logic C

100 ps 20 ps 100 ps 20 ps 100 ps 20 ps

R e g

R e g

R e g

100 ps 20 ps 100 ps 20 ps 100 ps 20 ps

Comb. logic A

Comb. logic B

Comb. logic C

Clock

R e g

Clock

Comb. logic A

R e g

Comb. logic B

R e g

Comb. logic C

100 ps 20 ps 100 ps 20 ps 100 ps 20 ps

Limitations: Nonuniform Delays

Throughput limited by slowest stage

Other stages sit idle for much of the time

Challenging to partition system into balanced stages

R

e g

Clock

R

e g

Comb. logic B

R

e g

Comb. logic C

50 ps 20 ps 150 ps 20 ps 100 ps 20 ps

Delay = 510 ps Throughput = 5.88 GIPS

Comb. logic A

Time

OP

OP

OP

A B C

A B C

A B C

Data Dependencies

System

Each operation depends on result from preceding one

Clock

Combinational logic

R

e g

Time

OP

OP

OP

Data Hazards

Result does not feed back around in time for next operation

Pipelining has changed behavior of system

R

e g

Clock

Comb. logic A

R

e g

Comb. logic B

R

e g

Comb. logic C

Time

OP

OP

OP

A B C

A B C

A B C

OP4 A B C

SEQ Hardware

 Stages occur in sequence

 One operation in process

at a time

SEQ+ Hardware

 Still sequential

implementation

 Reorder PC stage to put at

beginning

PC Stage

 Task is to select PC for

current instruction

 Based on results

computed by previous

instruction

Processor State

 PC is no longer stored in

register

 But, can determine PC

based on other stored

information

Pipeline Stages

Fetch

Select current PC

Read instruction

Compute incremented PC

Decode

Read program registers

Execute

Operate ALU

Memory

Read or write data memory

Write Back

Update register file

PIPE- Hardware

Pipeline registers hold intermediate values from instruction execution

Forward (Upward) Paths

Values passed from one stage to next

Cannot jump past stages

 e.g., valC passes

through decode

Feedback Paths

Predicted PC

Guess value of next PC

Branch information

Jump taken/not-taken

Fall-through or target address

Return point

Read from memory

Register updates

To register file write ports

Predicting the

PC

Start fetch of new instruction after current one has completed fetch stage

 Not enough time to reliably determine next instruction

Guess which instruction will follow

 Recover if prediction was incorrect