Studio Setup

Follow the steps from Studio 4A! The assignment link is link

Names

Open questions.md and list the names of everyone in your group.

Studio 7A

If you didn’t complete the implementation of xor last week (Studio 7A), complete it first. (Just the testbench part)

Micro Architectures & Execution

Consider the following segment of code for the next several parts:

    mv t0, a0        # t0 = start
    add t1, a1, a1   # t1 = 2x a1
    add t1, t1, t1   # t1 = 4x a1  (sll a1, a1, 2  would also work)
    add t1, a0, t1   # t1 = end (inclusive)

    mv a2, zero    # Init sum
loop:
    lw t2, 0(t0)
    add a2, a2, t2
    addi t0, t0, 4  # Move to next
    blt t0, t1, loop

Single-Cycle CPU

Formula: Come up with a formula for the time taken in terms of the array size ($n$ words) assuming this is running on a single-cycle RISC-V CPU operating at 6MHz (i.e., Homework 7A’s processor).

Specific Instance: How long will it take if $n$ is 100?

Multi-Cycle CPU

Recall the multi-cycle model’s state machine:

Multicycle State Machine

Formula: Come up with a formula for the time taken in terms of the array size ($n$ words) assuming this is running on this multi-cycle RISC-V CPU operating at 24MHz?

Specific Instance: How long will it take if $n$ is 100?

5-stage Pipeline

One of the variations of the RISC-V 5-stage pipeline supports data hazards via forwarding and/or stalls:

RISC-V 5 Stage Pipeline

Explain what is meant by a “Data Hazard” and how this CPU design resolves them (forwarding and stalls and and example of when each is needed).

What data hazards exist in this program and how can each be resolved (forwarding? Or stalls? How many stalls?)?

Formula: Come up with an approximate formula for the time taken in terms of the array size ($n$ words) assuming this is running on this 5-stage pipeline RISC-V CPU (support for data hazards via forwarding and stalls) operating at 24MHz (Assume it has a perfect branch predictor — there are no control hazards, but account for the data hazards)?

Specific Instance: How long will it take if $n$ is 100?

More Complex Execution

Assume instead you were asked to analyze this updated version of the code:

    mv t0, a0        # t0 = start
    add t1, a1, a1   # t1 = 2x a1
    add t1, t1, t1   # t1 = 4x a1  (sll a1, a1, 2  would also work)
    add t1, a0, t1   # t1 = end (inclusive)

    # use first value as min and max
    lw a0, 0(t0)   # Init min
    mv a1, a0      # Init max
    mv a2, zero    # Init sum
loop:
    lw t2, 0(t0)
    bge t2, a0, updateSum
    mv a0, t2  # Update min
updateSum:
    add a2, a2, t2
    addi t0, t0, 4  # Move to next
    blt t0, t1, loop

Could the approach(es) you’ve taken so far always give an exact answer without additional information (FYI: the answer is “no”)? If not, explain how you could provide an upper and lower bound on the execution time.

Summary

Compare and contrast the micro architectures. Does it seem reasonable for the multi-cycle and pipelined implementations to have a higher clock rates than single-cycle implementations (higher in general, not necessarily the exact differences shown in the examples)? In general, how would you expect a pipelined implementation’s performance to compare to the single-cycle version?

Submission / End-of-class: Commit And Push

1. First, be sure to commit and push files to GitHub (as shown in studio)

1.1

Source Selection

1.2

Commit Message

Failure to type in a commit message will cause VSCode to open a window to enter the message (in the editor area) and the Source Control pane will appear to be stuck (a waiting animation) until you type in a message and close the message pane.

1.3

Commit and Push

2. Then go to GitHub.com and confirm the updates are on GitHub

End of Studio: Stop the Codespace

Be sure to “stop” your Codespace. You have approximately 60 hours of Codespace time per month. Codespaces often run for ~!5 minutes extra if tabs are just closed.

Codespace