컴퓨터 구조

[컴퓨터 구조] 챕터 4리뷰

klee9 2024. 11. 15. 16:30

MIPS - simplified datapath

외우지 말고 이해하라고 하신다. 중간고사가 끝나고 보니까 너무 잘 이해된다. 

 

  • R-format
    • ex) add rd, rs, rt
      1. 4 is added to the PC to fetch the next instruction.
      2. Read instruction stored in the PC address from the instruction memory.
      3. rs, rt, rd are set as read reg1, read reg2, and write reg, respectively.
      4. The first 6 bits(=opcode) are passed into the ALU.
      5. The addition result goes back to the register memory as write data and is written to rd

 

  • Load
    • ex) lw rt, C(rs)
      1. 4 is added to the PC to fetch the next instruction.
      2. Read instruction stored in the PC address from the instruction memory.
      3. rs, rt are set as read and write registers, respectively.
      4. 16-bit constant C is sign-extended to 32 bits.
      5. The value stored in the address rs + C is used as read data and is written to rt.

 

  • Branch
    • ex) beq rs, rt, F
      1. 4 is added to the PC
      2. Read instruction stored in the PC address from the instruction memory.
      3. rs and rt are set as read registers for comparison.
      4. Branch address is sign-extended to 32 bits and multiplied by 4 (shift left 2) for PC relative addressing.
      5. If rs == rt, the branch address becomes the new PC value, if not, PC + 4.

 

  • Jump (unconditional)
    • ex) j F
      1. read instruction stored in the PC address from the instruction memory.
      2. 26-bit address is multiplied by 4 (shift left 2) and the top 4 bits of (PC + 4) is concatenated to the front of the shifted value (= jump address)
      3. the value of PC changes to the jump address.

Pipelining

  • Improves performance by increasing throughput (latency may decrease)
  • Ideal speedup == # of stages

  • Consists of 5 stages
    1. IF: fetch instruction from memory
    2. ID: decode instruction & read registers
    3. EX: execute operation or calculate address
    4. MEM: access memory operand
    5. WB: write the result back to the register.

  • Examples of needed steps
    • lw: IF - ID - EX - MEM - WB 
    • sw: IF - ID - EX - MEM
    • R-format: IF - ID - EX - WB
  • Pipeline performance
    • single cycle -> total time for N instructions = 800N ps (800 ps for each stage)
    • pipelined -> total time for N instructions = 800 + 200N ps
    • if N -> inf, then speedup = 4 (not 5 as we're wasting some time)

 

 

  • Structure hazard
    • A required resource is busy -> solved by using multiple memories
  • Data hazard
    • Need to wait for previous instruction to complete read/write
    • ex) add s0, t0, t1
            sub t2, s0, t3

  • Forwarding
    • This can solve the issue above.
    • Uses the ALU result immediately after it's computed.
    • One extra connection is needed in the datapath as pipelining is done in one circuit.

 

  • Load-use data hazard
    • Cannot be solved by forwarding. (We can't go back in time and fetch the old value)
    • MUST stall. However, this can be avoided to some extent by reordering the code (code rescheduling; done by the compiler)
    • ex) lw s0, 20(t1)
          sub t2, s0, t3
     

 

  • Control hazard
    • Fetching next instruction depends on branch outcome
    • Must stall until outcome is determined 
    • To avoid stall, predict branch outcome (MIPS always branches)
    • Predicting an average of 0.5 cycles.

  • Pipeline Registers
    • Used for holding information produced in the previous cycle
     

 

  • Single clock cycle diagram (for lw B, C(A))

 

  • Corrected datapath for load

 

  • Multi-cycle pipelining diagram example

 

  • Simplified pipelined "control"
    • Control signals are the same, but the timing of when to use control.
    • Control signals are passed down the pipeline.

 

  • Pipelined datapath (on final exam)

 

 

  • Data hazards in ALU operations

 

  • Detecting when to forward
    • Data hazard when previous / previous previous instruction's write register = current / previous instruction's read register
    • Hazard only exists if the forwarding instruction is writing to a register.
    • EX Hazard (current vs previous ins)
      • if (EX/MEM.RegWrite && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd = ID/EX.RegisterRs) -> ForwardA = 10
      • if (EX/MEM.RegWrite && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd = ID/EX.RegisterRt) -> ForwardB = 10
    • MEM Hazard (current vs previous ins) -> only use this if EX hazard condition is false when there's multiple hazards
      • if (MEM/WB.RegWrite && MEM/WB.RegisterRd != 0 && NOT(EX/MEM.RegWrite && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd = ID/EX.RegisterRs) && MEM/WB.RegisterRd = ID/EX.RegisterRs) -> ForwardA = 01
      • if (MEM/WB.RegWrite && MEM/WB.RegisterRd != 0 && NOT(EX/MEM.RegWrite && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd = ID/EX.RegisterRt) && MEM/WB.RegisterRd = ID/EX.RegisterRt) -> ForwardB = 01
    • Load-use hazard detection
      • Check when "use" instruction is decoded in ID (= Load is in EX)
      • Load-use hazard when:
        • ID/EX.MemRead AND ((ID/EX.RegisterRt = IF/ID.RegisterRs) || (ID/EX.RegisterRt = IF/ID.RegisterRt)
      • Stall and insert bubble if detected
    • How to stall the pipeline -> 2 steps
      1. Force control values in ID/EX register to 0, so that EX, MEM, WB do nop (no-oper)
      2. Prevent the update of PC, IF/ID register so that ID, IF are run again and the rest move forward normally.

 

 

  • Datapath with forwarding and hazard detection

 

  • Stalls and performance
    • Stalls reduce performance but are required for correct results.
    • Compilers can rearrage code to avoid hazards / stalls.
  • Instruction-level parallelism (ILP)
    • Pipelining != parallel, but true to some extent.
    • To increase ILP:
      1. Deeper pipeline: shorter clock cycle
      2. Multiple issues: multiple datapaths
      3. Loop unrolling: copy and paste the same code instead of looping. It's actually faster

'컴퓨터 구조' 카테고리의 다른 글

[컴퓨터 구조] 챕터5 리뷰  (3) 2024.12.20
[컴퓨터 구조] 챕터1 리뷰  (3) 2024.10.25