MIPS - simplified datapath
외우지 말고 이해하라고 하신다. 중간고사가 끝나고 보니까 너무 잘 이해된다.
- R-format
- ex) add rd, rs, rt
- 4 is added to the PC to fetch the next instruction.
- Read instruction stored in the PC address from the instruction memory.
- rs, rt, rd are set as read reg1, read reg2, and write reg, respectively.
- The first 6 bits(=opcode) are passed into the ALU.
- The addition result goes back to the register memory as write data and is written to rd
- ex) add rd, rs, rt
- Load
- ex) lw rt, C(rs)
- 4 is added to the PC to fetch the next instruction.
- Read instruction stored in the PC address from the instruction memory.
- rs, rt are set as read and write registers, respectively.
- 16-bit constant C is sign-extended to 32 bits.
- The value stored in the address rs + C is used as read data and is written to rt.
- ex) lw rt, C(rs)
- Branch
- ex) beq rs, rt, F
- 4 is added to the PC
- Read instruction stored in the PC address from the instruction memory.
- rs and rt are set as read registers for comparison.
- Branch address is sign-extended to 32 bits and multiplied by 4 (shift left 2) for PC relative addressing.
- If rs == rt, the branch address becomes the new PC value, if not, PC + 4.
- ex) beq rs, rt, F
- Jump (unconditional)
- ex) j F
- read instruction stored in the PC address from the instruction memory.
- 26-bit address is multiplied by 4 (shift left 2) and the top 4 bits of (PC + 4) is concatenated to the front of the shifted value (= jump address)
- the value of PC changes to the jump address.
- ex) j F
Pipelining
- Improves performance by increasing throughput (latency may decrease)
- Ideal speedup == # of stages
- Consists of 5 stages
- IF: fetch instruction from memory
- ID: decode instruction & read registers
- EX: execute operation or calculate address
- MEM: access memory operand
- WB: write the result back to the register.
- Examples of needed steps
- lw: IF - ID - EX - MEM - WB
- sw: IF - ID - EX - MEM
- R-format: IF - ID - EX - WB
- Pipeline performance
- single cycle -> total time for N instructions = 800N ps (800 ps for each stage)
- pipelined -> total time for N instructions = 800 + 200N ps
- if N -> inf, then speedup = 4 (not 5 as we're wasting some time)
- Structure hazard
- A required resource is busy -> solved by using multiple memories
- Data hazard
- Need to wait for previous instruction to complete read/write
- ex) add s0, t0, t1
sub t2, s0, t3
- Forwarding
- This can solve the issue above.
- Uses the ALU result immediately after it's computed.
- One extra connection is needed in the datapath as pipelining is done in one circuit.
- Load-use data hazard
- Cannot be solved by forwarding. (We can't go back in time and fetch the old value)
- MUST stall. However, this can be avoided to some extent by reordering the code (code rescheduling; done by the compiler)
- ex) lw s0, 20(t1)
sub t2, s0, t3
- Control hazard
- Fetching next instruction depends on branch outcome
- Must stall until outcome is determined
- To avoid stall, predict branch outcome (MIPS always branches)
- Predicting an average of 0.5 cycles.
- Pipeline Registers
- Used for holding information produced in the previous cycle
- Single clock cycle diagram (for lw B, C(A))
- Corrected datapath for load
- Multi-cycle pipelining diagram example
- Simplified pipelined "control"
- Control signals are the same, but the timing of when to use control.
- Control signals are passed down the pipeline.
- Pipelined datapath (on final exam)
- Data hazards in ALU operations
- Detecting when to forward
- Data hazard when previous / previous previous instruction's write register = current / previous instruction's read register
- Hazard only exists if the forwarding instruction is writing to a register.
- EX Hazard (current vs previous ins)
- if (EX/MEM.RegWrite && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd = ID/EX.RegisterRs) -> ForwardA = 10
- if (EX/MEM.RegWrite && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd = ID/EX.RegisterRt) -> ForwardB = 10
- MEM Hazard (current vs previous ins) -> only use this if EX hazard condition is false when there's multiple hazards
- if (MEM/WB.RegWrite && MEM/WB.RegisterRd != 0 && NOT(EX/MEM.RegWrite && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd = ID/EX.RegisterRs) && MEM/WB.RegisterRd = ID/EX.RegisterRs) -> ForwardA = 01
- if (MEM/WB.RegWrite && MEM/WB.RegisterRd != 0 && NOT(EX/MEM.RegWrite && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd = ID/EX.RegisterRt) && MEM/WB.RegisterRd = ID/EX.RegisterRt) -> ForwardB = 01
- Load-use hazard detection
- Check when "use" instruction is decoded in ID (= Load is in EX)
- Load-use hazard when:
- ID/EX.MemRead AND ((ID/EX.RegisterRt = IF/ID.RegisterRs) || (ID/EX.RegisterRt = IF/ID.RegisterRt)
- Stall and insert bubble if detected
- How to stall the pipeline -> 2 steps
- Force control values in ID/EX register to 0, so that EX, MEM, WB do nop (no-oper)
- Prevent the update of PC, IF/ID register so that ID, IF are run again and the rest move forward normally.
- Datapath with forwarding and hazard detection
- Stalls and performance
- Stalls reduce performance but are required for correct results.
- Compilers can rearrage code to avoid hazards / stalls.
- Instruction-level parallelism (ILP)
- Pipelining != parallel, but true to some extent.
- To increase ILP:
- Deeper pipeline: shorter clock cycle
- Multiple issues: multiple datapaths
- Loop unrolling: copy and paste the same code instead of looping. It's actually faster
'컴퓨터 구조' 카테고리의 다른 글
[컴퓨터 구조] 챕터5 리뷰 (3) | 2024.12.20 |
---|---|
[컴퓨터 구조] 챕터1 리뷰 (3) | 2024.10.25 |