Even though IBM disdains IA-64’s EPIC approach, it appears to be stealing a page from Intel’s playbook. In the same way that Intel usurped RISC principles to implement its x86 CISC architecture in P6, IBM plans to expropriate VLIW principles to implement its RISC architecture in Power4.
IBM only vaguely described the mechanism, but apparently in the early stages of the pipeline, the Power4 CPU groups instructions into VLIW-like bundles. These bundles are dispatched to issue queues, where individual instructions are held until their dependencies are resolved and then issued to the execution units. The pipeline beyond the issue stage is noninterlocked; so, once issued, nothing stops an instruction from completing, but all instructions in a bundle must complete before the bundle is retired.
Unlike conventional superscalar implementations that track individual instructions from dispatch through completion, the Power4 CPU tracks bundles only. According to IBM, this mechanism, along with data-flow sequencing through the noninterlocked pipelines, dramatically simplified the Power4 implementation, cutting the percentage of control logic in half compared with that of the four-issue Power3 design. This brought the control complexity of Power4 more in line with that of a VLIW machine while preserving the advantages of dynamic scheduling.
IBM said that the out-of-order-completion resources in the Power4 CPU are deep enough to hide the full latency of an L2 cache hit, which is probably 8–10 cycles. Also, to a greater extent than on any previous Power or PowerPC processor, Power4 will exploit the architecturally specified weak-storage-ordering model to reorder memory transactions and hide memory latency.
Each Power4 CPU implements the same ISA as IBM’s current RS/6000 and AS/400 systems and is also fully PowerPC compatible. IBM did, however, make some improvements that will be invisible to programs. The company is finally acknowledging that some of the complex instructions retained from the original 1990 POWER definition may not have been such great ideas. These instructions hinder the ability to run dynamically scheduled wide-issue processors at high frequency.
Convinced, however, that instruction-set stability is critical to its customer base, IBM didn’t take the radical step of expunging these instructions from the ISA. Instead, it has introduced instruction-set layering into Power4.
In this strategy, the hardware is optimized for the simple instructions, making no frequency compromises for complex ones. Slightly complex instructions, such as the base-registerupdate form of loads and stores, are cracked into two simple instructions by the instruction decoders. Moderately complex instructions, such as the string ops, are executed by a simple non-branching microcode engine. The most complex instructions, such as the old POWER instructions that were removed in PowerPC, trap to software emulation routines.
In this way, existing binaries run unmodified, but new binaries created by compilers aware of the layering may run faster by exploiting the faster alternatives.
1 則留言:
請癡漢大大推薦一些cpu的書吧!
張貼留言