<!–#set var="article_header" value="The Need for Speed:
Embedded Processor Forum in San Jose” –>
Introduction
Just a few years ago embedded processors were merely hand-me-downs from the PC world: obsolete PC processors, recycled as embedded processors. But this picture has changed almost completely. Today more and more embedded processors are specifically designed for the embedded market.
This became very apparent at the recent Embedded Processor Forum in San Jose, CA. The very informative event was put on by Cahners MicroDesign Resources, the same folks who also organize the Microprocessor Forum. In the category ‘High-Performance Processors for Embedded Systems’ only one of the six presenting companies actually showed a variation of a desktop CPU design.
The trends in embedded processor design are clearly moving towards new CPU architectures, chip multiprocessing (CMP), i.e. multiple CPU cores on a single die, and high performance (clock speeds up to 1 GHz!). The latter is definitely needed for some embedded applications that have an almost insatiable hunger for high performance. For example home video-game consoles, advanced set-top boxes, HDTVs, network routers, switches, gateways, and broadband modems – just to name a few.
New Architectures
Some of the embedded processors borrow technology from PC processors to gain more performance. They implement superpipelining for high clock speeds, and superscalar pipelines for parallel execution. Others developed new technologies that have not been seen in PC processors yet: CMP, integrated DSPs, embedded DRAM, VLIW, 64-bit RISC, and specialized data-path architectures. On top of that, embedded processors offer more options and are customizable, while in case of the desktop/server CPUs the vendors pretty much decide what you need and get. With some embedded processors on the other hand, customers can choose cache sizes, pick functions units, add on-chip memory (SRAM, embedded DRAM), and configure I/O components. Licensable cores also enable high integration.
Cahners MicroDesign Resources did a good job of inviting quite a diverse group of companies to present at the Forum. Quite interesting was for example the MIPS64 20K family of processor and core designs from MIPS Technology. The R20K processor and 20Kc core were originally conceived for servers and workstations, but then redesigned for high-performance embedded applications. MIPS implemented extensions to accelerate 3D geometry processing for digital entertainment applications: the chip comes with 6 integer units and a SIMD FPU, two 32 KB 4-way set associative on-chip caches, delivers 30 million polygons/s (1200 MIPS, 2.4 GFLOPS) at 500 MHz, and has an estimated power dissipation of 0.09 mW/MHz/mmІ.
Breaking 1 GHz Barrier
Startup SiByte, founded by former Digital Semiconductor engineers, combines high-performance with low power consumption. The SB-1 is an all-new core based on the MIPS64 instruction-set architecture (ISA), and features quad-issue in-order superscalar pipelines. It achieves more than 2000 MIPS at 1 GHz clock frequency while only consuming 2.5 W. This is not exactly low-power in the traditional embedded sense, which is more in the mW range, but it still qualifies with an awesome performance/power ratio.
The only company leveraging a desktop design was IBM. With the PowerPC 750CX, the company makes the transition from Mac to embedded processor. But because most PowerPC chips are sold into the embedded market, even that ancestry is ambiguous. IBM’s main objective was to optimize costs. They integrated a 256 K L2 cache on-chip to reduce the pin count, and to enable a smaller die. Additionally, the backside bus and some optional 60x bus signals were removed, dropping the pin count to 256. A new PBGA package offers better thermal dissipation and resistance to noise. The chip delivers 1275 MIPS at 550 MHz and is manufactured in IBM’s 0.18 µm Copper process technology.
A quite unusual design was presented by Silicon Magic. The DVine (DRAM Vector Engine) SM2700 processor introduces a scalable CMP architecture with on-chip DRAM. In the first standard implementation the chip integrates six RISC processors, and 4 MB of DRAM on a single chip. The embedded DRAM architecture has the advantage of enabling wide data paths of 128 bit, and the memory subsystem runs at chip speed. Additionally, less buffering is required to process streaming data, and the memory sizes are not restricted to standard configurations. The chip is fast enough for real-time MPEG-2 encoding.
The Jazz processor from Improv Systems is a VLIW configurable data-path architecture designed for very high performance. The sample chip has five processors, each with multiple Aus and a multiply-accumulate (MAC) unit. Shallow two-stage pipelines in the CPUs limit the clock frequency of the chip, but wide parallelism enables the processor to execute 3 to 6 BOPS at only 100 MHz. The interesting part is that the CPUs are not connected by a databus. Instead Improv uses multiple point-to-point shared memories, called memory interface units. Scheduling is done by the compiler ahead of time. There is one global bus on the chip, the Qbus, passing task address and 16-bit datum.
The presentations at the EPF certainly demonstrated that there is more interesting stuff to come in the world of embedded systems. While the PC processor market consolidates, the embedded processor market seems to proliferate, and the ever-growing diversity of embedded applications is a continuous challenge and also an opportunity for embedded processor companies to come up with new designs and solutions.