Introduction
3Dlabs took the unusual step of pulling forward the announcement of its next generation graphics architecture, the P10 visual processing unit (VPU), a product we didn’t expect to be talking about for another two weeks. While this architecture is going to find its way initially into the Oxygen line of workstation graphics cards from 3Dlabs, it also heralds some of what we can hope to see coming from Creative Labs this Christmas.
I have to say that there wasn’t as much information to back up the P10 launch as you would normally expect with a major chip launch, and this is an announcement that is two years overdue from 3Dlabs, but like I said the announcement was extremely hurried. Nevetheless, it’s interesting to see the direction 3Dlabs is taking and reflect on the influence of Longhorn on the P10’s specs. The next killer app for 3D may be the operating system, which I think is a signifcant development. It also means that we can predict what to expect in next gen products from other graphics chip vendors, and do geek gossip over coffee.
Therefore, I surmise, 3Dlabs has done us all a big service rushing out their announcement. By making the P10 public the company is giving us a glimpse into the issues that are going to drive 3D graphics hardware architectures in the coming months for almost all the graphics industry, and it’s damn good stuff.
Some of the key points of damn goodness came up at WinHEC 2002 this year and P10’s feature list reflects the directions Microsoft was giving hardware developers at their conference. Things like,
Full programmability – While Nvidia and ATi have mature programmable graphics products on the market, it’s worth noting that they still retain some level of support for the old fixed function pipeline with some form of integrated T&L circuitry. The next step for the graphics chip industry is to move to a fully programmable pipeline and to remove those pesky transistors for fixed function graphics. Graphics is going to need all the silicon real estate it can muster, but every chip will use those extra transistors differently.
Multi-tasking Graphics – Microsoft’s next generation operating system, Longhorn, is pushing the industry to create graphics processors that will offload almost all of the typical functions of managing windowed displays. This means that every window on your desktop becomes a 3D texture, whether it is running a game, a digital video, or an Office application. The CPU has to handle all of Longhorn’s open apps, videos, and games running in multiple windows, and Microsoft is working on determining how much graphics hardware it should ask for as a minimum to keep its OS humming. The graphics processor becomes a true partner processor for the CPU, but the question is, how low will Microsoft keep the bar on graphics performance and features? Will Microsoft open up the PC and graphics markets by demanding a significantly higher level of 3D graphics performance for base level Longhorn systems than what we are seeing today, or will it try and hedge its bets by staying a generation or two behind the curve?
Bye bye VGA – We have to say bye bye to VGA, and the sooner the better. VGA is the last of the big legacy items remaining on the PC. It makes ISA look nimble and hip. With no VGA, graphics processors get to ditch the lowest common denominator.
Just in case you are unfamiliar with the nuances of the programmable 3D graphics pipeline, I suggest you give Tom’s excellent review of the GeForce3’s technologies a look:
High-Tech And Vertex Juggling – NVIDIA’s New GeForce3 GPU
The above article is a great place to get a good grounding on where the programmable 3D graphics pipe got its big start in the mainstream. And Tom does a good job of explaining terminology and how pixels flow through the pipeline. I could have cut and pasted the stuff, but I believe that’s illegal.
P10 – Looking Forward to Longhorn
First of all, the P10 has broken ground on the term “Visual Processing Architecture.” That’s a first right there. 3Dlabs claims the architecture has been in development for the past two years, has some pending patents, and will be shipping in products coming out in the third quarter of this year. THG was briefed by 3Dlabs, but again, it’s been a very rushed launch, forced by the upcoming acquisition by Creative. That’s a shame because it really has been a long time coming. This is a major new architecture for 3Dlabs.
It is the concepts and ideas that surround the P10 that hold the most interest, not to mention the fact that this architecture is supposed to be at the heart of Creative’s graphics push this Christmas.
I don’t think this argument will last long, but the P10 directly targets competing GPUs by emphasizing that it is a ground-up, fully programmable architecture. This is a dig at the GeForce4 and Radeon 8500 that still have elements of past T&L engines incorporated into their designs. But, they also had to support those good ol’ DX7 games, too. 3Dlabs hasn’t really been troubled by DirectX support in recent memory, very much focusing on the high-end OpenGL (OGL) market.
So, it is not surprising that the first products that will use this architecture will be the workstation targeted Oxygen boards from 3Dlabs.
It won’t be until Creative Labs has fully acquired 3Dlabs, and is ready to announce its P10 boards for Christmas 2002 that we will know how the P10 is going to impact the mainstream desktop and the gamer, although 3Dlabs is convinced that the Creative P10 boards will be competitive with Nvidia and ATi products on the market at that time. Knowing Creative’s sales muscle and reach, a Creative graphics board needs only to be competitive, and not necessarily better, in order to be a viable alternative to the two horse race we have right now.
However, there are some concerns. Creative has tried repeatedly to establish a strong foothold in the graphics business and has been pulled in and out of the market, particularly in North America. 3Dlabs has been aiming to find a way into the mainstream with its technology for a number of years and has repeatedly fallen short of delivering a competitive product. Can this marriage work?
So, while P10 is a very interesting, and most likely effective, architecture for 3Dlabs’ workstation customers, that doesn’t gurantee it automatic entry into the consumer marketplace, even with Creative.
On the other hand, 3Dlabs has an enormous amount of 3D expertise. It was the first 3D graphics chip company in the PC market to make some real money for a while there. The company also has an extensive portfolio of 3D IP (intellectual property) rights. Nvidia got some by virtue of acquisitions from 3dfx and SGI. Via has a bucket load from S3. Intel has its set from both SGI and Real3D. So, 3Dlabs has the potential to play with the big boys, and with Creative’s backing, it also has the resources. Or, we could say that Creative finally has the technology it craved to compete with its own proprietary 3D graphics products.
The 3D Pipeline
The 3D pipeline for the P10 brings 3Dlabs up to date with the rest of the industry, and has some room to grow in support of OGL 2.0 and DX9, showing the obligatory 4-pixel pipeline. However, 3Dlabs has stuffed a lot more processing power, and hence, more operations, into the pipe. The function of the additional arrays of parallel SIMD processors in the pipeline is two-fold. On the one hand, they offer developers a set of functions that they haven’t had hitherto. On the other hand, they can be used to reduce the number of different passes made on a pixel by offering more operations to create the texture effects required. On a superficial level, the P10 architecture is packed for power and effects.
The 3D pipeline for the P10 shows the obligatory 4-pixel pipeline, but it packs more functions to massage those pixels. The P10 VPU boasts over 200 SIMD processors for all its geometry, texture and pixel processing stages.
With Longhorn, the expectation is that dozens of apps maybe fighting for resources on the CPU and graphics processor. Intel’s addressing this issue with HyperThreading, and 3Dlabs is using its command buffer to handle multiple requests coming into the P10. The command buffer scans multiple command buffers and finds work to do. This is the idea of multi-threading that 3Dlabs believes is going to be putting pressure on the graphics processor in future versions of Windows. Here, we see where the P10s processing power might be best suited in the present architecture. Instead of just using the array of processors to increase the effects available, you can also scale up as the load on the VPU grows.
The command processor allocates resources to multiple requests coming into the graphics processor, as is expected to be the case with Longhorn.
With DX9, Microsoft is pretty much making it obligatory for graphics processors to support any window as a possible surface. Everything from the rendering of video to wide lines will have to be tessellated and rendered through DX9. Even images are to be represented as a texture, rather than as a bitmap.
Texture processing isn’t going to be just about 3D graphics in Longhorn. It will apply to all aspects of window management, and even things like how paths and widened lines are displayed.
As a result of the expectations put on the texture processing engine, filtering and anti-aliasing demands on the graphics processor will also increase. In this case, 3Dlabs has its bases covered.
- Textures can be arbitrary sizes, which is a requirement for DX9.
- P10 claims it can do eight simultaneous textures compared to ATi’s six, and Nvidia’s four.
- Filtering is programmable beyond anistropic, bi-cubic and other hardwired texture formats. This is increasingly going to be important in future versions of Windows where the graphics hardware is going to have to support imaging applications that require higher order filtering and specialized support.
- 3Dlabs claims its virtual memory architecture can offer up to 16GB of addressable texture storage.
- The PIO has a 256-bit DDR memory interface that has a bandwidth of 20GB/s.
Is This Stuff All That New? History Repeats Itself
Having this top down view of the P10 VPU, you can also get a sense of the historical perspective we now have on past developments in workstation graphics that have gotten us here. The P10 VPU, and probably most of what is yet to come in support of OGL 2.0 and DX9, owes much to developments that took place ten years ago.
In the early 90s, Silicon Graphics (SGI) and Evans & Sutherland (E&S) dominated the 3D graphics industry. They put out, respectively, the RealityEngine and Freedom series of graphics subsystems. Both architectures brought high levels of parallelism to pixel processing using multi-plane graphics subsystems. One difference with the past is that what the P10 integrates onto one chip today was a set of boards and ASICs back then.
The RealityEngine had eight geometry engines, and up to 320 pixel processors, among other things. I think it came on three big circuit boards. The Freedom series used a DSP (digital signal processor) farm where each DSP was a separate processing unit that could work on its own set of vertices. Each processed vertex was then passed on to a set of parallel pixel processors. The results were composited for delivery as a final image.
There are a couple of sources of information on these old boys that you might like to explore for more depth, but most are dated:
Technical Overview of RealityEngine in Visual Simulation has a enough of a hardware overview to show the corollary with what is going on today.
In addition, around the mid-90s, researchers at the University of North Carolina delivered a new chip architecture called PixelFlow. PixelFlow used a number of processors in a parallel array to process pixels from various subdivisions of the screen. Call it tiling.
Tiling was, and is, a difficult process for which to program. Software has to sort different graphics primitives for each portion of the screen, assign it to one set of pipes, and composite the final image at the end of the pipe.
There’s an old Byte article that explains the architecture simply and succintly:
PixelFlow: Scalable Image Processing
These technologies, for their time, were revolutionary in so far as they brought very high-end 3D image processing capabilities to the workstation market. Sure, these workstations cost tens of thousands of dollars, but they owed their existence to research on graphics displays in the early 80s, in the areas of military and flight simulators. When the GeForce3 came out, it brought some of the principles that we first saw in the workstation market of the early 90s into play on gaming desktops of this decade.
Another interesting aspect of the historical perspective is that SGI and E&S competed with proprietary systems, although SGI stole the show by developing an API to help developers use its hardware, an API that would eventually become OpenGL. Depending on who you believe, Nvidia used its relationship with Microsoft on Xbox to help define DX8, which actually laid the groundwork for programmable architectures.
Now, today’s announcement of the P10 is an evolutionary step which is putting the spotlight on OGL 2.0. The P10 doesn’t want to have to worry about putting transistors in for a T&L engine and DX7. So, we now have this drive on the part of P10 to make sure that it’s the first hardware to root for OGL 2.0.
So, what does Creative do in all this? If Creative really gets into the P10 and swings back into graphics, will the company push OGL 2.0 to offset the influence of ATi and Nvidia on DirectX? Or, will Creative use the OGL 2.0 to create its own developer support base? Bear in mind that most games are going to be targeted at DirectX8.1 and Xbox level features at the base level. Surely, Creative is going to want some differentiation, and it already has a strong core of game developer support and influence from the audio side of things. Its worth thinking about.
Managing Memory
3Dlabs also claims that the P10 distinguishes itself from competitors like the GeForce4 and Radeon 8500 in two areas of memory management. First, the chip interfaces via a 256-bit interface to DDR memory, as opposed to the existing 128-bits. This gives the P10 a memory bandwidth of 20 GB/s. However, frame buffer storage is still limited to 256MB, so 3Dlabs offers a virtual memory management architecture.
3Dlabs claims that developers find managing memory to be the most challenging aspect of the 3D graphics pipeline. The company believes its virtual memory architecture gives the P10 an advantage in this regard.
Moving textures in and out of physical memory architectures is one of the bottlenecks for developers using existing graphics processors. The application has to load a texture into memory, check the memory to see where the texture is, if it needs a new texture, it has to flush the memory, and load the new texture, and so on.
3Dlabs claims that the P10 will have a logical address range of 16GB split between whatever is onboard memory (limit is still 256MB), and system memory. So, in theory, using a P10, a developer can load GBytes of textures into memory, not have to worry about where they are physically, and just call them directly as needed, reducing some of the overhead of existing architectures.
It’s certainly a feature that is going to be essential in the workstation environment where some applications, such as visualization and simulations, use large data sets and require a great deal of memory for storage. For example, terrain generators for a flight simulator will require memory to load very large texture maps. How feasible it will be to expect game developers to do the same for a P10 is debatable.
Game developers will want to support the largest number of graphics cards, and will design their applications to adapt to using cards that will probably have 128MB frame buffers onboard, and no virtual memory management. They may also not like the idea of relinquishing their textures to host memory, choosing to keep them local and under the control of their own engines.
However, down the line, as Longhorn appears, graphics cards are going to face a memory wall. If you have a number of high resolution windows open on your desktop, and each is double buffered, z-buffered, and alpha blended, you will pretty much use up 256MB of frame buffer storage after a few Windows. Maybe not realistically, but very possibly. So, it’ll be interesting to see how the need for ever bigger buffers get addressed in the future.
10-bit DACs – Color Precision Takes Center Stage
With the increasing interest in digital photography and the emphasis on digital imaging, the addition of a 10-bit DAC for the P10 architecture bears some scrutiny.
The DAC (digital-to-analog) in a typical graphics subsystem converts digital pixel data in the frame buffer into an analog voltage that, depending on its value, determines the brightness of a color. Typically, most graphics cards use an 8-bit DAC to each primary color, which can offer 256 levels of color, or 2 to the power 8 output signal levels. So, for an 8-bit DAC for the red component of an RGB signal, you would have 256 brightness levels available to you from white all the way to black with intermittent shades of red in between.
A 10-bit DAC will handle 2 to the power 10 or 1024 levels of color. Obviously, this means that you now have finer gradations of color between the white and black levels, which results in less banding of colors. Increasingly, Microsoft is pushing for even higher levels of precision from graphics vendors to meet the exacting demands of digital photography, high quality scanning, and image processing.
The impact on gaming is less clear, although higher levels of color precision does translate into better and sharper textures, too. Suffice to say, color precision, while not easy to distinguish, is an important element of the evolution of next generation graphics processors. The Windows display is going to get sharper, and brighter.
Conclusions
Getting excited about technology announcements is not always a good idea. You could be setting yourself up for disappointment. On the other hand, this is interesting stuff. Neil Trevett, 3Dlabs’ VP of Marketing, is still the best 3D evangelist in the hardware business, so the company’s message always seems to be strong and consistent Even the little information we have on the P10 points to a product that hits the mark on all the requirements for Microsoft certification, and that’s an absolute must to get a product into the OEM channel.
In truth, 3Dlabs as a company hasn’t been strong and consistent, but the graphics industry goes in cycles. It always has. No one stays on top forever, and someone always manages to come along and leapfrog to the top. Nvidia and ATi for now, and maybe there’s a chance for others. It was this time last year that I was bemoaning the weakness of ATi’s response to Nvidia. Maybe Creative Labs now has a legitimate shot with this new architecture as the foundation.
We won’t know until we see and test the actual product. I would hazard a guess and say that even then it will be too early to tell. Creative hasn’t exactly been good on following through when it comes to graphics. Toe in, and then toe out.
One thing is for sure, Microsoft’s Longhorn is probably the single greatest opportunity for the graphics industry since it moved into the 3D age ten years ago. However, a lot is going to depend on how low Microsoft sets the bar. If Longhorn pushes the graphics requirement of the subsystem, and lives up to its promise, then it’s open season. Longhorn needs to demand a GeForce3 and Radeon 8500 minimum base level of performance and features. Ideally, it would require a fully compatible DX9 card. Not just a card that has the drivers, but one that pushes everything to software.
The operating system has to be the next killer app because we can’t rely on games – developers are going to be on DX8.1 for a while and playing on their consoles for a lot longer. PC gaming isn’t dead, but is it really a driving force?
Even the extraordinary power of existing GPUs doesn’t seem to be pulling the market as much as expected. Watch the skies. Big things are coming. It ain’t over till the fat lady is FSAA’d. Let’s see what everyone else has to offer. Next Gen 3D is on, baby!