Introduction
It has been a long time since we’ve been able to write about something really new from Matrox. In the past years, Matrox has been gradually retreating to the background in the 3D area, concentrating its efforts more on 2D display quality. For today’s gaming requirements, there are hardly any Matrox cards that can really be used.
So it’s all the more surprising that the Canadian company is now launching a new GPU called “Parhelia-512,” which is supposed to be superior to the latest chips from NVIDIA and ATI. Matrox has packed a huge palette of 2D and 3D features into this 0.15-м process/80-million transistor chip. But let’s go through these things one at a time.
As is generally known, Matrox is not a traditional chip maker that produces chips for other manufacturers only. Parhelia cards will follow this tradition and will be produced under the Matrox brand only. Last year, this very product strategy had partly been dropped by ATI, to follow NVIDIA’s example of selling only chips and designs instead of its own cards, at least for ATI’s OEM customers. Matrox, however, is wary of taking this step and is instead concentrating on establishing a niche for itself along the lines of “quality sells.”
Therefore, the Parhelia can only be understood as a high-end solution that targets gaming enthusiasts and 2D professionals. Value or low-end solutions, as provided by NVIDIA and ATI, have not been in the planning. The least expensive Parhelia card variant with 128 MB should be about $400, which is quite steep for a graphics card. Nevertheless, Parhelia has several things to offer that you won’t find in anywhere else in such a concentrated form.
By the way, the name for the final boards has not been determined. Rumor has it that the cards will be called G1000, but this has not been officially confirmed by Matrox. The origin of the name “Parhelia” is quite interesting in itself – it’s actually a term for a natural phenomenon in which light from the sun is dispersed by ice crystals in the atmosphere.
Parhelia image from Francis Hindle.
What you then see is a ring of light around the sun, along with two smaller suns to the left and right. Here, Matrox alludes to a trinity of attributes, namely quality, performance and features.
The new features in Parhelia can be roughly divided into two larger categories: 3D, plus 2D and hardware.
Overview
Matrox describes Parhelia as the first 512-bit GPU (Graphics Processing Unit) in the world, which accounts for the “512” that appears in the full name for the chip. It comes with a 256-bit wide memory bus, which is able to transfer 512 bits of data per clock, because of its double data rate capability. This may just be a number to you, but in practice, this is some rather amazing stuff. It increases the pin count of the chip by an incredible number and it makes the PCB of the graphics card a lot more complex too. There’s a good reason why neither NVIDIA, nor ATi have tried that yet. Right now, Parhelia demo boards have no less than 8 layers! At 300 MHz memory clock (600 MHz DDR), Parhelia has a whopping memory bandwidth of 20 GB per second. By comparison, NVIDIA’s top product, the GeForce 4 Ti4600, just reaches 10.4 GB/s. However, with its Lightspeed Memory Architecture II, NVIDIA has provided its graphics cards with a very effective Crossbar Memory Controller, lossless Z-compression and further optimizations.
Parhelia also has an intelligent memory controller with several independent sub-controllers, which optimize the access of the intensity, depth, fragment and texture buffers. Furthermore, Matrox has implemented a fast Z Clear function. Various internal caches are designed to prevent bottlenecks in dataflow. Parhelia’s units for depth, fragment AA, pixel, texture and display are each connected separately to the 512-bit memory controller array, and they also have a special function that optimizes memory throughput. The requests from these units are handled by the memory controller array. The array contains several independent controllers that simultaneously process various data.
Matrox has tried to make the data throughput in the chip very efficient. For example, the texture fetcher and the cache are supposed to ensure maximum efficiency when reading texels from the textures in single-pass quad texturing and anisotropic filtering. Here, the texture fetcher calculates where a texel should come from, how many texels should be used and when, so that bottlenecks can be avoided. In addition, the memory is accessed as efficiently as possible during the process. The data are then stored in the texture cache and stand ready for the quad-texturing units.
Other interesting optimizations are related to the depth acceleration and the depth cache, which optimize access to the Z-buffer data. They also contain circuits for fast Z-clear functions, as well as sophisticated logic that combines the Z-reads and Z-writes so that they can be processed in burst access. As for the new AGP 8x interface, it remains to be seen whether this will offer a performance advantage in the future.
With regard to 3D features, the Parhelia-512 is positioned between DirectX 8.1 and the next version, DirectX 9. The quad vertex shader (v2.0) corresponds to DirectX 9, whereas the pixel shader still follows DirectX 8.1 version 1.3, as used by NVIDIA in its GeForce 4. At first glance, this is an odd combination, since vertex shaders can be emulated by the driver, while the emulation of pixel shaders can only be achieved with extreme loss of speed. Matrox argues that this is exactly why a modern vertex shader unit is worth it in practice. Game developers can use vertex shaders via emulation, without fearing that the functions won’t work with end users. Therefore, it can be expected that the the newest versions of shaders will always be used in games. With pixel shaders, however, the developers have to be more careful because the latest versions of these functions can only run on a few cards. Pixel shaders from v1.3 and up will be most widespread among games, and in the future, they will become somewhat of a standard. However, it wouldn’t hurt for Parhelia-512 to have a pixel shader unit based on DirectX 9, either.
3D Features
Matrox Parhelia-512 |
NVIDIA GF4 Ti 4600 |
ATI Radeon 8500 |
|
Chip Technology | 512-bit | 256-bit | 256-bit |
Manuf. Process | 0.15 Micron | 0.15 Micron | 0.15 Micron |
# of Transistors | 80 Mio | 63 Mio | 60 Mio |
Memory Bus | 256-bit DDR | 128-bit DDR | 128-bit DDR |
Memory Bandwith | 20 GB/s | 10,4 GB/s | 8,8 GB/s |
AGP Bus | 1x/2x/4x/8x | 1x/2x/4x | 1x/2x/4x |
Vertex Shader | 4 | 2 | 1 |
Pixel Pipelines | 4 | 4 | 4 |
Texture Stages/Pipe | 4 | 2 | 2 |
PS Stages/Pipe | 5 | 2 | 2 |
Texture Shader Stages | 36 | 16 | 16 |
Vertex S. Version | 2.0 | 1.1 | 1.1 |
Pixel S. Version | 1.3 | 1.3 | 1.4 |
DirectX Generation | 8.0 / 9.0 | 8.0 | 8.1 |
FSAA Modes | Fragment / SuperSampling | MultiSampling | SuperSampling |
Z-Data Compression | – | Yes | Yes |
Max Displays | 3 | 2 | 2 |
Internal Ramdacs | 2 | 2 | 2 |
External Ramdacs | 1 | – | – |
Max Dual Resolution | 2048×1536 @ 32bpp | 1600×1200 @ 32bpp | 1600×1200 @ 32bpp |
Max Tripple Resolution | 3840×1024 @ 32bpp | – | – |
Bits per Color Channel | 10 | 8 | 8 |
Quad Vertex Shader Array
Parhelia has a vertex shader unit whose T&L subsystem consists of four individual shaders. Combined with a 512-unit instruction cache, 256 constant registers and optimized control, the Parhelia-512 reaches a very high vertex throughput.
The vertex shaders correspond to Microsoft’s DirectX 9 (v2.0). The hardware displacement mapping engine is connected directly to the vertex shader unit, via the vertex fetcher and cache. Refer to the diagram here.
Hardware Displacement Mapping
Displacement Mapping is one of the new features that Matrox has introduced to DirectX 9. It allows real 3D geometry to be created based on special grayscale textures, which are also known as “displacement maps”.
The hardware generates Z-values from the grayscale, or rather, displacement values found in the displacement map. Displacement maps are not new and have been used for a long time. Much of the information about earth’s terrain provided by satellites is available in the form of displacement maps.
Parhelia is the first mainstream GPU that can make these calculations completely in the hardware. The unit writes, via the vertex fetcher and cache, directly to the Parhelia’s vertex shader array.
In addition, the number of triangles in the mesh can be increased as desired. Matrox calls this “adaptive tessellation.” From a relatively flat 3D object with few triangles, the hardware generates a high-polygon model and changes the Z-data of the points according to the displacement map.
The number of triangles in the scene is therefore increased significantly. In order to combat this, Parhelia carries out a dynamic LOD calculation, and as a result, objects that are further away are rendered with fewer triangles. Because only a few pixels are needed for rendering distant objects anyway, it would be a waste of processing power to render such objects with the full number of polygons.
In addition to rendering terrains, there are other uses for displacement mapping. Figures can also be generated, for example. The game developer creates and animates a simple, rough polygon figure and then specifies the final form for the model in a displacement map. This is a simple method to create different models, which can then be mapped with textures and pixel shader effects, just like any other 3D model.
Matrox shows some examples for this, which, however, do not show a whole lot of detail in the facial areas. In games that include scenes with masses of characters, though, the object-oriented approach (i.e., model generation via displacement maps) is a good idea.
Displacement mapping can also be used for bump mapping, as shown in the example above. But keep in mind that this shouldn’t be mistaken for the traditional Dot3 bump mapping, because in this case, real 3D geometry is generated.
36-Stage Shader Array, Including 64 Super Sample Texture Filtering
The Parhelia-512 chip offers a lot of power with regard to pixel calculation as well. It has over four pixel pipelines, each of which contains over four texture units. This means that four textures per clock and per texel can be used (quad texturing).
NVIDIA’s GeForce 3 and 4 series also has over four pipelines, but they only have two units each. Therefore, when it comes to multi-texturing, Parhelia is significantly faster than a GeForce 4 Ti, at least on paper.
Furthermore, each texture unit is connected to a 5-stage DirectX 8.1 pixel shader (v1.3). By comparison, GeForce 4 and Radeon 8500 have only two stages, so Matrox provides a total of 36 shader stages, as opposed to 16 from its competitors.
Through this extensive array, Parhelia also achieves a very good performance when anisotropic filtering is used, where up to 64 texture samples per clock are allocated to the texture units. Here, the loss in performance with texture filtering is significantly less than with the competitors.
With dual-texture pixels and anisotropic filtering, Matrox also promises performance levels similar to that of trilinear performance with the competitors.
16x Fragment Anti-Aliasing
Anti-aliasing has become an established technology for improving the image quality in 3D games. It eliminates the jagged edges effect that occurs on 3D objects in a scene. What causes this annoying effect is low screen resolution. Even at 1280 x 1024, not enough pixels are available to display a slanted line smoothly on the monitor.
Anti-aliasing combats this effect by decreasing the intensity of neighboring pixels in the line. However, this also causes the line to become less clear. By increasing the resolution of the monitor, this effect is in turn lessened. In practice, a smoother edge significantly increases the subjective perception of resolution.
In action scenes, the jagged edges lead to an annoying flickering effect, since the 3D perspective (and therefore the angle between the edge of the object and the observer) continually changes. Anti-aliasing also mitigates this effect.
Currently, Multisample Full Scene Anti-aliasing (FSAA) is the established technology – it has been used by NVIDIA since the GeForce 3 and significantly optimized in the GeForce 4. Supersampling belongs to the previous generation and is used by Radeon and GeForce 2, for example. To put it simply, an image is rendered in double or quadruple resolution, then filtered and resampled to the final resolution. On the monitor, the color of a pixel is set to the middle value from four of the original pixels (with 2x FSAA, this results in double the image size). This allows the hard edges to be smoothed out, as described above.
The disadvantage of this technique is that it generates a huge amount of data within a very short period of time, all of which needs to be processed. Here, the restriction in bandwidth strongly limits the performance. Multisampling is an optimized version of supersampling, in which the efficiency of several processes is increased. With supersampling, four textures are needed to determine the final color value of a pixel (based on four original pixels). By comparison, multisampling requires only one texture, which is then used four times in sequence. This makes multisampling significantly more effective and faster. With the GeForce 4, NVIDIA came up with further optimizations to boost performance. The loss in speed with 2xFSAA for a GeForce 4 Ti card is extremely small. NVIDIA even provides the inexpensive MX version with an extra hardware unit for FSAA.
16x Fragment Anti-Aliasing, Continued
So much about the technology to date. With the Parhelia-512, Matrox takes an entirely different route, namely, fragment anti-aliasing. This technique gets to the heart of the problem. After all, only the edges needed to be smoothed out, and this only accounts for 5% to 20% of the pixels in a scene. With this in mind, supersampling and multisampling are thus extremely ineffective, since the majority of pixels actually don’t need to be re-calculated.
Parhelia recognizes a pixel based on the edges of the triangles and then evaluates whether or not this is “fragmented,” i.e., whether another object is next to, or overlaps, the object in question. This lets the chip recognize whether or not a pixel is completely covered, completely exposed or overlapped only on the edges (fragmented). In the first two cases, the pixels are instantly allocated to the RAM. All fragmented pixels are assigned to a fragment buffer, where the colors of the pixels are then re-calculated with 16x subpixel precision. Finally, these pixels are added to the rest of the scene that has been stored to memory.
Matrox claims that this process allows for an image quality that corresponds to 16x FSAA quality, while minimizing the loss in performance at the same time. However, there are cases where fragment anti-aliasing doesn’t work, for example, in scenes where the stencil buffer is used. Here, the user can only resort to very slow supersampling, even if the high memory bandwidth of 256-bit DDR is supposed to mitigate this undesirable effect somewhat. Unfortunately, Matrox does not offer a list of compatible games at the moment. Stencil buffers are often used to calculate shadows in games.
It’s also interesting to compare quality. For example, Parhelia’s fragment antialiasing (FAA) does not recognize lines that are rendered in the form of textures. Supersampling and multisampling, by comparison, smooth out these edges as well. On the other hand, roll-playing games benefit from FAA, since menus aren’t overly softened as they are with multisampling FSAA and supersampling FSAA.
Surround Gaming
With this feature, Matrox offers well-to-do gamers an entirely new gaming experience where one perspective is rendered across three monitors.
Here, one’s perception is extended through a sort of panorama effect. Surround gaming is already available with many games today, for example, all games based on the Q3 engine (RTCW, Jedi Knight, Voyager, Soldier of Fortune II, etc.). Even Microsoft’s Flight Simulator can be played in this mode. Basically, any game that has variable FOV (Field of View) settings should be adaptable to surround gaming.
In any case, Matrox is convinced that surround gaming will give well-off customers the ultimate gaming experience. According to Matrox, many users already work with two monitors, so purchasing a third monitor wouldn’t make that big a difference. After all, in the days of 3dfx Voodoo2, lots of gaming freaks went out and bought a second card for SLI operation.
There’s no question that surround gaming is a special experience, but there’s not going to be a whole lot of users who can really afford it, or want to spend the money on it.
2D and Board Features
The exact structure of the new Matrox graphics cards is still unknown. What’s sure is that Matrox will provide the cards with a third Ramdac integrated on the board. The photo below shows how a special adapter allows up to three monitors to be connected to the card.
Matrox calls this mode “Triple Head Desktop.” Dual-head mode, traditional with Matrox, is also possible. The maximum analog resolution in dual-head mode is 2048×1535@32bpp. With digital displays, resolution is up to 1920×1200@32bpp. The maximum resolution in triple head mode is 3840×1024@32bpp.
Full sized 3072×768 pixel PNG image (1.3 MB)
Ultra-Sharp Display
Matrox provides Parhelia boards with new circuits that, along with the Ramdacs and the high-quality filters, are supposed to set new standards in VGA signal quality.
Blurriness at high resolutions and the ghosting effects that appear with a few of the GeForce 4 Ti cards should not occur with the new Matrox cards. Even the TV-out is supposed to benefit from these circuits.
In its own whitepaper, Matrox makes direct comparisons to a GeForce 4 (PNY Verto GF 4600) and a Radeon 8500. However, because we’re unable to verify the measurements at this time, we will not take benchmark results into consideration here.
10 bit GigaColor Technology
With regard to 24-bit True Color display, the latest graphics cards limit themselves to 16,777,216 simultaneously displayed colors (256і). For each value of red, green and blue in the VGA signal, there are 8 bits per value available. A green tone can therefore be displayed in 256 different intensities. By comparison, Parhelia is capable of displaying each channel in 10-bit, which means that 1,073,741,824 colors can be displayed simultaneously – this is 64 times more than the standard cards. Matrox calls this mode “GigaColor”.
Those who think that their own graphics card can do the same, merely because the Desktop menu shows that you can set the color depth to 32-bit as well as 24-bit, are gravely mistaken. The additional 8 bits are not used for color display, but can contain alpha information instead.
This feature is particularly useful for graphic designers. Thanks to a special Matrox plug-in for Adobe Photoshop, images with more than 8-bit per channel color information (for instance, from high-quality scanners or digital cameras) can be read with 10-bit per channel. Some games can also be displayed with better color depth already, and without loss in performance. In the future, the GigaGolor option should also be included directly in games. For this purpose, Parhelia allows 10-bit source textures to be processed.
DVD playback profits from GigaColor as well. Video filtering and scaling take place in the higher color depth, and even TV-out should benefit from this too.
Unfortunately, these claims cannot be verified, due to the lack of samples. The usefulness of this for normal users is questionable. In any case, the fact is that high-end workstations have already been working in higher color depth for a long time. Some professionals and enthusiasts are therefore certain to be pleased by GigaColor.
Glyph Anti-Aliasing for Fonts
Windows 2000 and XP allow fonts to be smoothed, but, depending on the application, this happens at the cost of 2D performance – up to 30%. Even though 2D performance is generally more than sufficient these days, a few manufacturers are offering hardware-driven anti-aliasing for fonts with their graphics cards.
These solutions, however, do not take gamma calculations into consideration. Parhelia relies fully on the hardware to take care of anti-aliased fonts, therefore there’s no loss of performance, and variable gamma calculation is still enabled.
Drivers & Co
Matrox announces support for the standard versions of Microsoft Windows, but what they really mean is that Windows 2000 and XP will be supported first. Support for Windows 9x/Me will be introduced later, as well as support for Linux.
The Matrox PowerDesk lets you set the various 2D and 3D features for the Parhelia-512 chips.
Conclusion
For the moment, Matrox can claim the crown for technological leadership, but the situation is sure to change soon. In the next few weeks, new product launches from NVIDIA and ATI are already expected, which should put Matrox’s advantages in a different light.
We are pleased to see this new competitor, but this is somewhat dampened by the fact that up to now, working samples of the hardware have only been available for view at individual presentations. There’s no card in sight, much less final specifications, names for the cards, prices and benchmarks. These won’t be available until next month. Until then, the press is supposed to content itself with an extensive collection of detailed whitepapers. Of course, none of the claims made in these documents can be confirmed, due to the lack of hardware. Therefore, we strongly recommend a healthy dose of skepticism when you look at the new features, because otherwise, you’ll hardly find any weaknesses or disadvantages in the Parhelia-512.
Similar to ATI with its R8500 last year, Matrox wants to bask in the limelight first, tempting consumers with attractive new technologies. Real hardware tests, which could possibly reveal weaknesses, would only put a damper on the party.
With their bold concept of targeting 2D professionals as well as gaming enthusiasts with the same product, Matrox is clearly going for a niche market. It remains to be seen if this strategy works, because in the past, Matrox lost the loyalty of many a gamer with the poor 3D performance offered by the G450 and G550 cards. Another aspect that’s not to their advantage is that the pixel shaders are limited to v1.3, or DirectX 8.1. In addition, in the area of graphics cards, most of the money to be made is in the OEM sector, and this is already firmly in the hands of Radeon 7500/7000 and GeForce 2 Ti/GeForce 4 MX. Matrox has not planned anything for this market segment.
One can only speculate as to why only whitepapers have been released and why test samples of the cards are being held back. Is this an attempt to get an early grip on the market, before the new ATI and NVIDIA chips grab up all the attention? Or are there still too many problems with the drivers and hardware? At the moment, Matrox is hiding the answers behind rosy marketing statements. We can only give you the real answers in June, when the first test samples are made available. The question of whether fragment anti-aliasing is worth it in practice also cannot be evaluated in any detail until that time.
We can only hope that in reality, everything is as rosy as Matrox claims it to be, because competition in the market has never done any harm. Welcome back, Matrox!