Introduction
Today (September 15, 1998) was the official start of Intel’s Developer Forum in Palm Springs, California. Craig Barret opened it with keynote #1, informing us about his view of the future of computing, followed by Albert Yu’s keynote about future Intel CPU’s. Please follow the links for more information about those keynotes.
The most interesting announcement for me came right after those two speakers, when Ticky Thakkar gave first details about KNI (Katmai New Instructions) and other enhancements that the upcoming Katmai/Tanner architecture will bring us.
You certainly know already about the 70 new instructions of the upcoming Katmai processor, which are mainly designed to enhance 3D gaming. Those new instructions are offering SIMD (Single Instruction Multiple Data) operations on single precision floating point values, one of the most important things for 3D game computing. This idea has already been realized by AMD with the 3DNow! technology, but the announcement today was making clear that Intel takes a different and more sophisticated approach to this issue, which differs significantly from AMD’s way of implementing SIMD-FP. There’s also a bit more to the Katmai Architecture than just SIMD-FP, as you will see in the following list:
Katmai/Tanner Architecture
1. SIMD-FP
- Introduction of 8 new 128 bit = 4 x 32 bit wide single precision packed CPU registers, enabling the computation of 4 single precision FP variables at the same time, equaling in up to 2 GFLOPs/sec peak floating point performance. It is not yet clear if Katmai will have one or two SIMD-FP pipelines. It is possible that the peak performance of two SIMD-FP pipelines will sum up to an overall result of up to 4 GFLOPs/sec.
AMD’s K6-2 is equipped with 8 64 = 2 x 32 bit registers only. - Introduction of a new separate processor state or mode to take advantage of those registers, which is the first new Intel processor mode since the 386 mode that was introduced more than 10 years ago.
This new processor mode will require an extension to the operating system, Intel has a patch for Windows98 already available and Windows NT5 will support this new mode by default. - The new processor state will enable concurrent use of either SIMD-FP and MMX or SIMD-FP and IA-FP double precision floating point code. You may remember that it was and still is impossible to use MMX and normal IA-FP at the same time, since both are using the same registers. This problem does not occur with Katmai’s new SIMD-FP.
The K6-2 is not able to use it’s 3DNow! and normal double precision FP-unit in parallel. - Introduction of new load/store, basic arithmetic, square root, logic comparison, negation, masking, ‘swizzle’ and conversion instructions on those new SIMD-FP registers, known as ‘KNI’. It is not yet known how many of the 70 new instructions will be used for the SIMD-FP. The different SIMD-FP instructions and specific algorithms will determine which operation will be faster on a Katmai or a K6-2.
Those features will improve 3D gaming significantly, but processing of audio (e.g. speech recognition, surround sound, AC3), physical models and imaging are also supposed to benefit from KNI by a large amount. Andreas Stiller, the famous CPU expert of c’t-Magazine disagrees with the estimation of a well known 3D analyst of Micro Design Resources, who thinks that Katmai’s architecture is pretty much the same as AMD’s 3Dnow! Andreas Stiller agrees with me that the new Katmai SIMD-FP architecture makes a significant difference to AMD’s SIMD-FP implementation. The introduction of a new processor mode as well as the addition of 8 128 bit registers is not realized by 3Dnow! AMD is using 8 64 bit registers and can thus only do two single precision FP operations per pipeline at the same time, Katmai can do four. The K6-2 has got two pipelines, so that Katmai will only have a x2 advantage in case it’s equipped with two SIMD-FP pipelines as well. We should still not forget however, that Katmai’s underlying SIMD-FP unit is based on the architecture of the normal P6 FP unit, which is a lot faster than the normal FP unit of the K6. Hence it wouldn’t be surprising if Katmai’s SIMD-FP instructions will be faster than the ones of the K6-2. We can expect that AMD will include Katmai’s SIMD-FP into the upcoming K7 processor.
Katmai/Tanner Architecture, Continued
2. Introduction of new Instructions for MMX
Intel added several new instructions to the MMX instruction set, that will improve conditional flow situations.
3. New Memory Streaming Architecture
This new feature is supposed to enhance the P6 bus, by ‘hiding’ memory latency effects on CPU performance, via a new prefetch mechanism. It can generate multiple outstanding (pipelined) requests without stalling the CPU pipeline, and it avoids cache pollution for write-once data in combination with the well known P6 write combining feature. This can improve MPEG2 encoding and decoding as well as operating systems, server and workstation applications. Intel expects performance improvements of 5 to 20% due to the new memory-streaming feature.
I guess we can expect a dramatic improvement in 3D gaming performance once Katmai has been introduced, at least similar to what we saw when the K6 was enhanced with 3DNow!. 3D game developers have told me a long time ago already how much faster they thought Katmai is compared to the K6-2.
AGP Texturing Becoming More Important
This brings me right to the next interesting topic. It seems as if the star of the AGP incapable 3Dfx products Voodoo2 and Voodoo Banshee is due to fall within the next 3-6 months. Finally AGP texturing is becoming really important, the first applications that use a large amount of high resolution textures will hit the shelves pretty soon.
Gabe Newell, Managing Director of Valve L.L.C., the game developer company that will bring us the highly impressive ‘Half-Life’ 3D action game and Quake Arena competitor, was giving his opinion on future 3D texture demands. He made clear that the biggest problem with 3D accelerators is the texturing issue, whilst he was satisfied with the polygon and fill rate of the latest 3D accelerators. Half-Life will be using about 40 MB of textures and this although the amount of textures and thus the level of detail has already been reduced, so that the game can run on 3Dfx chipset cards. It is certainly not in our interest that the visual detail of a 3D game is reduced only because 3Dfx can’t supply an AGP implementation. S3’s high texture Q2 level ‘newS3’ and the according demo ‘mon2.dm2’ have already shown that AGP can indeed become an issue and NVIDIA’s RIVA TNT is the current leader in AGP performance. Half-Life has the potential of becoming a very successful 3D game and it wouldn’t surprise me if Quake Arena should also use a much larger amounts of high resolution textures than Quake 2 does right now. This could have a serious impact on the sales of the current 3Dfx chipsets, which lack any kind of AGP 1x or 2x support.
Albert Yu on Intel’s Multiplier Locking Issue
I had the chance to speak to Intel’s Senior Vice President and head of the microprocessor products group, Albert Yu. He told me again that the multiplier locking issue used on the old and new Celeron processors is a way of fighting CPU remarking. Intel thinks that the amount of people who overclock is so small, that they don’t see any reason to care about them. I am not quite sure how large the amount of overclockers really is, but I would guess that it’s a lot larger than what Intel expects. Thus I am asking for the support of all websites that are in any way associated to overclocking, informing Intel in a nice and polite way about how many overclockers are really out there. It is important to show Intel that there is a large group of overclockers, who should not be ignored. However, this has to be done in a decent way, proving that overclockers aren’t some mentally deranged idiots, who haven’t got anything better to do than putting their systems at risk. I will try and arrange some kind of petition as soon as I am back home from IDF. We all know that there are other ways to avoid remarking, without restricting the chance for overclocking.
More to come…
Intel’s Developer Forum is from September 15-17, 1998 in Palm Springs, California, USA. Each day starts with a keynote presented by a high Intel representative, closely followed by 9 different parallel teaching tracks that you can choose from. Intel invited only a fairly small amount of press people, whilst the IDF is mainly held for ISVs and IHVs as well as OEMs. The IDF was arranged to torture the attendees by providing an all day view at the swimming pool, the one and only place where none of the training sessions is taking place. For all the people who don’t know Palm Springs… it’s HOT here.