Introduction
This year 1998 is the year of the Intel processor announcements. After the Pentium II at 100 MHz FSB and the Intel Celeron CPU, June 29 was the day when the Pentium II Xeon was announced, Intel’s new high end solution for the workstation and server market. It is now more than a year ago when Intel started replacing the Pentium Pro processor with the Pentium II CPU. The main difference between the two was the speed and location of the second level cache in the CPU as well as the package. The Pentium Pro was never released in a faster version than running at 200 MHz, simply due to the fact that the special L2 cache was not really designed for speeds higher than 200 MHz. In the Pentium Pro you could find a special silicon chip for the L2 cache, which was placed into the same package as the CPU core. This PPro L2 cache was running at CPU clock speed and the limitations for 200 MHz limited the maximal CPU core speed to 200 MHz as well. The Pentium II however was the first Intel CPU delivered in a cartridge, hosting a printed circuit board with the CPU core as well as two or four L2 cache chips. The L2 cache in the Pentium II runs at only half the CPU clock speed, so that 200 MHz L2 cache speed were only reached recently when the Pentium II 400 was released earlier this year.
Whilst the Pentium Pro was replaced by the Pentium II in the high end desktop market pretty quickly, the server market still required the Pentium Pro. There are a few reasons for that. The main problem using a Pentium II in a server system was its initial restriction to 512 MB main memory. Up until quite some time after the release of the Pentium II 333 in the beginning this year, Pentium II CPUs would crash as soon as you supply it with more than 512 MB RAM. Only now the latest 333 MHz versions and the versions at 350 and 400 MHz can run with as much memory as the Pentium Pro uses to run, 4 GB. Powerful servers can easily skip the 512 MB barrier and this barrier was one of the big complaints the server industry had about the Pentium II. The other reason why the Pentium Pro was still very attractive for servers was its second level cache. In servers which run at a very high load, the L2 cache that’s dedicated to each of Intel’s sixth generation CPUs has a major impact on performance. A larger L2 cache as well as a faster L2 cache are of great benefit here. The Pentium Pro runs the L2 cache at core clock speed, thus faster than any Pentium II up to 350 MHz and the Pentium Pro was available with 512 kB as well as 1 MB L2 cache, whilst the Pentium II is only available with 512 kB L2 cache size. Last but not least, the Pentium Pro could run in dual as well as quad CPU configuration, which can be well required in high end servers, whilst the Pentium II can only run in dual mode, quad or higher is not possible. Now you may understand why the Pentium Pro processor was still on Intel’s road map until now, since it was the only feasible high end server solution they provided.
Now things are going to change and Intel’s plans are to not only change the situation in the server market place, but also in the workstation area. Workstations are required by people and companies who either do computer animated designing of e.g. cars or houses or bridges, they are required by 3D modelers such as e.g. 3D game developers, the computer animations in movies require some high end workstation performance, so e.g. ILM is using high end workstations and then there are a lot of companies which do evaluations from huge data bases, many of them in the financial field, e.g. at Wall Street. Those companies require workstation performance and so far it was common practice to go with one of the proprietary solutions from Sun, DEC, SGI or Hewlett-Packard. The workstation community is a very conservative one, so that after once getting a product from one of those companies, it was normal to stay with this kind of system. Now we are talking big bucks here. Workstations are not comparable with prices for even high end desktop systems. $20,000 up to $60,000 and more are completely common. Add another $10,000 to more than $100,000 for the software and you know what I am talking about.
Intel’s Pentium II Xeon processor is supposed to attack this segment. The Xeon is supposed to deliver workstation performance at a much lower price, you can run x86 workstation software and you can even run your normal office software on the same system too. Thus Intel wants to get big time into the server as well as the workstation market and the Xeon is supposed to make exactly that possible.
The Architecture of the Pentium II Xeon
If I want to put it simply, I could say that the new architecture of Xeon is not that much of a big deal. The CPU core is still the well known ‘Deschutes’ core, used in the Pentium II as well as in the Celeron processor. The big trick is the Xeon’s L2 cache. Whilst Celeron has to live completely without L2 cache until September and whilst the Pentium II offers only 512 kB L2 cache running at half the core clock speed, the Xeon is supplied with 512 kB, 1 MB and soon 2 MB L2 cache running at the same clock as the Deschutes core. It sounds simple, but it took a lot of work designing L2 cache chips which would run at 400 MHz and above. Intel is producing these chips by itself, different to the L2 cache chips found in the Pentium II. The Cache chips are called ‘CSRAM’ for ‘custom’ static RAM, where custom only means that Intel cannot buy it from a SRAM manufacturer. The chips come in sizes of 512 kB each, so that a Xeon with 512 kB includes one, the 1 MB version two and the 2 MB version will include 4 of those chips. These L2 cache chips are in a package that looks identical to the Pentium II core package, which should show you how thermally sensitive those chips are. Cooling is required for the L2 cache chips, which is why the cartridge of the Pentium II Xeon got extremely huge.
a comparison between Pentium II and Pentium II Xeon shows how huge the Xeon cartridge really is
So the faster L2 cache is the main reason why under certain conditions the Pentium II Xeon will perform faster than the Pentium II. However, there are quite a few more special things about Xeon. First of all it enables all the features of the good old Pentium Pro, including support for quad CPU systems and even 8 CPUs in one system in combination with 450NX and a special cluster controller. The cacheable memory limit lies not only by 4 GB, but it can address and cache up to 64 GB memory by using 36-bit memory address bus and the new PSE36 mode. The Xeon is of course running at 100 MHz front side bus, enabling a higher memory bandwidth of up to 800 MB/s peak. ECC, the error checking and correction is definitely required in servers and often in workstations too, so the Xeon offers ECC for main memory via the chipsets 440GX and 450NX as well as using an ECC L2 cache RAM. For manageability the Xeon has a thermal sensor, responsible for keeping it at a save temperature, as known from the Pentium Pro already. Completely new and very pleasing is the new PIROM in the Xeon, standing for ‘Processor Information ROM’, which includes “robust addressing headers to allow for flexible programming and forward compatibility, core and L2 cache electrical specifications, processor part and S-spec numbers, and a unique electronic signature”, so that counterfeiting of the Xeon is hopefully impossible now, but possibly making overclocking impossible too. However, overclocking is light years from common practice in servers or workstations anyhow. System vendors can program a special EEPROM chip within the Xeon cartridge as well, called ‘scratch EEPROM. This EEPROM is supposed to host information like “system specifications, inventory and service tracking, installation defaults, environment monitoring, and usage data. Its contents can be write-protected by the system, as well.”
Having a look inside the Xeon w/512 kB L2 cache can show you the different components:
At the top you can see the ‘CSRAM’ L2 cache chip, the bottom shows the well known Deschutes CPU core. The two chips on the right are the thermal sensor and the ‘scratch EEPROM’ underneath.
The little chip on the left is the ‘PIROM’ chip, containing information about the Xeon chip.
Bugs of the Pentium II Xeon
Prior to the release on Monday June 29 there was a lot of press about potential system failures of Pentium II Xeon quad servers based on the 450NX chipset. Intel has so far reported 37 ‘errata’ in the latest specification update. Two of the three latest bugs are already fixed with a new microcode update, the code that’s loaded into a small ‘programming’ area on the Xeon chip, known from the Pentium Pro and Pentium II processor. You should still have a look at the specification update to see what kind of errata were found so far.
The New Chipsets for the Pentium II Xeon
Intel did not only release the Xeon CPU, but had to provide two new chipsets to go with it.
The 440GX AGPset is targeted to the workstation market segment. It is not too different to the 440BX chipset, providing 2 GB of memory support instead of only 1 GB as in case of the 440BX. The 440GX will be compatible with Pentium II Xeon as well as with the Pentium II.
The 450NX PCIset is for Xeon servers. A server doesn’t require AGP, so that this chipset doesn’t include an advanced graphics port. It supports up to 8 GB of RAM, using Address Bit Permuting and four way interleaving for a high bandwidth. One of the really special features is that it’s offering either four 32 bit PCI buses, two 64 bit PCI buses or one 64 bit and two 32 bit PCI buses. The chipset can run in quad CPU configuration on its own, with a special cluster controller it can host even 8 Xeon CPUs.
Pentium II Xeon Motherboards
The only motherboard manufacturer shipping Xeon motherboards right now is Intel. I ran my tests on Intel’s ‘Marlinspike’ MS440GX workstation dual Xeon motherboard, based on the 440GX chipset. For servers Intel is offering two bare bone solutions, including the chassis as well as the motherboard. You can choose between the AD450NX quad CPU killer server platform or the more modest SC450NX quad CPU server platform, both based on the 450NX chipset.
Performance Expectations, a Word on Multi-Processing and Multi-Threading
What shall we expect from a Pentium II Xeon? Will it blow away all that we knew so far?
The answer is yes and no. In the first place it’s got to be a ‘no’, since the Pentium II Xeon won’t blow away the Pentium II at all on the usual platforms where the Pentium II is used so far. The CPU core of the Xeon is identical to the core of the Pentium II as well as the Celeron, so the difference is only produced by the different L2 cache. In single processing and single threaded applications the performance gain of a full speed L2 cache over the half speed L2 cache of the Pentium II is minimal. Thus we will hardly see any difference in ANY application under Windows 95 or Windows 98. These two toys OSes are a simple ‘no-go’ for the Pentium II Xeon, because none of them supports multi processing or multi-threading. It requires a multi processing 32 bit OS like Windows NT or UNIX to take the slightest advantage of Xeon. The full speed L2 cache can show its muscles only when both CPUs are working hard, so even in a multi-processing or multi-threading environment where the CPUs are idle for half or 30% of the time the difference between Pentium II and Pentium II Xeon will be very little.
That is why it takes well programmed multi-threaded workstation software to show the benefit of the Xeon in workstation environments. The performance gain through the second CPU can be between 30 and 95 %, but this ‘can be’ is valid for both, the Xeon as well as the Pentium II. Only heavy traffic on the memory bus will give the full speed L2 cache of the Xeon an advantage, so that dual Xeon system can be between 3 – 25% faster than a dual Pentium II system.
The story looks a lot different in server systems. Here the Pentium II cannot compete because it’s not up to running in quad configurations anyhow. A heavily working server has always got full traffic on the memory and I/O buses. Here the Xeon shows its power. It scales almost linearly with the amount of CPUs used and whilst a Pentium II or Pentium Pro server shows a serious drop in TPS (transactions per second), the Xeon can still go ahead and kick some serious butt.
If you want to understand more about multi-processing and multi-threading, please have a look at this excellent paper from Intel.
Benchmark Results
I am very unhappy to let you know that I am still working on a server benchmarking setup and I’m also still waiting for several multi-threaded benchmarking software. Thus I had to come up with some own ideas on how to benchmark the Xeon CPU properly without this software. I will include server as well as workstation software benchmarks in the near future.
The used system consisted of the following components:
- Two Intel Pentium II Xeon CPUs
- Intel Marlinespike MS440GX Slot 2 Motherboard
- 128 MB Advantage Memory Corporation PC100 SDRAM
- Adaptec 2940UW SCSI Host Adapter
- IBM DGVS 09U SCSI HDD
- Diamond Fire GL 4000 PCI professional OpenGL graphics card for NT benchmarks
Diamond next generation professional pre-release OpenGL AGP graphics card (name and specs under non-disclosure) for NT benchmarks
Diamond Fire GL 1000 Pro AGP graphics card for Windows 95 benchmarks - Diamond Monster 3D 2 12 MB add-on 3D Card for Windows 95 Gaming Benchmarks
Winstone is a single threaded benchmark and thus doesn’t show much of a difference between Pentium II and Pentium II Xeon. Viewperf is a single threaded OpenGL benchmark, which shows the same results as well.
Benchmark Results – Continued
The same with the new pre-release Diamond OpenGL power card, blowing away the results of the highly respected Fire GL 4000. The Xeon can show a bit of its strength here, but 1-2 % wouldn’t be any reason for buying it over a Pentium II.
3D Studio Max is a well programmed multi-threaded application, which can benefit tremendously from a dual CPU system. However it doesn’t push the memory bus to the edge, so that the difference between the two CPUs is very little again.
Trying to push the multi-tasking I let Winzip compress a 1.2 GB file with the highest compression whilst running High End Winstone 98. You can see that neither the Pentium II nor the Pentium II Xeon are impressed by the background file compression much. Try this on a single CPU system and you won’t be able to run Winstone at all.
Benchmark Results – Continued
This one is really nasty. Running not only the Winzip file compression but also an AVI-video in the background whilst benchmarking with High End Winstone really pushes the multi-processing abilities of the CPUs to its limit. Here you can see a 16% advantage of Xeon over Pentium II. Xeon shows its superiority if there’s really heavy traffic on the buses.
Running Xeon under Windows 95 is almost an insult, the difference is minimal.
Quake 2 also doesn’t benefit of course.
Conclusion
On my way of trying to find a sensible benchmark that can show the advantage of Xeon over Pentium II I came across a lot of surprising occurrences. Running Quake 2 under NT in a dual Pentium II or Xeon system shows that you can run a file compression whilst playing Quake 2 online without ANY performance impact. Viewperf was also completely unimpressed by running in parallel with Winzip or the AVI-video, the scores stayed the same. I am running my own system with a dual PII 400 under NT and to the question why dual I always answer ‘because I can record a CD and play Quake2 at the same time’. It is true, the main reason for a dual CPU system should be the usage of multi-threaded software. However, you can benefit from it even with single threaded software, in case you should have the strange habit of running several power hungry applications at the same time. Don’t forget to put enough RAM into the system though!
Unfortunately I cannot offer you any server or workstation benchmarks yet. The setup for a decent server benchmark is scaringly big and I’m still waiting for the workstation software. I personally look forward to the Xeon benchmarking and evaluation of c’t Magazine that will be available by this weekend. Andreas Stiller is still ‘THE MAN’ when it comes to CPU evaluations. I hope that I can at least put up a link to an article at the c’t website. PC Professional’s gem Kai Schmerer has the chance of using their big server lab, so that I’m looking forward to his evaluation as well. For now we have to go with Intel’s own performance evaluations, but from what I see and hear the results are the same as seen outside of Intel.