<!–#set var="article_header" value="The First Palomino –
AMD Releases Mobile Athlon 4″ –>
Introduction
It is more than 11 months ago when AMD launched its first Athlon processor for SocketA with the code name ‘Thunderbird’. Thunderbird’s implementation of ‘on-die second level cache’ had made it possible to move from the previous SlotA processor cartridge design of the first (0.25 micron K7) and second-generation (0.18 micron K7) Athlon processors with external L2-cache modules to the smaller and easier to handle socket-version that we are used to today. AMD’s Athlon with the ‘Thunderbird’-core had been an immediate success and this processor has not lost any of its attraction to the majority of people who want to get the best and most honest performance for their money.
However, Intel has not been asleep in the last 12 months either. While Intel’s Pentium III processor was finally forced to give up the competition against the better performing Athlon, November 2000 saw the release of Intel’s new-generation Pentium 4 processor, which was and still is targeted to beat AMD’s bestseller. Right now, Pentium 4 has serious difficulties to outperform AMD’s Athlon processor in today’s software, but Intel is pushing, trying to win customers with Pentium 4 at high clock speeds and at relatively low prices. New software that is optimized for Pentium 4 is already showing the respectable potential of Intel’s current flagship-CPU. Times for ‘Thunderbird’ are getting harder.
It is about time that AMD pulls a new ace out of its sleeve to remain ahead of Intel as good as it can. The new ‘Palomino’ core is supposed to keep Pentium 4 at arm’s length for the next six months.
The First Palomino – For Notebooks Only!
AMD doesn’t see the time ripe to release its next Athlon-version for desktop computers just yet. It is much more important that AMD is finally claiming its (market-) share in the constantly growing mobile market segment. Previous mobile AMD-processors based on the K6-architecure were only good for ‘value’ notebooks, because they were unable to reach the performance as well as the low power standards of Intel’s mobile processor offerings. Only recently AMD started shipping mobile Duron processors, which are based on the same core as AMD’s value desktop processors. While those CPU’s are certainly able to provide sufficient performance, the power requirements are still rather high, and therefore not really good enough to be found in the prestigious high-end notebooks that are so important to establish a brand name in the mobile field.
The new ‘Mobile Athlon 4’, based on the ‘Palomino’ core, is supposed to bring AMD the same success in the mobile arena that it enjoys in the desktop market for quite some time already. The new mobile-Palomino has all that it takes. Excellent performance, a reasonable thermal envelope and low power consumption should be enough to guarantee its success. Over are the times when the vast majority of notebooks were equipped with Intel processors only.
Palomino For Everybody
There is no reason to despair if you should be a die-hard desktop user, who doesn’t care much for those little expensive mobile computers. In only a few weeks AMD will release the workstation Palomino’s for SMP-operation and a few months later we’ll finally be blessed with the normal desktop Palomino. Each of those processors will carry the name ‘Athlon 4’, making it easy to differentiate between the ‘Thunderbird’ (just ‘Athlon’) and ‘Palomino’-core (‘Athlon 4’). The difference between those different Athlon 4 versions will mainly be on validation level, while there won’t be any architectural differences. Mobile Athlon 4 needs to be validated for safe operation at low power consumption, Workstation Athlon 4 will have to be operating reliably in multi-processor configuration, while Desktop Athlon 4 has the lowest requirements, since it will operate with plenty of power and in single-CPU configuration only. You can imagine that Mobile Athlon 4 will reach the lowest possible clock speeds, Workstation Athlon 4 will reach higher clocks and Desktop Athlon 4 will be the fastest of the bunch.
Why Athlon 4?
At first, Palomino’s new name ‘Athlon 4’ makes you wonder if you somehow missed Athlon 2 and Athlon 3, but you can imagine that AMD has to have some sensible justification for the jump from ‘Athlon’ to ‘Athlon 4’. Technically, the Palomino core is indeed AMD’s fourth officially released Athlon version.
The first Athlon came with a core manufactured in 0.25-micron process, which therefore had quite a large die, despite the fact that it had to rely on external L2-cache modules:
Athlon No. 2 was quietly replacing the 0.25-micron die with a 0.18-micron pendant in the first half of 2000. It still came in the SlotA-cartridge, because it required external L2-cache modules as well:
Thunderbird was AMD’s third Athlon core version. It marked the step from SlotA to SocketA, because it came with integrated second level cache, while remaining to be manufactured in 0.18-mircon process. ‘Thunderbird’, or ‘Athlon No. 3’, has so far been the most successful AMD-processor of all time:
Palomino, or ‘Athlon 4’ has a completely redesigned die, although it certainly doesn’t mark a radical change in terms of features or performance, as you will see. The die size was slightly increased over ‘Thunderbird’, while the shape of the die changed quite a bit:
It is obvious that the above explanation for ‘Palomino’s new name doesn’t exactly tell the whole story. The name ‘Athlon 4’ is obviously targeting its direct competitor from Intel, the ‘Pentium 4’. The new name makes it much easier for AMD’s PR-department to make customers understand that ‘Palomino’ is an equally valuable product as Intel’s flagship processor. It’s not just ‘Pentium 4’ vs. ‘Athlon’, but ‘Athlon FOUR’.
The Technology Behind ‘Athlon 4’
There has been a lot of discussion about AMD’s upcoming ‘Palomino’-core in the past months. The only information concerning the ‘Athlon 4’ design was that it would basically provide a mere reduction of power consumption and thus heat production. Obviously the expectations for improved performance of ‘Palomino’ were rather low. Today AMD finally lifted the curtain and revealed a much more attractive picture of the new ‘Athlon 4’ processor. While ‘Palomino’ won’t be able to leave ‘Thunderbird’ too far behind, it certainly comes with a significant enough performance edge over its predecessor. Let’s have a look at all the new features of ‘Athlon 4’.
20% Less Power Requirement and Heat Production
Athlon 4’s die is now pretty much square and therefore of significantly different shape compared to the previous ‘Thunderbird’ Athlon processors. This alone shows that Athlon 4 is a complete redesign. The number of transistors was increased by half a million, from 37 million of Thunderbird to 37.5 million of Palomino. The surface area of Palomino’s die is with 128 mm2 only little larger than the 120mm2 of Thunderbird. The new chip design was able to reduce the power hunger of Athlon by 20% with the logical effect of a significant reduction of heat generationas well. While desktop and workstation Palominos will probably run at a core voltage of only 1.5 V, the voltage requirement of the today released Mobile Athlon 4 will be even less. We will discuss this in more detail further down in the article.
SSE For Athlon 4 – 3DNow! Professional
In February 1999 Intel launched Pentium III with the ‘Katmai’-core. The only significant improvement over its predecessor ‘Pentium II’ was the addition of 70 new instructions that Intel called ‘SSE’ = ‘Streaming SIMD Extensions’. These extensions were supposed to accelerate multimedia and 3D-applications, but a part of SSE were also the ‘streaming’ instructions, including several data pre-fetch operations.
AMD’s pendant was first ‘3DNow!’ of K6-2 and K6-3 and later the ‘Enhanced 3DNow!’ of Athlon. These AMD-specific SIMD-instructions proved pretty much just as powerful as Intel’s SSE, but AMD continued to have a rather hard time to make software developers implement code with those instructions. Due to Intel’s influence in the business and the big success of Pentium III, the implementation of SSE in current software is much more common than 3DNow!-support.
18 of the ‘Enhanced 3DNow!-instructions are identical to Intel’s SSE-stuff for a long time. Now AMD added the remaining 52 instructions as well as the status bit that is probed by software that wants to know if the system processor supports SSE. This means that Athlon 4 basically ‘understands’ all SSE-code and is therefore able to take advantage of software that was SSE-optimized. The fact that AMD is finally following Intel’s lead can be seen as the first step towards AMD’s future ‘Hammer’-line of processors, which are supposed to come with a full ‘SSE2’-implementation. Those are the 144 double precision floating point SIMD instructions introduced by Intel’s Pentium 4 processor last year.
Don’t expect any major performance advantage from Athlon 4’s SSE-capabilities. Firstly we don’t know how well those instructions were implemented and how many clock cycles those instructions actually require. Secondly are we still waiting for a proper proof that SSE is particularly beneficial altogether. I think that AMD is simply willing to finally leave 3DNow! behind and focus on the pure implementation of Intel’s multimedia extensions only.
Data Pre-Fetching Plus TLB-Improvements
What does pre-fetch mean? Well, just imagine you want to fix something on the engine of your car. You open the bonnet and look inside. First you want to remove the cover of the air filter, so you go to your toolbox and fetch a tool for this operation. The next thing you need is a screwdriver. You go back to your toolbox and fetch it. You can see what I am getting at. You are losing a lot of time if you walk all the way to your toolbox for each tool you require. Pre-fetching would save a lot of time. It simply means that you estimate what tools you may actually need and you carry all of them over to your car, so that you can grab them as soon as you need them. It’s messier, but it can save quite a bit of time.
The data pre-fetching done by microprocessors is very similar. A certain algorithm (either in software or in hardware) estimates which data you will need next and ‘pre-fetches’ it while the execution pipeline is processing the current data. Once the next data is required it has already been loaded and can be processed without any delay.
Athlon 4 is now able to do auto-pre-fetching as well. Besides the 6 SSE pre-fetch instructions, Athlon 4 is equipped with an automatic hardware data pre-fetch feature, such as found in Pentium 4. Besides the data pre-fetching, AMD did a few improvements to the TLBs of Athlon 4. The L1-cache data TLB was increased from 24 to 32 entries and the L1/L2-cache TLBs were made ‘exclusive’, meaning that the physical address data is now only found in one of the caches and not in both of them. AMD also removed a serializing algorithm of the TLBs that had a bad impact on the performance of very large data base software. The data pre-fetch feature plus the TLB-changes are the main reason for Palomino’s improved performance over Thunderbird.
Thermal Diode = Finally Thermal Protection?
It might only look like an eensy weensy little feature, but it could have been of crucial importance to the many of us who unwillingly fried their Athlon/Duron processors due to failing heat sink fans or badly attached heat sinks. Athlon 4 is finally equipped with a thermal diode!!!!
The pins for THERMDA (anode) and THERMDC (cathode) are S7 and U7. According to the above specs the motherboard can read the actual core temperature and apply appropriate measures to safe the life of the CPU (or avoid a notebook meltdown). What I am missing however is the automatic shut down of the Athlon 4 at a certain temperature, without any external electronics. Intel processors simply don’t operate when they are too hot, regardless if the motherboard cares about it or no.
PowerNow!
Specifically for notebook operation Palomino (or ‘Athlon 4’ if you prefer) was equipped with AMD’s derivate of Intel’s ‘SpeedStep-Technology’ to safe battery power. The idea is pretty simple. Once the notebook operates with battery power the CPU is throttled down, thus providing less performance, but also requiring less power to prolong the battery life.
AMD is proud to state that PowerNow! is more advanced than Intel’s SpeedStep. While Speedstep is only able to provide two different operation stages, one at full speed and one at reduced processor clock, PowerNow! comes with a third stage that automatically detects the performance required by your software and that then adjusts the processor clock speed to find the best compromise between performance and battery life. To realize this automatic mode, Athlon 4 is able to switch in between 32 different voltage/core clock stages, where the highest setting is the normal processor clock and the lowest setting is less than 500 MHz at 1.2 V.
The usual voltage of Mobile Athlon 4 is 1.4 V. At this voltage the different Mobile Athlon 4 versions have the following power requirements:
Form Factor And Compatibility
Mobile Athlon 4 will come in two different form factors. One of the two is the well-known SocketA form factor, as shown in the pictures above. AMD claims that Mobile Athlon 4 cannot be operated in a normal SocketA motherboard, although Mobile Athlon 4 for SocketA has got the same mechanical dimensions and pin count as previous SocketA processors.
Supposedly some of the pins have ‘different assignments’. I can NOT verify this. What I found however is that a few pins of Athlon 4 are now used that were marked as ‘not connected’ for Thunderbird Athlons.
Two of those pins are the above-mentioned pins for the thermal diode. Besides that there are five pins for Mobile Athlon 4’s clock multiplier SVID[0..4] and one pin that is called CPU_PRESENCE# (AK6).
Those pins might be an issue when sticking Mobile Athlon 4 into a normal SocketA motherboard, but it might as well not be a problem.
The other, probably preferred, form factor for Mobile Athlon 4 is an ultra-low profile, small footprint Organic Ball Grid Array (OBGA) package, optimized for thin and light notebooks.
Unfortunately I couldn’t find any pictures of this package, but I am sure that AMD will soon provide some.
Overclocking
It’s maybe a bit early, but it’s always nice to discuss at least the current prospect of overclocking Athlon 4. Basically, Athlon 4 comes with the already known bridges that determine the voltage and multiplier. There are quite a few more bridges on Athlon 4 than on Thunderbird however. It is certainly not going to take too long until the meanings of those bridges will be revealed to us.
Due to the PowerNow!-feature of Athlon 4, there might be a new and much easier way to overclock Athlon 4. The power-saving feature allows the adjustment of the core clock on the fly. It might well be the case that the multiplier can be altered to higher levels than the specified clock speed. Maybe all it takes is just a hack of the PowerNow!-software.
The multiplier settings of Athlon 4 have changed a bit, as they now allow multiplier settings of up to 18x. Here are the new specs:
New Mobile Duron
Today is also the day of the launch of a new Mobile Duron processor, which is not yet based on the ‘Morgan’ core, but on a modified ‘Palomino’-core. ‘Morgan’ will soon be ready to replace the interim solution for Mobile Duron. The new Duron basically resembles Athlon 4, but with only 64 kB of on-die L2-cache. This means that this processor is also benefiting from the auto data pre-fetching, TLB-changes, SSE and PowerNow! implementations of Athlon 4. If you should buy a Duron-notebook, make sure it comes with the new Duron and not with the old model that started shipping a few months ago.
Chipsets And Pricing
AMD chose two notebook chipsets for Mobile Athlon 4 and Mobile Duron. It’s ALi’s MaGiK1 and VIA’s KT133A chipset. The MAGiK1 is able to run with PC100/133 SDRAM as well as PC1600/2100 DDR SDRAM, but so far we weren’t too convinced of its performance in desktop systems. VIA’s KT133A is a good performer, but it does not come with DDR-SDRAM support.
AMD’s mobile processors Athlon 4 and Duron will operate on the power saving 100 MHz (200 MHz DDR) system bus and not yet at 133/266 MHz. AMD provides its new mobile processors the following clock speeds at the following prices (1K units, OEM prices):
Processor | Price |
Mobile Athlon 4 1 GHz | $425 |
Mobile Athlon 4 950 MHz | $350 |
Mobile Athlon 4 900 MHz | $270 |
Mobile Athlon 4 850 MHz | $240 |
Mobile Duron 850 MHz | $197 |
Mobile Duron 800 MHz | $170 |
Those prices make AMD’s new mobile processors significantly more attractive than the more expensive mobile Intel processors.
Performance
AMD has not yet supplied any tester with Mobile Athlon 4 samples and according notebook test beds. This leaves us unable to provide you with any in-house benchmark results. However, AMD has run a significant amount of benchmarks, which I would like to mention quickly. Basically, Athlon 4 is about 3-16% faster than ‘Thunderbird’, depending on the application. This performance advantage is mainly due to the new pre-fetching feature. Quake 3 Arena runs e.g. 6 % faster on Athlon 4 than on Athlon at the same core clock. Mobile Athlon 4 is outperforming Mobile Pentium III at the same clock speed. Mobile Duron is easily beating Mobile Celeron and even reaching Pentium III scores. The power saving benchmarks show that PowerNow! allows a battery life prolongation of 30%. I repeat that those results were provided by AMD and were not generated by us. However, we will post Athlon 4 benchmarks as soon as possible.
Conclusion
Today we have seen the launch of the first version of AMD’s highly anticipated and highly discussed ‘Palomino’ core. Maybe some of you are not too impressed by a performance increase of 3-16%, but I suggest to wait until the actual Desktop Athlon 4 will be launched. We are expecting clock speeds around 1.6 GHz and at that speed Palomino should well be able to compete against Pentium 4 at 2 GHz. Wait and see.
Let’s now remember that AMD released new processors for notebooks. When you look at the specs and realize the possible performance you will see that AMD’s Mobile Athlon 4 launch can be seen as a very significant event. I would say that the ingredients are almost perfect for AMD to quickly gain a respectable share in the mobile market. The processor comes in two form factors, it provides good performance, it is not a power consumption hazard, it doesn’t get too hot and the pricing is well chosen too. While Mobile Athlon 4 will attack Intel’s Mobile Pentium III from above, Mobile Duron will make its life harder from below. We can certainly kiss Mobile Celeron goodbye very soon. Intel has already brought up the mobile Tualatin processor in its roadmap. It shows that Intel is well aware of the threat.
If there’s one problem that AMD could encounter then it is the chipset issue. Neither ALi, nor VIA have a particularly clean record when it comes to reliability and stability. OEMs don’t accept any twitches in their notebooks. Let’s hope that the Taiwanese chipset makers won’t be a stumbling stone for AMD’s success in the mobile market.