2016 is shaping up to be the year of the GPU, with major advancements coming from all corners and at all prices. To recap, Nvidia has their supercomputing Tesla P100 with the GP100 GPU—not something you can currently buy as an individual graphics card, though that’s coming later this year, but it’s a monster of a chip for those that need it (read: not gamers for the time being). The GP100 represents the pinnacle of GPUs, with an estimated price per Tesla module of $12,000 (give or take). Dropping down from the stratosphere, the new GTX 1080 looks affordable by comparison, sporting a GP104 chip and delivering better-than-Titan X performance at a suggested price of $600-$700—too bad it remains out of stock (or severely overpriced) at most places. Not surprisingly, the GTX 1080’s affordable sibling, the GTX 1070, is in a similar state: it’s an awesome card that’s currently difficult to buy, particularly at anything close to the $380-$450 suggested price range (though Newegg apparently has a few cards listed as being in stock as I write this, so supply may be improving a bit).
During all of this Nvidia love-fest of new GPUs, there has been one caveat waiting in the wings: What about AMD’s Polaris GPUs? AMD officially announced the card’s name, the Radeon RX 480, and some core details a month ago, the biggest news being the price: $200 for the 4GB model, and $229 $239 for the 8GB card. (AMD increased the suggested price on the 8GB model at the last minute, though we’ll have to see what happens with street pricing.) Today, the other shoe drops and we can finally reveal performance for AMD’s Radeon RX 480 8GB; what we can’t do is tell you whether or not you’ll actually be able to buy one, but AMD and their partners couldn’t possibly do any worse than Nvidia at keeping the new cards in stock. Yeah, that’s a pretty low bar to clear….
RX 480: the return of the reference model.
Unlike Nvidia’s new GTX 10-series graphics cards where the reference model has been redubbed the ‘Founders Edition,’ AMD is sticking with reference designs for the RX 480 review samples. This is good in the sense that it means we’re not seeing a higher target price on AMD-produced cards, but it does carry the usual caveats of a reference design. Specifically, power delivery, cooling, and overclocking may be more limited than on custom AIB (add-in board) partner cards. We should see plenty of these reference cards for sale at launch, but over time the market will likely transition to custom models.
The card we received is the higher end 8GB model, which means the GDDR5 memory is clocked at 8 GT/s; we’re told the base model 4GB cards will be clocked at 7 GT/s, though partners are free to use higher memory clocks if they want. How much that will hurt performance remains to be seen—and we could see overclocking get back to 8+ GT/s. For most games, having more than 4GB of VRAM isn’t critical, particularly at the settings the RX 480 is designed to handle, but there are at least a few games that that can benefit from additional VRAM.
Polaris architecture and new features
RX 480 Core Specs
Transistors: 5.7 billion
Die size: 232 mm^2
Process: GloFo 14nm FinFET
Compute units: 36
Stream processors: 2304
Texture units: 144
ROPs: 32
Base clock: 1120MHz
Boost clock: 1266MHz
GFLOPS (boost): 5834
Memory bus: 256-bit
L2 cache: 2048KB
Memory: 4GB/8GB GDDR5
GDDR5 speed: 7000+/8000+ MT/s
Bandwidth: 224/256GB/s
TDP: 150W
Getting into the particulars of the GPU, the Polaris 10 at the heart of the RX 480 uses Samsung’s 14nm FinFET technology, licensed to GlobalFoundries. This is a massive jump in feature size from the previous generation’s 28nm planar transistors, though it’s not clear how much of a difference there is between Samsung’s 14nm FinFET and TSMC’s 16nm FinFET. By the numbers it would look like a 12.5 percent shrink from 16nm to 14nm, but AMD’s own Joe Macri stated last year at an RTG conference that we shouldn’t “get too hung up” on the numbers, as the differences may not be all that significant. Whatever that means in practice, what we can say is that 14nm FinFET has resulted in a big shrink in die size.
AMD’s Hawaii architecture (R9 290/290X/390/390X) is a great comparison point, as performance is going to be pretty similar to the RX 480. Hawaii sports 6.2 billion transistors in a 438 mm^2 package, with peak performance (in the 390X) of 5914 GFLOPS. Polaris 10 in contrast is a bit more than half the size of Hawaii, with slightly fewer transistors and performance of up to 5834 GFLOPS. Power has also dropped from a 275W TDP on the R9 390X to just 150W on the RX 480, so Polaris 10 appears to be about twice as efficient as the previous generation—AMD quotes a 1.9X increase in performance per Watt for RX 480 vs. R9 290, and up to 2.8X perf per Watt for RX 480 vs. R9 270X.
On the architecture front, AMD says the new RX series CUs (Compute Units) are 15% faster than before (using Hawaii / R9 290 as a comparison point). The performance improvements come thanks to several new additions. First, AMD has doubled the L2 cache size from 1MB to 2MB, and they’ve tuned the cache behavior. Specifically, the L2 cache supports client cache request grouping and improved cache and memory access efficiency. Next, instruction prefetch is improved, increasing overall efficiency by reducing pipeline stalls. Third, the per wave instruction buffer size has been increased, which helps improve single-threaded (or per-thread) performance. And finally, much like Nvidia’s Pascal, native support for FP16 and INT16 operations has been added, presumably offering double the performance of FP32/INT32 at reduced precision. FP16 has been used in quite a few SoC graphics solutions as a cost effective alternative to FP32, and it also has useful applications in computer vision and machine learning.
All of the above changes take place within the CU, but there are other architectural changes as well. One of the big ones is an enhanced geometry engine, with a new primitive discard accelerator. This helps to cull (remove) triangles early in the pipeline, focusing specifically on those that will have zero area (pixels) in the final output. AMD notes that this helps a lot when using MSAA, in particular at higher levels (4xMSAA and 8xMSAA). AMD provided some data showing performance with and without the primitive discard accelerator running, using TessMark, and they show a 200-350 percent improvement in tessellation performance. The geometry engine also has a new index cache that helps with small instanced geometry (think of all the small units in Ashes of the Singularity as an example).
Finally, AMD has added improved lossless delta color compression (DCC) to their fourth generation GCN architecture. Much like Nvidia did with Pascal, AMD is looking to improve effective bandwidth without having to include things like a 384-bit or 512-bit memory bus. Polaris 10 will use a 256-bit bus, which is a significant change compared to Hawaii, even with the faster memory speeds. R9 390 has 384GB/s of bandwidth while the RX 480 ‘only’ has 256GB/s…except the second gen GCN in Hawaii didn’t have any sort of DCC, and while Fiji did have some DCC, it wasn’t really needed—512GB/s basically made it superfluous. With their new 2/4/8:1 compression ratios in Polaris 10, AMD says it effectively boosts usable bandwidth by around 35 percent. That means roughly 346GB/s effective bandwidth, and suddenly the bandwidth gap between Hawaii and Polaris 10 is not so big.
Combined, the above have the effective of greatly improving bandwidth utilization—more L2 cache, DCC, primitive discard all reduce bandwidth needs. Dropping from a 512-bit bus down to a 256-bit bus also reduces power requirements, which ends up going back into improving performance elsewhere. From a high level, then, it’s pretty clear that AMD is aiming to beat GTX 970 and come close to R9 390 (or beat R9 290 if you prefer), all at substantially lower price and power targets. And we can safely say that they succeed at these goals, but we still need to look at real-world performance.
Before we get to the benchmarks, there are a few other updates worthy of mention. First, Polaris 10 and 11 bring DisplayPort 1.4-HDR/HDMI 2.0b to the party. All of the new line of graphics cards from AMD and Nvidia include DP1.3/1.4, along with HDR, and while there aren’t any displays that use DP1.4 yet, they’re coming. This won’t matter for existing displays, but it opens the door for things like 4K 120Hz displays, or 5K 60Hz over a single cable; we expect to see some of these displays by the end of the year. Combined with variable refresh rate technologies like G-Sync and FreeSync, we’re very excited to see the new displays. Just don’t be surprised when the HDR models cost an arm and a leg at launch, but they’ll come down in price over time.
Last but not least, for media center duties as well as streaming (via Twitch as an example), the video decoding/encoding block on Polaris is the most potent solution AMD has ever offered. It can handle up to 4K60 Main-10 HEVC decode, 4K VP9, 4K30 MJPEG, and 4K120 H.264 decoding. That’s pretty impressive, but AMD’s not done yet. On the encoding side, the hardware supports H.264 at 1080p120, 1440p60, and 4K30, and for HEVC (H.265) Polaris can do 1080p240, 1440p120, and 4K60. Polaris also supports 2-pass encoding, which can help improve quality without increasing the bitrate, though that won’t be useful for live streaming.
The need for speed
For testing, we’ve used our usual collection of hardware, an overclocked 4.2GHz i7-5930K running on the X99 platform. While that might seem excessive, we’ve done some other tests and found that once you’re running an overclocked Core i5 or above, performance in games is basically a wash—everything from an overclocked i5-6600K to the i7-6950X falls within a few percent. So while our test platform is a high-end build intended to eliminate other potential bottlenecks as much as possible, you can get similar performance from a mainstream Core i5 build.
It’s also important to talk about drivers for a moment. We’ve tested the RX 480 with AMD’s latest Crimson 16.6.2 launch drivers, and we noticed a few relatively large gains compared to previous generation AMD cards—specifically, the R9 390—in at least two games. Whether the gains are from drivers or game patches isn’t clear, but we did take the time to retest the R9 390 using the same 16.6.2 drivers. For Nvidia, we’ve tested the GTX 1070 and 1080 with the latest 368.39 drivers, but many of the older cards were tested previously using different drivers on older versions of the games. Obviously, that doesn’t apply to recently added titles like Hitman (2016) or Doom, but there may be differences in some cases. We’re working to retest, but time as always is a problem; there’s a certain margin of error involved here, but it’s typically only a few percent and doesn’t generally affect our opinion of any of the cards involved.
Something else we need to point out is our use of reference model cards, as much as possible. AMD’s new RX 480 along with their R9 Fury X and Nano are reference models sporting ‘stock’ clocks, while many retail cards may come with slight overclocks and different cooling; the R9 Fury (Asus Strix), R9 380X (Sapphire), and R9 380 (Sapphire) are all slightly overclocked. On the Nvidia side, the GTX 1080 and GTX 1070 are ‘Founders Edition’ cards, which is basically the same thing as a reference model. The GTX Titan X, 980 Ti, and 980 are also reference cards, while the GTX 970 (Zotac), GTX 960 (EVGA), and GTX 950 (Asus) are custom models that are slightly overclocked. We’ll discuss overclocking of the RX 480 below, but in most cases you can get about 10 percent higher performance than stock on AMD via overclocking, and 15-25 percent higher than stock performance on Nvidia via overclocking. YMMV, naturally.
Here’s the high level performance overview of all the graphics cards we currently have for testing. The RX 480 ends up in the middle of the pack, but it’s also priced lower than most of the other cards, and many of the cards have dropped in price by 15 percent or more thanks to the anticipation and launch of newer models. Of course the above chart doesn’t help much if you have an older GPU, so if you want additional context, the GTX 760, GTX 670, and HD 7950 Boost (R9 280) all fall roughly between the GTX 950 and 960, closer to the 960 in most cases. The HD 7850 (R7 765), GTX 660, and GTX 570 are generally about 25-30 percent slower than a GTX 950. Basically, as rough estimate you can subtract about 10 points from the model number per generation, or 100 points on the older HD series of AMD chips, though features and other aspects of the hardware definitely come into play.
Looking at the big picture, then, just as the core specifications would indicate, the RX 480 ends up being very comparable to the R9 390. If that doesn’t strike you as impressive, keep in mind the significantly lower price and power requirements—and as we’ll see when we look at the individual games, there are at least a few instances where the architectural improvements of Polaris put the newcomer on top. There are also cases where the sheer brute force of the R9 390 wins out by a decent margin, mostly thanks to the higher memory bandwidth. The RX 480 doesn’t compete in the same league as the high-end cards like the GTX 1070, but that was never the point. It’s priced as a mainstream card, surpassing the GTX 970 by a moderate amount and bringing higher performance to lower price brackets.
Our individual gaming charts are in the following gallery, with limited commentary on each. We’re including all the modern cards that we’ve tested, to give you the complete view of the market. We understand the RX 480 at $200-$240 is in a completely different market from the GTX 1080 at $600-$700, and we’re reviewing it as such, but that doesn’t mean people aren’t interested in seeing the other cards.
Just to highlight a few key points before we move on, AMD’s RX 480—and GCN in general—do really well in the only two DX12 benchmarks we’ve included, Ashes of the Singularity and Hitman (2016). Ashes performance in particular jumped by over 20 percent for AMD recently, but it’s not clear if that was driver updates or game optimizations. The only difficulty with trying to draw conclusions from our DX12 results is that both the DX12 games are also AMD Gaming Evolved releases, meaning they’re more likely to receive heavy tuning for AMD hardware. The same thing happens with Nvidia’s The Way It’s Meant To Be Played (TWIMTBP) program, and we include a selection from both camps in our testing for just that reason. Long-term, AMD’s GCN may prove to be a superior solution for DX12 and Vulkan, but it’s far too early to make such a bold statement.
Without getting too bogged down in the minutia, the RX 480 ends up competing well against the other cards, and more importantly it brings performance that used to cost $300 or more down into the $200 price range. This isn’t the sort of card where we’d recommend ditching a GTX 970 or R9 390, but if you’re using an older GPU like a GTX 760 or R9 765—or anything older/slower than those cards—you could potentially double your performance or more. You could have done that last year, of course, if you were willing to spend the money, but people who flinch at a $300+ GPU don’t usually find $200 cards as off-putting.
Speeding up time (aka overclocking)
One of the things we were really curious about with the latest generation of GPUs is overclocking potential. We’ve seen Nvidia’s GTX 1080 and 1070, and frankly the overclocks weren’t quite as high as on previous Nvidia GPUs—15 percent rather than 20-25 percent overclocks. The shrink in process technology may be part of the problem, or maybe everyone is pushing their stock clocks just a bit higher—the latter seems more likely, as both AMD and Nvidia have talked about working to ensure they don’t leave a lot of untapped performance on the table.
For overclocking the RX 480, at present the only way to do so is through AMD’s new WattMan utility, part of the AMD Crimson driver settings. I have to be honest here and say that I found WattMan to be a bit of a pain in the butt initially, and my attempts to dial in stable overclocks resulted in more crashes and hard locks of my test system than I’ve experienced in months. Yuck! I must have hard-locked over twenty times, and even seemingly minor tweaks could mean the difference between completing a benchmark with decent performance and crashing to the desktop—and every crash to the desktop ultimately required a reboot before I could continue testing properly. After much trial and error, here’s where I ended:
If that looks and sounds bad, well, it sort of is. I had hoped that my days of 5-10 percent overclocks on AMD hardware might change with Polaris, but instead I had to ultimately settle for a meager six percent increase in core clocks. 6.5 and even 7.5 percent could pass certain tests, but Hitman and Ashes of the Singularity in particular wouldn’t work at anything more than six percent. On the other hand, the memory was able to max out the WattMan slider and seemed to run fine at the +250MHz setting, giving a final speed of 9 GT/s. If you’re thinking lower voltages might help (1150mV is the maximum allowed by WattMan for our card, so I couldn’t go higher), 1125mV failed to complete most of our benchmarks, and 1100mV would hard-lock almost immediately, so the added voltage is definitely needed (at least on our sample, as well as a sample from one of our UK cohorts).
But don’t go drawing too many conclusions, as the reference RX 480 is a pretty conservative design. It has a single 6-pin PEG connector, which means 150W power while remaining in spec, and the blower fan works well enough at stock but requires a big bump in RPMs to keep overclocks stable. And that bump in fan speeds means substantially more noise. Custom cards with 8-pin adapters, better VRMs, and better cooling will almost certainly improve the situation, though I’d temper expectations and not plan on much better than 10-15 percent core overclocks.
The good news is that overclocking isn’t just about that six percent increase in core clocks. The power targeting by default means that you’ll see clocks drop down from the maximum 1266MHz and fall closer to 1120MHz. The overclocked settings shown above help keep the card at higher boost clocks, so overall performance gains end up being more than six percent. The combination of higher power limits, improved core clocks, and higher memory speed end up yielding an 11 percent overall increase in performance.
Hopefully WattMan can be further improved, and it would be great to see some form of auto-overclocking software that does a better job at dialing in appropriate values. The brute force approach I resorted to isn’t the most elegant, and the frequent crashes plus the inability to save OC profiles in WattMan are unfortunate. WattMan feels more like a beta utility right now, so hopefully driver updates will improve the functionality. High fan speeds resulting in a lot of noise are equally unpleasant, and at least for the reference design, I’d be inclined to bump up VRAM clocks by 150MHz, boost the power slider to max, and not do too much more.
A great card, if not a flagship
The messaging behind AMD’s RX 480 ends up coming off a bit mixed. For one, AMD maintains that the R9 Fury X is their fastest graphics card, and it is—but it’s still haunted by the 4GB HBM compromise required to launch last year, not to mention substantially higher power requirements. The RX 480 ends up being the young rookie that everyone likes to watch, but we’re waiting to see the post-graduate version decked out in Vega 10 attire. If you’re looking to join the Radeon Rebellion, you don’t already have an R9 290 or faster card, and you can’t spend more than $200, then RX 480 4GB is a great card at $200, and still a good card for the 8GB $240 model. But when AMD talks about bringing VR to the masses with a $200 card compared to their previous $300, and you still need to factor in the price of a $600 or $800 Rift or Vive? Somehow I don’t think people hoping to get into VR kits are going to be lining up for this one. It doesn’t help that we still haven’t seen any ‘killer’ VR games.
So forget VR, forget the revolution, and instead look at what AMD has delivered. This is a great new mainstream card, bringing maximum quality 1080p gaming back down to the $200 price point, and in many games even 1440p ultra is viable. We’ve had this level of performance for at least a year and a half, but it used to be $500 and then $300, and now it’s even more affordable. More important I think is the idea that you can convert just about any moderate PC into a true gaming PC by spending $200, and come next year when Project Scorpio launches, you’ll still be equal to the latest consoles in terms of performance.
As I did with the GTX 1070, there’s another way of looking at things: bang for the buck or performance per pound. Nvidia’s latest GPUs have been selling like hotcakes—that or else the supply has been seriously constrained, or maybe a bit of both. It’s entirely possible that the same thing will happen with RX 480, because if you were moderately interested in an R9 390 at roughly $300 earlier this year, getting nearly the same thing (minus 125W of power draw) for $50-$100 less will certainly push you over the tipping point. As such, street prices might be higher than MSRP for a bit, but we’ll have to wait and see; I’ve used the best readily available street pricing I’ve been able to find for the cards, and more specifically I’ve plugged in $620 for the GTX 1080, $400 for the GTX 1070, and $240 for the RX 480 8GB.
It’s not too surprising to see the RX 480 jump to the top of the bang for the buck chart, and it’s basically there on the UK chart as well, tying with the R9 380. Both charts are going to be volatile, since prices are in constant flux, which means this is really just a snapshot in time, but they illustrate quite well how there’s a sweet spot in terms of maximizing value. High-end cards may perform best, but the added cost often makes them a worse value overall. Of course we could also factor in the price of the rest of the system, which would change the overall picture quite a bit—if you’re spending $1000 on a PC for gaming purposes, it’s far more beneficial to get a faster GPU to maximize FPS per dollar, but I digress.
The main concern with these charts is that there’s a big gap in AMD’s portfolio; Nvidia rules the middle of the chart, and particularly the higher performance market. The Fury X/Fury/Nano are simply outclassed by the GTX 1070/1080, as well as the previous generation 980 Ti. If you don’t want to spend more than $250, AMD is an easy recommendation, but if the RX 480 isn’t fast enough for your intended use, Nvidia is the only sensible choice.
There’s still that whole DX12/Vulkan topic, however. I know earlier in the gaming discussion I noted that all of the DX12 games that heavily favor AMD are also AMD branded releases, but we have yet to see any DX12 or Vulkan releases where Nvidia comes out on top. It may be premature to come to a definitive conclusion, but the evidence so far points to GCN and AMD being either better suited to DX12, or easier to extract performance from using DX12, or both. Ashes and Hitman have started a pattern, and Forza, Gears of War, and Quantum Break haven’t radically altered things. The worst we’ve seen of DX12 so far is Rise of the Tomb Raider, where no one actually benefitted from the API switch. If DX12 continues to gain momentum over the coming year (which isn’t a given by any means), the RX 480 could end up being the best $200 graphics card we’ve ever seen, bar none. By then we’ll probably have something even better from AMD in the form of Vega, but that’s a story for later this year.
And we’re still not finished with new GPUs. If you want a high-end offering, the 1070/1080 can be had with a bit of patience. RX 480 nabs the $200-$240 market and scores a decisive win. But what about cards priced below the RX 480? Well, we can officially show some of the specs for the RX 470 and RX 460 now, but we don’t have pricing information yet, other than AMD’s statement that Polaris will cover the $100-$300 market. RX 460 looks to be roughly half the performance of the RX 480, with an equally low 75W TDP. The RX 470 may be more interesting, as it keeps the 256-bit memory bus and will likely come in around the $150 mark. And if that’s not enough to keep you interested, GTX 1060 photos are starting to crop up from Hong Kong. That’s honestly sooner than I expected, considering the GTX 960 lagged behind the 980/970 launch by four months, but perhaps Nvidia is feeling some mainstream pressure.
Polaris is also AMD’s first real chance at getting back into the high-end gaming notebook market in a while. Ever since the HD 7970M, AMD’s mobile GPUs have basically played second fiddle to whatever Nvidia was dishing up, and rebranding Pitcairn as the 8970M and then the R9 M290X (with a minor adjustment to clock speeds) didn’t help. Polaris 10 on the other hand could provide some much-needed competition to the GTX 980M and company, though a mobile GTX 1070/1080 is likely still out of reach.
Whatever your budget, the new generation of graphics cards is ready to take your money, delivering better performance and/or lower power at every price point. But I’d be lying if I didn’t say both Nvidia and AMD are playing it a bit safe right now, shrinking and tweaking existing designs without doing a complete overhaul. The architectures are better, but it’s a lot like Intel’s tick-tock cadence for CPU updates, except that in the world of GPU updates it’s easier to extract more performance by increasing core counts, giving us 25-30 percent performance increases each generation. Thank goodness for that, as I always need more fps (and I’m still looking forward to 120Hz 4K displays)!
For those that don’t have pockets deep enough to keep up with the bleeding edge, AMD’s Polaris 10 architecture aims for the sweet spot and ends up being mighty tasty. It may not be the fillet mignon of graphics cards, but you can enjoy several good RX 480 steaks for the price of a GTX 1080 fillet. Besides, rib-eyes taste better than fillet mignon.