GPUs Graphics Reviews

Vapor gate including failure of the entire quality management? Background on the cooler problem of the AMD Radeon RX 7900 XTX (MBA)

Before I mention more numbers about the Radeon RX 7900 XTX and the cooler problem, as well as a few words about the supply chain and its failure in quality management, we also have to talk about quality management in general and quality assurance as well as quality control in particular today. Why? Because it helps to better understand the complex interrelationships and also to classify them, since AMD itself is not a manufacturer, but has them manufactured. And this contract manufacturer, in turn, relies on other suppliers, who in turn have third-party and partner companies work for them. The causal chain therefore seems to be almost endless and if even one link breaks, then good night.

However, this is exactly what seems to have happened with the current problem with AMD’s own MBA cards (Made by AMD). In the end, one can only feel sorry for Scott Herkelman, because to give a live interview at a time when not all details could be clarified for sure, I think is quite explosive and breakneck. Because actually he could only lose here, at least as far as the statements made were concerned. The fact that one uses the publicity of a medium, in order to deflect and/or calm down, is given. After all, it is also the job of PR to do such things. But one should really be careful at this point where the boundaries between reality and wishful thinking run. After all, everything is verifiable, even if unfortunately often only much later. And the customers are not late risers.

Scott Herkelman interviewed by PC World (from the linked YT video)

Important preliminary remark

To be fair, I have to state that some of my conclusions are based on information from third parties (involved parties), but it seems completely plausible when looking at the usual business processes. I am, however, familiar with the processes at the manufacturers involved, because there are no real secrets there either. But what went wrong in detail must now be worked out and clarified by quality management as quickly and completely as possible. This is neither my task nor that of any third parties or outsiders. I have also clearly broken down the following processes and considerations on quality management so that the normal reader does not fall asleep bored over his morning coffee.

The globality and complexity of a quality management system includes all customer-supplier relationships and acts as a permanent, intensive interrelationship between them. And when we speak of quality here, the quality concept used stands as a unity of condition and quality requirement. The quality demanded is therefore not the maximum feasibility, but the correct realization of the quality requirements defined for a cooler. errors can always happen and there is no such thing as 100% error-free production. But one must always strive to approach this unattainable ideal with a so-called zero-failure strategy. 

Fact check

Let’s therefore get to the fact check first before our quality management investigations. The first statement that is surprising is that the interview talked about a wide availability of exchange cards. Since I am involved in an RMA process myself, I will quote from the letter from AMD support to the customer, which is also from the same period, so it is up-to-date. While the first letter (from the beginning of January 2023) still spoke of around two weeks for the replacement already applied for, this circumstance has now turned out to be no longer tenable. I therefore quote once from the letter now available to me, which came as a response to the exchange already agreed with AMD:

We understand that you want a replacement for your RX 7900 XTX. It is important to know that at the moment we are not able to replace your card as we do not have stock available in our warehouses. We can start the process as soon as the inventory is replenished, but right now we don’t have an estimated date for replenishment.
If you prefer a refund instead, we can process that refund immediately, and we will provide you with a return label so you can send the card back to our warehouse (translation from German AMD suppoert)

 

This could have been communicated properly in the interview, indeed it should have been. The fact that the PR contradicts the support here is actually no longer comprehensible. In addition, Scott Herkelman speaks in the interview of only a small number of affected cards, which can certainly not be so true. While various board partners (AIB) report 9 to 11 percent of RMA cases now realized or requested, the information from system integrators (SI) and finished PC manufacturers is much more alarming. Because there are now also disassembled or not completely assembled PCs with rates of well over 10% on stockpile or standby. This may also be due to the fact that SIs test much more purposefully than many buyers of single cards (due to lack of knowledge). Selling a clock loss due to thermal throttling as a feature is downright absurd because the memory, whose operating temperature was specified far below 110 °C, is also fried here. You really have to blame the interviewer for not following up on some points here. Unfortunately, that’s soft washing à la carte and was guaranteed to have been agreed upon in advance.

Apart from the fact that some SIs have already run out of storage space (we are also talking about an amount of systems in the higher three-digit range in each case), this unfortunately plays right into the hands of the green competitor. I wouldn’t even be surprised if NVIDIA would ensure that a simple replacement with a GeForce RTX 4080 would be pushed here with special discount promotions. At least I know that PCs have already shipped with cards converted to NVIDIA products. That is at least bitter for the market penetration in the SI sector, which is already a difficult patch for AMD. If I were NVIDIA, Jensen would easily pay for that out of his leather wallet. After the adapter gate from the November event, which was presented with media attention, now the vapor gate as revenge? Karma is a b1tch.

Basic problem: AMD is not a manufacturer, but quality management is a serious science

I always find it difficult (for understandable reasons) to name exact sources. Because this will burn them faster than you can say pug. And it’s also a matter of decency. But we can reveal a few things today that are certainly not familiar to the masses. This already starts with the chain of the entire contract manufacturing, which AMD entrusts to an experienced manufacturer (OEM and ODM) like PC Partner from Dongguan. But this manufacturer is actually only the final manufacturer, who in turn commissions third and fourth parties to supply further components in a long chain.

What does it actually take to make a graphics card or do the finishing work? First of all, you commission board manufacturers to deliver the bare PCBs to the manufacturer, you buy all the components and sometimes even have the belts for the SMD components already assembled by third parties – and you buy a complete cooler. Before I go into this cooler and the other entanglements in the delivery process, let’s take a rough look at this with quality management first so we can also grasp the fundamental problem.

The first problem, despite ISO 9000:2015 , is the exact demarcation of the 3 areas of quality management (QM), quality assurance (QA, Quality Assurance) and quality control (QC, Quality Control), which cannot be answered so precisely at all. This is because the entire supply chains and distributed responsibilities blur the definitions here and create fluid transitions that make exact demarcation difficult to impossible. Furthermore, each manufacturer can also introduce its very own definitions and transitions, respectively adapt these sensitivities to the exact circumstances, conditions and requirements of the respective end product. In any case, it is essential that the higher-level OEM, together with the client, defines and ultimately enforces the meaning, definition and delimitation of QM, QA and QC. 

While quality management is in the hands of AMD and, to a large extent, the contract manufacturer, third-party providers and suppliers must fit in seamlessly here. This is definitely a rocky road in the case of the RX 7900 XTX, as the cooler is supplied by Cooler Master. However, Cooler Master is also only “man-in-the-middle”, because the cooling construction including the vapor chamber is again made by Asia Vital Components Co, Ltd. (also known as AVC for short). So while QM is up to AMD and the contract manufacturer, Cooler Master has to subordinate and ensure QA on their part. The actual manufacturer AVC must in turn subordinate itself to the superordinate QA (and thus also to the general QM) and ensure QC on site during production. Let’s take a quick look at how QA and QC are directly related:

The QA should have already been applied to the bare cooler. You have to assign the responsibility to the Cooler Master product manager in question, even if AVC is the actual culprit, i.e. the guilty party. QA at AVC has to be controlled by the client, that’s just the way it is. In detail, this means that QA always takes a proactive approach. This means that a defect should be prevented by a process or product design, before it even occurs. QC in the production process and final inspection, on the other hand, takes a purely reactive approachand focuses on the detection and verification of defects, that have already occurred.

QA takes a process-oriented approach and focuses on preventing quality problems by implementing processes and methods, while QC takes a product-oriented approach and focuses on identifying quality problems in incoming goods (more on this on the next page), within the manufacturing process, or during final inspection and release (also more on this in a moment). QA’s quality systems include methods and procedures for meeting quality standards and requirements. The quality control systems measure, test and verify before and during the production process. I don’t want to go any further than that here.

In summary, while QC refers to the actual product and QA to the processes required for it, QM considers the entire value chain and also includes all planning aspects. Quality management also includes the requirements of ISO 9001 and all management objectives and strategies. QM therefore evaluates these requirements holistically and then breaks them down to the individual units and processes, which includes quality requirements. None of the 3 aspects may be neglected or excluded. Without QA and QC, QM would be completely powerless, as in this case.

So we see that a defective chamber could (and should) have been detected by already testing samples in the finished product from each batch (series), which at AVC is generally 10,000 units. And for those who still remember: AMD, PC Partner, Cooler Master and AVC have not learned anything from mistakes already made. Does anyone still remember the dreadful pump whining of AMD’s FuryX? It took a German editor to find the trapped air bubbles, while the entire Asia Connection fumbled in the dark for weeks. Sometimes it was the PWM controller, sometimes the stupid customer with a wrong installation. All just cheese. Fun fact: AVC later even supplied the defective pumps to Gigabyte for their Waterforce, which I was then allowed to uncover. Well, if the product managers for AMD and NVIDIA don’t talk to each other, something like this will come out. The controller at AVC must have been pleased with the profitable disposal of partially defective pumps.

A quiet pump for Gigabytes GTX 980 Ti Xtreme Gaming Waterforce | Retro 5 years ago

When nothing matches

Exactly at this point, however, I would still have an insider information that actually makes a meaningful QA hardly possible. Well, we’re not on Tinder now, but “match” has a lot of meaning in QA, too. I already wrote that the serial number of a graphics card can be used to infer the series of the cooler, whose QR code provides the necessary information. However, anyone who believes that this will also allow them to locate the series (batch) of the vapor chamber may now be disappointed.

For this to be possible at all, the batches of the vapor chamber and the finished cooler would have to be congruent, i.e. “matched”. Only if a batch of coolers is continuously equipped with only one continuous batch of these vapor chambers can one speak of a traceable process at all. However, according to my information, this is not the case. If this is true, then partially defective vapor chambers can spread wildly over many batches of coolers because different batches were mixed in the factory. However, anyone who dares to do something like this also has the damned duty to thoroughly test and check all components until the end. The word QA contains “Assurance”, and that’s what matters.

I wish AMD (and in the end especially Cooler Master) a lot of fun and nice weather to be able to quantify this at all validly. That’s why it’s so difficult: you just don’t know. So in the end, you will only be able to act reactively and exchange. If the circumstances I mentioned are really true, then a targeted recall would be technically and logistically impossible at all. However, you can read why PC Partner also failed miserably on the next page, where we will talk about the exact QM at the manufacturer, because I have already been to a PC Partner factory and know what I am writing about.

 

378 Antworten

Kommentar

Lade neue Kommentare

Derfnam

Urgestein

7,517 Kommentare 2,029 Likes

Moin ersma :)
Kann sein, dass ich mental noch im Bett bin, aber bei mir hakt es hier, wo du schreibst:
Mal abgesehen davon, dass manchen SI bereits der Lagerplatz ausgegangen ist (wir reden hier auch von Systemen im jeweils höheren dreistelligen Bereich), (...)
Vierstellig, oder? Allein die Karte kostet ja schon Tausendirgendwas.

Antwort Gefällt mir

Andy197

Veteran

196 Kommentare 96 Likes

Wenn AMD wirklich nicht nach Seriennummer gehen kann, wird's aber ziemlich unangenehm für die :/
Bin Mal gespannt wie lange das Thema noch durch die Foren schwirrt

Antwort Gefällt mir

Igor Wallossek

1

10,193 Kommentare 18,807 Likes

Stückzahlen, nicht Preis :D

Antwort Gefällt mir

konkretor

Veteran

297 Kommentare 300 Likes

Die Frage ist ob man auch genügend Ersatzteile liefern kann in der kurzen Zeit. Sind die Ersatzteile auch Fehlerfrei? Kann die erhöhte Nachfrage bedient werden, da jetzt Menge x mehr benötigt wird.

Ich vermute man will die Meldung über einen Lieferstopp vermeiden.

Antwort Gefällt mir

Andy197

Veteran

196 Kommentare 96 Likes

Naja würde doch schon gesagt, dass nicht bekannt ist, wann die Lager wieder voll sind für einen Umtausch :D schätze Mal das der kühler auch nicht im überfluss irgendwo gelagert wird. Ist bestimmt so kalkuliert das es ziemlich exakt so viele kühler gibt, wie auch GPUs "vom Band fliegen"

Antwort Gefällt mir

Igor Wallossek

1

10,193 Kommentare 18,807 Likes

Den gibt es längst ;)

Und die Menge X ist nicht bezifferbar, da die Chambers offensichtlich ungematcht verbaut wurden. Das ist wie bei Corona. Jeder unerkannte Spreader macht größere Gruppen zu Risikopatienten. Alles kann, nichts muss. Vetrackte Situation, aber da sind alle selbst schuld. Wenn nicht mal triviale Qualitätsregelkreise greifen, dann ist das ein hausgemachtes Problem mit Ansage.

Ich bin noch nicht mal BWLer (mir zu trocken und langweilig), aber ein paar Grundlagen sollte wirklich jeder kennen, der ein Werkzeug in die Hand nimmt, um später was zu verkaufen (oder fair bewerten zu können). Als Redakteur sollte man auch seinen Arsch durch diverse Fabriken geschleift haben, um Produkte später wirklich objektiv testen zu können. Realitäten muss man nun mal (an)erkennen.

Antwort 7 Likes

Derfnam

Urgestein

7,517 Kommentare 2,029 Likes

Zu früh, sag ich doch. Danke für die Aufklärung. Ob ein Q & A zur QA kommt^^?
Von wem auch immer.

Antwort Gefällt mir

Inxession

Mitglied

51 Kommentare 43 Likes

Absolut blöde Situation für alle Beteiligten.

Finde den Umgang von AMD mit dem Problem allerdings nachvollziehbar.
Und bedeutend "offener" als nVidia jemals irgendwelche Probleme kommuniziert hat (970er / neuer Stecker / etc)

Das es dauert bis sowas anläuft und sich bis dahin eigentlich jedes Wort verkniffen werden müsste ist allerdings korrekt.

Naja... sehr schade.
Gerade AMD hat so was nicht verdient.

Antwort Gefällt mir

Derfnam

Urgestein

7,517 Kommentare 2,029 Likes
c
cunhell

Urgestein

549 Kommentare 503 Likes

Ist sicher alles richtig. Nur wenn schon einer der Partner behauptet A getan zu haben, es aber nicht durchgeführt hat, wird schon alleine durch einen Beteiligten das Qualitätsmanagement ad absurdum geführt. Gerade in China mit seiner Null-Covid-Stragegie und deren Auswirkungen auf die Firmen verleitet doch schon, manchmal Fünfe gerade sein zu lassen, wenn Geschäftliches auf dem Spiel steht. Bei festen Lieferterminen und zu wenig Leute und Zeit nimmt man es ggf. mal nicht so genau.

Cunhell

Antwort Gefällt mir

R
RazielNoir

Veteran

337 Kommentare 111 Likes

Das ist nicht nur in China so. So lange es gut geht und keiner meckert bzw. das an die große Glocke hängt, versucht man auch in der Restlichen Welt oft genug nur das zu tun, was Geld bringt. Und lässt im Zweifel anderes eben "alle fünfe gerade sein".

Antwort 1 Like

R
RX Vega_1975

Urgestein

575 Kommentare 75 Likes

@Igor Wallossek
Wieviel Karten wurden denn dann in etwa wirklich verkauft, wenns so viele Batches geben soll und eine Batch 10.000 Stück aufweist.
Dies muss dann wirklich um gute 100.000 Karten liegen - XT und XTX gemeinsam oder weitaus mehr?

Antwort Gefällt mir

Igor Wallossek

1

10,193 Kommentare 18,807 Likes

Ich denke mal, inzwischen mehr. Es ist ja zudem nicht sicher, dass alle 10.000 Chambers einer Batch gleich stark betroffen sind, sondern dass eine Batch mehr oder weniger fehlerfreie Einheiten entält, die zudem unterschiedlich stark geschädigt sind. Das können dann 10.000, 5.000 oder nur 10 sein. Oder vielleich 8.000 leicht Teildefekte, die aber keiner bemerkt. Weiß keiner, weil die QA komplett versagt hat

Antwort 3 Likes

L
Legalev

Mitglied

46 Kommentare 50 Likes

Sowas gibt es überall.
Bin im Baustofffachhandel und was ich da Teilweise erlebe bei Projektvergabe und wer das dann ausführt.
Billiger geht immer, und wenn das dann eine Kolonne aus Rumänien macht und der gesamte Trockenbau in einem Neubau komplett wieder entfernt werden muss mit Schadenhöhe von fast 100K.
Also Alltag

Antwort 4 Likes

LurkingInShadows

Urgestein

1,348 Kommentare 551 Likes

unmatched ohne alle Komponenten zu loggen? Derjenige, der sich das ausgedacht hat, gehört eigentlich erschossen.

Muss ja nicht komplett deckungsgleich sein,zB wird man nicht eine funktionierende VC wegwerfen nur weil beim Nummeridenten Kühler die Finnen krumm sind und er entsorgt wird; ABER es muss geloggt/aus dem Log ersichtlich werden, dass ab da eine Nr. Unterschied ist.

Antwort Gefällt mir

R
RX Vega_1975

Urgestein

575 Kommentare 75 Likes

Dann wird der Verkauf der XTX so wie beschrieben komplett eingestellt,- und ist bereits geschehen seit 20-ten Dezember!
oder gibt es noch Hoffnung auf Neue Karten sofern Problem komplett gelöst werden kann, oder ist da der Zug schon abgefahren und in einigen Monaten will keiner mehr die MBA 7900 XTX Karten erwerben.

Antwort Gefällt mir

O
Oberst

Veteran

337 Kommentare 131 Likes

Und die Kommunikation war auch alles andere als gut. Jetzt zahlt AMD ja scheinbar doch für Rücksendungen bei Wandlung, das hatte man doch zunächst anders kommuniziert. Oder habe ich da das von dir zitierte Schreiben falsch verstanden?
Interessant finde ich auch: Bei Geizhals sind von Powercolor, Asus und Sapphire MBA XTX sofort verfügbar. Werden die potentiell defekten Karten jetzt noch weiter verkauft, oder sind das schon neue mit fehlerfreier VC? Wäre ja total dämlich, potentiell defekte Karten weiter zu verkaufen.

Antwort Gefällt mir

FfFCMAD

Urgestein

670 Kommentare 174 Likes

Gerade AMD hat soetwas verdient.

Antwort 2 Likes

Megaone

Urgestein

1,746 Kommentare 1,645 Likes

Erstmal Danke für den Top recherchierten Artikel. Das bekommt ausser dir kaum jemand hin. An dieser Stelle vorab Gruß an alle, die wieder versuchen mit deinem Know und deinen Beziehungen Klicks zu generieren:

" Hallo ihr Amateure, so schaut guter Journalismus aus!"

Sorry, aber das musste jetzt mal sein.

Der Punkt für mich in deinem Bericht sind folgende Sätze.

"Denn dort stehen ja mittlerweile auch ausgeschlachtete und nicht komplett montierte PCs mit Quoten von weit über 10% auf Halde bzw. Standby. Was auch daran liegen mag, dass die SI deutlich zielgerichteter testen, als viele Käufer von Einzelkarten (mangels Wissen)."

Das wird für mich genau zu dem Punkt werden, an dem ich die Seriösität von AMD messen werde. Wird man versuchen auf Kosten der Kunden die RMA Rate gering zu halten oder wird man Wege suchen, jeden betroffenen Kunden zu erreichen?

Antwort 3 Likes

Danke für die Spende



Du fandest, der Beitrag war interessant und möchtest uns unterstützen? Klasse!

Hier erfährst Du, wie: Hier spenden.

Hier kannst Du per PayPal spenden.

About the author

Igor Wallossek

Editor-in-chief and name-giver of igor'sLAB as the content successor of Tom's Hardware Germany, whose license was returned in June 2019 in order to better meet the qualitative demands of web content and challenges of new media such as YouTube with its own channel.

Computer nerd since 1983, audio freak since 1979 and pretty much open to anything with a plug or battery for over 50 years.

Follow Igor:
YouTube Facebook Instagram Twitter

Werbung

Werbung