Introduction
Since the arrival of my new Ryzen system I had problems with random crashes. Seems it’s a memory timing problem, at least that’s what Google suggests. I bought a 16GB Corsair Vengeance memory kit, consisting of 2x8GB RAM bars, a suggestion from a colleague. She’s using the same ASUS-board with that memory without problems, but her box isn’t running 24/7. It’s a Win10 Gaming PC.
Take 1
First thing to do: Update the BIOS, or UEFI, as it’s called ‘nowadays. Of course that didn’t help. Also, the changelog wasn’t very helpful (“improving stability”), thanks, ASUS!
So I ventured out and and did some research on overclocking and memory timing. Turns out that Intel invented XMP (Extreme Memory Profile). Basically, it’s timing data stored on the memory bar, read and used by the BIOS/UEFI. AMD didn’t want to pay the license fee, so they called their version AMP. ASUS, the mainboard manufacturer, called its implementation of AMP D.O.C.P. (Direct Over Clock Profile).
Full of hope, I turned on D.O.C.P. It set the timing data to the suggested values from this document. It didn’t help. Guess when I bought the new gear:
The crashes have absolutely nothing to do with CPU load or temperature, quite the opposite. Mostly, they happen during the night, after backup 🙁
Take 2
“What the hell”, I figured, and bought another memory kit. This time 16GB G.Skill Flare X for only € 197,46. After that the box ran for whopping 5 days without a crash, yay! It still crashed, though. This time I didn’t have X running and saw a kernel message that PID 1 (systemd) was stuck: “Soft lockup: CPU#3 stuck for 23s” (or similar). As always, I could only recover the box by hardware reset.
Take 2.5
So today (2017/11/24) I updated to the latest BIOS/UEFI (PRIME-X370-PRO-ASUS-3203.CAP) and down-clocked the memory to 2199 or 2133Mhz, don’t remember the exact number. Let’s see how that turns out!
[To be (dis)-continued…]