{"id":379,"date":"2017-11-30T12:09:42","date_gmt":"2017-11-30T11:09:42","guid":{"rendered":"https:\/\/tollana.d-tor.org\/notes-to-self\/?p=379"},"modified":"2017-11-30T12:10:15","modified_gmt":"2017-11-30T11:10:15","slug":"ryzen-part-3","status":"publish","type":"post","link":"https:\/\/tollana.d-tor.org\/notes-to-self\/?p=379","title":{"rendered":"Ryzen, Part 3"},"content":{"rendered":"<h3>Take 3<\/h3>\n<p>Of course neither the new memory nor the BIOS update fixed the random crashes, but fortunately I&#8217;ve got another hint: Maybe it&#8217;s a linux-specific problem, and not one with memory timing. This <a href=\"https:\/\/bugs.launchpad.net\/ubuntu\/+source\/linux\/+bug\/1690085\">post<\/a> suggests that it&#8217;s a problem with C-States and RCU. Adding the following kernel command line parameters should help:<\/p>\n<pre>processor.<wbr \/>max_cstate=<wbr \/>1\r\nrcu_nocbs=0-11<\/pre>\n<p>As always, nothing is as simple as it seems \ud83d\ude41<\/p>\n<h3>The C-States<\/h3>\n<p>For max_cstate to take effect, I had to actually enable Global C-State-Control in the UEFI-thingy of my mobo! The default was Auto, which in turn defaulted to Disabled. After enabling it, dmesg reported this:<\/p>\n<pre>ACPI: ACPI: processor limited to max C-state 1<\/pre>\n<p>Before that, there was no mentioning of C-states in the kernel log, so I doubt that it has any impact, but one should never give up hope!<\/p>\n<h3>The RCU-Thingy<\/h3>\n<p>What it is (<a href=\"https:\/\/lwn.net\/Articles\/262464\/\">quoting Paul E. McKenney from LKML<\/a>):<\/p>\n<pre>Read-copy update (RCU) is a synchronization mechanism that was added to the Linux kernel in October of 2002. RCU achieves scalability improvements by allowing reads to occur concurrently with updates.<\/pre>\n<p>It&#8217;s much more likely to be the cause of the problem since I once saw something like this after a reboot on the console:<\/p>\n<pre class=\"bz_comment_text\">NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [DOM Worker:1364]<\/pre>\n<p>The process was &#8220;systemd&#8221; instead of &#8220;DOM Worker&#8221;, but that shouldn&#8217;t matter. Anyway, the parameter\u00a0rcu_nocbs=0-11 has no effect if the kernel config option\u00a0CONFIG_RCU_NOCB_CPU is not set. Guess what, in the stock archlinux kernel it&#8217;s unset, lucky me!<\/p>\n<p>So I ventured out to compile a custom kernel for arch. That turned out to be disappointingly easy! The <a href=\"https:\/\/wiki.archlinux.org\/index.php\/Kernels\/Arch_Build_System\">available documentation<\/a> just works &#8482;! Four hints, though:<\/p>\n<ol>\n<li>Uncomment &#8220;make menuconfig&#8221;<\/li>\n<li>Edit \/etc\/makepkg.conf and set MAKEFLAGS to &#8220;-j&lt;no-processors+1&gt;&#8221; to get parallel builds<\/li>\n<li>If you have a nvidia graphics card and use the proprietary driver, keep in mind that you have to rebuild that one, too. Once again, that was ridiculously easy. Just install nvidia-dkms before updating the kernel with pacman -U and it will be built automagically when you install the new kernel! Detailed instructions are <a href=\"https:\/\/wiki.archlinux.org\/index.php\/Dynamic_Kernel_Module_Support\">here<\/a>.<\/li>\n<li>Don&#8217;t forget to update your grub-config with grub-mkconfig before rebooting!<\/li>\n<\/ol>\n<pre>Offload RCU callbacks from CPUs: 0-11.<\/pre>\n<p>should be in dmesg after a reboot, otherwise it didn&#8217;t work. I&#8217;m waiting with baited breath how it turns out!<\/p>\n<p>Previous parts of my adventure: <a href=\"https:\/\/tollana.d-tor.org\/notes-to-self\/?p=317\">Part 1<\/a>, <a href=\"https:\/\/tollana.d-tor.org\/notes-to-self\/?p=370\">Part 2<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Take 3 Of course neither the new memory nor the BIOS update fixed the random crashes, but fortunately I&#8217;ve got another hint: Maybe it&#8217;s a linux-specific problem, and not one with memory timing. This post suggests that it&#8217;s a problem with C-States and RCU. Adding the following kernel command line parameters should help: processor.max_cstate=1 rcu_nocbs=0-11 &hellip; <a href=\"https:\/\/tollana.d-tor.org\/notes-to-self\/?p=379\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Ryzen, Part 3<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19,79,77],"tags":[],"class_list":["post-379","post","type-post","status-publish","format-standard","hentry","category-arch-linux","category-hardware","category-linux"],"_links":{"self":[{"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=\/wp\/v2\/posts\/379","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=379"}],"version-history":[{"count":4,"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=\/wp\/v2\/posts\/379\/revisions"}],"predecessor-version":[{"id":384,"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=\/wp\/v2\/posts\/379\/revisions\/384"}],"wp:attachment":[{"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=379"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=379"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tollana.d-tor.org\/notes-to-self\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=379"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}