Tuesday, October 12, 2021

Tuning Microsoft Windows for CPU Cryptocurrency Mining

This post covered setup and basic configuration of a Monero crypto miner on Microsoft Windows.

The PC had 16GB of RAM, and rarely used more than 9GB of RAM. In the Performance tab of Task Manager, under Memory, it shows the amount of RAM in use and the amount compressed. RAM compression uses a small (and typically unnoticeable amount of CPU to compress RAM pages which are not being used. While compressed RAM pages can't actually be used by a process when they are compressed, the idea is that by compressing RAM it may avoid swapping RAM to disk in a low-memory situation. Certainly reading swapped pages from disk should be considerably slower than  using CPU to uncompress RAM pages which are already in memory. For anyone using a machine which is not using all of the RAM, this is an optimization looking for a problem - if the machine is not swapping, then there is no RAM slowdown due to swapping. Compressing RAM when there is not a RAM shortage slightly increases CPU use. While the small additional CPU use may not be noticeable on many workloads, it can be noticeable and detrimental to do this speculative RAM optimization when the CPU is already near 100% usage, as is typical when mining a cryptocurrency on CPUs.


With the machine not in danger of swapping in-memory RAM pages to disk, let's turn off RAM compression. In the worst case if a process takes more RAM than the installed 16GB, some pages will be swapped to disk as has been standard for decades.

In this example with the Monero miner and a web browser and other applications, Task Manager shows the machine had 5.2GB available and 1.6GB compressed of 15.9GB total RAM. Even if the compressed RAM is uncompressed at triple the compressed size, everything should still fit into the real 15.9GB of RAM.


Let's look at the current RAM compression setting. Right-click on the Start Button and choose "Windows PowerShell (Admin)". Then in PowerShell run this command: Get-MMAgent

This shows "MemoryCompression" is set to True. This confirms what can be seen in the Task Manager, that some amount of the memory was "Compressed". Get some statistics with: Get-Process -Name "Memory Compression"

Disable RAM compression with: Disable-MMAgent -MemoryCompression

Reboot so the change will take effect. After reboot, use PowerShell to check that the setting is False with: Get-MMAgent

Run your typical processes, and also run the Monero crypto miner. Look at Task Manager. In this example, Task Manager shows about 7GB of RAM available without using compression. Note that "(Compressed)" is zero.


Presumably the Microsoft Windows 10 developers put considerable effort into optimizing the RAM compression code. When a machine is pushed to limits such as nearly 100% CPU, there may be noticeable performance differences when turning off the speculative RAM compression. In general use, we should expect to find the changes basically unnoticeable from a user interface standpoint. On CPU-bound processes like crypto mining we might be able to see a performance change. I was surprised to find a consistent increase in Monero mining performance (measured via hashes per second) of over 15%. The peak hash per second increased over 20%.

While this dramatic increase was unexpected, it shows the dangers of premature optimization ("let's do RAM compression just because it might be good"). It also shows the benefit of benchmarking before and after a change. Both before and after the change the "huge pages" mining feature was enabled.


October 17 update on Monero miner tuning - it makes a small difference by forcing the miner to avoid using CPU core 0. On the machine in this example, there are four CPU cores. The mining process consistently starts three processes, which leaves one of the cores unused. The unused core was consistently core 3, and cores 0 - 2 were consistently pegged at 100% CPU usage. On a hunch that hardware interrupts may still be serviced by core 0, I used Task Manager to set processor affinity to cores 1 - 3. This had the effect of moving the miner process off core 0.

Task Manager shows a process named "System interrupts". This is more of an indication of interrupts rather than a real process. If a hardware driver is misbehaving, this process could indicate greater than zero CPU usage. In a properly functioning system, interrupts should have nearly zero CPU usage. It is possible that even with low CPU usage, interrupts are causing miner code removal from the core along with possible small CPU cache issues.


The Performance Monitor application allowed viewing of percentage interrupt time. Right-click and choose "Add counters" then look for "Processor information" to find "% Interrupt Time". The first counter in the following graph is the overall interrupt counter and one counter for each of the four CPU cores. The scale on the graph is increased from 1 to 100 to highlight the differences. CPU core zero (colored red) frequently spikes. The overall interrupt (color blue) also spikes though is tempered by the other CPUs. The lines for CPUs 1 - 3 are flatlined against the bottom of the graph. This shows interrupts are being serviced by CPU 0. Therefore, removing the mining process from CPU 0 should provide smoother process execution as measured by better program throughput.


Looking at the hashes per second of the monero miner process, the high-water mark increased about 6% and a day the average hashes per second increased about 3%. On a mining process, a 3% increase in performance at no hardware cost is a nice improvement. A more precise performance measurement would need a dedicated machine running without other programs and measured for several days.

The following line has been added to the command file to set the processor affinity when the miner is started:
start /BELOWNORMAL /affinity E /b E:\tech\crypto\monero\GUIs\xmrig-6.15.1-msvc-win64\xmrig-6.15.1\xmrig.exe -o pool.supportxmr.com:443 -u 44YkNzrdJnmAqwoTRMPV2e4fihv3AfZ5vMmrtaWCGQaxKuXp5eiZeRREzhPi9tVqGuA3VBRriNwKa2QxmZmRGmHVTkUSTMc -k --tls -p runDemoA

While affinity "E" works for this four-core system, it depends on your CPU core count along with available hyper-threads so please view this post to determine the binary number. Then view this website to convert the binary number to hexadecimal to use with the "affinity" flag of the "start" command.

Affinity "E" will put threads on CPU cores 1 and 2 and 3, so use this for a process with three threads - using this with 1 or 2 threads will typically shift the CPU core over time, which can be seen in Task Manager cpu view. Affinity "A" would be two threads on CPU cores 1 and 3. Please verify the number of threads in the program, and match that to specific CPU cores by using the correct hexadecimal value which you can find in the prior links.


Tuning update for CPU hyper-threading:

Hyper-threading allows a CPU core to support, typically, two separate threads. The idea is that usually threads do not use the maximum available CPU on a core (not completely CPU bound) because program code often waits for network or disk or user interaction. During that time of waiting, the other thread can run on the CPU core. If the CPU core is already 100% consumed by a thread then adding another thread does not increase throughput.

BIOS settings can often turn off CPU hyper-threading, though this may not be available in some laptop BIOS. Using BIOS to turn off and on hyper-threading, the miner ran about 2.5% better with hyper-threading off. Throughput went from 826 hashes per second to 846 hashes per second. This was tested on two separate machines of the same make and model, with similar performance improvements on both machines.


This page has instructions on becoming a MS Windows user with Admin privilege to set the "lock pages" feature, which is also known as huge pages. To summarize, run the Local Security Policy editor, choose Local Policies, choose User Rights Assignment, choose Lock Pages In Memory, and double-click and add the user.





No comments:

Post a Comment