ARM Linux Server
Important Notes
- Welcome message is in file
/etc/motd, and ASCII art looks great in there. - OS choices
- Debian
- Fedora
- Arch
- Ubuntu (May even damage firmware)
Useful Software
Desktop Environments
- Xfce is one of the best and easy to configure desktop environment.
- To install it
sudo apt install xfce4 - to start it
startxfce4from tty terminal.
Fixes
SLURM GPU segregation fix
Slurm uses nvml to detect the number of gpus, segregate them during running, and this api was not working for my case.
- Find the library location
sudo find / -name "libnvidia-ml.so", for my case this was/usr/local/cuda-12.2/targets/sbsa-linux/lib/stubs/libnvidia-ml.so - Copy this to
/usr/lib - Add header location to
/etc/profile, for my case it wasexport C_INCLUDE_PATH="/usr/local/cuda-12.2/targets/sbsa-linux/include:$C_INCLUDE_PATH" export CPLUS_INCLUDE_PATH="/usr/local/cuda-12.2/targets/sbsa-linux/include:$CPLUS_INCLUDE_PATH" - By default
/usr/libshould be in ldconfig still one can add that in/etc/ld.so.conf.d . sudo ldconfig - Also here is a c code to check nvml, use
gcc nvml.c -lnvidia-ml; ./a.out. If it outputs the correct number of gpu, I guess everything should be configured.
#include <stdio.h>
#include <nvml.h>
int main() {
nvmlReturn_t result;
unsigned int device_count = 0;
// Initialize NVML library
result = nvmlInit();
if (NVML_SUCCESS != result) {
printf("Failed to initialize NVML: %s\n", nvmlErrorString(result));
return 1;
}
// Get the number of GPUs
result = nvmlDeviceGetCount(&device_count);
if (NVML_SUCCESS != result) {
printf("Failed to get device count: %s\n", nvmlErrorString(result));
nvmlShutdown();
return 1;
}
printf("Number of GPUs detected: %u\n", device_count);
// Clean up NVML library
result = nvmlShutdown();
if (NVML_SUCCESS != result) {
printf("Failed to shutdown NVML: %s\n", nvmlErrorString(result));
return 1;
}
return 0;
}
Nvidia kernel stuff
- To rebuild the kernel simply
sudo akmods --force - Logs should be in
/var/cache/akmods/nvidia/{Kernel Version} - Check kernel version with
uname -srm
Bad ideas
- Never change /usr/bin/gcc to /usr/bin/gcc-13 from default gcc-14 (or whatever your version is). Same for g++. It breaks core system files including nvidia drivers akmod.
Enjoy Reading This Article?
Here are some more articles you might like to read next: