Skip to content

Debian

Notes about my coexistence with Debian.

JupyterLab

Jupyter lab setup:

La unidad de systemd es esta, tuve que poner www-data como Group para que el data de Nextcloud (symlink a /mnt/data/nextcloud_data/<YOUR USER>/files) apareciera:

# /etc/systemd/system/jupyterlab.service
[Unit]
Description=JupyterLab Service

[Service]
Type=simple
PIDFile=/run/jupyter.pid
ExecStart=/home/<YOUR USER>/.venv/bin/jupyter lab --config=/home/<YOUR USER>/.jupyter/jupyter_lab_config.py
User=<YOUR USER>
Group=www-data
WorkingDirectory=/home/<YOUR USER>/
Restart=always
RestartSec=10

[Install]
# sudo systemctl set-default multi-user.target
WantedBy=multi-user.target

NVIDIA setup for TensorFlow

I'm running this version on a headless server:

Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm

Notas:

También hay una guía oficial: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

Despues de instalar esos drivers ya no podia iniciar KDE, pero seleccionando la sesion X11 (en lugar de wayland) en la pantalla de login anduvo bien.

Algunos comandos:

echo $WAYLAND_DISPLAY
lspci | grep -i nvidia
lspci -nn | egrep -i "3d|display|vga"
# install stuff
sudo apt update
sudo apt upgrade
sudo apt install -y linux-headers-amd64
sudo apt install -y nvidia-driver firmware-misc-nonfree
sudo apt install -y nvidia-cuda-dev nvidia-cuda-toolkit
sudo apt install -y systemd-timesyncd
sudo apt install -y python3.11 python3.11-venv python-is-python3
sudo apt install -y bash-completion
# reboot now

# Para los errores que aparecen al cargar tensorflow sobre cudnn, cublas, etc.
sudo apt-get -y install nvidia-cudnn

Para instalar tensorflow:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install tensorflow[and-cuda]
pip install tensorrt

Para chequear que la GPU anda:

import tensorflow as tf
tf.config.list_physical_devices('GPU')

Output de ejemplo:

Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-12-02 18:37:01.143947: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-02 18:37:01.157205: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-02 18:37:01.161470: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-02 18:37:01.171408: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-12-02 18:37:01.987068: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1733175436.331024  111931 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1733175436.367486  111931 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1733175436.367676  111931 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Num GPUs Available:  1

Para monitorear el uso de RAM en la GPU, se puede usar nvidia-smi:

watch -n 1 nvidia-smi

Nextcloud

Setup nots for Nextcloud on Debian.

Datos

Made an fstab entry for the data:

# /etc/fstab entry for the data HDD
UUID=34870b78-32e4-4beb-96c4-0d1881d3661f   /mnt/data   ext4    users,noatime,nofail 0 0 

Setup with docker-compose

Usar docker-compose anduvo después de un poco de manoseo del archivo: docker-compose.yml.

Auto-start at boot: https://stackoverflow.com/a/53569049

cd ~/Software
# El "-d" es clave para que arranque al bootear.
sudo docker-compose up -d docker-compose.yml
sudo systemctl enable docker

Agregar al usuario al grupo www-data para que pueda acceder a los contenidos de Nextcloud.

sudo usermod -aG www-data YOUR_USER
ln -s /mnt/data/nextcloud_data/YOUR_USER/files ~/Nextcloud

Creating WebDAV mounts on the Linux command line

Esto no hace falta en realidad porque la data de Nextcloud del usuario <YOUR USER> esta montada directamente en ~/Nextcloud. No hace falta el webdav mount.

Tutorial: https://docs.nextcloud.com/server/20/user_manual/en/files/access_webdav.html#creating-webdav-mounts-on-the-linux-command-line

Instalar sudo apt install davfs2.

Decirle que si cuando pregunta si setear el "SUID bit" para que usuarios no privilegiados puedan montar WebDAVs.

Me saltié la parte de restartear jupyter porque lo estaba usando para hacer esto... Después hacer systemctl restart jupyterlab.service.

sudo usermod -aG davfs2 <YOUR USER>

Setup with AIO

Esto no funcionó porque no supe como usarlo sin HTTPS, ni como configurarlo con HTTPS pero solo con la IP (sin un dominio).

También intenté hacerle usar grinch.local pero al momento de abrir https://grinch.local tampoco andaba, porque el setup trataba de conseguir un certificado de algun lado externo.

sudo mkdir -p /mnt/data/nextcloud_data
sudo docker run \
    --init \
    --sig-proxy=false \
    --name nextcloud-aio-mastercontainer \
    --restart always \
    --publish 80:80 \
    --publish 8080:8080 \
    --publish 8443:8443 \
    --volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config \
    --volume /var/run/docker.sock:/var/run/docker.sock:ro \
    --env NEXTCLOUD_DATADIR="/mnt/data/nextcloud_data" \
    --env SKIP_DOMAIN_VALIDATION=true \
    nextcloud/all-in-one:latest
sudo docker stop nextcloud-aio-mastercontainer
sudo docker stop nextcloud-aio-domaincheck
sudo docker stop nextcloud-aio-apache nextcloud-aio-notify-push
sudo docker stop nextcloud-aio-nextcloud nextcloud-aio-imaginary nextcloud-aio-redis nextcloud-aio-database nextcloud-aio-talk nextcloud-aio-collabora
sudo docker ps --format {{.Names}}
sudo docker ps --filter "status=exited"
sudo docker container prune
sudo docker network rm nextcloud-aio
sudo docker volume ls --filter "dangling=true"
sudo docker volume prune
sudo rm -rf /mnt/data/nextcloud_data
sudo docker volume ls --format {{.Name}}