A watchdog device, assisted by a watchdog application, monitors the server to ensure it is active and healthy. Every 30 seconds (though this interval can be adjusted), the daemon checks if everything is functioning correctly. If it is, that’s fine; if not, the watchdog device can perform certain actions. In my case, I usually request the device to execute a hard reboot of the server to restore its reliability
Enable the Intel TCO watchdog timer by running:
echo "WATCHDOG_MODULE=iTCO_wdt" | sudo tee -a /etc/default/pve-ha-managerReboot your Proxmox server.
reboot nowYou can verify which timer is being used with wdctl.
It will output something similar to:
Device: /dev/watchdog0
Identity: iTCO_wdt [version 0]
Timeout: 10 seconds
Timeleft: 10 secondsYou can test the watchdog timer using:
echo c > /proc/sysrq-triggerThe system should hang and reboot.