Commit graph

5 commits

Author SHA1 Message Date
064a9a05dc monitoring: send alert emails for failed services
this idea is based on
 https://utcc.utoronto.ca/~cks/space/blog/linux/SystemdTimersMailNotes
and the therein linked
 https://wiki.archlinux.org/title/Systemd/Timers#MAILTO
but using a top-level systemd override to send such alerts for all
service units on parsons, not just timers. Tested by sending SIGKILL to
monit a couple times & receiving emails.

We might now get two emails for some failing units, or possibly even
three! (if is-system-running is false, caused by a service unit failure,
and monit also notices the service not running). On the other hand, we
now also get emails if monit fails.
2025-02-01 17:00:03 +01:00
cabc8706a3 hotfix: set monit onlyoffice (re)start to config 2024-06-19 21:01:03 +02:00
efadc5ada9 monit: increase delay for deployed-commit-on-main
there's little point in having it alert while people are working on the
config & test-deploying things; it's meant to remind later, in case we
forget committing the result.
2024-05-08 14:33:14 +02:00
8c3d3bf6db monitoring: warn if no deploy for 10 days
this is not entirely accurate — the lastModified attribute of a flake's
self-input gives the date of the last commit, not the last deploy. But I
figure it's close enough and less obscure to check than reading in the
last date via nix-env.

inspired by: we did no server updates for two weeks.
2024-05-02 22:33:47 +02:00
d20acbfe58 monit: a couple new checks
move the monit config out of mail.nix, and add two checks:
 - has any systemd unit failed?
 - is the currently deployed commit the tip of the main branch of
   haccfiles?
2024-04-07 16:30:57 +02:00