ZFS e-mail
I do not have a proper monitoring system like Prometheus & Grafana or Nagios. I use a bash script on a cron job that gets the zpool status of each pool and sends me a daily status email. I opted for a daily email over on-fail because I like the daily assurance that the script ran. If I don’t get the email, something is wrong. I check it each morning for pool state, pool status, and any errors.
#!/bin/bash
hal_status=$(ssh [email protected] "zpool status")
gamehenge_status=$(zpool status)
debianbackup_status=$(ssh [email protected] "sudo zpool status")
m4mini_status=$(ssh [email protected] "sudo zpool status")
LOG=zfs_log.txt
cat > $LOG << DMZ
=== daily zpool status ===
=== $(date '+%A, %B %d, %Y %H:%M:%S') ===
=== Hal (samizdat) ===
$hal_status
=== gaia (gamehenge) ===
$gamehenge_status
=== debian-backup (catalyst & magellan) ===
$debianbackup_status
=== mac mini (abraxas) ===
$m4mini_status
DMZ
mail -s "Daily ZFS Status - $(date +%Y-%m-%d)" [email protected] < $LOG
Quite simple, does the job.
I’m going to admit I’m not much of a programmer. The precise syntax needed and digital abstraction prevent me from fully grasping writing code fluidly, instead it’s an arduous one line at a time process for me. But I wanted to add more features to this simple script. So I did what any amateur modern programmer would do, and asked Claude. This is what he came up with.
#!/bin/bash
# ── connection targets ──────────────────────────────────────────────────────
HAL="[email protected]"
DEBIAN="[email protected]"
M4MINI="[email protected]"
LOG="/tmp/zfs_daily_$(date +%Y%m%d).log"
SNAP_COUNT=5 # how many recent snapshots to show per dataset
# ── helper: fetch recent snapshots ─────────────────────────────────────────
# Takes a command prefix (e.g. "ssh user@host sudo") and formats the last N
# snapshots per dataset as a compact, readable table sorted by creation time.
get_snapshots() {
local cmd_prefix="$1"
$cmd_prefix zfs list -t snapshot -o name,creation,used -s creation 2>/dev/null \
| awk -v n="$SNAP_COUNT" '
NR==1 { next } # skip header
{
# extract dataset name (everything before the @)
split($1, parts, "@")
ds = parts[1]
counts[ds]++
if (counts[ds] <= n) {
print
}
}
' \
| column -t
}
# ── helper: pool health summary ────────────────────────────────────────────
# Pulls just the state line so you get a quick ONLINE/DEGRADED/FAULTED
# flag at the top before the verbose status block.
get_pool_health() {
local cmd_prefix="$1"
$cmd_prefix zpool list -o name,health,size,alloc,free,cap,frag 2>/dev/null \
| column -t
}
# ── gather data ─────────────────────────────────────────────────────────────
hal_status=$(ssh $HAL "zpool status")
hal_health=$(get_pool_health "ssh $HAL")
hal_snaps=$(get_snapshots "ssh $HAL")
gamehenge_status=$(sudo zpool status)
gamehenge_health=$(get_pool_health "sudo")
gamehenge_snaps=$(get_snapshots "sudo")
debian_status=$(ssh $DEBIAN "sudo zpool status")
debian_health=$(get_pool_health "ssh $DEBIAN sudo")
debian_snaps=$(get_snapshots "ssh $DEBIAN sudo")
m4mini_status=$(ssh $M4MINI "sudo zpool status")
m4mini_health=$(get_pool_health "ssh $M4MINI sudo")
m4mini_snaps=$(get_snapshots "ssh $M4MINI sudo")
# ── detect any non-ONLINE pools across all hosts ────────────────────────────
# Useful for putting a warning banner at the top of the email subject/body.
ALL_HEALTH="$hal_health
$gamehenge_health
$debian_health
$m4mini_health"
ALERT=""
if echo "$ALL_HEALTH" | grep -qvE "(ONLINE|NAME|health)"; then
ALERT="⚠ WARNING: DEGRADED OR FAULTED POOL DETECTED — "
fi
# ── build the log ───────────────────────────────────────────────────────────
cat > "$LOG" << DMZ
=== daily zfs status ===
=== ${ALERT}$(date '+%A, %B %d, %Y %H:%M:%S') ===
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
HAL (samizdat)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
--- pool health ---
$hal_health
--- pool status ---
$hal_status
--- last $SNAP_COUNT snapshots per dataset ---
$hal_snaps
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GAIA (gamehenge)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
--- pool health ---
$gamehenge_health
--- pool status ---
$gamehenge_status
--- last $SNAP_COUNT snapshots per dataset ---
$gamehenge_snaps
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DEBIAN-BACKUP (catalyst & magellan)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
--- pool health ---
$debian_health
--- pool status ---
$debian_status
--- last $SNAP_COUNT snapshots per dataset ---
$debian_snaps
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
MAC MINI (abraxas)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
--- pool health ---
$m4mini_health
--- pool status ---
$m4mini_status
--- last $SNAP_COUNT snapshots per dataset ---
$m4mini_snaps
DMZ
# ── ship it ─────────────────────────────────────────────────────────────────
mail -s "${ALERT}Daily ZFS Status - $(date +%Y-%m-%d)" [email protected] < "$LOG"
There was some fine-tuning to be done of course. As it was showing full zpool disk capacity rather than total usable for each zpool, but it was an easy fix. I also ended up toggling the variable for snapshot listing to 1 for each dataset, easy to tell they’re all up-to-date and replicating. It feels bad missing an opportunity to learn. It still uses my base script, it just adds features I would have scratched my head over implementing and likely left half finished.