IPMI troubleshooting
This page covers IPMI detection and collection in Crucible: how it works, why "Not detected" doesn't always mean "broken", how to use glassmkr-crucible doctor ipmi to self-diagnose, and what to expect across BMC vendors.
New in Crucible 0.9.4 (2026-05-13). The
glassmkr-crucible doctor ipmisubcommand below ships with this release. Capability detection now also re-runs every hour, so installingipmitoolafter the agent started is picked up automatically without a service restart. See the 2026-05-13 changelog for the full set of 0.9.4 changes. To upgrade:sudo npm install -g @glassmkr/crucible@latest && sudo systemctl restart glassmkr-crucible.
How Crucible detects IPMI
Detection is capability-based, not vendor-allowlist. Crucible doesn't look at your BMC vendor string and decide whether to support you; it asks "can I actually talk to the BMC?" and uses the answer.
The probe chain at agent start, and on every re-check:
- Device-node check.
stat /dev/ipmi0(also/dev/ipmi/0and/dev/ipmidev/0). Permission errors here surface aspermission_denied. - ipmitool binary check.
ipmitool -V. Missing binary surfaces asno_ipmitool_binary. - Fast path. If both the device node and the binary are present, Crucible records the capability as available and stops probing.
- Sensor-probe fallback. Only used when the binary exists but the device node didn't. Runs
ipmitool sensorand inspects stderr. A "could not open device" message surfaces asno_bmc_device; other errors surface asexecution_failed.
The result is a structured detection.reason field on every snapshot, with one of four values: no_ipmitool_binary, permission_denied, no_bmc_device, or execution_failed. The Dashboard dashboard surfaces this reason under "IPMI: Not detected" so you know which fix to apply.
Crucible 0.9.4 and later re-runs detection every 12 collection cycles (one hour at the default 5-minute interval). If you install ipmitool or load the kernel modules after the agent started, the next hour's re-check picks the change up automatically. No restart required.
Detection vs collection: they can disagree, by design
It is normal for the Dashboard dashboard to report "IPMI: Not detected" on a host where some hardware metrics still show up. This is not a bug: detection and collection use different data sources.
The header IPMI verdict reflects Crucible's BMC probe. The dashboard's CPU temperature, fan, and ECC blocks can also be populated from non-BMC sources:
- CPU temperature often comes from
hwmon(kernel-side, no BMC needed) orlm-sensors. - ECC counters can come from kernel EDAC (
/sys/devices/system/edac/mc/mc*/{ce,ue}_count) on systems where the BIOS exposes them, completely separate from the BMC. - SMART, RAID, network, disk usage are kernel-side and don't depend on IPMI at all.
Before Crucible 0.9.4, the agent emitted stub ecc_errors: {correctable: 0, uncorrectable: 0} whenever it couldn't probe IPMI. The dashboard rendered those stubs the same way it would render real "0 errors observed" measurements, which produced a confusing "Not detected (header) + ECC: 0 / 0 (body)" combination. Crucible 0.9.4 emits null instead of stub zeros when detection fails, and the Dashboard dashboard now renders that as no signal (BMC not probed).
Self-diagnose with glassmkr-crucible doctor ipmi
The doctor subcommand (Crucible 0.9.4+) runs the same probes the agent uses and prints actionable guidance for each failure mode. It is read-only; it does not modify system state.
sudo glassmkr-crucible doctor ipmi
The available case looks like:
IPMI capability check: Result: [OK] IPMI detected via ipmitool_in_band ipmitool: 1.8.19 Crucible will collect: - Sensor readings (temperature, fan, voltage, power) - SEL events (recent + cumulative ECC counters) - PSU redundancy state (per-PSU + aggregate)
Failure cases print the matching detection.reason plus a fix recipe.
no_ipmitool_binary
Meaning: the /dev/ipmi0 device exists, but ipmitool is not installed.
Fix: install the package:
- Debian / Ubuntu:
sudo apt install ipmitool - RHEL / Rocky / Alma:
sudo dnf install ipmitool - Arch:
sudo pacman -S ipmitool - Alpine:
sudo apk add ipmitool
No restart needed. The next collection cycle (within ~5 minutes) sees the binary, and the next hourly re-check flips the snapshot's detection.available to true. The Dashboard dashboard updates on the following ingest.
permission_denied
Meaning: Crucible cannot open /dev/ipmi0. The device node is mode 0600 owned by root.
Fix: ensure the systemd unit runs as root:
systemctl cat glassmkr-crucible | grep '^User='
The default unit has no User= directive, which means root. If you customised the unit to run as a non-root user, either revert to root, or add the user to the right group and adjust the udev rule on /dev/ipmi0 (group ownership varies by distribution).
no_bmc_device
Meaning: ipmitool is installed and runs, but the kernel has no IPMI device node and the in-band ipmitool probe could not open one. Usually the kernel modules aren't loaded.
Fix:
sudo modprobe ipmi_si ipmi_devintf ipmi_msghandler ls -l /dev/ipmi0 # should appear after the modules load
If /dev/ipmi0 still does not appear, the host may genuinely not have a BMC. This is common on consumer hardware, Raspberry Pi, laptops, and virtual machines without IPMI passthrough. In that case set collection.ipmi: false in /etc/glassmkr/collector.yaml to silence the snapshot field; the dashboard will stop trying to render IPMI for this host.
execution_failed
Meaning: ipmitool ran, but the call returned an error other than "could not open device". The BMC is reachable in some sense but not responding the way Crucible expected.
Fix: reproduce by hand and read the error:
sudo ipmitool mc info
Common causes:
- The BMC is in a degraded state and dropped the request. Retry; if it persists, escalate via the support path below.
- The in-band interface (KCS or SSIF) is busy. Sustained busy state usually means firmware is mid-task; wait a few minutes and retry.
- The installed ipmitool is too old for the BMC's IPMI 2.0 dialect. Upgrade
ipmitoolvia the distribution package manager.
Do not run sudo ipmitool mc reset cold without first confirming with your hardware vendor. Some BMCs do not recover cleanly from a cold reset and hang past the operation, which on a remote machine is much worse than the original failure.
Per-vendor notes
Crucible's detection is capability-based, so any BMC that responds to standard IPMI 2.0 commands works. These notes are vendor-specific quirks observed on real hardware, not detection-gating rules.
Supermicro
Usually clean. The BMC reports vendor strings cleanly via ipmitool mc info (Manufacturer Name: Supermicro or Super Micro Computer Inc.). PSU sensors typically appear as PS1 Status / PS2 Status with the discrete-state bitmask in the Reading column.
Gigabyte
The BMC sometimes reports Manufacturer Name: Unknown (0x3C0A) in ipmitool mc info output, even though the IANA manufacturer ID (15370) resolves to Gigabyte. This is a Gigabyte BMC firmware quirk; Crucible doesn't gate detection on the manufacturer string so no customer action is needed. PSU sensors typically appear as PS1_Status with an underscore separator.
ASUS
Validated on RS700-E10-RS4U. Detection works correctly when ipmitool is installed; this vendor's most common issue is that Linux distributions occasionally ship without ipmitool by default, which surfaces as no_ipmitool_binary in the doctor output. Install via the per-distro command above.
ASRockRack
DMI sys_vendor may read "To Be Filled By O.E.M." on some boards (a known firmware default), but the BMC itself reports vendor cleanly via ipmitool mc info (Manufacturer Name: ASRock Rack Incorporation). PSU sensors appear as PSU1 Status / PSU2 Status.
Dell PowerEdge (iDRAC)
In-band IPMI through iDRAC works without an iDRAC Enterprise licence. The licence gates out-of-band IPMI over LAN, not the in-band KCS path Crucible uses. PSU sensors appear as PS1 Status / PS2 Status, and iDRAC also exposes an aggregate PS Redundancy sensor which Crucible reads for whole-pair redundancy state.
Dell iDRAC compatibility has not been validated on real hardware in our validation fleet. If you hit a detection or collection issue specific to iDRAC, please file a support request with the output of sudo ipmitool mc info and sudo glassmkr-crucible doctor ipmi.
HP ProLiant (iLO)
In-band IPMI via KCS usually works without an iLO Advanced licence. The licence gates out-of-band iLO features, not in-band IPMI. Some older iLO firmware revisions require ipmitool 1.8.18 or later for IPMI 2.0 compatibility.
HP iLO compatibility has not been validated on real hardware in our validation fleet. Same support-request convention as Dell above.
A note on PSU monitoring
Crucible 0.9.4 closed two PSU monitoring bugs that affected most BMC vendors silently. The isPsuSensor filter was previously narrow and missed Supermicro / Gigabyte / ASRockRack naming conventions; the per-PSU rule path also only matched text strings like "fail" / "absent" and the short status codes cr / nr, missing every BMC that reports discrete states as IPMI spec hex bitmasks in the Reading column. Both are now fixed and unit-tested.
If you have a multi-PSU box that previously showed two healthy PSUs in the dashboard but one was actually failed or unplugged, that is the bug shape. Crucible 0.9.4 catches it.
When to file a support request
Email [email protected] with the following when:
- Your BMC vendor is not in the supported list above, and detection works (the
doctoroutput shows[OK]) but a specific collection path (sensors, SEL, PSU) returns unexpected values. - Detection fails (
doctoroutput shows[FAIL]) butsudo ipmitool mc infoworks fine when you run it interactively. - The
doctorsubcommand returnsexecution_failedwith an error message not covered in the section above.
Attach:
- The doctor output:
sudo glassmkr-crucible doctor ipmi 2>&1 - A successful raw probe:
sudo ipmitool mc info 2>&1 - One hour of agent logs:
sudo journalctl -u glassmkr-crucible --since "1 hour ago" --no-pager > crucible.log - Your server ID from the Dashboard dashboard.