Caisey Blog

Mac admins · May 20, 2026

Why launchd details matter for Mac remote troubleshooting agents

Mac remote troubleshooting agents need proper launchd configuration for reliable startup, bootstrap context, and service recovery. Learn what MSPs should verify.
macOSlaunchdendpoint agentsMSPservice recovery

Mac endpoints don't reboot often—until they do. And when a remote troubleshooting agent fails to come back after a system update, a kernel panic recovery, or a simple restart, the gap in coverage isn't just inconvenient. For an MSP managing dozens or hundreds of Macs, it's a ticket that shouldn't exist and a technician pulled away from real work.

The difference between an agent that survives these events and one that quietly disappears often comes down to launchd specifics that are easy to overlook during initial packaging. Caisey's Mac runtime is designed around these realities, but the principles apply whether you're evaluating a tool or maintaining your own.

The bootstrap context problem

launchd operates in multiple bootstrap namespaces on macOS. A remote agent installed at the user level—scoped to a single login session—dies when that user logs out or the session ends. For MSP use cases, that's almost always the wrong choice. The agent needs to run in the system bootstrap domain, which persists across user sessions and remains active even when no one is logged in.

Getting this wrong means your "installed" endpoint becomes unreachable the moment the primary user restarts and doesn't immediately log back in. Worse, the failure is silent. The management console still shows the machine as enrolled, but there's no runtime to coordinate with. Caisey handles this by targeting the system domain directly, with the installer verifying bootstrap context before reporting success.

Startup type and keep-alive semantics

A launchd plist has two jobs: start the service and keep it running. The KeepAlive key seems straightforward, but its behavior varies significantly based on configuration. A boolean true restarts the service unconditionally, which can mask real crashes by respawning too aggressively. A dictionary with SuccessfulExit set to false is more surgical—only restarting after abnormal termination—but may not catch all failure modes you care about.

For remote troubleshooting agents, the right choice depends on what "failure" means. A clean exit from a self-update process shouldn't trigger immediate respawn. A crash from a network subsystem failure probably should. Caisey's launchd configuration uses conditional keep-alive tied to exit codes, with explicit handling for update-in-progress states to avoid restart loops during version transitions.

Throttle intervals and crash mitigation

macOS will back off from respawning a service that exits too quickly. The default throttle interval is 10 seconds, but aggressive crashes can push a service into a longer penalty window. Without careful plist configuration, a transient startup failure—say, a network race during boot—can leave the agent throttled for minutes or hours.

The ThrottleInterval key controls this directly. Setting it explicitly, rather than accepting defaults, lets you balance recovery speed against system protection. More importantly, pairing it with StandardOutPath and StandardErrorPath gives you somewhere to look when the agent doesn't behave. Caisey's installer creates these paths predictably, so technicians checking a machine locally know exactly where to find startup logs without hunting through unified logging filters.

Service recovery after system updates

macOS updates sometimes reset permissions or move paths in ways that break existing launchd jobs. The agent binary may still exist, but its code signature requirements or TCC entitlements no longer match what the system expects after a point release.

A robust installer doesn't just drop files—it verifies that the loaded job matches the current system state. Caisey's Mac runtime includes a bootstrap check on startup that validates its own launchd registration and re-registers if the job is missing or modified. This self-healing step catches the common case where an update clears user-installed launch agents but preserves system-level daemons—except when the daemon's plist was stored in a location that macOS now treats differently.

What to verify in your own deployments

If you're packaging or maintaining a Mac remote agent, these checks catch the launchd issues that produce silent failures:

  • Confirm the job loads in the system bootstrap domain, not user or session scope
  • Verify KeepAlive behavior against your actual exit scenarios, not just documentation
  • Set explicit ThrottleInterval values and log paths
  • Test behavior across logout, restart, and update sequences—not just initial install
  • Include a startup self-check that validates launchd registration

The last point is particularly important. An agent that can inspect and repair its own launchd context turns a potential site visit into a non-event. For distributed Mac fleets, that's the difference between manageable scale and constant firefighting.

Why this matters for remote troubleshooting specifically

Screen sharing tools don't typically face these constraints—they're invoked on demand by a logged-in user. But headless troubleshooting agents run continuously, coordinate through cloud control planes, and need to be present before a problem occurs. Their launchd configuration is infrastructure, not convenience.

Caisey's approach treats the Mac runtime as a system service with operational requirements: it must start reliably, report its state clearly, and recover without manual intervention. The launchd details aren't implementation trivia—they're the mechanism that makes remote troubleshooting practical at scale across Mac endpoints that may go weeks between direct admin contact.