Designing scheduled jobs

Most of the autonomous system runs as scheduled jobs (crons). Good ones are quiet, cheap, and trustworthy; bad ones spam you, drift, or fail silently. The patterns here are what separate the two.


Silent-on-no-op (the most important rule)

A scheduled job that runs every few hours should produce nothing when there’s nothing to report.

  • If it emits “nothing to do” every tick, you’ll train yourself to ignore it — and miss the tick that mattered.
  • Design the job so empty output = no notification. A job that’s silent for a week and then speaks is one you’ll actually read.

Deterministic where possible (no LLM in the loop)

The most robust scheduled jobs are plain scripts with no model:

  • no prompt-injection surface (nothing reads untrusted content into a model),
  • no nondeterminism (same input → same output),
  • no token cost, no latency. Push every job toward this shape. Reserve the LLM for jobs that genuinely need judgment (summarize a feed, draft a digest, reason about content) — and for those, the security spine’s injection guardrails are mandatory.

Incremental, not full-rescan

A steady-state job should process only what’s new or changed, not re-scan the entire back catalog every tick. Mark what you’ve handled (a state file, a label, a ledger) and skip it next time. The one-time backfill is a manual/initial run; the recurring job is cheap.

Idempotent

Running the job twice changes nothing the first run didn’t already do. This makes retries, overlaps, and manual re-runs safe. Reconcile to a desired state rather than blindly appending.

The heartbeat exception

One job type should always speak: a liveness watchdog that audits the whole fleet’s health and reports daily even when all-green. Here the absence of the green report is the alarm — if the heartbeat goes quiet, something broke the heartbeat itself. Use this sparingly (one fleet-health job), not for every cron.

Coordination & durability

  • Concurrency: if multiple jobs (or a job and a live session) can touch the same state, guard it with a lock so they don’t collide. A simple file lock with a “someone else owns this item, skip it” exit is usually enough.
  • Orphaned work: a job can do its work, commit locally, then die before pushing. Check for unpushed state at the top of the next run and flush it. Don’t assume a local commit reached the remote.
  • Script location & secrets: jobs that need a secret call a secret-isolating helper (the token never enters the job’s context). Keep the helper scripts where the scheduler can resolve them.

Failure surfacing

  • A job that exits non-zero or times out should alert, not fail silently. Silent-on-no-op is for success with nothing to do — never for errors.
  • Pair long autonomous jobs with a completion or error notification so a broken job can’t hide.

A good scheduled job, summarized

Deterministic if it can be · incremental · idempotent · silent when there’s nothing to do · loud when it errors · guarded against collisions and lost work · and, if it writes to a public surface, paired with a watchdog.


Related: the watchdog pattern · the autonomy ladder.


Back to top

This site documents Steward — an operating model for AI-assisted project maintenance. MIT licensed.