It has never been easier to automate an operations task, and never been easier to automate a small mistake into a large one. The interesting work is no longer writing the automation — it is deciding what to hand over, and engineering it so a wrong move is cheap to undo.
Automate the toil, not the judgement
The best candidates for automation are the tasks that are frequent, well-understood, and tedious: rotating credentials, scaling a fleet, patching a known CVE, reconciling a config drift. The worst candidates are the judgement calls — the moments that need context a model does not have. We automate the toil and keep humans in the decisions that carry real, irreversible consequence.
Every action needs an undo
Automation that can only move forward is a liability dressed as efficiency. Before an automated action ships, we ask how it is reversed and how fast. A change with a one-command rollback can run unattended; one with no clean path back stays gated behind a human until it earns more trust.
Observability is the prerequisite, not the reward
You cannot safely automate what you cannot see. Before we let a system act on its own, three things have to be true:
- The signals that justify the action are measured, not assumed
- Every automated action is logged with the reasoning behind it
- There is an alert for the automation misbehaving, not just the system it manages
Start where the blast radius is small
New automation — and especially anything with a model in the loop — proves itself on low-stakes work first. It runs in suggestion mode, then in supervised mode, then unattended, and only widens its scope once the evidence is boring. Trust is earned in production, in small increments, exactly the way we treat a migration wave.
Automation should let your best engineers spend their attention where it matters. That only happens if the automation is trustworthy enough to ignore — which is a higher bar than making it work.
Begin a conversation → about the systems you depend on.