Current State
200 Servers, 10 Engineers
Team Composition:
Team composition:
4 Senior SREs ($180K fully-loaded each)
4 Mid-level DevOps ($150K fully-loaded each)
2 Junior engineers ($120K fully-loaded each)
Total: 10 engineers, $1.56M/year
Server-to-engineer ratio:
200 servers ÷ 10 engineers = 20:1
Industry benchmark: 50:1 (you're overstaffed for this size)
Time Allocation:
Time allocation (typical):
70% firefighting (incident response, manual fixes)
20% planned maintenance (patches, upgrades)
10% strategic projects (new features, optimization)
Operational capacity:
Incidents per month: 240 (12 per engineer)
Time per incident: 45 minutes average
Monthly firefighting: 180 hours per engineer
Actual capacity: Fully utilized, no headroom
What This Looks Like Daily:
Typical week for an SRE:
Monday: 3 incidents (disk full, pod restart, connection reset) = 2.25 hours
Tuesday: 4 incidents = 3 hours
Wednesday: 2 incidents + planned patch maintenance = 4.5 hours
Thursday: 3 incidents + on-call prep = 3 hours
Friday: 2 incidents + incident review meeting = 2.5 hours
Total: 14 incidents/week = 10.5 hours firefighting
Monthly:
60 incidents × 45 min = 45 hours firefighting
30 hours planned maintenance
15 hours meetings (stand-ups, planning, retros)
70 hours strategic work (if lucky)
Reality: 160 hours/month - 75 hours toil = 85 hours for everything else