kubernetes
Kubernetes OOMKill Auto-Recovery
Detect out-of-memory kills, analyze resource usage, increase limits, restart pod
Intermediate
Steps: 5
PLAYBOOK LIBRARY
50+ tested YAML playbooks for Kubernetes, databases, storage, network, and security incidents. Download, customize, deploy. No vendor lock-in β these are yours to keep.
50+
Production-ready playbooks
100%
Open YAML format (no lock-in)
FREE
Download with email (lead capture)
50 results
kubernetes
Detect out-of-memory kills, analyze resource usage, increase limits, restart pod
Intermediate
Steps: 5
kubernetes
Identify crash-looping pods, analyze logs, adjust resource limits or restart with backoff
Intermediate
Steps: 4
kubernetes
Gracefully evict pods from node before maintenance window, verify workload migration
Advanced
Steps: 7
kubernetes
Adjust HPA thresholds based on load patterns, prevent thrashing
Intermediate
Steps: 3
kubernetes
Update ConfigMap, trigger rolling restart of dependent deployments
Beginner
Steps: 3
kubernetes
Detect near-full PVCs, request expansion, verify filesystem resize
Advanced
Steps: 5
kubernetes
Detect expiring certs, request renewal from cert-manager, verify
Intermediate
Steps: 4
kubernetes
Restart StatefulSet pods in order (0β1β2), verify readiness between restarts
Advanced
Steps: 6
kubernetes
Detect pod communication failures, analyze network policies, suggest fixes
Advanced
Steps: 4
kubernetes
Detect image pull failures, retry with auth refresh, fallback to previous version
Intermediate
Steps: 3
kubernetes
Detect service with zero endpoints, restart backing pods, verify connectivity
Beginner
Steps: 4
kubernetes
Detect quota exhaustion, analyze usage patterns, request quota increase
Intermediate
Steps: 3
databases
Detect exhausted connection pool, gracefully reset connections, verify throughput
Beginner
Steps: 3
databases
Identify queries exceeding threshold, kill safely, notify query owner
Intermediate
Steps: 4
databases
Detect replication lag, pause writes if needed, verify catch-up, resume
Advanced
Steps: 6
databases
Identify deadlocked transactions, kill blocking queries, retry if safe
Advanced
Steps: 5
databases
Detect bloat, trigger manual vacuum, adjust autovacuum settings
Intermediate
Steps: 4
databases
Detect bloated or corrupted indexes, rebuild concurrently, verify performance
Advanced
Steps: 5
databases
Detect stale cache, flush query cache, verify fresh results
Beginner
Steps: 2
databases
Post-failover: verify replication, check data consistency, update app configs
Advanced
Steps: 7
storage
Clear temp files >7 days, rotate logs, verify disk space >20% free
Beginner
Steps: 3
storage
Rotate large log files, compress archives, maintain retention policy
Beginner
Steps: 2
storage
Identify directories with excessive small files, clean up, verify inode availability
Intermediate
Steps: 4
storage
Detect stale NFS mount, unmount safely, remount with verification
Intermediate
Steps: 5
storage
Request EBS volume resize, extend filesystem, verify new size
Intermediate
Steps: 4
storage
Identify old objects, archive to Glacier or delete, verify cost reduction
Beginner
Steps: 3
storage
Create EBS/GCP disk snapshot before risky operations, verify completion
Beginner
Steps: 2
storage
Check RAID status, rebuild degraded arrays, notify on disk failure
Advanced
Steps: 6
storage
Verify recent backup completion, test restore, alert on failures
Intermediate
Steps: 4
storage
Identify over-quota users, clean temp files, notify users, enforce limits
Intermediate
Steps: 5
network
Clear DNS cache, verify resolution, update /etc/hosts if needed
Beginner
Steps: 2
network
Detect failed health checks, restart unhealthy backends, re-register
Intermediate
Steps: 4
network
Detect blocked traffic, add temporary allow rule, verify connectivity, schedule review
Advanced
Steps: 5
network
Run traceroute, identify bottleneck, suggest routing changes
Intermediate
Steps: 3
network
Detect connection exhaustion, increase ulimit, tune kernel parameters
Intermediate
Steps: 3
network
Detect NAT gateway failure, update route tables to backup gateway
Advanced
Steps: 4
network
Detect VPN tunnel down, restart IPsec/OpenVPN, verify connectivity
Intermediate
Steps: 3
security
Detect expiring certificates (30 days), request renewal, install, reload services
Intermediate
Steps: 4
security
Rotate API keys/passwords, update configs, restart affected services
Advanced
Steps: 6
security
Detect excessive failed logins, block IP temporarily, notify security team
Intermediate
Steps: 3
security
Run ClamAV scan on uploads directory, quarantine threats, notify admins
Intermediate
Steps: 4
security
Generate new SSH keys, update authorized_keys, revoke old keys, verify access
Advanced
Steps: 5
security
Identify overly permissive rules (0.0.0.0/0), suggest restrictions, update if approved
Advanced
Steps: 4
custom
Example: Adjust rate limits based on traffic patterns
Advanced
Steps: Variable
custom
Example: Orchestrate multi-step deployment with rollback
Advanced
Steps: 12+
custom
Example: Detect and stop idle EC2/GCP instances during off-hours
Intermediate
Steps: 5
custom
Example: Restart failed Airflow/data pipeline tasks
Advanced
Steps: Variable
custom
Example: Automatically provision new tenant infrastructure
Advanced
Steps: 10+
custom
Example: Run security/compliance scans, generate report
Advanced
Steps: Variable
custom
Blank YAML template with examples and documentation
All levels
Steps: Your choice
Pre-built bundles for common use cases. Download all at once.
12 PLAYBOOKS
Complete set of K8s remediation playbooks for autonomous Kubernetes management.
8 PLAYBOOKS
Production-tested PostgreSQL and MySQL playbooks for reliability and recovery.
6 PLAYBOOKS
Automated security playbooks for certificate management, secret rotation, and compliance operations.
Write your own playbook in 10-30 minutes using our template.
Every playbook is available on GitHub under MIT license. Fork, customize, contribute back β no lock-in, ever.
Yes. All playbooks are free to download and use forever.
No. They are standard YAML files usable with existing tools.
Absolutely. Every playbook is editable YAML with no vendor lock-in.
We add 2-4 new playbooks monthly based on user requests.
Start with free playbooks for any automation tool. When you're ready, activate autonomous execution + compliance logging with SentienGuard Free.
8,743 infrastructure engineers have downloaded these playbooks. Join them.