RAG Intelligence

Semantic Playbook Matching Beats Brittle If/Then Rules

1536-dimension vector embeddings match incidents to remediation strategies based on context, not keyword matching. Confidence scoring determines autonomous execution vs human approval. <165ms selection latency from detection to playbook dispatch.

Start Free (3 Nodes)View Playbook Library →

165msPlaybook selection latencyRAG semantic search

0.94Average confidence scoreHigh-certainty matches

1536Embedding dimensionsOpenAI text-embedding-3-large

Why If/Then Logic Breaks at Scale

Traditional monitoring uses rules engines. They optimize for speed (simple if/then checks) but sacrifice accuracy. RAG inverts this: slightly slower selection (165ms vs 10ms) but dramatically higher accuracy (94% vs 60% correct playbook).

Brittle Rule Example

# Brittle rule example
if metric == "disk_usage" AND value > 85%:
    execute playbook "disk_cleanup"

Example Failure Cascade

3:00 AM - Disk usage 86% on log-aggregator-03
3:00 AM - Rule matches: disk_usage > 85%
3:00 AM - Executes: disk_cleanup_temp_files playbook
3:02 AM - Playbook deletes /tmp (empty, no space freed)
3:02 AM - Disk still 86% (real cause: log rotation failed)
3:02 AM - Alert re-fires
3:02 AM - Rule matches again, executes same playbook
3:04 AM - Infinite loop until human intervenes

Context-blind matching

Rule: If disk >85%, clean temp files

Reality: Production database vs dev server vs log aggregator

Problem: Same threshold, different root causes, wrong fix

Keyword dependency

Rule: Rule matches "disk_usage" exactly

Reality: Misses: "filesystem_full", "storage_capacity", "volume_usage"

Problem: Synonyms break matching

Maintenance nightmare

Rule: 500 servers × 20 metrics = 10,000 potential rules

Reality: Every new service = new rules. Every threshold change = rule update.

Problem: Rules multiply faster than humans can maintain

No learning

Rule: Rule executes same playbook forever

Reality: Never learns: "This playbook failed 5 times on this host type"

Problem: Repeats failures, no improvement

Binary decisions

Rule: Match or no match (0% or 100% confidence)

Reality: Incidents have nuance

Problem: Can't express "probably this playbook, but verify first"

Retrieval-Augmented Generation Pipeline

Four stages from raw incident data to playbook selection. Total pipeline latency: 50ms + 100ms + 15ms + 10ms = 175ms typical (target: <165ms).

Incident Embedding

<50ms

Incident data converted to natural language description
Passed to OpenAI embedding model (text-embedding-3-large)
Output: 1536-dimension vector representing incident semantics

Input: Incident Data

{
  "host": "prod-db-03.us-east-1",
  "metric": "disk_usage",
  "value": 91.4,
  "baseline": 68.2,
  "deviation": 4.8,
  "environment": "production",
  "service": "postgresql",
  "time": "2026-02-10T14:35:42Z"
}

Natural Language Conversion

"Production PostgreSQL database server prod-db-03 in us-east-1
experiencing disk usage anomaly: 91.4% observed, 68.2% expected,
4.8 standard deviations above baseline at 2:35 PM on Tuesday."

Embedding Output (1536-dim)

[0.023, -0.891, 0.445, ..., 0.129]  // 1536 numbers

Semantic Search

<100ms

Incident vector compared to all playbook vectors in library
Cosine similarity calculated (measures angle between vectors)
Top 5 most similar playbooks retrieved
Library: 50+ pre-built + unlimited custom playbooks

Similarity Calculation

similarity = cosine(incident_vector, playbook_vector)
           = dot_product(A, B) / (magnitude(A) * magnitude(B))
           = 0.94  // Higher = more similar

Top 5 Results

1. disk_cleanup_prod_db        (similarity: 0.94)
2. disk_cleanup_general        (similarity: 0.87)
3. log_rotation_postgres       (similarity: 0.82)
4. database_vacuum             (similarity: 0.76)
5. filesystem_expansion        (similarity: 0.71)

Context Filtering

<15ms

Top 5 candidates filtered by metadata constraints
Host type, environment, time-of-day, historical success
Incompatible playbooks removed before scoring

Filter Constraints

# Playbook metadata
playbook: disk_cleanup_prod_db
constraints:
  host_pattern: "*.db.*"        # Must match database servers
  service: "postgresql"         # Must be PostgreSQL
  environment: ["production"]   # Production only

Filtering Results

✓

disk_cleanup_prod_db

host=prod-db-03, service=postgresql, env=production

✓

disk_cleanup_general

No constraints (universal playbook)

✓

log_rotation_postgres

Matches postgresql service

✗

database_vacuum

Constraint: only run during maintenance windows

✗

filesystem_expansion

Constraint: requires approval, cloud provider API

Historical Success Rates

disk_cleanup_prod_db

46/47 runs successful · avg 87s

97.9%

disk_cleanup_general

178/203 runs successful · avg 62s

87.7%

log_rotation_postgres

35/38 runs successful · avg 45s

92.1%

Confidence Scoring

<10ms

Final playbook selected based on weighted score
Confidence determines: autonomous, approval-required, or escalate

Scoring Formula

confidence = (0.6 × semantic_similarity) +
             (0.3 × historical_success_rate) +
             (0.1 × recency_boost)

disk_cleanup_prod_db:
  = (0.6 × 0.94) + (0.3 × 0.979) + (0.1 × 1.0)
  = 0.564 + 0.294 + 0.100
  = 0.958  // 95.8% confidence

Confidence Thresholds

> 0.90Autonomous execution

No approval needed

0.70 – 0.90Approval required

Slack notification, human confirms

< 0.70Escalate to human

No playbook match confident enough

Final Selection

disk_cleanup_prod_db

Confidence: 0.958 (95.8%)

Action: Execute autonomously (>0.90 threshold)

Estimated duration: 87 seconds (historical average)

Real Incident → Playbook Selection

Walk through a real incident matching flow: from raw metrics to autonomous remediation with full audit trail.

Incident Details

Raw Incident Data

{
  "incident_id": "inc_2026_02_10_1435",
  "timestamp": "2026-02-10T14:35:42.124Z",
  "host": "prod-db-03.us-east-1",
  "environment": "production",
  "service": "postgresql",
  "metric": "disk_usage",
  "current_value": 91.4,
  "baseline": 68.2,
  "deviation": 4.8,
  "severity": "critical"
}

Natural Language

“Production PostgreSQL database prod-db-03 in us-east-1 experiencing critical disk usage: 91.4% current vs 68.2% baseline, 4.8 standard deviations above normal.”

Top 3 Playbook Candidates

1disk_cleanup_prod_db

Selected

Semantic similarity: 0.94

Historical success: 97.9% (46/47 runs)

Constraints: ✓ host=*.db.*, service=postgresql, env=production

Last run: 24 hours ago (successful)

Avg duration: 87 seconds

Final confidence: 0.958

2disk_cleanup_general

Semantic similarity: 0.87

Historical success: 87.7% (178/203 runs)

Constraints: ✓ No constraints (universal)

Last run: 5 hours ago (successful)

Avg duration: 62 seconds

Final confidence: 0.854

3log_rotation_postgres

Semantic similarity: 0.82

Historical success: 92.1% (35/38 runs)

Constraints: ✓ service=postgresql

Last run: 3 days ago (successful)

Avg duration: 45 seconds

Final confidence: 0.821

What Gets Executed

name: disk_cleanup_prod_db
version: 1.4.2
steps:
  - name: clear_temp_files
    command: "find /tmp -type f -mtime +7 -delete"
  - name: rotate_logs
    command: "logrotate -f /etc/logrotate.conf"
  - name: verify_space_freed
    health_check: "disk_usage < 80%"

Selection Decision

Selected: disk_cleanup_prod_db

Reason: Highest confidence (0.958 > 0.90 threshold)

Action: Execute autonomously

Notification: Informational Slack message (not approval request)

Outcome

Execution time: 87 seconds

Disk usage: 91.4% → 72.1%

Health verification: PASS

Status: Resolved autonomously

Confidence Improves Over Time

New playbooks start conservative (human oversight) and earn autonomy through proven success. This prevents “AI running wild” while allowing automation to scale as confidence builds.

Confidence Score Over Time

Week 1

First Deployment

Escalate to human

Total runs: 0

Success: N/A

Confidence: 68.4%

(0.6 × 0.89) + (0.3 × 0.50 assumed) + (0.1 × 0.0)

Result: Human reviews, approves manually, playbook succeeds

Week 2

Building Confidence

Approval required

Total runs: 3

Success: 100% (3/3)

Confidence: 88.4%

(0.6 × 0.89) + (0.3 × 1.00) + (0.1 × 0.5)

Result: Slack notification, human approves, playbook succeeds

Week 4

Approaching Autonomy

Still approval-required

Total runs: 12

Success: 91.7% (11/12)

Confidence: 88.9%

(0.6 × 0.89) + (0.3 × 0.917) + (0.1 × 0.8)

Result: Human approves 12 times, all successes

Week 8

Autonomous

Execute autonomously

Total runs: 47

Success: 97.9% (46/47)

Confidence: 92.8%

(0.6 × 0.89) + (0.3 × 0.979) + (0.1 × 1.0)

Result: No human approval needed, runs automatically

How Playbooks Are Stored and Matched

Every playbook is a YAML file with execution steps, metadata for RAG matching, and historical performance stats that update after each run.

Playbook Anatomy

# Playbook YAML file
name: disk_cleanup_prod_db
version: 1.4.2
description: |
  Clear disk space on production database servers by removing
  temporary files older than 7 days and rotating logs. Targets
  PostgreSQL servers experiencing disk usage >85%.

# Metadata for RAG matching
metadata:
  tags: ["disk", "cleanup", "database", "postgresql", "storage"]
  host_pattern: "*.db.*"
  service: "postgresql"
  environment: ["production"]
  severity: ["warning", "critical"]

# Vector embedding (computed at playbook creation)
embedding: [0.023, -0.891, 0.445, ..., 0.129]  # 1536 dimensions

# Historical performance (updated after each run)
performance:
  total_runs: 47
  successful: 46
  failed: 1
  success_rate: 0.979
  avg_duration_seconds: 87
  last_run: "2026-02-09T03:12:45Z"
  last_result: "success"

# Execution steps
steps:
  - name: clear_temp_files
    action: ssh_command
    command: "find /tmp -type f -mtime +7 -delete"
    timeout: 60s

  - name: rotate_logs
    action: ssh_command
    command: "logrotate -f /etc/logrotate.conf"
    timeout: 60s

  - name: verify_space_freed
    action: health_check
    metric: disk_usage
    threshold: "< 80%"
    retry: 3
    retry_delay: 10s

Library Organization

Playbook Library (vector database)

\u251c\u2500\u2500 Pre-built Playbooks (50+)

├── disk_cleanup_linux

├── disk_cleanup_prod_db

├── memory_restart_service

├── k8s_pod_restart

├── postgres_connection_reset

└── ssl_cert_renewal

\u251c\u2500\u2500 Custom Playbooks (unlimited)

├── custom_app_restart

└── custom_cache_clear

\u2514\u2500\u2500 Embeddings Index

├── Incident vectors → Playbook vectors

├── Similarity search <100ms

└── Supports 10,000+ playbooks

Search Performance

Library SizeSearch LatencyMemory Usage

50 playbooks<50ms10 MB

500 playbooks<75ms100 MB

1,000 playbooks<100ms200 MB

10,000 playbooks<200ms2 GB

Why RAG Outperforms Traditional Rules

165ms extra latency to avoid executing the wrong playbook is an excellent trade.

Dimension	Rules Engine	RAG Intelligence	Winner
Matching Method	Exact keyword match	Semantic similarity	RAG
Context Awareness	None (if metric=="X")	Full (host, service, time, history)	RAG
Synonyms	Fail (must match exactly)	Handle automatically	RAG
Maintenance	Manual (update rules per change)	Automatic (learns from embeddings)	RAG
New Playbooks	Write new rules for each	Auto-indexed, immediately searchable	RAG
Confidence	Binary (match or no match)	Scored (0.0–1.0 confidence)	RAG
Learning	Static (never improves)	Dynamic (confidence increases)	RAG
Selection Speed	Faster (10ms if/then)	Slower (165ms embedding + search)	Rules
Selection Accuracy	Lower (60–70% correct)	Higher (90–95% correct)	RAG
False Positives	High (wrong playbook executed)	Low (low confidence = escalate)	RAG
Scalability	Poor (rules multiply)	Excellent (vector search scales)	RAG

Rules Engine Results (100 disk incidents)

Correct playbook62 incidents (62%)

Wrong playbook28 incidents (28%)

No match10 incidents (10%)

Average MTTR: 3.2 hours (includes fixing wrong playbook executions)

RAG Intelligence Results (100 disk incidents)

Correct playbook (>0.90 confidence)87 incidents (87%)

Approval-required (0.70\u20130.90)11 incidents (11%)

Escalated (<0.70)2 incidents (2%)

Average MTTR: 92 seconds autonomous (8 minutes approved)

RAG Pipeline Components

End-to-end architecture from incident detection to playbook dispatch. Every component optimized for production latency and reliability.

Incident Detection

Receives structured incident data from anomaly detection engine
Includes host, metric, value, baseline, deviation, environment, service

Natural Language Conversion

Converts structured JSON to human-readable incident description
Captures full semantic context for accurate embedding

OpenAI Embedding Model (text-embedding-3-large)

<50ms

Input: Natural language text
Output: 1536-dimension vector
Cost: ~$0.0001 per incident
Alternative: Self-hosted (sentence-transformers) for air-gapped

Vector Database (Pinecone / Weaviate / Qdrant)

<100ms

Index type: HNSW (Hierarchical Navigable Small World)
Distance metric: Cosine similarity
Returns: Top 5 most similar playbooks
Persistence: Disk-backed (survives restarts)

Context Filtering Engine

<15ms

Host pattern matching, environment restrictions
Historical success rate comparison
Time-of-day constraints
Caching: Recent incidents cached for 5 minutes

Confidence Scoring

<10ms

>0.90 Auto0.70–0.90 Approve<0.70 Escalate

Formula: 0.6×similarity + 0.3×success_rate + 0.1×recency
Thresholds configurable per organization
Override: Admins can force autonomous/approval per playbook

Selected Playbook Dispatched

Total: <175ms

Playbook name, confidence score, execution mode, and estimated duration sent to execution orchestrator.

Beyond Basic Matching

Multi-Metric Correlation

Problem: Single metric anomaly might not warrant playbook execution.

Solution: RAG considers multiple related metrics. If disk_usage is 91% but inode_usage, write_iops, and read_latency are all normal, confidence drops and approval is required instead of autonomous execution.

Disk high but performance normal → likely expected behavior → lower confidence (0.82), require approval

Time-of-Day Awareness

Problem: Same metric at different times can mean different root causes.

Solution: Embedding includes time context. CPU 85% at 2 PM matches traffic_surge_scaling (scale horizontally). CPU 85% at 3 AM matches background_job_throttle (slow down batch processing).

Same metric, different playbooks based on time of day

Incident Clustering

Problem: 5 servers with same issue = 1 root cause, not 5 separate incidents.

Solution: RAG clusters similar incidents. When prod-db-01, 02, and 03 all show disk 90%+ within 2 minutes, it identifies the shared root cause (e.g., shared NFS mount full) and executes one cluster-wide playbook.

3 servers alerted → cluster detected → 1 playbook execution, not 3

Negative Matching

Problem: Some playbooks should NEVER run on certain hosts or during certain windows.

Solution: Exclusion rules filter out dangerous playbooks before scoring. aggressive_cache_clear never runs on production, never during backup windows (00:00–06:00), and never during major outages (>5 concurrent incidents).

Excluded playbooks removed before confidence scoring

Common Questions

Confidence score falls below the 0.70 threshold. SentienGuard escalates to a human via Slack, email, or PagerDuty. You investigate manually, then create a new playbook for future occurrences. After 3–5 successful manual resolutions, RAG has enough confidence to start running autonomously.

Yes. Admins can manually trigger any playbook from the dashboard. Useful for testing new playbooks or handling edge cases. Manual triggers are still logged in the immutable audit trail.

Write a detailed description in YAML metadata. RAG embeds the description, so natural language clarity matters more than keywords. Good: "Clear disk space on PostgreSQL production databases by removing temporary files older than 7 days and rotating application logs. Use when disk usage exceeds 85% and database performance is unaffected." Bad: "Cleans disk".

Traditional ML detects anomalies (what's broken). RAG selects remediation (how to fix it). Different problems, complementary solutions. SentienGuard uses both: statistical ML for anomaly detection, RAG for playbook selection.

No. RAG only selects from your existing playbook library. It cannot invent new remediation steps. This is "Retrieval-Augmented" Generation—retrieval constrains output to real playbooks. If no playbook matches (confidence <0.70), RAG escalates rather than guessing.

Approximately $0.0001 per incident (negligible). For 10,000 incidents/month that's about $1/month in embedding cost. Platform cost ($4/node) includes AI service fees—no surprise bills.

Try RAG Intelligence

1Deploy SentienGuard agents on 3 nodes (free tier).
2Import pre-built playbook library (50+ playbooks, already embedded).
3Trigger test incident (fill disk to 90%).
4Watch RAG select playbook in <165ms.
5Review confidence score and execution results.

Start Free (3 Nodes)View Playbook Library →

Free tier: 3 nodes, unlimited playbooks, full audit logs, no credit card required.