Product Overview

From Detection to Resolution Without Waking Humans

Lightweight agents detect anomalies, RAG engine selects playbooks, autonomous execution fixes incidents, immutable logs prove it happened. Four-stage pipeline, <3 minutes total latency, zero manual intervention.

Start Free (3 Nodes)Read Documentation →

50 MBAgent binary size<100 MB RAM resident

165msPlaybook selection latencyRAG semantic search

<60sTypical execution timeRoutine infrastructure fixes

0Inbound ports requiredOutbound HTTPS only

How It Works: End-to-End Architecture

Four components: agents in your infrastructure, control plane for intelligence, execution orchestrator for playbooks, immutable storage for audit trail.

Your InfrastructureAWS / GCP / Azure / On-Prem

Server 1

Agent

50 MB · <100 MB RAM

Server 2

Agent

50 MB · <100 MB RAM

Server N

Agent

50 MB · <100 MB RAM

Metrics every 30s (batched)

CPU, memory, disk, network, process count, service health

Outbound HTTPS (443) · TLS 1.3 · Cert Pinning

SentienGuard Control PlaneManaged SaaS

Metrics Ingestion

Time-series database
30-second intervals
Metric normalization

<200ms

Anomaly Detection

Dynamic baselines (7-day rolling)
Statistical analysis (σ deviation)
Time-of-day pattern matching

<100ms

RAG Engine

Incident → 1536-dim vector
Semantic playbook search
Confidence scoring (0.0-1.0)
Context: host, env, time-of-day

<165ms

Execution Orchestrator

Ed25519 playbook signing
Command dispatch to agent
Health verification per step
Automatic rollback on failure

10-90s

Audit Storage

AWS S3 + Object Lock (WORM)
SHA-256 hash-chained entries
Immutable — cannot be modified
2-year retention (7-year configurable)

Playbook execution results (commands, outputs, timestamps)

Agent executes via SSH

Bash commands & scripts
File operations (cleanup, rotation)
Service management (restart, reload)
Database connection management

Agent executes via kubectl

Pod restarts & eviction
Horizontal scaling (replica count)
Rolling rollbacks
Node drain & cordon

End-to-End Flow: Detection to Resolution

Metrics Collection (30-second intervals)

Agent collects: CPU usage per core, memory (used/available/swap), disk usage per filesystem, network (bytes in/out, packet loss), process count, service health checks
Agent batches metrics, sends via HTTPS to control plane
Latency: <200ms from collection to ingestion

Anomaly Detection (real-time statistical analysis)

Control plane maintains 7-day rolling baseline per metric per host
Calculates: mean, standard deviation, time-of-day patterns for every metric
Detects: Deviations >2σ from expected (configurable threshold per metric)
Example: Disk usage 91% when baseline is 68% ± 5% = 4.6σ deviation → anomaly
Latency: <100ms from metric arrival to anomaly detection

Playbook Selection (RAG semantic search)

Incident converted to vector embedding (1536 dimensions) capturing metric type, host context, environment, time-of-day
Semantic search across playbook library (50+ pre-built playbooks)
Context matching: host type (VM, container, bare metal), environment (prod/staging), time-of-day, historical success rate for similar incidents
Confidence scoring: >0.90 autonomous execution, 0.70-0.90 requires human approval via Slack, <0.70 escalates to on-call with full context
Latency: <165ms from anomaly to playbook selection

Execution (autonomous or approval-gated)

Control plane signs playbook with Ed25519 cryptographic signature
Agent verifies signature and timestamp freshness (<5 minutes) before execution
Agent executes steps via SSH, kubectl, or cloud provider APIs on the target host
Health verification after each step confirms the action had the desired effect
Automatic rollback reverses all changes if any verification step fails
Latency: 10-90 seconds depending on playbook complexity

Audit Logging (immutable storage)

Every action logged: command text, full stdout/stderr output, nanosecond timestamp, exit code, RBAC authorizer identity
Stored in AWS S3 with Object Lock (Write Once, Read Many) — cannot be modified or deleted
Hash-chained entries: each record contains SHA-256 hash of the previous record, creating tamper-evident chain
Retention: 2 years default (hot storage), configurable to 7 years (cold storage) for regulated industries
Export formats: JSON for API and SIEM integration, CSV for spreadsheets, formatted PDF reports for auditor handoff

Total Pipeline Latency: Detection → Resolution

Fastest (disk cleanup)87 seconds

Typical (service restart with health checks)2-3 minutes

Complex (database failover with verification)5-8 minutes

Manual (human-dependent)30 min - 4 hours

Two Ways to Deploy: Agent-Based or Direct API

Agent-Based

Recommended — 95% of deployments

Lightweight binary (50 MB) installed on each server. Collects metrics, executes playbooks locally, reports results to control plane. Full autonomous resolution capability with zero inbound attack surface.

Supported Platforms

Linux: Ubuntu 20.04+, CentOS 7+, Debian 10+, RHEL 8+
Architectures: x86_64, ARM64
Container: Kubernetes (Helm chart), Docker (container runtime)
Cloud: AWS EC2/EKS, GCP Compute/GKE, Azure VMs/AKS
On-premises: Bare metal, VMware, Proxmox

Installation

Linux

curl -sSL https://get.sentienguard.com/install | bash

Kubernetes (Helm)

helm repo add sentienguard https://charts.sentienguard.com
helm install sentienguard sentienguard/agent \
  --set apiKey=$SENTIENGUARD_API_KEY

Resource Usage

Binary size50 MB

Memory<100 MB resident (RSS)

CPU<0.5% steady-state, 2% peak during playbook execution

Disk200 MB (binary + logs + cache)

Network<100 KB/s outbound (batched metrics every 30s)

What Agent Collects

Infrastructure metrics: CPU, memory, disk, network (via eBPF + system APIs)
Process metrics: count, resource usage per process, open file descriptors
Kubernetes metrics: pod status, node health, events (via kubectl API)
Service health: HTTP endpoints, TCP ports, systemd unit status

What Agent Executes

SSH commands: bash scripts, file operations, service management
Kubernetes operations: kubectl (pod restart, scale, rollback, drain)
Cloud provider APIs: AWS CLI, gcloud, az CLI for cloud-native operations
Database queries: PostgreSQL, MySQL connection pool management

Security Model

Outbound-only: Agent initiates HTTPS (443) to control plane, never listens
No inbound ports: Zero attack surface from network scanning or exploitation
TLS 1.3: Certificate pinning prevents man-in-the-middle attacks
Cryptographic verification: Playbooks signed by control plane, agent verifies before execution
Non-root execution: Runs as dedicated service account with minimal privileges (configurable)

Advantages

Full playbook execution capability (not just metrics collection)
Works in air-gapped environments with cached playbooks (Enterprise tier)
Local execution means faster remediation (<60s typical round-trip)
Offline resilience: agent caches playbooks, executes during network outage

Direct API

Specialized Use Cases

Send metrics directly to SentienGuard API endpoint (/v1/incidents). No agent installation required. Metrics-only mode with AI-powered playbook recommendations. Ideal for serverless architectures or gradual evaluation before full agent deployment.

Supported Platforms

AWS Lambda (serverless functions)
Cloud Run, Cloud Functions (GCP serverless)
Azure Functions (serverless)
Edge computing (Cloudflare Workers, Lambda@Edge)
Custom applications (any HTTP client)

API Call Example

curl -X POST https://api.sentienguard.com/v1/incidents \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "host": "lambda-payment-processor",
    "metric": "invocation_errors",
    "value": 15.2,
    "threshold": 5.0,
    "environment": "production"
  }'

What You Send

Metric name (string): "disk_usage", "error_rate", "latency_p99"
Current value (number): 91.4, 12.5, 850
Threshold (number): 85.0, 5.0, 500
Host identifier (string): unique identifier for the metric source
Environment (string): "production", "staging", "dev"

What SentienGuard Does

Anomaly detection: compare value against dynamic baseline for that metric
Playbook selection: RAG matches incident to remediation strategy
Notification: Slack/email alert with recommended playbook and confidence score
Audit logging: record incident, recommendation, and outcome

Limitations

No autonomous execution (cannot run playbooks without an installed agent)
Metrics-only mode (detection and recommendation without remediation)
Must trigger remediation manually or via webhook callback

When to Use

Serverless architectures (no persistent servers for agent installation)
Custom monitoring systems (already collecting metrics, want AI playbook recommendations)
Alerting enrichment (add RAG-powered recommendations to existing alerting pipeline)
Gradual evaluation (start with metrics-only, add agents to production later)

Advantages

Zero infrastructure overhead (no agent binary to manage or update)
Works with serverless and ephemeral compute environments
Simple HTTP integration from any language or platform
Gradual adoption path: start with metrics, add agents when ready

Feature	Agent-Based	Direct API
Installation	Binary or Helm chart	HTTP POST to endpoint
Metrics Collection	Automatic (30s intervals)	Manual (you send)
Playbook Execution	Autonomous	Manual only
Latency to Resolution	<90s autonomous	Hours (human-dependent)
Supported Platforms	Linux, Kubernetes	Any HTTP client
Resource Usage	100 MB RAM, 0.5% CPU	Zero (no agent)
Air-Gapped Support	Yes (Enterprise)	No (requires internet)
Audit Logging	Complete trail	Detection only
Cost	$4/node/month	$4/incident/month
Best For	Production infrastructure	Serverless, evaluation

95% of deployments use agent-based model for full autonomous resolution. Direct API is for serverless architectures or gradual evaluation. Start with agents for production workloads.

Zero Inbound Attack Surface

Outbound-Only Communication

Principle

Agents never listen on network ports. All communication initiated outbound from agent to control plane. No inbound connections accepted.

Implementation

Agent connects to: control.sentienguard.com:443
Protocol: HTTPS (TLS 1.3)
Direction: Outbound only (agent → control plane)
Firewall rules: Allow outbound 443, deny all inbound
NAT-friendly: Works behind corporate firewalls and HTTP proxies

What This Prevents

Inbound exploitation (no listening ports to attack)
Lateral movement (compromised agent cannot accept commands from attacker)
Port scanning (no services exposed to network)

Comparison

Datadog agent: Listens on port 8125 (StatsD), 8126 (APM traces)

Prometheus: Scrapes metrics via HTTP endpoint (port 9090)

SentienGuard: Zero listening ports, outbound-only

Certificate Pinning

Principle

Agent trusts only SentienGuard's specific TLS certificate. Man-in-the-middle attacks impossible even if attacker has a valid CA-signed certificate.

Implementation

SentienGuard CA certificate hash embedded in agent binary at compile time
Agent verifies server certificate matches pinned hash on every connection
Connection refused if certificate does not match (no fallback to CA trust)
Certificate rotation: new agent version required (controlled deployment)

What This Prevents

Man-in-the-middle attacks (attacker cannot impersonate control plane)
Rogue control plane (agent refuses connection to unauthorized servers)
Certificate authority compromise (pinning bypasses entire CA trust chain)

Certificate Pinning Verification (Pseudocode)

expectedHash := "sha256:a3f8b9c2d1e4..."
actualHash := sha256(serverCertificate)

if actualHash != expectedHash {
    return error("Certificate pinning failed")
}

Cryptographic Playbook Signing

Principle

Every playbook signed by control plane with private key. Agent verifies signature before execution. Prevents unauthorized command injection.

Implementation

Control plane signs playbook with Ed25519 private key
Signature covers: playbook YAML, timestamp, incident ID, target host
Agent verifies signature with public key (embedded in agent binary)
Execution proceeds only if signature is valid AND timestamp is fresh (<5 minutes)

What This Prevents

Unauthorized playbook injection (attacker cannot forge Ed25519 signature)
Replay attacks (timestamp freshness check rejects stale playbooks)
Playbook tampering (any modification invalidates the signature)

Signed Playbook Payload

{
  "playbook": "disk_cleanup_prod_db",
  "version": "1.4.2",
  "incident_id": "inc_2026_02_10_1435",
  "target_host": "prod-db-03.us-east-1",
  "timestamp": "2026-02-10T14:35:43.891Z",
  "signature": "ed25519:a8f3b2c1d9e4..."
}

Agent Verification Process

1.Extract signature from payload
2.Verify signature using control plane public key
3.Check timestamp (must be within 5 minutes of current time)
4.Verify target host matches agent's hostname
5.If all checks pass: execute playbook
6.If any check fails: reject, log failed authorization attempt

Attack Surface Analysis

Traditional Monitoring Agent

Listening ports (StatsD, HTTP metrics endpoint)
Accepts inbound connections from any source
Trusts CA-signed certificates (MITM vulnerable)
Executes commands from any authenticated source

SentienGuard Agent

Zero listening ports (outbound-only)
Refuses all inbound connections
Certificate pinning (MITM impossible)
Cryptographically signed playbooks only

Result: 90% reduction in attack surface compared to traditional monitoring agents.

Performance & Capacity

Agent Performance

Latency

Metric collection<50ms per batch

Metric transmission<200ms to control plane

Heartbeat interval30 seconds

Playbook download<100ms

Playbook execution10-90 seconds (workload-dependent)

Throughput

Metrics per second100+ per agent

Concurrent playbooks1 per agent (serialized for safety)

Max agents per orgUnlimited

Max playbooks per agent100 active

Resource Limits

CPU limit2% (configurable via systemd/Kubernetes)

Memory limit200 MB (OOM protection)

Disk usage500 MB max (log rotation)

Network bandwidth1 Mbps peak

Control Plane Performance

Latency

Anomaly detection<100ms from metric ingestion

RAG playbook selection<165ms semantic search

Playbook signing<10ms

Total pipeline<500ms (detection → dispatch)

Throughput

Metrics ingested1M+ per second

Incidents processed10K+ per second

Playbook executions1K+ concurrent

API requests100K+ per minute

Availability

Uptime SLA99.9% (Enterprise: 99.99%)

Multi-regionUS-East, US-West, EU-West (Enterprise)

FailoverAutomatic within 30 seconds

Data replication3× redundancy

Storage Performance

Audit Logs

Write latency<50ms to S3

Read latency<200ms from S3 (hot storage)

Retention2 years hot, 7 years cold (configurable)

Size per entry2-10 KB (average 5 KB)

Compressiongzip (70% reduction)

Capacity (1,000-node deployment)

Logs per day~50,000 entries

Storage per day250 MB uncompressed, 75 MB compressed

Annual storage27 GB

Cost~$0.50/month S3 Standard (2-year retention)

Playbook Library

Capacity

Pre-built playbooks50+ included

Custom playbooksUnlimited

Playbook size<100 KB typical (YAML)

Embedding dimensions1536 per playbook

Vector DB size~10 MB per 1,000 playbooks

Performance

Semantic search<100ms for 10,000 playbooks

Confidence scoring<50ms

Context filtering<15ms

Total selection<165ms

Six Core Components

Click through to deep-dive pages for technical details on each component.

Semantic Playbook Matching

1536-dimension vector embeddings match incidents to remediation strategies using retrieval-augmented generation. Context matching evaluates host type, environment, time-of-day, and historical success rates. Confidence scoring determines autonomous execution (>0.90), human approval (0.70-0.90), or escalation (<0.70). The system gets smarter over time as successful resolutions reinforce playbook confidence scores and failed attempts get flagged for review and refinement.

Learn More →

Lightweight, Outbound-Only

50 MB binary with <100 MB RAM resident footprint. Zero inbound ports opened on your infrastructure. Certificate pinning prevents man-in-the-middle attacks. Cryptographic playbook signing ensures only authorized remediation executes. Non-root service account with minimal privileges. Deploys in 2 minutes via one-liner or Helm chart. Works behind corporate firewalls and NAT gateways without configuration.

Learn More →

Dynamic Baselines, Not Static Thresholds

7-day rolling average with time-of-day patterns captures Monday morning traffic spikes and Friday evening lulls. Statistical deviation detection triggers on >2σ deviations from expected behavior. Adapts to infrastructure growth, seasonal patterns, and deployment cadences automatically. New deployments recalibrate baselines within 48 hours. No manual threshold tuning needed. High-signal, low-noise anomaly detection that catches real problems and ignores expected fluctuations.

Learn More →

Execute, Verify, Rollback

Idempotent playbooks execute via SSH, kubectl, or cloud provider APIs. Every step includes health verification to confirm the action had the desired effect before proceeding. If any verification step fails, automatic rollback reverses all changes made during the current execution. Complete stdout/stderr captured for every command. Cryptographically signed audit trail records exactly what was run, when, by which agent, on which host. Typical execution under 60 seconds for routine infrastructure fixes.

Learn More →

Immutable Compliance Evidence

S3 Object Lock (Write Once, Read Many) prevents modification or deletion of audit records. SHA-256 hash-chained entries create tamper-evident chain that auditors can independently verify. 2-year default retention, configurable to 7 years for regulated industries. Each entry captures: Who, RBAC Authorizer, What, When, Where, and Result. Satisfies HIPAA §164.312(b), SOC 2 CC6.1/CC7.2, PCI-DSS Requirement 10, ISO 27001 A.12.4. Export as JSON, CSV, or formatted PDF.

Learn More →

Unified Infrastructure Dashboard

Real-time health monitor showing fleet status across all environments. Incident timeline with full execution history and audit trail. Playbook library with search, import, and custom YAML editor. User management with RBAC roles: Observer (view only), Remediation Authority (approve and execute), Admin (full control). Multi-tenant architecture for MSPs managing multiple client environments with strict isolation. API access for programmatic integration with existing tooling.

Learn More →

Works With Your Existing Stack

Monitoring Sources

Datadog

Import monitors as playbook triggers

Prometheus

AlertManager webhook integration

CloudWatch

SNS to SentienGuard API endpoint

Grafana

Webhook notification channel

Custom metrics

HTTP POST to /v1/incidents API

Execution Targets

SSH

Linux servers, bash commands, file operations

Kubernetes

kubectl via API (pod restart, scale, rollback, drain)

AWS

CLI, boto3, CloudFormation stack operations

GCP

gcloud, Cloud SDK for Compute, GKE, Cloud SQL

Azure

az CLI, ARM templates for VMs, AKS, SQL

Notification Channels

Slack

Approval gates, incident summaries, resolution reports

SMTP, SendGrid, AWS SES integration

PagerDuty

Escalation on autonomous failure or low confidence

Webhooks

Custom HTTP callbacks for any integration

SMS

Twilio for critical alerts and escalation

Storage & Logging

AWS S3

Primary audit log storage with Object Lock (WORM)

Elasticsearch

Optional log shipping for search and analysis

Splunk

SIEM integration for security event correlation

Datadog Logs

Forward audit logs if keeping Datadog for dashboards

Sumo Logic

Log aggregation and compliance reporting

How Teams Deploy SentienGuard

Replace Datadog Entirely

Setup

Datadog current cost: $18K/month (500 nodes, $15/host + metrics + APM)
Deploy SentienGuard agents on all 500 nodes ($4/node = $2K/month)
Import Datadog monitors as SentienGuard playbook triggers
Self-host Grafana for dashboards ($0) or use Grafana Cloud ($1.5K/month)

Timeline

Month 1

Run both in parallel (validation). Prove 87% autonomous resolution on live incidents. Team reviews every auto-resolved incident to build confidence.

Month 2

Shift alerting to SentienGuard as primary responder. Datadog becomes read-only dashboards. Cancel Datadog alerting and APM tiers.

Month 3

Cancel Datadog entirely. Deploy Grafana for any dashboard needs. Full autonomous resolution operational.

Month 4+

Optimized state. Engineering team fully reclaimed for product work. On-call pages reduced 87%.

Result

Cost: $18K/month → $2K/month (89% reduction)

MTTR: 4 hours → 90 seconds (96% improvement)

Savings: $192K/year

Hybrid (Keep Datadog Dashboards)

Setup

Datadog current cost: $18K/month (500 nodes, full suite)
Deploy SentienGuard for autonomous remediation ($2K/month)
Downgrade Datadog to infrastructure metrics only (no alerting, no APM, no log management)
SentienGuard handles all incident detection and resolution

Timeline

Month 1-2

Run both systems in parallel. SentienGuard shadow-resolves incidents while Datadog remains primary. Compare resolution times and accuracy.

Month 3

Cancel Datadog alerting, APM, and log management tiers. Retain infrastructure metrics for dashboards. Route all incident response through SentienGuard.

Month 4+

Steady state. Datadog provides read-only dashboards at reduced tier ($4K/month). SentienGuard handles all autonomous resolution ($2K/month).

Result

Cost: $18K/month → $6K/month (67% reduction)

MTTR: 4 hours → 90 seconds (autonomous)

Savings: $144K/year

Greenfield (No Existing Monitoring)

Setup

No Datadog, New Relic, or Prometheus subscription to migrate from
Deploy SentienGuard agents on all production nodes ($4/node/month)
Use included 50+ playbook library for common infrastructure incidents
Add Grafana for dashboards (optional, $0 self-hosted or $1.5K/month cloud)

Timeline

Week 1

Deploy agents across fleet. Import standard playbook library. Agents begin collecting metrics and building baselines immediately.

Week 2-4

Tune baselines as 7-day rolling average establishes patterns. Add custom playbooks for application-specific scenarios. Start with approval mode, transition to autonomous.

Month 2+

Full autonomous operations. Baselines calibrated. Custom playbooks tested and deployed. On-call team focused on strategic work, not firefighting.

Result

Cost: $2K/month SentienGuard + $0 Grafana = $2K/month total

MTTR: <90 seconds from day 1

Savings: No legacy monitoring bills to compare against

Deploy Your First Agent in 2 Minutes

Free for 3 nodes. No credit card. First autonomous resolution in 8 minutes.

Install Agent

Linux

curl -sSL https://get.sentienguard.com/install | bash

Kubernetes (Helm)

helm repo add sentienguard \
  https://charts.sentienguard.com
helm install sentienguard \
  sentienguard/agent \
  --set apiKey=$SENTIENGUARD_API_KEY

Docker

docker run -d \
  --name sentienguard-agent \
  -e SENTIENGUARD_API_KEY=$API_KEY \
  -v /var/run/docker.sock:\
     /var/run/docker.sock \
  sentienguard/agent:latest

Time: 2 minutes

Verify Connection

Check agent status

# Check agent status
systemctl status sentienguard-agent

# View logs
tail -f /var/log/sentienguard/agent.log

Expected Output

[INFO] Connected to control plane
[INFO] Heartbeat sent (30s interval)
[INFO] Baseline learning started
[INFO] 6 metrics streaming

Time: 30 seconds

Import Playbooks

Via Dashboard

Navigate to: Playbooks → Import
Select: disk_cleanup, memory_restart, k8s_pod_restart
Click: Import All

Via CLI

sentienguard playbook import \
  disk_cleanup_linux
sentienguard playbook import \
  postgres_connection_reset
sentienguard playbook import \
  ssl_cert_renewal

Time: 5 minutes

Total: 8 minutes to first autonomous resolution.

Ready to See It in Action?

Deploy free on 3 nodes. Trigger a test incident. Watch autonomous resolution.
Review the audit log. All in under 10 minutes.

Start Free (3 Nodes)View Documentation →

Free tier: 3 nodes, unlimited playbooks, full audit logs, no credit card required. Upgrade anytime to scale beyond 3 nodes.