SentienGuard
Home>Product

Product Overview

From Detection to Resolution Without Waking Humans

Lightweight agents detect anomalies, RAG engine selects playbooks, autonomous execution fixes incidents, immutable logs prove it happened. Four-stage pipeline, <3 minutes total latency, zero manual intervention.

50 MBAgent binary size<100 MB RAM resident
165msPlaybook selection latencyRAG semantic search
<60sTypical execution timeRoutine infrastructure fixes
0Inbound ports requiredOutbound HTTPS only

How It Works: End-to-End Architecture

Four components: agents in your infrastructure, control plane for intelligence, execution orchestrator for playbooks, immutable storage for audit trail.

Your InfrastructureAWS / GCP / Azure / On-Prem

Server 1

Agent

50 MB · <100 MB RAM

Server 2

Agent

50 MB · <100 MB RAM

Server N

Agent

50 MB · <100 MB RAM

Metrics every 30s (batched)

CPU, memory, disk, network, process count, service health

Outbound HTTPS (443) · TLS 1.3 · Cert Pinning

SentienGuard Control PlaneManaged SaaS
1
Metrics Ingestion
  • Time-series database
  • 30-second intervals
  • Metric normalization
<200ms
2
Anomaly Detection
  • Dynamic baselines (7-day rolling)
  • Statistical analysis (σ deviation)
  • Time-of-day pattern matching
<100ms
3
RAG Engine
  • Incident → 1536-dim vector
  • Semantic playbook search
  • Confidence scoring (0.0-1.0)
  • Context: host, env, time-of-day
<165ms
4
Execution Orchestrator
  • Ed25519 playbook signing
  • Command dispatch to agent
  • Health verification per step
  • Automatic rollback on failure
10-90s
5
Audit Storage
  • AWS S3 + Object Lock (WORM)
  • SHA-256 hash-chained entries
  • Immutable — cannot be modified
  • 2-year retention (7-year configurable)

Playbook execution results (commands, outputs, timestamps)

Agent executes via SSH
  • Bash commands & scripts
  • File operations (cleanup, rotation)
  • Service management (restart, reload)
  • Database connection management
Agent executes via kubectl
  • Pod restarts & eviction
  • Horizontal scaling (replica count)
  • Rolling rollbacks
  • Node drain & cordon

End-to-End Flow: Detection to Resolution

1

Metrics Collection (30-second intervals)

  • Agent collects: CPU usage per core, memory (used/available/swap), disk usage per filesystem, network (bytes in/out, packet loss), process count, service health checks
  • Agent batches metrics, sends via HTTPS to control plane
  • Latency: <200ms from collection to ingestion
2

Anomaly Detection (real-time statistical analysis)

  • Control plane maintains 7-day rolling baseline per metric per host
  • Calculates: mean, standard deviation, time-of-day patterns for every metric
  • Detects: Deviations >2σ from expected (configurable threshold per metric)
  • Example: Disk usage 91% when baseline is 68% ± 5% = 4.6σ deviation → anomaly
  • Latency: <100ms from metric arrival to anomaly detection
3

Playbook Selection (RAG semantic search)

  • Incident converted to vector embedding (1536 dimensions) capturing metric type, host context, environment, time-of-day
  • Semantic search across playbook library (50+ pre-built playbooks)
  • Context matching: host type (VM, container, bare metal), environment (prod/staging), time-of-day, historical success rate for similar incidents
  • Confidence scoring: >0.90 autonomous execution, 0.70-0.90 requires human approval via Slack, <0.70 escalates to on-call with full context
  • Latency: <165ms from anomaly to playbook selection
4

Execution (autonomous or approval-gated)

  • Control plane signs playbook with Ed25519 cryptographic signature
  • Agent verifies signature and timestamp freshness (<5 minutes) before execution
  • Agent executes steps via SSH, kubectl, or cloud provider APIs on the target host
  • Health verification after each step confirms the action had the desired effect
  • Automatic rollback reverses all changes if any verification step fails
  • Latency: 10-90 seconds depending on playbook complexity
5

Audit Logging (immutable storage)

  • Every action logged: command text, full stdout/stderr output, nanosecond timestamp, exit code, RBAC authorizer identity
  • Stored in AWS S3 with Object Lock (Write Once, Read Many) — cannot be modified or deleted
  • Hash-chained entries: each record contains SHA-256 hash of the previous record, creating tamper-evident chain
  • Retention: 2 years default (hot storage), configurable to 7 years (cold storage) for regulated industries
  • Export formats: JSON for API and SIEM integration, CSV for spreadsheets, formatted PDF reports for auditor handoff

Total Pipeline Latency: Detection → Resolution

Fastest (disk cleanup)87 seconds
Typical (service restart with health checks)2-3 minutes
Complex (database failover with verification)5-8 minutes
Manual (human-dependent)30 min - 4 hours

Two Ways to Deploy: Agent-Based or Direct API

Agent-Based

Recommended — 95% of deployments

Lightweight binary (50 MB) installed on each server. Collects metrics, executes playbooks locally, reports results to control plane. Full autonomous resolution capability with zero inbound attack surface.

Supported Platforms

  • Linux: Ubuntu 20.04+, CentOS 7+, Debian 10+, RHEL 8+
  • Architectures: x86_64, ARM64
  • Container: Kubernetes (Helm chart), Docker (container runtime)
  • Cloud: AWS EC2/EKS, GCP Compute/GKE, Azure VMs/AKS
  • On-premises: Bare metal, VMware, Proxmox

Installation

Linux

curl -sSL https://get.sentienguard.com/install | bash

Kubernetes (Helm)

helm repo add sentienguard https://charts.sentienguard.com
helm install sentienguard sentienguard/agent \
  --set apiKey=$SENTIENGUARD_API_KEY

Resource Usage

Binary size50 MB
Memory<100 MB resident (RSS)
CPU<0.5% steady-state, 2% peak during playbook execution
Disk200 MB (binary + logs + cache)
Network<100 KB/s outbound (batched metrics every 30s)

What Agent Collects

  • Infrastructure metrics: CPU, memory, disk, network (via eBPF + system APIs)
  • Process metrics: count, resource usage per process, open file descriptors
  • Kubernetes metrics: pod status, node health, events (via kubectl API)
  • Service health: HTTP endpoints, TCP ports, systemd unit status

What Agent Executes

  • SSH commands: bash scripts, file operations, service management
  • Kubernetes operations: kubectl (pod restart, scale, rollback, drain)
  • Cloud provider APIs: AWS CLI, gcloud, az CLI for cloud-native operations
  • Database queries: PostgreSQL, MySQL connection pool management

Security Model

  • Outbound-only: Agent initiates HTTPS (443) to control plane, never listens
  • No inbound ports: Zero attack surface from network scanning or exploitation
  • TLS 1.3: Certificate pinning prevents man-in-the-middle attacks
  • Cryptographic verification: Playbooks signed by control plane, agent verifies before execution
  • Non-root execution: Runs as dedicated service account with minimal privileges (configurable)

Advantages

  • Full playbook execution capability (not just metrics collection)
  • Works in air-gapped environments with cached playbooks (Enterprise tier)
  • Local execution means faster remediation (<60s typical round-trip)
  • Offline resilience: agent caches playbooks, executes during network outage

Direct API

Specialized Use Cases

Send metrics directly to SentienGuard API endpoint (/v1/incidents). No agent installation required. Metrics-only mode with AI-powered playbook recommendations. Ideal for serverless architectures or gradual evaluation before full agent deployment.

Supported Platforms

  • AWS Lambda (serverless functions)
  • Cloud Run, Cloud Functions (GCP serverless)
  • Azure Functions (serverless)
  • Edge computing (Cloudflare Workers, Lambda@Edge)
  • Custom applications (any HTTP client)

API Call Example

curl -X POST https://api.sentienguard.com/v1/incidents \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "host": "lambda-payment-processor",
    "metric": "invocation_errors",
    "value": 15.2,
    "threshold": 5.0,
    "environment": "production"
  }'

What You Send

  • Metric name (string): "disk_usage", "error_rate", "latency_p99"
  • Current value (number): 91.4, 12.5, 850
  • Threshold (number): 85.0, 5.0, 500
  • Host identifier (string): unique identifier for the metric source
  • Environment (string): "production", "staging", "dev"

What SentienGuard Does

  • Anomaly detection: compare value against dynamic baseline for that metric
  • Playbook selection: RAG matches incident to remediation strategy
  • Notification: Slack/email alert with recommended playbook and confidence score
  • Audit logging: record incident, recommendation, and outcome

Limitations

  • No autonomous execution (cannot run playbooks without an installed agent)
  • Metrics-only mode (detection and recommendation without remediation)
  • Must trigger remediation manually or via webhook callback

When to Use

  • Serverless architectures (no persistent servers for agent installation)
  • Custom monitoring systems (already collecting metrics, want AI playbook recommendations)
  • Alerting enrichment (add RAG-powered recommendations to existing alerting pipeline)
  • Gradual evaluation (start with metrics-only, add agents to production later)

Advantages

  • Zero infrastructure overhead (no agent binary to manage or update)
  • Works with serverless and ephemeral compute environments
  • Simple HTTP integration from any language or platform
  • Gradual adoption path: start with metrics, add agents when ready
FeatureAgent-BasedDirect API
Installation Binary or Helm chart HTTP POST to endpoint
Metrics Collection Automatic (30s intervals) Manual (you send)
Playbook Execution Autonomous Manual only
Latency to Resolution <90s autonomous Hours (human-dependent)
Supported Platforms Linux, Kubernetes Any HTTP client
Resource Usage 100 MB RAM, 0.5% CPU Zero (no agent)
Air-Gapped Support Yes (Enterprise) No (requires internet)
Audit Logging Complete trail Detection only
Cost $4/node/month $4/incident/month
Best For Production infrastructure Serverless, evaluation

95% of deployments use agent-based model for full autonomous resolution. Direct API is for serverless architectures or gradual evaluation. Start with agents for production workloads.

Zero Inbound Attack Surface

Outbound-Only Communication

Principle

Agents never listen on network ports. All communication initiated outbound from agent to control plane. No inbound connections accepted.

Implementation

  • Agent connects to: control.sentienguard.com:443
  • Protocol: HTTPS (TLS 1.3)
  • Direction: Outbound only (agent → control plane)
  • Firewall rules: Allow outbound 443, deny all inbound
  • NAT-friendly: Works behind corporate firewalls and HTTP proxies

What This Prevents

  • Inbound exploitation (no listening ports to attack)
  • Lateral movement (compromised agent cannot accept commands from attacker)
  • Port scanning (no services exposed to network)

Comparison

Datadog agent: Listens on port 8125 (StatsD), 8126 (APM traces)
Prometheus: Scrapes metrics via HTTP endpoint (port 9090)
SentienGuard: Zero listening ports, outbound-only

Certificate Pinning

Principle

Agent trusts only SentienGuard's specific TLS certificate. Man-in-the-middle attacks impossible even if attacker has a valid CA-signed certificate.

Implementation

  • SentienGuard CA certificate hash embedded in agent binary at compile time
  • Agent verifies server certificate matches pinned hash on every connection
  • Connection refused if certificate does not match (no fallback to CA trust)
  • Certificate rotation: new agent version required (controlled deployment)

What This Prevents

  • Man-in-the-middle attacks (attacker cannot impersonate control plane)
  • Rogue control plane (agent refuses connection to unauthorized servers)
  • Certificate authority compromise (pinning bypasses entire CA trust chain)

Certificate Pinning Verification (Pseudocode)

expectedHash := "sha256:a3f8b9c2d1e4..."
actualHash := sha256(serverCertificate)

if actualHash != expectedHash {
    return error("Certificate pinning failed")
}

Cryptographic Playbook Signing

Principle

Every playbook signed by control plane with private key. Agent verifies signature before execution. Prevents unauthorized command injection.

Implementation

  • Control plane signs playbook with Ed25519 private key
  • Signature covers: playbook YAML, timestamp, incident ID, target host
  • Agent verifies signature with public key (embedded in agent binary)
  • Execution proceeds only if signature is valid AND timestamp is fresh (<5 minutes)

What This Prevents

  • Unauthorized playbook injection (attacker cannot forge Ed25519 signature)
  • Replay attacks (timestamp freshness check rejects stale playbooks)
  • Playbook tampering (any modification invalidates the signature)

Signed Playbook Payload

{
  "playbook": "disk_cleanup_prod_db",
  "version": "1.4.2",
  "incident_id": "inc_2026_02_10_1435",
  "target_host": "prod-db-03.us-east-1",
  "timestamp": "2026-02-10T14:35:43.891Z",
  "signature": "ed25519:a8f3b2c1d9e4..."
}

Agent Verification Process

  1. 1.Extract signature from payload
  2. 2.Verify signature using control plane public key
  3. 3.Check timestamp (must be within 5 minutes of current time)
  4. 4.Verify target host matches agent's hostname
  5. 5.If all checks pass: execute playbook
  6. 6.If any check fails: reject, log failed authorization attempt

Attack Surface Analysis

Traditional Monitoring Agent

  • Listening ports (StatsD, HTTP metrics endpoint)
  • Accepts inbound connections from any source
  • Trusts CA-signed certificates (MITM vulnerable)
  • Executes commands from any authenticated source

SentienGuard Agent

  • Zero listening ports (outbound-only)
  • Refuses all inbound connections
  • Certificate pinning (MITM impossible)
  • Cryptographically signed playbooks only

Result: 90% reduction in attack surface compared to traditional monitoring agents.

Performance & Capacity

Agent Performance

Latency

Metric collection<50ms per batch
Metric transmission<200ms to control plane
Heartbeat interval30 seconds
Playbook download<100ms
Playbook execution10-90 seconds (workload-dependent)

Throughput

Metrics per second100+ per agent
Concurrent playbooks1 per agent (serialized for safety)
Max agents per orgUnlimited
Max playbooks per agent100 active

Resource Limits

CPU limit2% (configurable via systemd/Kubernetes)
Memory limit200 MB (OOM protection)
Disk usage500 MB max (log rotation)
Network bandwidth1 Mbps peak

Control Plane Performance

Latency

Anomaly detection<100ms from metric ingestion
RAG playbook selection<165ms semantic search
Playbook signing<10ms
Total pipeline<500ms (detection → dispatch)

Throughput

Metrics ingested1M+ per second
Incidents processed10K+ per second
Playbook executions1K+ concurrent
API requests100K+ per minute

Availability

Uptime SLA99.9% (Enterprise: 99.99%)
Multi-regionUS-East, US-West, EU-West (Enterprise)
FailoverAutomatic within 30 seconds
Data replication3× redundancy

Storage Performance

Audit Logs

Write latency<50ms to S3
Read latency<200ms from S3 (hot storage)
Retention2 years hot, 7 years cold (configurable)
Size per entry2-10 KB (average 5 KB)
Compressiongzip (70% reduction)

Capacity (1,000-node deployment)

Logs per day~50,000 entries
Storage per day250 MB uncompressed, 75 MB compressed
Annual storage27 GB
Cost~$0.50/month S3 Standard (2-year retention)

Playbook Library

Capacity

Pre-built playbooks50+ included
Custom playbooksUnlimited
Playbook size<100 KB typical (YAML)
Embedding dimensions1536 per playbook
Vector DB size~10 MB per 1,000 playbooks

Performance

Semantic search<100ms for 10,000 playbooks
Confidence scoring<50ms
Context filtering<15ms
Total selection<165ms

Six Core Components

Click through to deep-dive pages for technical details on each component.

Semantic Playbook Matching

1536-dimension vector embeddings match incidents to remediation strategies using retrieval-augmented generation. Context matching evaluates host type, environment, time-of-day, and historical success rates. Confidence scoring determines autonomous execution (>0.90), human approval (0.70-0.90), or escalation (<0.70). The system gets smarter over time as successful resolutions reinforce playbook confidence scores and failed attempts get flagged for review and refinement.

Learn More →

Lightweight, Outbound-Only

50 MB binary with <100 MB RAM resident footprint. Zero inbound ports opened on your infrastructure. Certificate pinning prevents man-in-the-middle attacks. Cryptographic playbook signing ensures only authorized remediation executes. Non-root service account with minimal privileges. Deploys in 2 minutes via one-liner or Helm chart. Works behind corporate firewalls and NAT gateways without configuration.

Learn More →

Dynamic Baselines, Not Static Thresholds

7-day rolling average with time-of-day patterns captures Monday morning traffic spikes and Friday evening lulls. Statistical deviation detection triggers on >2σ deviations from expected behavior. Adapts to infrastructure growth, seasonal patterns, and deployment cadences automatically. New deployments recalibrate baselines within 48 hours. No manual threshold tuning needed. High-signal, low-noise anomaly detection that catches real problems and ignores expected fluctuations.

Learn More →

Execute, Verify, Rollback

Idempotent playbooks execute via SSH, kubectl, or cloud provider APIs. Every step includes health verification to confirm the action had the desired effect before proceeding. If any verification step fails, automatic rollback reverses all changes made during the current execution. Complete stdout/stderr captured for every command. Cryptographically signed audit trail records exactly what was run, when, by which agent, on which host. Typical execution under 60 seconds for routine infrastructure fixes.

Learn More →

Immutable Compliance Evidence

S3 Object Lock (Write Once, Read Many) prevents modification or deletion of audit records. SHA-256 hash-chained entries create tamper-evident chain that auditors can independently verify. 2-year default retention, configurable to 7 years for regulated industries. Each entry captures: Who, RBAC Authorizer, What, When, Where, and Result. Satisfies HIPAA §164.312(b), SOC 2 CC6.1/CC7.2, PCI-DSS Requirement 10, ISO 27001 A.12.4. Export as JSON, CSV, or formatted PDF.

Learn More →

Unified Infrastructure Dashboard

Real-time health monitor showing fleet status across all environments. Incident timeline with full execution history and audit trail. Playbook library with search, import, and custom YAML editor. User management with RBAC roles: Observer (view only), Remediation Authority (approve and execute), Admin (full control). Multi-tenant architecture for MSPs managing multiple client environments with strict isolation. API access for programmatic integration with existing tooling.

Learn More →

Works With Your Existing Stack

Monitoring Sources

Datadog

Import monitors as playbook triggers

Prometheus

AlertManager webhook integration

CloudWatch

SNS to SentienGuard API endpoint

Grafana

Webhook notification channel

Custom metrics

HTTP POST to /v1/incidents API

Execution Targets

SSH

Linux servers, bash commands, file operations

Kubernetes

kubectl via API (pod restart, scale, rollback, drain)

AWS

CLI, boto3, CloudFormation stack operations

GCP

gcloud, Cloud SDK for Compute, GKE, Cloud SQL

Azure

az CLI, ARM templates for VMs, AKS, SQL

Notification Channels

Slack

Approval gates, incident summaries, resolution reports

Email

SMTP, SendGrid, AWS SES integration

PagerDuty

Escalation on autonomous failure or low confidence

Webhooks

Custom HTTP callbacks for any integration

SMS

Twilio for critical alerts and escalation

Storage & Logging

AWS S3

Primary audit log storage with Object Lock (WORM)

Elasticsearch

Optional log shipping for search and analysis

Splunk

SIEM integration for security event correlation

Datadog Logs

Forward audit logs if keeping Datadog for dashboards

Sumo Logic

Log aggregation and compliance reporting

How Teams Deploy SentienGuard

Replace Datadog Entirely

Setup

  • Datadog current cost: $18K/month (500 nodes, $15/host + metrics + APM)
  • Deploy SentienGuard agents on all 500 nodes ($4/node = $2K/month)
  • Import Datadog monitors as SentienGuard playbook triggers
  • Self-host Grafana for dashboards ($0) or use Grafana Cloud ($1.5K/month)

Timeline

Month 1

Run both in parallel (validation). Prove 87% autonomous resolution on live incidents. Team reviews every auto-resolved incident to build confidence.

Month 2

Shift alerting to SentienGuard as primary responder. Datadog becomes read-only dashboards. Cancel Datadog alerting and APM tiers.

Month 3

Cancel Datadog entirely. Deploy Grafana for any dashboard needs. Full autonomous resolution operational.

Month 4+

Optimized state. Engineering team fully reclaimed for product work. On-call pages reduced 87%.

Result

Cost: $18K/month → $2K/month (89% reduction)

MTTR: 4 hours → 90 seconds (96% improvement)

Savings: $192K/year

Hybrid (Keep Datadog Dashboards)

Setup

  • Datadog current cost: $18K/month (500 nodes, full suite)
  • Deploy SentienGuard for autonomous remediation ($2K/month)
  • Downgrade Datadog to infrastructure metrics only (no alerting, no APM, no log management)
  • SentienGuard handles all incident detection and resolution

Timeline

Month 1-2

Run both systems in parallel. SentienGuard shadow-resolves incidents while Datadog remains primary. Compare resolution times and accuracy.

Month 3

Cancel Datadog alerting, APM, and log management tiers. Retain infrastructure metrics for dashboards. Route all incident response through SentienGuard.

Month 4+

Steady state. Datadog provides read-only dashboards at reduced tier ($4K/month). SentienGuard handles all autonomous resolution ($2K/month).

Result

Cost: $18K/month → $6K/month (67% reduction)

MTTR: 4 hours → 90 seconds (autonomous)

Savings: $144K/year

Greenfield (No Existing Monitoring)

Setup

  • No Datadog, New Relic, or Prometheus subscription to migrate from
  • Deploy SentienGuard agents on all production nodes ($4/node/month)
  • Use included 50+ playbook library for common infrastructure incidents
  • Add Grafana for dashboards (optional, $0 self-hosted or $1.5K/month cloud)

Timeline

Week 1

Deploy agents across fleet. Import standard playbook library. Agents begin collecting metrics and building baselines immediately.

Week 2-4

Tune baselines as 7-day rolling average establishes patterns. Add custom playbooks for application-specific scenarios. Start with approval mode, transition to autonomous.

Month 2+

Full autonomous operations. Baselines calibrated. Custom playbooks tested and deployed. On-call team focused on strategic work, not firefighting.

Result

Cost: $2K/month SentienGuard + $0 Grafana = $2K/month total

MTTR: <90 seconds from day 1

Savings: No legacy monitoring bills to compare against

Deploy Your First Agent in 2 Minutes

Free for 3 nodes. No credit card. First autonomous resolution in 8 minutes.

1

Install Agent

Linux

curl -sSL https://get.sentienguard.com/install | bash

Kubernetes (Helm)

helm repo add sentienguard \
  https://charts.sentienguard.com
helm install sentienguard \
  sentienguard/agent \
  --set apiKey=$SENTIENGUARD_API_KEY

Docker

docker run -d \
  --name sentienguard-agent \
  -e SENTIENGUARD_API_KEY=$API_KEY \
  -v /var/run/docker.sock:\
     /var/run/docker.sock \
  sentienguard/agent:latest

Time: 2 minutes

2

Verify Connection

Check agent status

# Check agent status
systemctl status sentienguard-agent

# View logs
tail -f /var/log/sentienguard/agent.log

Expected Output

[INFO] Connected to control plane
[INFO] Heartbeat sent (30s interval)
[INFO] Baseline learning started
[INFO] 6 metrics streaming

Time: 30 seconds

3

Import Playbooks

Via Dashboard

  • Navigate to: Playbooks → Import
  • Select: disk_cleanup, memory_restart, k8s_pod_restart
  • Click: Import All

Via CLI

sentienguard playbook import \
  disk_cleanup_linux
sentienguard playbook import \
  postgres_connection_reset
sentienguard playbook import \
  ssl_cert_renewal

Time: 5 minutes

Total: 8 minutes to first autonomous resolution.

Ready to See It in Action?

Deploy free on 3 nodes. Trigger a test incident. Watch autonomous resolution.
Review the audit log. All in under 10 minutes.

Free tier: 3 nodes, unlimited playbooks, full audit logs, no credit card required. Upgrade anytime to scale beyond 3 nodes.