Traditional monitoring agent:
Listens on port 8125 (StatsD metrics)
Listens on port 8126 (APM traces)
Listens on port 9090 (HTTP metrics endpoint)
Attack surface: 3 inbound ports
SentienGuard agent:
Listens on: NOTHING
Initiates outbound to: control.sentienguard.com:443
Attack surface: 0 inbound portsAgent Architecture
Lightweight. Outbound-Only. Zero Inbound Attack Surface.
50 MB agent binary, <100 MB resident memory, <0.5% CPU steady-state. Outbound HTTPS only—no listening ports, no inbound connections, no VPN dependency. Deploys in 2 minutes via curl, Helm, or Docker.
Four Principles That Define Agent Architecture
Outbound-Only Communication
Why: Inbound ports = attack surface. Every listening port is a potential entry point for exploitation.
Implementation
- Agent initiates all connections to control plane
- Protocol: HTTPS on port 443 (outbound)
- No inbound ports opened on host
- No listening services exposed
- Firewall-friendly (works through NAT, proxies, corporate firewalls)
What This Prevents
- Port scanning attacks (no services to discover)
- Remote code execution via exposed endpoints
- Lateral movement (compromised agent can't accept commands from attacker)
- Network-based exploits targeting agent services
Minimal Footprint
Why: Production servers have constrained resources. Agent overhead must be negligible.
Implementation
- Binary size: 50 MB (statically compiled Go)
- Resident memory: <100 MB RSS (steady-state)
- CPU usage: <0.5% average, 2% peak during playbook execution
- Disk usage: 200 MB (binary + logs + cache)
- Network bandwidth: <100 KB/s outbound (metrics batched every 30s)
Stateless Operation
Why: Operational state stored centrally means agent restart = zero data loss.
Implementation
- Metrics sent to control plane immediately (not buffered locally)
- Playbook execution state reported in real-time
- Configuration fetched from control plane on startup
- Historical data stored in control plane time-series DB
Idempotent Execution
Why: Network failures, timeouts, retries must not cause duplicate actions.
Implementation
- Playbook validation rejects non-idempotent operations
- State checks before execution ("if disk already <80%, skip cleanup")
- Conditional steps ("only if service not running")
- Health verification after execution (confirms desired state reached)
/opt/sentienguard/
├── bin/
│ └── sentienguard-agent # 50 MB
├── config/
│ └── agent.yaml # 2 KB
├── logs/
│ └── agent.log # 10-50 MB (rotated)
└── cache/
└── playbooks/ # 10-20 MB| Agent | Binary Size | Memory (RSS) | CPU (avg) |
|---|---|---|---|
| Datadog | 200 MB | 200-300 MB | 1-2% |
| New Relic | 150 MB | 150-250 MB | 1-3% |
| Prometheus Node Exporter | 20 MB | 50-80 MB | 0.3-0.5% |
| SentienGuard | 50 MB | <100 MB | <0.5% |
# Agent crashes or restarts
systemctl restart sentienguard-agent
# What happens:
# 1. Agent reconnects to control plane (<5 seconds)
# 2. Fetches current configuration
# 3. Resumes metric collection from current state
# 4. No data loss (metrics already sent)
# 5. No incident interruption (state on control plane)# Idempotent playbook step
- name: clear_temp_files
command: "find /tmp -type f -mtime +7 -delete"
# First run: Deletes 100 files
# Second run: Deletes 0 files (already gone)
# Third run: Deletes 0 files
# Result: Same outcome regardless of runs# BAD - not idempotent
- name: increment_counter
command: "echo $((counter + 1)) > /var/counter"
# First run: counter = 1
# Second run: counter = 2 (wrong!)
# Problem: Side effects accumulateThree Ways to Deploy Agents
Choose the deployment model that fits your infrastructure. All options deliver the same agent binary with identical capabilities.
Supported Platforms
# One-line install
curl -sSL https://get.sentienguard.com/install | bash
# What this does:
# 1. Downloads agent binary (50 MB, GPG-signed)
# 2. Verifies GPG signature
# 3. Installs to /opt/sentienguard/
# 4. Creates systemd service
# 5. Starts agent, enables auto-start on boot
# Verify installation
systemctl status sentienguard-agent
# Expected: "active (running)"
# View logs
tail -f /var/log/sentienguard/agent.log
# Expected: "Connected to control plane, heartbeat every 30s"# Download binary manually
wget https://releases.sentienguard.com/agent/v1.4.2/sentienguard-agent-linux-amd64
# Verify signature
gpg --verify sentienguard-agent-linux-amd64.sig
# Install
sudo cp sentienguard-agent-linux-amd64 /opt/sentienguard/bin/sentienguard-agent
sudo chmod +x /opt/sentienguard/bin/sentienguard-agent
# Configure
sudo tee /opt/sentienguard/config/agent.yaml <<EOF
api_key: YOUR_API_KEY
control_plane: https://control.sentienguard.com
environment: production
EOF
# Create systemd service
sudo tee /etc/systemd/system/sentienguard-agent.service <<EOF
[Unit]
Description=SentienGuard Agent
After=network.target
[Service]
Type=simple
User=sentienguard
ExecStart=/opt/sentienguard/bin/sentienguard-agent --config /opt/sentienguard/config/agent.yaml
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# Start service
sudo systemctl daemon-reload
sudo systemctl enable sentienguard-agent
sudo systemctl start sentienguard-agent2 minutes
Install time
200 MB
Disk space
<100 MB
Memory
<0.5%
CPU
Metrics Collected Every 30 Seconds
Agents collect infrastructure, process, Kubernetes, and application metrics using kernel-level instrumentation with zero performance overhead.
Infrastructure Metrics
CPU
- Usage per core (user, system, idle, iowait)
- Load average (1min, 5min, 15min)
- Context switches per second
- CPU throttling events (cgroups)
Memory
- Total, used, available, free
- Swap usage
- Memory pressure (PSI)
- Cache and buffer usage
Disk
- Usage per filesystem (%, bytes)
- Inode usage (%, count)
- Read/write IOPS
- Read/write throughput (MB/s)
- I/O wait time
Network
- Bytes in/out per interface
- Packets in/out
- Packet loss rate
- Network errors (dropped, collisions)
- Connection count (TCP, UDP)
eBPF (extended Berkeley Packet Filter) for kernel-level metrics. /proc filesystem for process metrics. sysfs for system metrics. Zero overhead (kernel-space collection).
Process Metrics
Per-Process Data
- Process count (total, running, sleeping, zombie)
- CPU usage per process
- Memory (RSS, VSZ) per process
- Open file descriptors per process
- Process state (running, sleeping, zombie, stopped)
Service Health
- systemd service status (active, inactive, failed)
- Service restart count
- Service uptime
- Service memory usage
Kubernetes Metrics
If ApplicablePod-Level
- Pod status (Running, Pending, Failed, CrashLoopBackOff)
- Container restarts
- Resource usage (CPU, memory per container)
- Pod events (OOMKilled, Evicted, etc.)
Node-Level
- Node status (Ready, NotReady, SchedulingDisabled)
- Allocatable resources vs capacity
- Pod count per node
- Node conditions (DiskPressure, MemoryPressure, PIDPressure)
Deployment-Level
- Desired vs available replicas
- Rollout status
- Deployment events
Kubernetes API via ServiceAccount credentials. Metrics from kubelet (node-level). Events from API server.
Application Context
OptionalOpenTelemetry Integration
- Request rate
- Error rate
- Latency percentiles (p50, p95, p99)
- Active requests
Database Connections
- Active connection count (PostgreSQL/MySQL)
- Query performance monitoring
HTTP Endpoints
- Health check endpoint monitoring
- Response status codes
{
"process": "postgresql",
"pid": 1234,
"cpu_percent": 12.4,
"memory_rss_mb": 2048,
"memory_percent": 12.8,
"open_files": 147,
"state": "running",
"uptime_seconds": 864000
}What Agents DON'T Collect
- ×Application logs (only infrastructure events)
- ×User data or PII
- ×Application source code
- ×Environment variables with secrets
- ×Custom business metrics (unless explicitly configured)
Defense-in-Depth Security Architecture
Five layers of security from network to secrets. Each layer independently prevents a class of attacks, so compromise of one layer doesn't compromise the system.
Outbound-Only Communication
Agent → Control Plane: Outbound HTTPS (443), TLS 1.3, certificate pinning. No inbound connections accepted.
What This Prevents
- Network-based attacks (no listening ports)
- Reverse shells (agent can't accept inbound connections)
- Port scanning (no services to discover)
TLS 1.3 with Certificate Pinning
SentienGuard CA certificate hash embedded in agent binary at compile time. Connection refused if mismatch—no fallback to CA trust chain.
What This Prevents
- Man-in-the-middle attacks
- Certificate authority compromise
- Rogue control plane impersonation
Cryptographic Playbook Signing
Ed25519 signatures on every playbook. Agent verifies signature, checks timestamp freshness (<5 min), and confirms target host before execution.
What This Prevents
- Unauthorized playbook injection
- Replay attacks (timestamp freshness check)
- Playbook tampering
- Wrong-host execution
Non-Root Execution
Agent runs as dedicated sentienguard user. Sudo with explicit allow-list for commands requiring root.
What This Prevents
- Privilege escalation
- System-wide damage (limited to sentienguard user permissions)
- Lateral movement (can't modify other services)
Secret Management
API keys: file permissions (0600). SSH keys: AWS Secrets Manager / Azure Key Vault / HashiCorp Vault with just-in-time retrieval. Cloud credentials: IAM roles, no static keys.
What This Prevents
- Secrets in version control
- Secrets in logs (automatic redaction)
- Long-lived credentials (automatic rotation)
# Agent needs ONLY outbound HTTPS
iptables -A OUTPUT -p tcp --dport 443 -j ACCEPT
iptables -A INPUT -j DROP # Deny all inbound
iptables -A OUTPUT -j DROP # Deny all other outbound// Pseudocode: Agent TLS configuration
tlsConfig := &tls.Config{
MinVersion: tls.VersionTLS13,
InsecureSkipVerify: false,
VerifyPeerCertificate: func(rawCerts [][]byte,
verifiedChains [][]*x509.Certificate) error {
// Verify certificate matches pinned hash
expectedHash := "sha256:a3f8b9c2d1e4..."
actualHash := sha256(rawCerts[0])
if actualHash != expectedHash {
return errors.New("certificate pinning failed")
}
return nil
},
}{
"playbook": "disk_cleanup_prod_db",
"version": "1.4.2",
"target_host": "prod-db-03",
"timestamp": "2026-02-10T14:35:43Z",
"steps": ["..."],
"signature": "ed25519:a8f3b2c1d9e4f5a6..."
}func verifyPlaybook(payload Playbook) error {
// 1. Extract signature
signature := payload.Signature
// 2. Verify with control plane public key
publicKey := loadEmbeddedPublicKey()
valid := ed25519.Verify(publicKey, payload.Bytes(), signature)
if !valid {
return errors.New("invalid signature")
}
// 3. Check timestamp freshness (<5 minutes)
age := time.Now().Sub(payload.Timestamp)
if age > 5*time.Minute {
return errors.New("playbook too old, possible replay attack")
}
// 4. Verify target host matches
if payload.TargetHost != agent.Hostname {
return errors.New("playbook not intended for this host")
}
return nil // All checks passed
}# Agent runs as dedicated user (not root)
useradd -r -s /bin/false sentienguard
chown -R sentienguard:sentienguard /opt/sentienguard/
# Systemd service runs as sentienguard user
[Service]
User=sentienguard
Group=sentienguard# /etc/sudoers.d/sentienguard
# Allow specific commands only
sentienguard ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart *
sentienguard ALL=(ALL) NOPASSWD: /usr/bin/find /tmp -type f -mtime +7 -delete
sentienguard ALL=(ALL) NOPASSWD: /usr/sbin/logrotate -f /etc/logrotate.conf
# Deny dangerous commands
sentienguard ALL=(ALL) !ALL # Default deny- name: restart_database
action: ssh_command
command: |
PASSWORD=$(aws secretsmanager get-secret-value \
--secret-id prod-db-password \
--query SecretString \
--output text)
echo "$PASSWORD" | sudo -S systemctl restart postgresql
secrets:
- aws_secret: prod-db-password # Retrieved just-in-timeFrom Install to Updates
Five-stage lifecycle covering installation, steady-state operation, incident response, offline resilience, and automatic updates.
Installation
2 minutesDownload GPG-signed binary, install to /opt/sentienguard/, create systemd service, connect to control plane, register host, receive configuration, begin metric collection.
Normal Operation
30s heartbeatEvery 30 seconds: collect metrics (CPU, memory, disk, network, processes), batch and send to control plane via HTTPS POST, check for pending playbook executions. CPU: <0.5%, Memory: ~80 MB, Network: ~50 KB per heartbeat.
Incident Response
10-90 secondsControl plane detects anomaly, selects and signs playbook, sends to agent. Agent verifies Ed25519 signature, checks timestamp freshness, executes steps sequentially, captures stdout/stderr, verifies health, reports results.
Offline Resilience
Up to 24 hoursIf control plane unreachable: continue collecting metrics locally (cached 5 min), execute cached playbooks if incidents detected, queue audit logs, retry connection every 30 seconds. Max 24h offline before pausing execution.
Updates
~5s downtimeWeekly release cycle: control plane notifies agents, agent downloads new binary, verifies GPG signature, replaces binary, restarts service, reconnects. Rollback available if new version breaks.
curl -sSL https://get.sentienguard.com/install | bash
# Behind the scenes:
# 1. Download agent binary from releases.sentienguard.com
# 2. Verify GPG signature (pub key embedded in install script)
# 3. Copy binary to /opt/sentienguard/bin/
# 4. Generate default config
# 5. Create systemd service
# 6. Start service, enable auto-start on bootControl plane down → Disk fills → Agent detects anomaly
→ Checks cache for disk_cleanup playbook
→ Found (executed 2 days ago, cached)
→ Executes from cache
→ Incident resolved
→ Queues audit log for upload when online[2026-02-10 14:35:12] INFO: Heartbeat sent (30s interval)
[2026-02-10 14:35:12] INFO: Metrics: cpu=12.4%, mem=68.2%, disk=72.1%
[2026-02-10 14:35:12] INFO: No pending playbooks
[2026-02-10 14:35:42] INFO: Heartbeat sent (30s interval)
[2026-02-10 14:35:42] INFO: Anomaly detected: disk_usage=91.4% (4.8σ)
[2026-02-10 14:35:43] INFO: Playbook received: disk_cleanup_prod_db
[2026-02-10 14:35:43] INFO: Signature verified, executing playbook
[2026-02-10 14:37:09] INFO: Playbook completed successfully (87s)
[2026-02-10 14:37:09] INFO: Health verification: disk_usage=72.1% (PASS)# agent.yaml
updates:
automatic: true
schedule: "weekly" # Check every Sunday 2 AM
window: "02:00-06:00" # Only update during window# agent.yaml
updates:
automatic: false
# Update manually:
$ sentienguard-agent update
$ systemctl restart sentienguard-agent
# Rollback if needed:
$ sentienguard-agent rollback
$ systemctl restart sentienguard-agentExecution Isolation (Stage 3)
Concurrency
One playbook at a time (serialized)
Queuing
New requests queued if one running
Timeout
5 minutes max (configurable per playbook)
Failure
Automatic rollback if health check fails
Deploy Anywhere: AWS, GCP, Azure, On-Prem
Same agent binary, same capabilities, every environment. Cloud-native credential integration for each provider.
Supported Services
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeTags",
"cloudwatch:PutMetricData",
"secretsmanager:GetSecretValue"
],
"Resource": "*"
}
]
}#!/bin/bash
# Install agent on EC2 instance launch
curl -sSL https://get.sentienguard.com/install | bash
echo "api_key: $SENTIENGUARD_API_KEY" >> /opt/sentienguard/config/agent.yaml
systemctl start sentienguard-agentMonitoring the Monitors
Who monitors the monitoring agent? Agent health dashboard tracks status, version distribution, and resource usage across your entire fleet.
Agent Status (per host)
Status
Online / Offline
Last heartbeat
Timestamp
Version
Installed
Uptime
Duration
Resources
CPU, Memory, Disk
┌─────────────────────────────────────────────────────────┐
│ Agent Health (500 nodes) │
├─────────────────────────────────────────────────────────┤
│ ✅ Online: 498 │
│ ⚠️ Offline: 2 │
│ - prod-db-12 (offline 15min, heartbeat timeout) │
│ - staging-web-03 (offline 2h, host unreachable) │
├─────────────────────────────────────────────────────────┤
│ Agent Version Distribution: │
│ v1.4.2: 487 nodes (97%) │
│ v1.4.1: 11 nodes (2%) [Update available] │
│ v1.3.9: 2 nodes (1%) [Critical update needed] │
├─────────────────────────────────────────────────────────┤
│ Resource Usage (avg across all agents): │
│ CPU: 0.4% Memory: 82 MB Network: 48 KB/s │
└─────────────────────────────────────────────────────────┘Troubleshooting Commands
# Check service status
systemctl status sentienguard-agent
# Check network connectivity
curl -v https://control.sentienguard.com/health
# Check logs
tail -f /var/log/sentienguard/agent.log
# Test API key
sentienguard-agent test-connection# Check what agent is doing
strace -p $(pgrep sentienguard-agent)
# Check if playbook running
ps aux | grep sentienguard
# View recent playbook executions
sentienguard-agent playbook-history# Restart agent
systemctl restart sentienguard-agent
# If still offline, check control plane connectivity
ping control.sentienguard.com
telnet control.sentienguard.com 443
# Check firewall rules
iptables -L -n | grep 443Common Questions
No. Agent runs as dedicated sentienguard user (non-root). Some playbooks require root commands (service restarts, disk operations)—use sudo with explicit allow-list for these commands only. See deployment docs for sudo configuration.
Systemd automatically restarts the agent (RestartSec=10s). Agent reconnects to control plane, fetches current config, resumes metric collection. No data loss—metrics already sent to control plane before crash. Operational state stored in control plane, not agent.
Yes, with Enterprise tier. Deploy on-premises control plane in your data center. Agents communicate with internal control plane (not cloud). All data stays within your network. Contact sales for air-gapped deployment architecture.
~100 KB/s outbound average. Metrics batched every 30 seconds (~50 KB per batch). Playbook downloads negligible (10-50 KB per playbook). Total: 150-250 MB/day per agent. For 500 agents: 75-125 GB/day outbound from your infrastructure.
Yes. Set HTTP_PROXY and HTTPS_PROXY environment variables, then restart the agent. Agent respects standard proxy environment variables.
Yes. Playbook metadata includes exclusions (e.g., host_pattern: "*.prod.*" to never run on production). Or disable via dashboard: Playbooks → disk_cleanup → Disable on prod-db-03.
Every playbook includes rollback steps. If health verification fails, agent automatically reverts changes. Example: Playbook restarts wrong service → health check fails → rollback restarts original service. Complete audit trail shows what happened for post-incident review.
Deploy Your First Agent
Install agent on Linux server in 2 minutes. Watch metrics flow to dashboard. Import playbook library. Trigger test incident. See autonomous resolution.
curl -sSL https://get.sentienguard.com/install | bashFree tier: 3 agents, unlimited playbooks, full audit logs, no credit card.