Client A data:
- Audit logs: s3://sentienguard-logs/client-a/
- Metrics: timeseries-db/client-a/
- Playbooks: namespace=client-a
Client B data:
- Audit logs: s3://sentienguard-logs/client-b/
- Metrics: timeseries-db/client-b/
- Playbooks: namespace=client-b
Result: Client A cannot see Client B's data (database-level isolation)Command Center
Single Dashboard. Every Incident. Every Fix. Complete Visibility.
Unified interface for infrastructure health monitoring, incident timelines, playbook management, and compliance reporting. Role-based views for executives, engineers, and auditors. Real-time updates, historical analysis, and audit trail access—all in one place.
Four Views for Four Audiences
Command Center has four primary views, each optimized for different user needs. Executives see business impact. Engineers see real-time incidents. Compliance officers see audit evidence. Administrators configure the platform.
Audience: CTOs, VPs Engineering, Executives
MTTR (30d)
92s
↓ 12% vs last month
Autonomous Resolution
87%
↑ 3% · industry avg: 0%
Engineering Time Saved
427 hrs
= 2.7 FTE / month
Cost Avoidance / mo
$34,160
vs. manual resolution
MTTR Trend (10 weeks)
Incident Volume (Weekly)
What Executives Do Here
- Review monthly/quarterly trends (not individual incidents)
- Share metrics with board/investors
- Justify platform ROI with saved engineering hours
- Export executive summary reports (PDF, 1-pager)
Frequency: Weekly check-in (5 minutes)
Single Dashboard, 150+ Clients
MSPs manage dozens or hundreds of clients. SentienGuard eliminates context switching with one multi-tenant dashboard—client filtering, data isolation, per-client reporting, and role-based access across your entire portfolio.
Managing 150 clients = 150 different dashboards:
Total Clients
150
3,600 hosts
Incidents (24h)
47
41 auto · 6 manual
Portfolio Uptime
99.8%
148 nominal
Clients Requiring Attention
Client: Acme Corp
24
hosts
12
incidents (7d)
78s
avg MTTR
Client Isolation (Security)
# MSP engineer assigned to specific clients
users:
- email: engineer1@msp.com
role: Remediation Authority
clients: [client-a, client-b, client-c] # Can only access these 3
- email: engineer2@msp.com
role: Remediation Authority
clients: [client-d, client-e] # Can only access these 2
- email: manager@msp.com
role: Administrator
clients: all # Can access all 150 clientsEngineer1 logs in:
- Dashboard shows only: Client A, Client B, Client C
- Cannot see: Client D, Client E, ... Client Z
- Cannot switch to unauthorized clients (dropdown filtered)
Engineer1 attempts to access Client D URL directly:
- Result: 403 Forbidden
- Audit log: Unauthorized access attempt recordedMSP Reporting
Per-Client Report — Acme Corp
January 2026
Aggregate MSP Report
January 2026
Find Any Incident in Seconds
Full-text search across all log fields, advanced multi-filter combinations, saved filter sets for recurring queries, and a visual timeline for time-based investigation.
47 results matching all filters
Saved Filter Sets
Production failures (last 24h)
env:production · result:failed · 24h
Manual approvals (this week)
actor:user · action:approval · 7d
HIPAA systems (Q4 2025)
tag:hipaa · Oct–Dec 2025
Visual Incident Timeline — Feb 10, 2026
14:00
prod-api-07 · k8s_pod_restart
Resolved (23s)
15:00
prod-db-03 · disk_cleanup
Resolved (87s)
15:30
prod-db-05 · postgres_reset
Awaiting approval
16:30
staging-web-02 · ssl_cert_renewal
Failed — CA unreachable
Live Dashboard, <5 Second Latency
WebSocket connections push updates to all connected clients in real-time. Incident feed, metrics, approval requests, and health maps all update live—no manual refresh needed.
Browser opens dashboard:
1. Establish WebSocket connection to control.sentienguard.com
2. Subscribe to real-time incident feed
3. Receive updates as incidents occur
Incident detected at 14:35:43:
1. Control plane detects anomaly (14:35:43.124Z)
2. WebSocket broadcast to all connected clients (14:35:43.289Z)
3. Dashboard updates incident feed (14:35:43.450Z)
Total latency: 326ms (detection to display)What Updates in Real-Time
1. Incident Feed (Live)
New — 0 sec ago
prod-db-03 · disk_cleanup
Executing…
Updated — 1 min ago
prod-db-03 · disk_cleanup
Resolved (87s MTTR)
2. Metrics (Live)
3. Approval Requests (Live)
Slack & Dashboard simultaneously
prod-db-05 · postgres_connection_reset
4. Health Map (Live)
Notification Preferences
Dashboard
Slack
Full Dashboard Access on Mobile
Responsive layouts for on-call engineers. Approve playbooks, view incident feeds, and check infrastructure health from your phone. Push notifications for approval requests.
Incident Feed
disk_cleanup
Resolved (87s)
postgres_reset
Awaiting Approval
ssl_cert_renewal
Failed
Approval Flow
Approval Required
prod-db-05
postgres_connection_reset
Connection pool: 98%
Approved
Playbook executing now…
Metrics Dashboard
Today
This Week
Infrastructure
Create, Edit, Test Playbooks in Dashboard
Built-in YAML editor with syntax highlighting, auto-completion, real-time validation, linting, dry-run testing, and full version control. No external tools needed.
Built-In YAML Editor Features
Playbook Creation Workflow
Dashboard → Playbooks → [Create New Playbook]
Template selection:
- Blank playbook
- Disk cleanup template
- Service restart template
- Kubernetes template
- Custom command template
Selected: Service restart templateClick: [Validate YAML]
Validation results:
✅ Syntax: Valid YAML
✅ Schema: All required fields present
✅ Commands: Syntax valid
✅ Rollback: Defined for critical steps
⚠️ Warning: approval_gate.required=true (needs approval each time)
Lint suggestions:
💡 Consider adding health check timeout (currently unlimited)
💡 Add tags for better searchability
[Fix Warnings] [Save Anyway]name: custom_app_restart
version: 1.0.0
description: |
Restart custom application when memory exceeds 90%.
Gracefully stops app, clears cache, restarts, verifies health.
metadata:
tags: ["memory", "restart", "application"]
author: "alice.jones@company.com"
created: "2026-02-10"
trigger:
metric: memory_usage
threshold: "> 90%"
duration: 5m
approval_gate:
required: true # First deployment, require approval
notify_channel: "#ops-production"
steps:
- name: stop_application
action: ssh_command
command: "systemctl stop custom-app"
timeout: 30s
rollback: "systemctl start custom-app"
- name: clear_cache
action: ssh_command
command: "rm -rf /var/cache/custom-app/*"
timeout: 10s
- name: start_application
action: ssh_command
command: "systemctl start custom-app"
timeout: 30s
rollback: "systemctl stop custom-app"
verification:
- type: http
url: "http://localhost:8080/health"
expected_status: 200
retry: 3
retry_delay: 10s
- type: metric
metric: memory_usage
threshold: "< 85%"
notes: |
Initial version. Monitor success rate over 10 runs before enabling
autonomous execution (approval_gate: required: false).Click: [Test Playbook]
Target host: staging-app-01 (dropdown)
Mode: Dry-run (no actual execution)
Click: [Run Test]
Dry-run results:
✅ Step 1: stop_application (simulated, 0.2s)
✅ Step 2: clear_cache (simulated, 0.1s)
✅ Step 3: start_application (simulated, 0.3s)
✅ Verification: http health check (simulated, PASS)
✅ Verification: memory check (simulated, PASS)
Total estimated duration: 87 seconds
Estimated success probability: 94% (based on similar playbooks)
[Save Playbook] [Deploy to Production]Click: [Save Playbook]
Playbook saved:
- Name: custom_app_restart
- Version: 1.0.0
- Status: Active
- Approval: Required (until proven)
Next steps:
1. Trigger manually on staging to validate
2. Review execution logs
3. After 5 successful runs, consider autonomous mode
[Trigger Manually] [View in Library]Playbook Version Control
Version History — disk_cleanup_prod_db
Current: v1.4.2
Added verification retry logic
Increased timeout for log rotation step
Added hash verification step
+ 8 more versions
name: disk_cleanup_prod_db
-version: 1.4.1
+version: 1.4.2
description: Clear disk space on production databases
steps:
- name: clear_temp_files
action: ssh_command
command: "find /tmp -type f -mtime +7 -delete"
- timeout: 30s
+ timeout: 60s # Increased timeout for large temp directories
verification:
- type: metric
metric: disk_usage
threshold: "< 80%"
+ retry: 3 # NEW: Retry verification 3 times
+ retry_delay: 10s
Changelog:
+ Added retry logic for health verification
+ Increased timeout for temp file deletion
+ Improved reliability by 2% (96% vs 94% success rate)Common Questions
Yes. Unlimited concurrent users. Each sees real-time updates independently. Perfect for NOC (Network Operations Center) wall displays, team collaboration, or distributed teams.
Dashboard enters offline mode (banner notification). Shows last known state. Reconnects automatically when internet restored. Missed updates loaded on reconnect. No data loss.
Yes. Generate shareable links with read-only access. Options include specific incident timelines, infrastructure health maps, and compliance reports. Set expiration from 1 day to never. No login required for read-only links.
Multiple export options: CSV (all incidents with filters applied), JSON (machine-readable for custom tooling), PDF (executive summaries, compliance reports), API (programmatic access for custom dashboards).
Yes (Enterprise tier). Drag-and-drop customization: move incident feed, health map, playbook performance widgets. Hide irrelevant sections like anomaly detection or compliance widgets. Save custom layouts for different workflows.
Command Center shows actions taken, not just metrics. Datadog says "Disk 91%" (observation). Command Center says "Disk 91% → 72% via disk_cleanup_prod_db (87s)" (observation + action + outcome). Focus: what we did about problems, not just what problems exist.
See Command Center in Action
Deploy agents, open Command Center, watch incidents resolve in real-time, review audit trail, generate reports.
What to explore:
Free tier: 3 nodes, full Command Center access, unlimited users, no credit card.