SOC-SIM
2022-01-25
SOC Simulator (soc-sim) -- Technical Documentation
1. Overview
The SOC Simulator is TryHackMe's interactive Security Operations Center training platform. It places users in realistic SOC analyst scenarios where they triage security alerts using industry-standard SIEM tools (Splunk, Elastic, or Azure Sentinel), investigate incidents on an analyst workstation VM, classify alerts, and write case reports -- all in a time-bound, scored environment.
The system orchestrates the full lifecycle: scenario selection, VM provisioning, log ingestion, timed alert release, real-time investigation, report submission, AI-powered evaluation, and scoring.
2. High-Level Architecture
3. Directory Structure
Backend (API)
apps/api/app/api/v2/src/
├── routes/companies/soc-sim.ts # Company-scoped API routes
├── controllers/soc-sim/ # Request handlers
├── services/soc-sim/
│ ├── manage-runs.ts # Run lifecycle (init, pause, resume, terminate)
│ ├── vms.ts # VM deployment & management
│ ├── runs.ts # Run CRUD operations
│ ├── alerts.ts # Alert import from Sanity CMS
│ ├── run-alerts.ts # Run alert lifecycle & scoring
│ ├── run-logs.ts # Log ingestion via EventBridge
│ ├── reports.ts # Case report management
│ ├── run-alert-assign.ts # Alert assignment logic
│ ├── run-evaluate.ts # AI-powered evaluation
│ ├── multiplayer-runs.ts # Multiplayer coordination
│ ├── sentinel.ts # Azure Sentinel integration
│ └── socket.ts # WebSocket event emitters
├── models/soc-sim/
│ ├── run.ts # SOCSimRun Mongoose model
│ └── run-alert.ts # SOCSimRunAlert Mongoose model
├── common/
│ ├── interfaces/soc-sim/ # TypeScript interfaces
│ ├── enums/soc-sim/ # Enums (status, severity, types)
│ ├── constants/vms/index.ts # VM IDs, health check config
│ └── constants/soc-sim/ # SOC Sim constants
└── infra/sockets/handlers/soc-sim.ts # WebSocket event handlers
Frontend
apps/frontend/src/features/soc-sim/
├── soc-sim.tsx # Root component
├── soc-sim.slice.ts # Core RTK Query API slice
├── soc-sim.types.ts # TypeScript types
├── landing-sections/ # Home, scenarios, stats, leaderboard
├── scenario-onboarding/ # Onboarding flow
├── soc-sim-dashboard/ # Active run dashboard
├── alert-queue/ # Alert triage interface
├── case-reports/ # Report listing
├── case-reports-details/ # Individual report editor
├── playbooks/ # Playbook listing
├── summary/ # Run summary page
├── public-summary/ # Shareable summary
├── components/ # Shared components (nav, VM frame)
├── contexts/ # Polling, modal contexts
└── tour/ # Coach-marking system
4. Data Models
SOCSimRun (soc_sim_runs)
| Field | Type | Description |
|---|---|---|
scenario | String | Sanity CMS scenario ID |
title | String | Scenario display name |
user | ObjectId (User) | Run owner |
status | Enum | starting, ready, completed, terminated, abandoned, paused |
startedAt / endedAt | Date | Run time boundaries |
pausedAt / resumedAt | Date | Pause/resume timestamps |
actualRuntime / pausedDuration | Number | Time accounting (seconds) |
vms.siem | { url, instance } | SIEM VM connection |
vms.analyst | { url, instance } | Analyst VM connection |
vms.azure | { labId, url, credentials, instance } | Azure Sentinel connection |
siemTool | Enum | splunk, elastic, sentinel |
scenarioSpeed | Number | Alert/log release speed multiplier |
alertTypes | String[] | Alert categories in this scenario |
multiplayer | Array | Multi-user participation data |
certificationId | ObjectId | Optional certification exam link |
pointsAwarded / closedAlerts | Number | Scoring metrics |
mttr / meanDwellTime | Number | Performance metrics |
success | Boolean | Whether run was successful |
isErrored | Boolean | VM deployment failure flag |
SOCSimRunAlert (soc_sim_run_alerts)
| Field | Type | Description |
|---|---|---|
run | ObjectId (SOCSimRun) | Parent run |
status | Enum | AWAITING_ACTION, IN_PROGRESS, RESOLVED |
severity | Enum | low, medium, high, critical |
type | Enum | phishing, process, execution, network, web attack, rce |
incidentId | String | Scenario incident identifier |
alertRule | String | Detection rule name |
description / details | String | Alert content (JSON-encoded details) |
shownAt | Date | When the alert becomes visible to user |
assignedAt / closedAt | Date | Investigation timestamps |
resolveTime | Number | Time to resolve (seconds) |
user | ObjectId | Assigned analyst (multiplayer) |
releaseTime | Number | Original release offset (seconds) |
report | Embedded | classification, requiresEscalation, writeup, evaluation |
meta | Embedded | isTruePositive, isEscalationRequired, points, playbook |
5. Run Lifecycle
Lifecycle Steps
- STARTING -- Run document created, VM deployment triggered asynchronously
- READY -- VMs are healthy, logs scheduled, alerts inserted; user can begin investigation
- PAUSED -- VMs terminated, alert
shownAttimestamps frozen - COMPLETED -- All true-positive alerts resolved; scoring + AI evaluation triggered
- TERMINATED -- User manually ended the run before completing
- ABANDONED -- System auto-terminated (VM expiry, inactivity)
6. VM Creation & Deployment
VM Types & IDs
| VM Role | SIEM Tool | Hardcoded Upload ID |
|---|---|---|
| SIEM | Splunk (with collector API) | 6900943a0e5714f3db8fefa2 |
| SIEM | Elastic | 68a86b8926462357824ff3da |
| SIEM | Sentinel | 68bf3507a2d75b47d470c4ee (Azure Lab) |
| Analyst | All tools | 668c3018946888fcc0034de6 |
These defaults can be overridden per-scenario via Sanity CMS fields (analyst_vm_upload_id, siem_vm_upload_id, siem_elastic_vm_upload_id).
Deployment Details
- EC2 Deployment: Uses
VMsService.deployVm()which creates an EC2 instance from a pre-configured AMI with the appropriate security groups and subnets - Retry Logic: Up to 2 retries on failure; on each retry, previously deployed instances are terminated
- Guacamole Connection: Each VM gets a remote desktop connection via Apache Guacamole, providing browser-based access through an iframe
- Instance Tracking: Each deployed VM gets an
Instancedocument in MongoDB with a 3-hour expiry time - Cost Tagging: All VMs tagged with
CostCenter.SOC_SIM
7. Alert Generation & Release
Alert Timing Formula
scenarioSpeed = 0 → shownAt = run.startedAt + index (all at once)
scenarioSpeed = 1 → shownAt = run.startedAt + releaseTime + 150s buffer
scenarioSpeed > 1 → shownAt = run.startedAt + (releaseTime / scenarioSpeed)
The 150-second buffer at scenarioSpeed === 1 ensures alerts appear after the corresponding SIEM logs have been ingested, giving users evidence to investigate before the alert fires.
Alert Properties from CMS
Each scenario alert in Sanity contains:
severity(critical/high/medium/low)type(phishing/process/execution/network/web attack/rce)incidentId,alertRule,descriptiondetails(JSON-encoded alert payload)releaseTime(seconds offset from run start)meta.isTruePositive-- ground truth for scoringmeta.isEscalationRequired-- ground truth for escalation scoringmeta.classificationPoints,meta.escalationPoints,meta.evaluationPointsmeta.evaluationCriteria-- criteria for AI evaluationmeta.playbook-- linked playbook reference
8. Log Ingestion Pipeline
Log Scheduling Details
- Splunk/Elastic: Each log becomes an AWS EventBridge one-time schedule that triggers a Lambda function at the log's
release_time. The Lambda sends the log via HTTP to the SIEM VM's collector API. Logs are processed in batches of 400. - Sentinel: Logs are converted to CSV, uploaded to Azure Blob Storage in batches of 5,000, then a lab action triggers ingestion into the Sentinel workspace.
- Timestamp Alignment: Log timestamps are recalculated relative to
run.startedAtso that log events and alert times are consistent. - Scenario Speed: When
scenarioSpeed > 1, the EventBridge schedule time is compressed (release_time / scenarioSpeed), but the embedded log timestamp remains at real-time offset to match alert details.
9. Scoring & Evaluation
Scoring Breakdown
| Category | Condition | Points |
|---|---|---|
| Classification | Correct TP or FP classification | meta.classificationPoints (from CMS) |
| Classification Penalty | Classified FP as TP | -10 |
| Escalation | TP correctly escalated | meta.escalationPoints (default 10) |
| Evaluation | AI-scored writeup quality | Up to meta.evaluationPoints |
Completion Trigger
A run completes automatically when all true-positive alerts have been resolved (areTruePositivesResolved()). False-positive alerts do not need to be resolved for completion. Upon completion:
- VMs are terminated asynchronously
- Classification & escalation points are calculated
- AI evaluation runs on all report writeups
- Run stats computed (MTTR, FP rate, mean dwell time)
- Certification section updated (if applicable)
- Segment analytics event fired
- Mission progress tracked
10. Real-Time Communication
The WebSocket layer is primarily used for multiplayer runs where multiple analysts share the same alert queue. Events notify all participants of alert assignments and run status changes in real-time.
11. Feature Flags (GrowthBook)
| Flag | Type | Purpose |
|---|---|---|
soc-sim-vms-deploy | Boolean | Enables real VM deployment (off = mock mode for local dev) |
soc-sim-pause-run | Boolean | Enables pause/resume functionality |
soc-sim-playbooks | Boolean | Enables playbook feature |
soc-sim-scenario-speed | Number | Override scenario speed multiplier |
soc-sim-live-log-streaming | Boolean | Live log streaming mode |
soc-sim-coach-marking | Boolean | Tour/onboarding coach marks |
soc-sim-badges | Boolean | Badge system |
soc-sim-siem-tool-modal | Boolean | SIEM tool selection UI |
soc-sim-public-summary-page | Boolean | Public shareable summary |
soc-sim-hacktivities | Boolean | Hacktivities integration |
soc-sim-short-scenarios | Boolean | Short scenario format |
verify-and-restore-socsim-analyst-vm | Boolean | VM restore feature |
12. Frontend User Flow
13. API Endpoints Summary
Public Routes (User-facing)
| Method | Endpoint | Description |
|---|---|---|
GET | /soc-sim/runs/active | Get active run data |
POST | /soc-sim/runs | Create new run |
POST | /soc-sim/runs/pause | Pause active run |
POST | /soc-sim/runs/resume | Resume paused run |
POST | /soc-sim/runs/terminate | Terminate run |
GET | /soc-sim/alerts | Get paginated alerts (visible only) |
PUT | /soc-sim/alerts/assign | Assign alert to user |
PUT | /soc-sim/alerts/unassign | Unassign alert |
GET | /soc-sim/alerts/details | Get alert details |
GET/POST/PUT/DELETE | /soc-sim/reports/* | Case report CRUD |
PATCH | /soc-sim/vms/extend | Extend VM time by 1 hour |
POST | /soc-sim/vms/refresh-url | Refresh Guacamole connection URL |
GET | /soc-sim/content/scenarios | List available scenarios |
GET | /soc-sim/content/playbooks | Get playbooks for run |
GET | /soc-sim/stats | User statistics |
GET | /soc-sim/leaderboard | Leaderboard |
GET | /soc-sim/runs/summary | Run summary data |
Company Routes (Management Dashboard)
| Method | Endpoint | Description |
|---|---|---|
GET | /companies/soc-sim/users | Company SOC Sim users |
POST | /companies/soc-sim/assignment | Create scenario assignment |
POST | /companies/soc-sim/stats/progression | Team progression stats |
POST | /companies/soc-sim/stats | Company SOC Sim stats |
GET | /companies/soc-sim/runs | User runs for company |
POST | /companies/soc-sim/stats/rate | Alert classification rate |
14. Infrastructure Summary
Key infrastructure characteristics:
- No dedicated IaC -- SOC Sim reuses the platform's shared VM infrastructure
- VM Lifetime: 3 hours default, extendable by 1 hour (max once, must be within last hour, max 6 hours total)
- Health Check Polling: Every 3 seconds via Agenda.js until VMs respond or 30-minute timeout
- EventBridge Schedules: One-time schedules per log entry, cleaned up on run termination
- Cost Center Tagging: All resources tagged
CostCenter.SOC_SIMfor billing attribution