SOC-SIM

2022-01-25

SOC Simulator (soc-sim) -- Technical Documentation

1. Overview

The SOC Simulator is TryHackMe's interactive Security Operations Center training platform. It places users in realistic SOC analyst scenarios where they triage security alerts using industry-standard SIEM tools (Splunk, Elastic, or Azure Sentinel), investigate incidents on an analyst workstation VM, classify alerts, and write case reports -- all in a time-bound, scored environment.

The system orchestrates the full lifecycle: scenario selection, VM provisioning, log ingestion, timed alert release, real-time investigation, report submission, AI-powered evaluation, and scoring.

2. High-Level Architecture

3. Directory Structure

Backend (API)

apps/api/app/api/v2/src/
├── routes/companies/soc-sim.ts          # Company-scoped API routes
├── controllers/soc-sim/                 # Request handlers
├── services/soc-sim/
│   ├── manage-runs.ts                   # Run lifecycle (init, pause, resume, terminate)
│   ├── vms.ts                           # VM deployment & management
│   ├── runs.ts                          # Run CRUD operations
│   ├── alerts.ts                        # Alert import from Sanity CMS
│   ├── run-alerts.ts                    # Run alert lifecycle & scoring
│   ├── run-logs.ts                      # Log ingestion via EventBridge
│   ├── reports.ts                       # Case report management
│   ├── run-alert-assign.ts             # Alert assignment logic
│   ├── run-evaluate.ts                  # AI-powered evaluation
│   ├── multiplayer-runs.ts             # Multiplayer coordination
│   ├── sentinel.ts                      # Azure Sentinel integration
│   └── socket.ts                        # WebSocket event emitters
├── models/soc-sim/
│   ├── run.ts                           # SOCSimRun Mongoose model
│   └── run-alert.ts                     # SOCSimRunAlert Mongoose model
├── common/
│   ├── interfaces/soc-sim/              # TypeScript interfaces
│   ├── enums/soc-sim/                   # Enums (status, severity, types)
│   ├── constants/vms/index.ts           # VM IDs, health check config
│   └── constants/soc-sim/              # SOC Sim constants
└── infra/sockets/handlers/soc-sim.ts    # WebSocket event handlers

Frontend

apps/frontend/src/features/soc-sim/
├── soc-sim.tsx                          # Root component
├── soc-sim.slice.ts                     # Core RTK Query API slice
├── soc-sim.types.ts                     # TypeScript types
├── landing-sections/                    # Home, scenarios, stats, leaderboard
├── scenario-onboarding/                 # Onboarding flow
├── soc-sim-dashboard/                   # Active run dashboard
├── alert-queue/                         # Alert triage interface
├── case-reports/                        # Report listing
├── case-reports-details/                # Individual report editor
├── playbooks/                           # Playbook listing
├── summary/                             # Run summary page
├── public-summary/                      # Shareable summary
├── components/                          # Shared components (nav, VM frame)
├── contexts/                            # Polling, modal contexts
└── tour/                                # Coach-marking system

4. Data Models

SOCSimRun (`soc_sim_runs`)

Field	Type	Description
`scenario`	String	Sanity CMS scenario ID
`title`	String	Scenario display name
`user`	ObjectId (User)	Run owner
`status`	Enum	`starting`, `ready`, `completed`, `terminated`, `abandoned`, `paused`
`startedAt` / `endedAt`	Date	Run time boundaries
`pausedAt` / `resumedAt`	Date	Pause/resume timestamps
`actualRuntime` / `pausedDuration`	Number	Time accounting (seconds)
`vms.siem`	`{ url, instance }`	SIEM VM connection
`vms.analyst`	`{ url, instance }`	Analyst VM connection
`vms.azure`	`{ labId, url, credentials, instance }`	Azure Sentinel connection
`siemTool`	Enum	`splunk`, `elastic`, `sentinel`
`scenarioSpeed`	Number	Alert/log release speed multiplier
`alertTypes`	String[]	Alert categories in this scenario
`multiplayer`	Array	Multi-user participation data
`certificationId`	ObjectId	Optional certification exam link
`pointsAwarded` / `closedAlerts`	Number	Scoring metrics
`mttr` / `meanDwellTime`	Number	Performance metrics
`success`	Boolean	Whether run was successful
`isErrored`	Boolean	VM deployment failure flag

SOCSimRunAlert (`soc_sim_run_alerts`)

Field	Type	Description
`run`	ObjectId (SOCSimRun)	Parent run
`status`	Enum	`AWAITING_ACTION`, `IN_PROGRESS`, `RESOLVED`
`severity`	Enum	`low`, `medium`, `high`, `critical`
`type`	Enum	`phishing`, `process`, `execution`, `network`, `web attack`, `rce`
`incidentId`	String	Scenario incident identifier
`alertRule`	String	Detection rule name
`description` / `details`	String	Alert content (JSON-encoded details)
`shownAt`	Date	When the alert becomes visible to user
`assignedAt` / `closedAt`	Date	Investigation timestamps
`resolveTime`	Number	Time to resolve (seconds)
`user`	ObjectId	Assigned analyst (multiplayer)
`releaseTime`	Number	Original release offset (seconds)
`report`	Embedded	classification, requiresEscalation, writeup, evaluation
`meta`	Embedded	isTruePositive, isEscalationRequired, points, playbook

5. Run Lifecycle

Lifecycle Steps

STARTING -- Run document created, VM deployment triggered asynchronously
READY -- VMs are healthy, logs scheduled, alerts inserted; user can begin investigation
PAUSED -- VMs terminated, alert shownAt timestamps frozen
COMPLETED -- All true-positive alerts resolved; scoring + AI evaluation triggered
TERMINATED -- User manually ended the run before completing
ABANDONED -- System auto-terminated (VM expiry, inactivity)

6. VM Creation & Deployment

VM Types & IDs

VM Role	SIEM Tool	Hardcoded Upload ID
SIEM	Splunk (with collector API)	`6900943a0e5714f3db8fefa2`
SIEM	Elastic	`68a86b8926462357824ff3da`
SIEM	Sentinel	`68bf3507a2d75b47d470c4ee` (Azure Lab)
Analyst	All tools	`668c3018946888fcc0034de6`

These defaults can be overridden per-scenario via Sanity CMS fields (analyst_vm_upload_id, siem_vm_upload_id, siem_elastic_vm_upload_id).

Deployment Details

EC2 Deployment: Uses VMsService.deployVm() which creates an EC2 instance from a pre-configured AMI with the appropriate security groups and subnets
Retry Logic: Up to 2 retries on failure; on each retry, previously deployed instances are terminated
Guacamole Connection: Each VM gets a remote desktop connection via Apache Guacamole, providing browser-based access through an iframe
Instance Tracking: Each deployed VM gets an Instance document in MongoDB with a 3-hour expiry time
Cost Tagging: All VMs tagged with CostCenter.SOC_SIM

7. Alert Generation & Release

Alert Timing Formula

scenarioSpeed = 0  →  shownAt = run.startedAt + index (all at once)
scenarioSpeed = 1  →  shownAt = run.startedAt + releaseTime + 150s buffer
scenarioSpeed > 1  →  shownAt = run.startedAt + (releaseTime / scenarioSpeed)

The 150-second buffer at scenarioSpeed === 1 ensures alerts appear after the corresponding SIEM logs have been ingested, giving users evidence to investigate before the alert fires.

Alert Properties from CMS

Each scenario alert in Sanity contains:

severity (critical/high/medium/low)
type (phishing/process/execution/network/web attack/rce)
incidentId, alertRule, description
details (JSON-encoded alert payload)
releaseTime (seconds offset from run start)
meta.isTruePositive -- ground truth for scoring
meta.isEscalationRequired -- ground truth for escalation scoring
meta.classificationPoints, meta.escalationPoints, meta.evaluationPoints
meta.evaluationCriteria -- criteria for AI evaluation
meta.playbook -- linked playbook reference

8. Log Ingestion Pipeline

Log Scheduling Details

Splunk/Elastic: Each log becomes an AWS EventBridge one-time schedule that triggers a Lambda function at the log's release_time. The Lambda sends the log via HTTP to the SIEM VM's collector API. Logs are processed in batches of 400.
Sentinel: Logs are converted to CSV, uploaded to Azure Blob Storage in batches of 5,000, then a lab action triggers ingestion into the Sentinel workspace.
Timestamp Alignment: Log timestamps are recalculated relative to run.startedAt so that log events and alert times are consistent.
Scenario Speed: When scenarioSpeed > 1, the EventBridge schedule time is compressed (release_time / scenarioSpeed), but the embedded log timestamp remains at real-time offset to match alert details.

9. Scoring & Evaluation

Scoring Breakdown

Category	Condition	Points
Classification	Correct TP or FP classification	`meta.classificationPoints` (from CMS)
Classification Penalty	Classified FP as TP	-10
Escalation	TP correctly escalated	`meta.escalationPoints` (default 10)
Evaluation	AI-scored writeup quality	Up to `meta.evaluationPoints`

Completion Trigger

A run completes automatically when all true-positive alerts have been resolved (areTruePositivesResolved()). False-positive alerts do not need to be resolved for completion. Upon completion:

VMs are terminated asynchronously
Classification & escalation points are calculated
AI evaluation runs on all report writeups
Run stats computed (MTTR, FP rate, mean dwell time)
Certification section updated (if applicable)
Segment analytics event fired
Mission progress tracked

10. Real-Time Communication

The WebSocket layer is primarily used for multiplayer runs where multiple analysts share the same alert queue. Events notify all participants of alert assignments and run status changes in real-time.

11. Feature Flags (GrowthBook)

Flag	Type	Purpose
`soc-sim-vms-deploy`	Boolean	Enables real VM deployment (off = mock mode for local dev)
`soc-sim-pause-run`	Boolean	Enables pause/resume functionality
`soc-sim-playbooks`	Boolean	Enables playbook feature
`soc-sim-scenario-speed`	Number	Override scenario speed multiplier
`soc-sim-live-log-streaming`	Boolean	Live log streaming mode
`soc-sim-coach-marking`	Boolean	Tour/onboarding coach marks
`soc-sim-badges`	Boolean	Badge system
`soc-sim-siem-tool-modal`	Boolean	SIEM tool selection UI
`soc-sim-public-summary-page`	Boolean	Public shareable summary
`soc-sim-hacktivities`	Boolean	Hacktivities integration
`soc-sim-short-scenarios`	Boolean	Short scenario format
`verify-and-restore-socsim-analyst-vm`	Boolean	VM restore feature

12. Frontend User Flow

13. API Endpoints Summary

Public Routes (User-facing)

Method	Endpoint	Description
`GET`	`/soc-sim/runs/active`	Get active run data
`POST`	`/soc-sim/runs`	Create new run
`POST`	`/soc-sim/runs/pause`	Pause active run
`POST`	`/soc-sim/runs/resume`	Resume paused run
`POST`	`/soc-sim/runs/terminate`	Terminate run
`GET`	`/soc-sim/alerts`	Get paginated alerts (visible only)
`PUT`	`/soc-sim/alerts/assign`	Assign alert to user
`PUT`	`/soc-sim/alerts/unassign`	Unassign alert
`GET`	`/soc-sim/alerts/details`	Get alert details
`GET/POST/PUT/DELETE`	`/soc-sim/reports/*`	Case report CRUD
`PATCH`	`/soc-sim/vms/extend`	Extend VM time by 1 hour
`POST`	`/soc-sim/vms/refresh-url`	Refresh Guacamole connection URL
`GET`	`/soc-sim/content/scenarios`	List available scenarios
`GET`	`/soc-sim/content/playbooks`	Get playbooks for run
`GET`	`/soc-sim/stats`	User statistics
`GET`	`/soc-sim/leaderboard`	Leaderboard
`GET`	`/soc-sim/runs/summary`	Run summary data

Company Routes (Management Dashboard)

Method	Endpoint	Description
`GET`	`/companies/soc-sim/users`	Company SOC Sim users
`POST`	`/companies/soc-sim/assignment`	Create scenario assignment
`POST`	`/companies/soc-sim/stats/progression`	Team progression stats
`POST`	`/companies/soc-sim/stats`	Company SOC Sim stats
`GET`	`/companies/soc-sim/runs`	User runs for company
`POST`	`/companies/soc-sim/stats/rate`	Alert classification rate

14. Infrastructure Summary

Key infrastructure characteristics:

No dedicated IaC -- SOC Sim reuses the platform's shared VM infrastructure
VM Lifetime: 3 hours default, extendable by 1 hour (max once, must be within last hour, max 6 hours total)
Health Check Polling: Every 3 seconds via Agenda.js until VMs respond or 30-minute timeout
EventBridge Schedules: One-time schedules per log entry, cleaned up on run termination
Cost Center Tagging: All resources tagged CostCenter.SOC_SIM for billing attribution