Corporate Wellness Tracking Platform
Overview
A multi-tenant platform enabling companies to engage employees through gamified health challenges, digital health assessments, activity tracking via wearables, a points-based reward economy, and a partner benefit marketplace – serving both web and mobile clients, with full English/French localization.
| Domain | B2B Corporate Wellness & Health |
|---|---|
| Role | Tech Lead – System Architect, design, and full-stack delivery |
| Cloud | AWS (Lambda, API Gateway, SQS, SSM, Amplify, VPC) |
| Stack | TypeScript, Node.js, React, MongoDB, Redis |
| Scale | 14 microservices, ~68 API endpoints, 33 data models, 5 external integrations |
Architecture
Technical Decisions
The decisions below are the ones I consider most consequential. Each shaped how the system behaves at scale and under failure.
Serverless in a VPC
| Decision | Why |
|---|---|
| AWS Lambda over containers | Zero-ops scaling for bursty B2B traffic patterns; pay-per-invocation cost model aligned with multi-tenant usage |
| All Lambdas in VPC private subnets | Enterprise compliance requirement – no direct internet exposure; outbound routed through NAT Gateway with static IP for third-party whitelisting |
| SQS FIFO for financial transactions | Exactly-once, ordered processing for the points economy; decoupled write path survives Lambda cold starts |
Monorepo with Shared Packages
| Decision | Why |
|---|---|
| Lerna + Yarn Workspaces | 15 shared libraries + 14 service modules in one repo; code reuse without sacrificing independent deployability |
| Shared packages for cross-cutting concerns | Database layer (33 models), authorization (RBAC), token management, request validation – each service imports what it needs without duplicating logic |
| Per-service Serverless config | Each microservice deploys independently with its own IAM role, VPC binding, and environment; no blast radius across services |
Security by Design
| Decision | Why |
|---|---|
| Auth0 for identity | Delegated authentication for both end-users and admins; multi-tenant with separate database connections per client |
| Casbin RBAC with dynamic policies | Operators get static role-based policies; end-users get dynamically generated policies scoped to their own resources – no permissions database to maintain |
| Encrypted PII collection | Separate encrypted data model for personal information; application-level encryption on top of at-rest encryption |
System Design Deep Dives
1. Points Economy – Async Transaction Processing
The most architecturally interesting subsystem. Users earn points through various actions (completing challenges, tracking activity, reading articles). The system must guarantee no double-rewards and no balance inconsistency.
Design rationale:
- Two-phase via queues rather than a single synchronous write – if the balance update fails, the transaction stays PENDING and can be retried without re-issuing the reward
- FIFO with deduplication prevents double-rewards from retries or duplicate API calls
- Decoupled workers mean the user gets a fast 202 response; the heavy work happens asynchronously
2. Authorization – Dynamic Policy Generation
Rather than maintaining a permissions database, I designed the authorization layer to generate access policies on the fly from the JWT token:
Why this approach: Adding a new user requires zero permission configuration. Their access scope is derived entirely from their identity token. Operator roles are defined declaratively as policy rules – adding a new admin role is a config change, not a code change.
3. Challenge Engine – Unified Model, Multiple Lifecycles
Three distinct challenge types (individual, team, company-wide collaborative) sharing a single data model but with type-specific state machines:
The design tradeoff: A single unified model means simpler queries and shared progression logic, but requires careful validation to enforce type-specific rules. I extracted this into a dedicated shared package so every service that touches challenges uses the same state machine.
Request Handler Pattern
Every endpoint across all 14 services follows the same pipeline. This was a deliberate architectural choice to enforce consistency:
Why it matters: New services or endpoints added by any team member follow the same auth, validation, and error-handling path. No one can accidentally skip authorization. The shared response formatter guarantees consistent CORS headers and status codes.
Key Engineering Challenges
| Problem | Approach | Result |
|---|---|---|
| Financial consistency in a distributed points system | Two-phase async processing with SQS FIFO; PENDING → RESOLVED transaction states | Zero balance inconsistencies; fast user-facing response |
| Multi-tenant authorization without a permissions DB | Dynamic Casbin policies generated from identity tokens | Instant policy changes; zero per-user configuration |
| Cold-start latency in VPC Lambdas | MongoDB connection pooling; provisioned concurrency for critical paths | Met enterprise security requirements without sacrificing UX |
| Three challenge types, one codebase | Unified data model with type-specific state machines in a shared package | Consistent API surface; new challenge types require only new state rules |
| 15 shared packages across 14 services | Lerna monorepo with Yarn Workspaces; clear package boundaries | Code reuse without deployment coupling |
| 5 external enterprise integrations | Dedicated integration package per vendor with standardized error handling | Clean abstraction; swapping a vendor touches one package |
Technology Stack
| Area | Technologies |
|---|---|
| Backend | TypeScript, Node.js, Serverless Framework, MongoDB (Mongoose), Redis, Yup, Jest |
| Frontend | TypeScript, React, Redux Toolkit, React Query, Material-UI, Formik, i18next |
| Cloud | AWS Lambda (VPC), API Gateway, SQS FIFO, SSM, Amplify, NAT Gateway |
| Auth | Auth0 (identity), Casbin (RBAC), JWT, application-level encryption |
| Integrations | Salesforce (CRM), Validic (health data), WordPress (CMS), Auth0 (IdP) |
| Quality | SonarQube, ESLint, OpenAPI 3.0 documentation |
Retrospective – What I Would Do Differently
| Area | What was | What I’d change | Why |
|---|---|---|---|
| Observability | Basic logging | Distributed tracing (X-Ray) + structured logging + dashboards | 14 services across queues – debugging without traces is painful |
| Transaction integrity | App-level two-phase via SQS | MongoDB multi-document transactions for the synchronous path | Reduces complexity for cases that don’t need async |
| Scheduled jobs | Manual deployment (cron workaround) | EventBridge Scheduler | The Serverless Framework cron bug was worked around rather than solved |
| Frontend build | CRA (Create React App) | Vite or Next.js | Faster builds, better DX, SSR options for SEO-relevant pages |
| Secret management | SSM (with some legacy hardcoded dev values) | SSM-only + pre-commit secret scanning (e.g., gitleaks) | Defense in depth; no secrets should ever touch source control |