Smart Sleep Pod Platform
Overview
A full-stack IoT platform connecting smart sleep pods to a digital booking ecosystem. Users discover nearby pods on a map, book a time slot, pay with credits, and unlock the pod with a PIN code. Operators manage their fleet, track revenue, and monitor pod health from a backoffice dashboard. Think of it as Airbnb meets WeWork — but for 30-minute naps, powered by IoT.
| Domain | IoT + Hospitality Tech (B2C + B2B) |
|---|---|
| Role | Senior Software Engineer & Technical Lead |
| Team | 4 Engineers (2 Backend, 1 Mobile, 1 Frontend) |
| Stack | Node.js, Express, MongoDB, Redis, RabbitMQ, React, React Native, MQTT |
| Scale | 6 domain modules, 16 data models, real-time IoT events, 3 external integrations |
System Architecture
The platform is composed of three applications and a supporting infrastructure layer, all designed by me from the ground up.
Architecture Decisions
| Decision | Rationale |
|---|---|
| Modular monolith over microservices | Small team, fast iteration — modules are independently deployable later if needed |
| RabbitMQ for async workflows | Decouples booking → billing → notification chains; enables retry semantics |
| Socket.io + Redis adapter | Real-time updates to clients with horizontal scaling support |
| Cookie-based sessions over JWT | Server-side session invalidation; simpler revocation for a security-sensitive domain |
| Geospatial (2dsphere) indexes | Sub-millisecond proximity search across the entire pod fleet |
| PIN-code based access | No BLE/NFC dependency on user devices — works with any phone |
Backend — Modular Monolith
I designed the backend as a module-per-domain architecture. Each module owns its models, controllers, permissions, and message queue bindings. This gave us the isolation benefits of microservices with the operational simplicity of a monolith.
Module Internal Structure
Every module follows the same convention, which I defined as the team’s standard:
modules/<name>/
├── models/ # Mongoose schemas (domain entities)
├── controllers/ # Business logic (Express handlers)
├── iam/ # Routes + permission declarations
├── rmq/ # RabbitMQ exchange & queue bindings
├── helpers/ # Module-specific utilities
├── sockets/ # Real-time event handlers
├── schemas/ # AJV validation schemas
└── i18n/ # Translation files (en, pt, fr)
This pattern made onboarding fast — any engineer could understand a new module in minutes.
Data Model
16 domain entities across 6 bounded contexts, with geospatial indexing and event-sourced IoT telemetry.
Booking Lifecycle
The most complex flow in the system — spans the mobile app, backend API, billing system, message queue, and IoT hardware. I designed this as an event-driven pipeline to keep each concern isolated.
Booking State Machine
Event-Driven Architecture (RabbitMQ)
I chose RabbitMQ to decouple the booking, billing, and notification domains. This allowed us to handle payment failures gracefully — a failed transaction triggers a compensating event rather than a cascading error.
Event Catalog
| Event | Flow | Purpose |
|---|---|---|
BOOKING.NEW | Bookings → Billing | Initiate payment for new booking |
BOOKING.USERCANCELLED | Bookings → Billing | Trigger refund |
TRANSACTION.COMPLETED | Billing → Bookings | Confirm booking after successful payment |
TRANSACTION.ERROR | Billing → Bookings | Cancel booking on payment failure |
PURCHASE.CREDITS | Billing → Balance | Credit user account |
IoT Integration
Each physical pod contains a microcontroller that communicates with the platform. I designed the IoT layer to ingest device events and correlate them with active bookings in real time.
IoT Event Types
| Event | Meaning | Platform Action |
|---|---|---|
WRONG_CODE | Invalid PIN entered | Log, potential lockout after N attempts |
OPEN_DOOR | Valid PIN → door opens | Transition booking → INPROGRESS |
SESSION_START | User occupies pod | Start session timer |
SESSION_END | Timer expires / user exits | Prepare completion |
CLOSE_DOOR | Door shuts | Finalize booking → COMPLETED |
Authentication & Authorization
I implemented a hierarchical IAM (Identity Access Management) system that binds permissions to roles and attaches them to route definitions. This means adding a new API endpoint automatically enforces its access policy — no manual middleware wiring.
Permission format: modules:<domain>:<resource>:<action> — e.g., modules:cochilo-iot:devices:create.
Infrastructure Design
What I Would Deploy in Production
I designed the target-state cloud architecture for production readiness, even though the initial deployment was Docker on VMs.
Scaling Projections
| Dimension | 100 Pods | 1,000 Pods | 10,000 Pods |
|---|---|---|---|
| MQTT connections | 100 | 1,000 | 10,000 |
| Events/day | 1,500 | 15,000 | 150,000 |
| Telemetry msgs/min | 200 | 2,000 | 20,000 |
| API instances | 1–2 | 2–4 | 4–8 |
| DB read replicas | 0 | 1 | 2–3 |
IoT at Scale — Hot/Warm/Cold Pipeline
For 10,000+ pods, I designed a tiered ingestion pipeline that separates critical events from telemetry:
Tech Stack
| Layer | Technology |
|---|---|
| Backend | Node.js, Express, MongoDB, Mongoose, RabbitMQ, Socket.io, Redis, Passport.js |
| Backoffice | React, Redux, Redux-Saga, Bootstrap, FullCalendar, Axios |
| Mobile | React Native, Redux, Redux-Saga, React Navigation, Google Maps, Geolocation |
| Infrastructure | Docker, PM2, GitLab CI/CD, designed for AWS (ECS, DocumentDB, IoT Core) |
| External Services | PagSeguro (payments), SendGrid (email), Twilio (SMS) |
Retrospective — What I Would Do Differently
Building this system end-to-end taught me a lot. Here’s what I’d change with the benefit of hindsight:
1. MQTT from Day One, Not REST for IoT
What we did: IoT devices push events via REST (POST /devices/:id/events).
The problem: REST is request-response — no server-initiated commands, no persistent connection for health monitoring, higher latency for time-sensitive events like door unlocks.
What I’d do now: Start with MQTT (via AWS IoT Core or EMQX) from the beginning. Use topics like cochilo/cabine/{id}/events for device → platform and cochilo/cabine/{id}/commands for platform → device. This gives us bidirectional communication, connection state awareness (Last Will & Testament), and QoS guarantees.
2. TypeScript Everywhere
What we did: Backend and mobile are JavaScript. Only type hints via JSDoc in some areas.
The problem: As the codebase grew, refactoring became risky. Model shape mismatches between backend responses and mobile/backoffice consumers caused silent bugs.
What I’d do now: TypeScript across all three applications with a shared types package. One source of truth for API contracts — if the backend changes a response shape, the mobile and backoffice get compile-time errors, not runtime crashes.
3. API Versioning & Contract Testing
What we did: Single /api/v1 prefix, no contract tests between clients and backend.
The problem: Breaking changes in the API required coordinated deploys across all three applications. We had incidents where a backend deploy broke the mobile app because a field name changed.
What I’d do now: Implement API contract testing (Pact or similar). Each client declares what it expects from the API. CI fails if the backend breaks a contract before the client is updated.
4. Event Sourcing for Bookings
What we did: Bookings are mutable documents — status updates overwrite the previous state.
The problem: We lost the audit trail. When a booking ended up in an unexpected state, debugging required correlating logs across modules.
What I’d do now: Event-source the booking aggregate. Store events (BookingCreated, BookingConfirmed, SessionStarted, BookingCompleted) and derive current state from the event stream. This gives us a full audit log, enables time-travel debugging, and makes it trivial to build analytics.
5. Push Notifications from the Start
What we did: No push notifications. Users had to open the app to check booking status.
The problem: Users missed booking confirmations and session reminders. This hurt conversion and UX.
What I’d do now: Integrate Firebase Cloud Messaging (FCM) for mobile push and web push for the backoffice. Trigger notifications from RabbitMQ consumers — TRANSACTION.COMPLETED → push “Your pod is confirmed! PIN: XXXX”.
6. Environment & Secret Management
What we did: API keys and URLs hardcoded in the mobile app’s config file. HTTP in development, no secret rotation.
The problem: Security risk. Anyone decompiling the APK gets API keys. HTTP traffic is sniffable.
What I’d do now: Build-time environment injection (react-native-config), HTTPS everywhere (even in dev with self-signed certs), secrets in AWS Secrets Manager with automatic rotation, and X.509 device certificates for IoT authentication.
7. Observability Before We Needed It
What we did: Basic console.log debugging. No structured logging, no distributed tracing, no alerting.
The problem: When things broke in production, we SSH’d into the server and tailed logs. Finding the root cause of a failed booking could take hours.
What I’d do now: Structured JSON logging (Winston/Pino), OpenTelemetry for distributed tracing across the async RabbitMQ pipeline, CloudWatch dashboards for business metrics (bookings/hour, payment success rate, pod utilization), and PagerDuty alerts for anomalies.
Impact & Key Metrics
- Designed and delivered a full-stack IoT platform from zero to production
- Architected a system handling real-time device events, async payment workflows, and geospatial queries
- Built a modular monolith that enabled a 4-person team to ship features independently across 6 domain modules
- Designed the IAM system supporting 5 role tiers with hierarchical permission inheritance
- Created an event-driven pipeline processing booking → payment → notification flows with failure compensation
- Designed a scaling architecture supporting 100 to 10,000+ IoT devices with tiered ingestion
Built with care for users who just need a good nap.