Smart Sleep Pod Platform

Overview

A full-stack IoT platform connecting smart sleep pods to a digital booking ecosystem. Users discover nearby pods on a map, book a time slot, pay with credits, and unlock the pod with a PIN code. Operators manage their fleet, track revenue, and monitor pod health from a backoffice dashboard. Think of it as Airbnb meets WeWork — but for 30-minute naps, powered by IoT.

DomainIoT + Hospitality Tech (B2C + B2B)
RoleSenior Software Engineer & Technical Lead
Team4 Engineers (2 Backend, 1 Mobile, 1 Frontend)
StackNode.js, Express, MongoDB, Redis, RabbitMQ, React, React Native, MQTT
Scale6 domain modules, 16 data models, real-time IoT events, 3 external integrations

System Architecture

The platform is composed of three applications and a supporting infrastructure layer, all designed by me from the ground up.

Architecture Decisions

DecisionRationale
Modular monolith over microservicesSmall team, fast iteration — modules are independently deployable later if needed
RabbitMQ for async workflowsDecouples booking → billing → notification chains; enables retry semantics
Socket.io + Redis adapterReal-time updates to clients with horizontal scaling support
Cookie-based sessions over JWTServer-side session invalidation; simpler revocation for a security-sensitive domain
Geospatial (2dsphere) indexesSub-millisecond proximity search across the entire pod fleet
PIN-code based accessNo BLE/NFC dependency on user devices — works with any phone

Backend — Modular Monolith

I designed the backend as a module-per-domain architecture. Each module owns its models, controllers, permissions, and message queue bindings. This gave us the isolation benefits of microservices with the operational simplicity of a monolith.

Module Internal Structure

Every module follows the same convention, which I defined as the team’s standard:

modules/<name>/
├── models/        # Mongoose schemas (domain entities)
├── controllers/   # Business logic (Express handlers)
├── iam/           # Routes + permission declarations
├── rmq/           # RabbitMQ exchange & queue bindings
├── helpers/       # Module-specific utilities
├── sockets/       # Real-time event handlers
├── schemas/       # AJV validation schemas
└── i18n/          # Translation files (en, pt, fr)

This pattern made onboarding fast — any engineer could understand a new module in minutes.


Data Model

16 domain entities across 6 bounded contexts, with geospatial indexing and event-sourced IoT telemetry.


Booking Lifecycle

The most complex flow in the system — spans the mobile app, backend API, billing system, message queue, and IoT hardware. I designed this as an event-driven pipeline to keep each concern isolated.

Booking State Machine


Event-Driven Architecture (RabbitMQ)

I chose RabbitMQ to decouple the booking, billing, and notification domains. This allowed us to handle payment failures gracefully — a failed transaction triggers a compensating event rather than a cascading error.

Event Catalog

EventFlowPurpose
BOOKING.NEWBookings → BillingInitiate payment for new booking
BOOKING.USERCANCELLEDBookings → BillingTrigger refund
TRANSACTION.COMPLETEDBilling → BookingsConfirm booking after successful payment
TRANSACTION.ERRORBilling → BookingsCancel booking on payment failure
PURCHASE.CREDITSBilling → BalanceCredit user account

IoT Integration

Each physical pod contains a microcontroller that communicates with the platform. I designed the IoT layer to ingest device events and correlate them with active bookings in real time.

IoT Event Types

EventMeaningPlatform Action
WRONG_CODEInvalid PIN enteredLog, potential lockout after N attempts
OPEN_DOORValid PIN → door opensTransition booking → INPROGRESS
SESSION_STARTUser occupies podStart session timer
SESSION_ENDTimer expires / user exitsPrepare completion
CLOSE_DOORDoor shutsFinalize booking → COMPLETED

Authentication & Authorization

I implemented a hierarchical IAM (Identity Access Management) system that binds permissions to roles and attaches them to route definitions. This means adding a new API endpoint automatically enforces its access policy — no manual middleware wiring.

Permission format: modules:<domain>:<resource>:<action> — e.g., modules:cochilo-iot:devices:create.


Infrastructure Design

What I Would Deploy in Production

I designed the target-state cloud architecture for production readiness, even though the initial deployment was Docker on VMs.

Scaling Projections

Dimension100 Pods1,000 Pods10,000 Pods
MQTT connections1001,00010,000
Events/day1,50015,000150,000
Telemetry msgs/min2002,00020,000
API instances1–22–44–8
DB read replicas012–3

IoT at Scale — Hot/Warm/Cold Pipeline

For 10,000+ pods, I designed a tiered ingestion pipeline that separates critical events from telemetry:


Tech Stack

LayerTechnology
BackendNode.js, Express, MongoDB, Mongoose, RabbitMQ, Socket.io, Redis, Passport.js
BackofficeReact, Redux, Redux-Saga, Bootstrap, FullCalendar, Axios
MobileReact Native, Redux, Redux-Saga, React Navigation, Google Maps, Geolocation
InfrastructureDocker, PM2, GitLab CI/CD, designed for AWS (ECS, DocumentDB, IoT Core)
External ServicesPagSeguro (payments), SendGrid (email), Twilio (SMS)

Retrospective — What I Would Do Differently

Building this system end-to-end taught me a lot. Here’s what I’d change with the benefit of hindsight:

1. MQTT from Day One, Not REST for IoT

What we did: IoT devices push events via REST (POST /devices/:id/events).

The problem: REST is request-response — no server-initiated commands, no persistent connection for health monitoring, higher latency for time-sensitive events like door unlocks.

What I’d do now: Start with MQTT (via AWS IoT Core or EMQX) from the beginning. Use topics like cochilo/cabine/{id}/events for device → platform and cochilo/cabine/{id}/commands for platform → device. This gives us bidirectional communication, connection state awareness (Last Will & Testament), and QoS guarantees.

2. TypeScript Everywhere

What we did: Backend and mobile are JavaScript. Only type hints via JSDoc in some areas.

The problem: As the codebase grew, refactoring became risky. Model shape mismatches between backend responses and mobile/backoffice consumers caused silent bugs.

What I’d do now: TypeScript across all three applications with a shared types package. One source of truth for API contracts — if the backend changes a response shape, the mobile and backoffice get compile-time errors, not runtime crashes.

3. API Versioning & Contract Testing

What we did: Single /api/v1 prefix, no contract tests between clients and backend.

The problem: Breaking changes in the API required coordinated deploys across all three applications. We had incidents where a backend deploy broke the mobile app because a field name changed.

What I’d do now: Implement API contract testing (Pact or similar). Each client declares what it expects from the API. CI fails if the backend breaks a contract before the client is updated.

4. Event Sourcing for Bookings

What we did: Bookings are mutable documents — status updates overwrite the previous state.

The problem: We lost the audit trail. When a booking ended up in an unexpected state, debugging required correlating logs across modules.

What I’d do now: Event-source the booking aggregate. Store events (BookingCreated, BookingConfirmed, SessionStarted, BookingCompleted) and derive current state from the event stream. This gives us a full audit log, enables time-travel debugging, and makes it trivial to build analytics.

5. Push Notifications from the Start

What we did: No push notifications. Users had to open the app to check booking status.

The problem: Users missed booking confirmations and session reminders. This hurt conversion and UX.

What I’d do now: Integrate Firebase Cloud Messaging (FCM) for mobile push and web push for the backoffice. Trigger notifications from RabbitMQ consumers — TRANSACTION.COMPLETED → push “Your pod is confirmed! PIN: XXXX”.

6. Environment & Secret Management

What we did: API keys and URLs hardcoded in the mobile app’s config file. HTTP in development, no secret rotation.

The problem: Security risk. Anyone decompiling the APK gets API keys. HTTP traffic is sniffable.

What I’d do now: Build-time environment injection (react-native-config), HTTPS everywhere (even in dev with self-signed certs), secrets in AWS Secrets Manager with automatic rotation, and X.509 device certificates for IoT authentication.

7. Observability Before We Needed It

What we did: Basic console.log debugging. No structured logging, no distributed tracing, no alerting.

The problem: When things broke in production, we SSH’d into the server and tailed logs. Finding the root cause of a failed booking could take hours.

What I’d do now: Structured JSON logging (Winston/Pino), OpenTelemetry for distributed tracing across the async RabbitMQ pipeline, CloudWatch dashboards for business metrics (bookings/hour, payment success rate, pod utilization), and PagerDuty alerts for anomalies.


Impact & Key Metrics

  • Designed and delivered a full-stack IoT platform from zero to production
  • Architected a system handling real-time device events, async payment workflows, and geospatial queries
  • Built a modular monolith that enabled a 4-person team to ship features independently across 6 domain modules
  • Designed the IAM system supporting 5 role tiers with hierarchical permission inheritance
  • Created an event-driven pipeline processing booking → payment → notification flows with failure compensation
  • Designed a scaling architecture supporting 100 to 10,000+ IoT devices with tiered ingestion

Built with care for users who just need a good nap.