Smart Sleep Pod Platform

Overview

A full-stack IoT platform connecting smart sleep pods to a digital booking ecosystem. Users discover nearby pods on a map, book a time slot, pay with credits, and unlock the pod with a PIN code. Operators manage their fleet, track revenue, and monitor pod health from a backoffice dashboard. Think of it as Airbnb meets WeWork — but for 30-minute naps, powered by IoT.

Domain	IoT + Hospitality Tech (B2C + B2B)
Role	Senior Software Engineer & Technical Lead
Team	4 Engineers (2 Backend, 1 Mobile, 1 Frontend)
Stack	Node.js, Express, MongoDB, Redis, RabbitMQ, React, React Native, MQTT
Scale	6 domain modules, 16 data models, real-time IoT events, 3 external integrations

System Architecture

The platform is composed of three applications and a supporting infrastructure layer, all designed by me from the ground up.

Architecture Decisions

Decision	Rationale
Modular monolith over microservices	Small team, fast iteration — modules are independently deployable later if needed
RabbitMQ for async workflows	Decouples booking → billing → notification chains; enables retry semantics
Socket.io + Redis adapter	Real-time updates to clients with horizontal scaling support
Cookie-based sessions over JWT	Server-side session invalidation; simpler revocation for a security-sensitive domain
Geospatial (2dsphere) indexes	Sub-millisecond proximity search across the entire pod fleet
PIN-code based access	No BLE/NFC dependency on user devices — works with any phone

Backend — Modular Monolith

I designed the backend as a module-per-domain architecture. Each module owns its models, controllers, permissions, and message queue bindings. This gave us the isolation benefits of microservices with the operational simplicity of a monolith.

Module Internal Structure

Every module follows the same convention, which I defined as the team’s standard:

modules/<name>/
├── models/        # Mongoose schemas (domain entities)
├── controllers/   # Business logic (Express handlers)
├── iam/           # Routes + permission declarations
├── rmq/           # RabbitMQ exchange & queue bindings
├── helpers/       # Module-specific utilities
├── sockets/       # Real-time event handlers
├── schemas/       # AJV validation schemas
└── i18n/          # Translation files (en, pt, fr)

This pattern made onboarding fast — any engineer could understand a new module in minutes.

Data Model

16 domain entities across 6 bounded contexts, with geospatial indexing and event-sourced IoT telemetry.

Booking Lifecycle

The most complex flow in the system — spans the mobile app, backend API, billing system, message queue, and IoT hardware. I designed this as an event-driven pipeline to keep each concern isolated.

Booking State Machine

Event-Driven Architecture (RabbitMQ)

I chose RabbitMQ to decouple the booking, billing, and notification domains. This allowed us to handle payment failures gracefully — a failed transaction triggers a compensating event rather than a cascading error.

Event Catalog

Event	Flow	Purpose
`BOOKING.NEW`	Bookings → Billing	Initiate payment for new booking
`BOOKING.USERCANCELLED`	Bookings → Billing	Trigger refund
`TRANSACTION.COMPLETED`	Billing → Bookings	Confirm booking after successful payment
`TRANSACTION.ERROR`	Billing → Bookings	Cancel booking on payment failure
`PURCHASE.CREDITS`	Billing → Balance	Credit user account

IoT Integration

Each physical pod contains a microcontroller that communicates with the platform. I designed the IoT layer to ingest device events and correlate them with active bookings in real time.

IoT Event Types

Event	Meaning	Platform Action
`WRONG_CODE`	Invalid PIN entered	Log, potential lockout after N attempts
`OPEN_DOOR`	Valid PIN → door opens	Transition booking → INPROGRESS
`SESSION_START`	User occupies pod	Start session timer
`SESSION_END`	Timer expires / user exits	Prepare completion
`CLOSE_DOOR`	Door shuts	Finalize booking → COMPLETED

Authentication & Authorization

I implemented a hierarchical IAM (Identity Access Management) system that binds permissions to roles and attaches them to route definitions. This means adding a new API endpoint automatically enforces its access policy — no manual middleware wiring.

Permission format: modules:<domain>:<resource>:<action> — e.g., modules:cochilo-iot:devices:create.

Infrastructure Design

What I Would Deploy in Production

I designed the target-state cloud architecture for production readiness, even though the initial deployment was Docker on VMs.

Scaling Projections

Dimension	100 Pods	1,000 Pods	10,000 Pods
MQTT connections	100	1,000	10,000
Events/day	1,500	15,000	150,000
Telemetry msgs/min	200	2,000	20,000
API instances	1–2	2–4	4–8
DB read replicas	0	1	2–3

IoT at Scale — Hot/Warm/Cold Pipeline

For 10,000+ pods, I designed a tiered ingestion pipeline that separates critical events from telemetry:

Tech Stack

Layer	Technology
Backend	Node.js, Express, MongoDB, Mongoose, RabbitMQ, Socket.io, Redis, Passport.js
Backoffice	React, Redux, Redux-Saga, Bootstrap, FullCalendar, Axios
Mobile	React Native, Redux, Redux-Saga, React Navigation, Google Maps, Geolocation
Infrastructure	Docker, PM2, GitLab CI/CD, designed for AWS (ECS, DocumentDB, IoT Core)
External Services	PagSeguro (payments), SendGrid (email), Twilio (SMS)

Retrospective — What I Would Do Differently

Building this system end-to-end taught me a lot. Here’s what I’d change with the benefit of hindsight:

1. MQTT from Day One, Not REST for IoT

What we did: IoT devices push events via REST (POST /devices/:id/events).

The problem: REST is request-response — no server-initiated commands, no persistent connection for health monitoring, higher latency for time-sensitive events like door unlocks.

What I’d do now: Start with MQTT (via AWS IoT Core or EMQX) from the beginning. Use topics like cochilo/cabine/{id}/events for device → platform and cochilo/cabine/{id}/commands for platform → device. This gives us bidirectional communication, connection state awareness (Last Will & Testament), and QoS guarantees.

2. TypeScript Everywhere

What we did: Backend and mobile are JavaScript. Only type hints via JSDoc in some areas.

The problem: As the codebase grew, refactoring became risky. Model shape mismatches between backend responses and mobile/backoffice consumers caused silent bugs.

What I’d do now: TypeScript across all three applications with a shared types package. One source of truth for API contracts — if the backend changes a response shape, the mobile and backoffice get compile-time errors, not runtime crashes.

3. API Versioning & Contract Testing

What we did: Single /api/v1 prefix, no contract tests between clients and backend.

The problem: Breaking changes in the API required coordinated deploys across all three applications. We had incidents where a backend deploy broke the mobile app because a field name changed.

What I’d do now: Implement API contract testing (Pact or similar). Each client declares what it expects from the API. CI fails if the backend breaks a contract before the client is updated.

4. Event Sourcing for Bookings

What we did: Bookings are mutable documents — status updates overwrite the previous state.

The problem: We lost the audit trail. When a booking ended up in an unexpected state, debugging required correlating logs across modules.

What I’d do now: Event-source the booking aggregate. Store events (BookingCreated, BookingConfirmed, SessionStarted, BookingCompleted) and derive current state from the event stream. This gives us a full audit log, enables time-travel debugging, and makes it trivial to build analytics.

5. Push Notifications from the Start

What we did: No push notifications. Users had to open the app to check booking status.

The problem: Users missed booking confirmations and session reminders. This hurt conversion and UX.

What I’d do now: Integrate Firebase Cloud Messaging (FCM) for mobile push and web push for the backoffice. Trigger notifications from RabbitMQ consumers — TRANSACTION.COMPLETED → push “Your pod is confirmed! PIN: XXXX”.

6. Environment & Secret Management

What we did: API keys and URLs hardcoded in the mobile app’s config file. HTTP in development, no secret rotation.

The problem: Security risk. Anyone decompiling the APK gets API keys. HTTP traffic is sniffable.

What I’d do now: Build-time environment injection (react-native-config), HTTPS everywhere (even in dev with self-signed certs), secrets in AWS Secrets Manager with automatic rotation, and X.509 device certificates for IoT authentication.

7. Observability Before We Needed It

What we did: Basic console.log debugging. No structured logging, no distributed tracing, no alerting.

The problem: When things broke in production, we SSH’d into the server and tailed logs. Finding the root cause of a failed booking could take hours.

What I’d do now: Structured JSON logging (Winston/Pino), OpenTelemetry for distributed tracing across the async RabbitMQ pipeline, CloudWatch dashboards for business metrics (bookings/hour, payment success rate, pod utilization), and PagerDuty alerts for anomalies.

Impact & Key Metrics

Designed and delivered a full-stack IoT platform from zero to production
Architected a system handling real-time device events, async payment workflows, and geospatial queries
Built a modular monolith that enabled a 4-person team to ship features independently across 6 domain modules
Designed the IAM system supporting 5 role tiers with hierarchical permission inheritance
Created an event-driven pipeline processing booking → payment → notification flows with failure compensation
Designed a scaling architecture supporting 100 to 10,000+ IoT devices with tiered ingestion

Built with care for users who just need a good nap.