arrow_back Back to Projects
PLATFORM ENGINEERING & ARCHITECTURE

Engineering a Multi-Tenant SaaS Platform from the Ground Up

How 20+ years of enterprise IT experience shaped the architecture of Path of Progress — a production SaaS platform built solo, held to enterprise standards from day one.

12+
Containerized Services
Schema
Level Tenant Isolation
Zero
Stored Passwords
4-Layer
Defense in Depth
01 / Philosophy

Enterprise Standards,
Solo Execution

With over 20 years in the IT industry, I've seen firsthand what separates systems that survive production from systems that don't. Scalability, fault tolerance, and redundancy aren't things you bolt on later — they're foundational decisions that shape everything downstream.

"Architect it the way an enterprise would architect a customer-facing SaaS product. No shortcuts on security. No 'good enough for now' on data isolation. No skipping the connection pooler because the user count is low today."

— Design Principle, Day One

Every layer of the stack was designed with the assumption that it would need to scale, recover from failure, and withstand real-world operational pressure. The commitment to enterprise-grade practices — applied within the constraints of a solo development effort — is what defined the project and what made it a meaningful engineering challenge.

02 / Challenge

Enterprise Rigor on a Solo Budget

The core challenge was never "how do I build a chore app." It was: how do I deliver the same reliability, security, and operational maturity that a well-staffed engineering team would, while working as a single developer?

database

Multi-Tenancy with Real Isolation

Not just application-level filtering, but schema-level separation in the database so that tenant boundaries are enforced structurally — immune to a missed WHERE clause.

shield_lock

Enterprise Auth for a Family Audience

Balancing robust SSO, secure token management, and session lifecycle controls with the reality that end users include kids on shared devices who sign in with a four-digit PIN.

schedule

Complex Workflow Orchestration

Recurring tasks with rotation logic, approval workflows, timed expirations, and nightly batch operations that must execute reliably without human oversight across every tenant.

monitoring

Fault Tolerance & Observability from Day One

Health checks, automatic recovery, resource limits, centralized logging, and metrics collection so that problems surface before users notice them — not after.

deployed_code

Sustainable Solo Operations

A deployment model that's reproducible, containerized, and automated enough that one person can maintain it — without SSH-ing into a server at midnight to restart a process.

03 / Infrastructure

Containerized Multi-Service with a Phased Path to Orchestration

Every concern — API serving, background processing, task scheduling, caching, connection pooling, object storage, monitoring — runs in its own isolated container, orchestrated together on a shared network.

The containerization strategy was designed from the beginning as a phased approach: build and stabilize on Docker Compose first, then migrate to Kubernetes for multi-host orchestration and high availability before the platform goes fully public. This wasn't a deferral — it was a deliberate sequencing decision. Introducing container orchestration before the application architecture is rock-solid just multiplies complexity in two places at once.

By proving stability on a single-host Docker deployment first, the Kubernetes migration becomes a pure infrastructure concern rather than a debugging exercise against a moving application target.

A monolith would have coupled concerns that needed independent lifecycle management — a long-running background job shouldn't be able to take down the API. A full microservices deployment from day one would have introduced orchestration overhead before the service boundaries were proven. The containerized multi-service approach delivers fault isolation and independent restartability today, and every container is already designed to be portable to Kubernetes pod definitions — same images, same health checks, same resource constraints, different orchestrator.

03b / Key Infrastructure Decisions

Every Layer, Purpose-Built

swap_horiz

Connection Pooling

All database traffic routes through a dedicated pooling layer using transaction-mode pooling, preventing connection exhaustion under load and allowing worker processes to scale without proportionally scaling database connections.

settings_suggest

Background Task Processing

Runs in a dedicated worker container with its own resource limits, separate from the API server. Task results persist to the database — not to the in-memory cache — providing a durable audit trail of every automated operation.

cached

Deliberately Ephemeral Cache

The caching layer serves as a message broker and pub/sub backbone, but stores nothing that can't be regenerated. If it restarts, no durable data is affected and all services reconnect automatically — simplifying recovery by design.

cloud_upload

S3-Compatible Object Storage

A dedicated S3-compatible service handles file uploads and persistent assets, keeping binary data out of the database and the filesystem with proper lifecycle management.

04 / Data Architecture

Schema-Level Tenant Isolation

The platform supports multiple independent households, and each one needs complete data isolation — not just for privacy, but as a structural guarantee that a bug in application logic can't accidentally surface one family's data inside another's.

Each tenant receives its own database schema, with tenant-specific data — users, tasks, rewards, configurations — fully isolated at the database layer. Shared resources like the global digital item catalog and system-wide configuration live in a common schema accessible to all tenants.

close

Shared Tables with Tenant ID Filtering

Simpler to implement, but isolation depends entirely on application logic. One missed WHERE clause is a data leak.

remove

Separate Databases per Tenant

Maximum isolation, but operationally expensive. Migrations, backups, and connection management multiply with every tenant.

check_circle

Schema-Based Isolation (Selected)

Enforces boundaries at the database level while keeping all tenants in a single instance. Migrations run once across schemas, backups capture everything, and connection pooling remains centralized.

The background task system is also tenant-aware. Every scheduled job — nightly resets, rotation processing, penalty calculations — executes within the correct tenant context, ensuring that batch operations respect the same isolation boundaries as API requests.

In an enterprise context, the answer to authentication is well-established — delegate to a dedicated identity provider, use industry-standard protocols, enforce strong credentials, manage sessions with short-lived tokens. I didn't want to deviate from any of that.

But the reality of the target audience introduced constraints most enterprise applications never face. Kids don't have their own email addresses. Families share devices. Complex passwords and MFA aren't practical for an eight-year-old. The challenge was preserving the full enterprise security model at the perimeter while providing an age-appropriate experience inside.

Stage 1

Household Authentication

Enterprise-grade SSO via an external identity provider using OpenID Connect. Secure token exchange, proper session lifecycle. The parent authenticates once, establishing a verified tenant context.

Stage 2

Profile Selection

Individual family members select their profile using a simplified PIN. The session upgrades from tenant-level to user-level, granting access to that specific user's data and permissions.

The system maintains distinct authentication states as a formal state machine. Switching profiles downgrades the session without requiring re-authentication against the identity provider. A full sign-out terminates everything and explicitly prevents silent session reuse.

The identity provider handles everything it should — credential storage, token issuance, session management — and the application never stores or manages passwords directly.

05 / Identity & Access

Two-Stage Session Architecture

Enterprise security meets family-friendly UX through a layered session model that never compromises on either.

06 / Workflow Engine

Database-Driven Task Orchestration

Far beyond basic CRUD — a composable workflow engine managing complex scheduling, multi-user assignments, and nightly batch reconciliation across tenant boundaries.

The platform's task system manages recurring schedules, multi-user rotation assignments, approval pipelines, timed claim-and-release mechanics, team-based shared assignments, and nightly batch reconciliation — all operating within multi-tenant schema isolation.

sync

Composable Recurrence & Rotation

A task's recurrence schedule (when it appears) and its rotation assignment (who it's assigned to) are decoupled and compose independently without special-case logic. A weekly task can rotate between three users, each assigned every three weeks.

content_copy

Template-Instance Pattern

Shared community tasks act as templates. When claimed, the system generates a user-specific instance while the template manages its own lifecycle — resets, timed expirations, and availability constraints.

group

Shared-Instance Team Model

Multiple users link to the same task record through a junction table with role-based permissions. No data duplication — completion, progress, and rewards operate on a single source of truth.

approval

Approval Workflows

Certain task completions are gated behind administrator review — entering a pending state with different reward and achievement logic depending on the resolution path.

Nightly Batch Operations

Automated jobs run on schedule to maintain system integrity across all tenants:

arrow_forward Daily reset — returns completed and failed tasks to active status for the new day
arrow_forward Rotation advancement — processes rotation schedules and updates assignments
arrow_forward Streak reconciliation — identifies missed completion requirements and adjusts tracking
arrow_forward Penalty calculation — optional per-tenant feature applying consequences for incomplete mandatory tasks

Each job is tenant-aware, idempotent where possible, and logged for auditability. They run in the isolated background worker so batch processing doesn't affect API responsiveness.

07 / Observability

Two Layers of Visibility

Infrastructure monitoring tells me whether the system is healthy. In-app observability gives end users the same depth of visibility into their own data that I'd expect from an enterprise operations dashboard.

dns

Infrastructure Layer

Container & Host Metrics

CPU, memory, network, and disk I/O per service, scraped by external monitoring and visualized in dashboards.

Centralized Log Aggregation

Full ELK stack — container logs, application logs, and audit events shipped to a durable, searchable store for incident investigation and pattern detection.

Health Checks & Auto-Recovery

Every critical service exposes health endpoints with automatic restart on failure. Resource ceilings prevent any single container from starving the others.

dashboard

Application Layer

Real-Time Operational Views

Live data backed by WebSocket infrastructure — who's completed what, what's pending, what's overdue, across every active user.

Deep Reporting & Analytics

Completion rates, streak histories, transaction logs, per-question accuracy breakdowns, error pattern analysis — the drill-down depth of a BI tool.

Full Audit Logging

Every meaningful action captured — completions, purchases, transactions, approvals, administrative changes. Timestamped, attributed, fully traceable.

"You can't manage what you can't see. The people responsible for outcomes deserve the same quality of tooling that an operations team would expect — whether they're managing a data center or managing a household."

08 / Operational Maturity

Defense in Depth & Operational Sustainability

Security and operations aren't features — they're layers. Every layer catches what the one above might miss.

security

Defense-in-Depth Rate Limiting

Rate limiting operates at multiple layers of the stack. No single layer is solely responsible — each one catches what the layer above might miss:

Layer 1
Cloud WAF & Bot Mitigation
Layer 2
Firewall IPS Policies
Layer 3
Identity Provider Rate Limits
Layer 4
Database Connection Limits
key

Centralized Secrets Management

API keys, database credentials, third-party service tokens, and encryption keys are stored and managed through HashiCorp Vault. Secrets are injected at runtime rather than stored in environment files or compose configurations — ensuring they're centrally auditable, rotatable, and never stored in plaintext on disk alongside application code.

rocket_launch

Kubernetes-Ready Deployment

Multi-host high availability is a hard prerequisite before the platform goes fully public. All images are pre-built and pulled from a private registry — no build-on-deploy, no drift between environments. Because every service was containerized from day one with proper health checks, resource constraints, and dependency declarations, the Kubernetes migration delivers multi-node scheduling, automated failover, horizontal pod autoscaling, and parallel queue processing across nodes.

Health checks map directly to Kubernetes liveness and readiness probes. Defined dependency chains ensure services start in the correct order. It's the final infrastructure gate before general availability.

inventory_2

Stateful Component Classification

Every stateful component is classified by criticality, driving backup strategy, recovery planning, and infrastructure investment:

Critical

External database, object storage, file upload volumes — durable storage, external hosting, backup automation.

Ephemeral

Cache layer, scheduler state, monitoring history — designed to recover through restart alone.

09 / Takeaways

Enterprise Patterns, Proven Solo

check_circle

Schema-Level Data Isolation

Multi-tenant architecture with structural guarantees that transcend application logic — enforced at the database layer, not the code layer.

check_circle

Delegated Identity Management

Zero stored passwords. Enterprise-grade SSO with a two-stage session model that balances security with real-world usability constraints.

check_circle

Defense-in-Depth Security

Four-layer rate limiting, centralized secrets management through HashiCorp Vault, and WAF-level bot mitigation at the edge.

check_circle

Full-Stack Observability

ELK-based centralized logging, container and host metrics, in-app audit trails, and real-time operational dashboards at both infrastructure and application layers.

check_circle

Fault-Isolated Architecture

Every concern containerized independently with health checks, auto-recovery, and resource limits — designed from day one as Kubernetes-portable pod definitions.

check_circle

Phased, Deliberate Scaling

Prove stability on Docker first, migrate to Kubernetes for multi-host HA before going public. Architectural judgment over architectural ambition.

"Enterprise best practices aren't just for enterprises. The discipline of applying these patterns from day one — rather than retrofitting them when something breaks — is what separates a sustainable platform from a fragile one. Knowing when to apply a pattern, how to right-size it, and where to draw the line between rigor and over-engineering — that's the real skill."

Need this level of architecture?

Whether you're building a new platform from scratch or modernizing an existing system, the same enterprise patterns apply. Let's talk about what your architecture needs.