Skip to content

ADR-0003: Tenant data isolation enforcement

  • Status: Superseded by ADR-0004 (2026-05-02)
  • Date: 2026-04-27
  • Decider(s): Theo (SA)

Superseded. The chokepoint mechanism described here (single SQLAlchemy session-event hook, ContextVar-bound current_stringer_id, loud-failure on unbound) is retained by ADR-0004. What changes in 0004:

  • The Player entity is replaced by a Person + ClientProfile split. Player.stringer_id no longer exists; ClientProfile.stringer_id plays that role for the per-stringer view, and Person carries no stringer scope.
  • The player_shares table is replaced by two share tables (order_shares for per-job grants from either a stringer or a client; person_stringer_share for client-initiated future-inclusive grants per target stringer).
  • The chokepoint predicate broadens accordingly. Writes remain strictly stringer_id = :me.
  • The mandated integration test (every tenant-scoped table refuses unscoped reads) stays, simply against the new table set.

Read ADR-0004 for the current decision. The text below is preserved verbatim for historical context.

Context

RBO V2 is multi-tenant from day one (Topic 1 — Stefan's call). The privacy expectation is strict per-stringer isolation by default, with one deliberate cross-tenant breach: the player_shares table allowing stringer A to share a Player record (and history) read-only with stringer B.

The fundamental risk is a silent cross-tenant data leak — a query forgets the WHERE stringer_id = :me clause and serves Player A's orders to Stringer B. At our scale, the data is mild PII (names, dates, CHF amounts), but the loss-of-trust consequence is severe enough that "we just have to remember" is not a defensible mitigation.

The platform-level decision in ADR-0001 commits us to row-level tenancy. This ADR commits to how the row-level filter is enforced, and what mechanism prevents regression.

Options

  • (1) Per-handler manual filtering. Every endpoint writes WHERE stringer_id = :me itself. Simple; relies on developer discipline; fails the silent-leak risk because the test surface is "every handler" with no central guarantee.
  • (2) Single ORM-session-layer chokepoint. A SQLAlchemy do_orm_execute event hook intercepts every query, appends the tenant filter, and refuses queries with no bound current_stringer_id. One implementation, one place to audit, one place to test.
  • (3) Postgres Row-Level Security (RLS) policies. Database-side enforcement. Strong isolation; survives ORM bugs. But: requires per-request SET LOCAL app.current_stringer_id = ...; fights gotrue's auth schema layout; harder to audit policy diff in code review; the visibility/share predicates are awkward to express purely in SQL policies.
  • (4) DB-per-tenant. Strong isolation by physical separation. Conflicts with Atlas's locked db-per-app policy. Operationally absurd at our 1–5 stringer scale.

Decision

Option 2: enforce at the SQLAlchemy ORM session layer.

Concretely:

  1. Tenant-scoped models declare __tenant_column__ = "stringer_id" or inherit TenantBase.
  2. Request middleware binds current_stringer_id (and current_role) into a ContextVar after JWT validation by gotrue.
  3. A do_orm_execute event hook on every Session:
  4. For tenant-scoped models in the FROM clause, AND-in <model>.stringer_id = :current_stringer_id.
  5. For models with a visibility column (Racket, String), additionally AND-in (<model>.created_by_stringer_id = :me OR <model>.visibility = 'shared').
  6. For Player, additionally AND-in (stringer_id = :me OR id IN (SELECT player_id FROM player_shares WHERE target_stringer_id = :me)).
  7. If current_stringer_id is unbound, the query is refused (raised exception, logged loudly).
  8. Admin role can opt into session.bypass_tenant = True per-session — the only legitimate way to read across tenants. The bypass is logged on every query it permits.

player_shares is the only deliberate cross-tenant relation. Writes to a shared Player remain owner-only.

The required integration test

This ADR mandates the test as part of the decision, not as a wishlist item:

Test: for every model class declared as tenant-scoped (__tenant_column__ set or TenantBase parent), open a SQLAlchemy session without binding current_stringer_id and attempt a SELECT * FROM <table> LIMIT 1. The test passes iff the query is refused (raises the tenancy-not-bound exception). The test must enumerate every tenant-scoped model class via reflection — adding a new tenant-scoped table without adding it to __tenant_column__ is a test failure.

This test is the gate that turns "we believe the chokepoint covers everything" into "we mechanically prove it on every CI run."

Consequences

Good

  • Single chokepoint — the audit surface is one event-hook implementation, not N handlers.
  • Failure-mode is loud — an unbound query raises rather than silently returning all rows.
  • The deliberate breach (player_shares) is exactly one place in the chokepoint. Anyone reviewing the code sees it; anyone widening it has to touch this single function.
  • Admin bypass is auditable — every bypassed query is logged. At one-admin scale this is operationally manageable; at multi-admin scale it remains tractable.
  • Visibility and share predicates compose cleanly — they're SQL fragments added by the same hook, not orthogonal mechanisms.

Costs we accept

  • Deal-breaker risk: silent data leak. Fully acknowledged. The chokepoint plus the integration test are the mitigation. This ADR exists because that risk is the headline.
  • Raw SQL via session.execute(text(...)) bypasses the hook. Code-review rule: every such call requires a comment explaining the scope. Where possible, prefer ORM queries.
  • The hook adds a small per-query CPU cost (parsing the FROM clause, mutating the WHERE clause). At our row counts and query volume, immeasurable.
  • Cross-app code reuse is constrained — the chokepoint is RBO-specific. CSD/ALJ on the same platform have different tenancy stories (mostly single-tenant). This is fine; it's not a platform-wide library.
  • Postgres RLS is left on the table as a future hardening if the chokepoint ever proves leaky. Adding RLS later is additive (defense-in-depth); doesn't invalidate this ADR.

Cross-references