architecture · Oct 2023 – Sep 2024

Migrating Aviation Operational Data from Relational DBs to MongoDB

Query performance +20% and downtime −15% migrating legacy relational systems to MongoDB across aviation operational workflows.

Role: Owned the schema redesign, aggregation-pipeline design, and the migration rollout
Team: Backend integrations team
Scope: Operational data powering enterprise integrations and reporting at Universal Weather and Aviation
Level: Software Engineer (SDE-1)

Query performance

baseline (relational)+20% faster

Method: Compared latency on the top operational queries before vs after migration, on the read-heavy hot paths the redesign targeted.
Window: post-migration

Operational downtime

baseline−15%

Method: Compared downtime attributable to the data layer before vs after the migration.
Window: post-migration

Context

At Universal Weather and Aviation I worked on the backend integrations powering enterprise airline operations across multiple GDS platforms. The operational data behind those integrations — and the reporting on top of it — lived in normalized relational tables, and the hot queries that joined across several tables were getting slower as the data grew.

The brief was to migrate that data to MongoDB. The interesting part wasn't moving bytes; it was deciding what shape the data should take on the other side, and how to move live operational data without taking the integrations down.

The data-model decision

The trap in a relational-to-document migration is to lift your tables straight across — keep the same normalized shape, replace SQL joins with application-side lookups, and end up paying the relational cost without the relational guarantees.

Instead I modeled documents around how the data is actually read: embed what's read together, and precompute the reporting views as indexed aggregation pipelines. The workload was overwhelmingly read- and report-heavy, so trading a little write-time complexity for read-time speed was the right exchange. That's where the ~20% query improvement came from.

Migrating without an outage

Because this was live operational data, I migrated incrementally rather than in a big-bang cutover: dual-write to both stores, verify that reads matched, then shift traffic one collection at a time. Every step was independently reversible — which is what kept the downtime change to −15% instead of risking a hard outage.

What broke

The honest part: an early aggregation pipeline that flew in staging did full collection scans in production, because the compound index I'd built didn't match the pipeline's match-and-sort order — the equality-then-sort-then-range field ordering a compound index needs in order to actually be used. Staging simply didn't have the data volume to expose it. A reporting path was slow for a short window after one cutover until I corrected the index. The lesson stuck — now I read the explain plan on every pipeline against production-scale data before a cutover, not after, and confirm it's an index scan rather than a collection scan.

Outcomes

Query performance improved roughly 20% on the hot paths, data-layer downtime dropped about 15%, and the document-modeling approach became the template for later reporting work on the team.

Decision Records

DR-001 · architecture

Redesigning the data model around read and aggregation patterns — rather than lifting normalized tables 1:1 — is what produced the query gain

Context

The legacy operational data lived in normalized relational tables. The hottest queries joined across several tables and were getting slower as data grew. A naive move to MongoDB would have preserved those joins as application-side lookups and made things worse, not better.

Options considered

✗
Keep relational, add a read replica
Buys headroom but doesn't address the structural cost of multi-table joins on the hot paths, and adds another piece of infrastructure to operate.
✗
Lift-and-shift tables 1:1 into MongoDB collections
Preserves the join shape as application-side lookups — you inherit the relational cost without the relational guarantees. A common migration trap.
✓
Redesign documents around the actual query and aggregation patterns
Embed the data that is read together; precompute reporting views with aggregation pipelines. Trades write-time complexity for read-time speed, which matched a read-heavy workload.

Tradeoffs

Denormalized documents mean some data is duplicated and writes touch more places. Acceptable because the workload is overwhelmingly read- and report-heavy, and the duplicated fields change rarely.

Outcome

The top operational queries got measurably faster and reporting aggregations moved from multi-join queries to indexed pipelines.

DR-002 · architecture

An incremental migration with a read-verification window is what kept the change safe on live operational data

Context

This was live operational data feeding enterprise integrations. A single big-bang cutover would have concentrated all the risk into one window with no easy rollback.

Options considered

✗
Big-bang cutover in a maintenance window
All risk concentrated in one window; rollback means restoring from backup and losing anything written during the window.
✓
Incremental migration with dual-writes and a read-verification window
Write to both stores, verify reads match, then shift traffic collection by collection. Each step is independently reversible.

Tradeoffs

Dual-writes add temporary complexity and a window where both stores must stay consistent. Worth it because reversibility on live operational data mattered more than a faster cutover.

Outcome

Traffic shifted incrementally without a major migration incident, and overall data-layer downtime dropped versus the relational baseline.

Where I Got It Wrong

Failure 1

An early aggregation pipeline that was fast in staging did full collection scans in production, because the supporting compound index didn't match the pipeline's match-and-sort order

Cost: A reporting path was slow for a short period after one of the incremental cutovers, until the index was corrected

Lesson: Validate aggregation pipelines against production-scale data and read the explain plan before cutover — staging data volume hides index problems that only surface at scale

Long-Term Impact

The document model and aggregation-pipeline approach became the pattern the team reused for later reporting work.