Home About Our Philosophy How We Work Case Studies Services Your Free Assessments Articles Advanced Analytics Contact Book a Session
Rail & Transit Data Architecture April 17, 2026

Your Rail Network Collects More Data Than Any Modern Enterprise — And Makes Worse Decisions Than Most

Hint: It's not a technology failure. It's a data architecture failure.

Maria Wright-Noor
Maria Wright-Noor
Data Engineer · Snowflake · Microsoft Fabric · Databricks · dbt
View on LinkedIn
US rail network data architecture and GTFS schema discipline

Every major US transit authority is drowning in signals. Telematics from rolling stock. Passenger flow sensors at platforms. Maintenance logs from depots. Timetable data. Asset management systems. Real-time signalling feeds. The data exists — and it multiplies every single day.

So why are 44% of transit officials unable to see what's actually happening inside their own networks? Why do a third of transit leaders identify data fragmentation as a significant barrier to sound management? Why do rail networks, operating assets with 30- to 50-year lifespans, still discover faults reactively after the disruption has already cost them passengers, budget, and credibility?

The answer isn't a lack of data. It's that the data doesn't talk to itself. And in 2025, that problem is no longer just operational. It's regulatory.

The Compliance Clock Is Running

The Federal Transit Administration's National Transit Database (NTD) now requires transit agencies to submit structured, standardised data as a condition of receiving federal funding — and these requirements are tightening. For the 2025 and 2026 reporting years, the FTA has mandated new GTFS (General Transit Feed Specification) fields, expanded asset reporting categories, including ADA accessibility data, and added rail-specific infrastructure counts. Agencies that cannot cleanly extract, validate, and submit this data on schedule risk compliance failures that directly threaten their federal funding position.

California has gone further. Caltrans now publishes monthly GTFS quality reports for every transit provider in the state — visible, public scorecards of data quality. Agencies that cannot meet the California Minimum GTFS Guidelines are put on two-year improvement plans. The era of transit data as an internal, informal, "figure it out later" problem is over.

The regulatory infrastructure is now built around the assumption that transit agencies have clean, structured, interoperable data. Most don't. And the gap between what regulators expect and what most agencies can actually deliver sits within the same fragmented architecture that's been silently degrading operational performance for years.

The data doesn't talk to itself. In 2025, that problem is no longer just operational — it's regulatory.

Three Silos. One System Breaking Down.

Through working with organisations undergoing data transformation, the same three fragmentation patterns repeat across US rail and transit networks of every size:

  1. Operations vs. Infrastructure

    Train operations and infrastructure management capture data independently, in incompatible formats, with no shared naming conventions or data contracts. When the FTA asks for a unified asset picture, teams scramble to reconcile spreadsheets that were never designed to align.

  2. Maintenance vs. Asset Intelligence

    Maintenance teams log faults reactively. IoT telemetry and vehicle health data live in a separate platform. The bridge between "what failed" and "what is about to fail" is never built — so predictive maintenance remains a strategy slide rather than a daily operational reality.

  3. Passenger vs. Network

    Ticketing data, journey planning data, and real-time disruption feeds operate on different systems. Passengers experience their journey as one thing. The agencies managing it experience it as three. Dynamic rerouting, demand forecasting, and real-time capacity decisions all require these silos to collapse. Most US networks haven't started.

The cost is measurable. Agencies report maintenance windows being repeated on the same corridor because a unified data view would have bundled the work — but the work was planned in isolation. Compensation payments accumulate. And by the time an insight has been extracted, reconciled across systems, and escalated to the decision-maker, the moment to act has already passed.

What Data Interoperability Actually Means — and What It Unlocks

Data interoperability in transit is not about buying new platforms. It's about enforcing a shared schema — a common data contract — across every system that touches your network.

GTFS is the most visible example. It's not just a file format. It's a schema: a strict definition of how routes, stops, trips, timetables, and fares must be structured so that any system can read, compare, and act on the data. Over 10,000 agencies in more than 100 countries have adopted it — not because they had to, but because the moment data follows a consistent schema, it becomes usable across every downstream system simultaneously.

When a US transit agency enforces schema discipline across its core data, here is the class of decisions that immediately improves:

  • Federal NTD Compliance. Structured, validated data pipelines replace the annual scramble to manually reconcile figures for FTA submission. Agencies using proper data architecture report NTD compliance as a routine output — not a quarterly fire drill.
  • On-Time Performance Reporting. OTP is the primary public KPI for every US transit agency. WMATA (Washington DC) now publishes real-time performance through its MetroPulse platform. The MTA (New York) reported 83.7% weekday subway on-time performance in 2025 — a 2.1-point improvement driven in part by data-led operational decisions. The MBTA (Boston) publishes monthly performance scorecards across heavy rail, bus, and paratransit. These numbers exist because the underlying data is structured, named consistently, and queryable in real time. Agencies without that foundation are guessing.
  • Predictive Maintenance Scheduling. When asset condition data, maintenance history, and telematics share a common schema, the system can identify degradation patterns before failure. Without schema alignment, these datasets cannot be joined — and every maintenance decision is made on incomplete information.
  • Dynamic Capacity Management. When ridership data (from ticketing), vehicle availability (from dispatch), and timetable data (from scheduling) follow a unified schema, operators can make real-time capacity decisions: holding a train, adding a vehicle, rerouting a service. When those datasets live in incompatible formats, the decision is made manually, late, and reactively.
  • ADA Compliance Tracking. New FTA NTD requirements mandate structured reporting on ADA-accessible station assets. Agencies without a governed asset data model cannot produce this accurately, and inaccurate ADA reporting carries serious legal and funding consequences.
  • Multi-Agency Trip Planning. In metro regions served by multiple operators — LA Metro and Metrolink, MTA and NJ Transit — a shared schema makes seamless journey planning possible. Without it, each agency's data lives in its own dialect, and passengers bear the cost of that translation failure.

The Consequence of Waiting

Every quarter a US transit agency delays data unification, it pays a compounding cost: repeated maintenance windows, reactive fault response, manual NTD reconciliation, and growing exposure to regulatory non-compliance. The connected rail market is growing from $38 billion today toward $51 billion by 2030. The investment is already flowing. The question is whether your data architecture can absorb it, or whether new tools will simply land on top of the same fragmented foundation and produce the same fragmented results.

The data exists. The regulatory mandate exists. The performance gap is visible in every public dashboard. What's missing, in most cases, is the architecture that connects it.

New tools landing on top of a fragmented foundation produce the same fragmented results.

Three Non-Negotiables for Getting This Right

  1. Adopt and enforce a shared schema across operational systems

    GTFS is the floor, not the ceiling. Every internal system — maintenance, dispatch, asset management — needs its own data contract that maps to a unified model. Without this, integration is perpetually temporary.

  2. Build a governed data layer, not just a data lake

    Centralising data without governing it creates a new silo — just a bigger one. Ownership, access controls, lineage tracking, and data quality rules must be built in from the start, not added later.

  3. Make compliance a data pipeline output, not a reporting task

    NTD submission, GTFS quality scores, ADA asset counts — these should flow from your data architecture automatically. If they require manual effort today, that manual effort is masking a structural problem.

This is the work — and it's overdue for most US transit networks.

If you're leading a transit or rail organisation wrestling with fragmented data, compliance pressure, or the gap between your operational reality and your performance dashboards, this is exactly what we help solve at One Big Table. The first step is always the same: understand what you actually have before you decide what to build.

What data challenge is your network dealing with right now?

Read the original on LinkedIn
Back to all articles

Fragmented Data?
Let's Connect It.

If your transit or rail network is wrestling with NTD reporting, ADA compliance, or predictive maintenance gaps, the answer is in the schema layer. Let's diagnose it together.

Book a Session