Methodology

How the performance claims on this site are produced.

Overview

The performance figures cited on this site reflect the platform run against synthetic vehicle trajectory data, not against any specific employer's dataset. Synthetic data is generated using SUMO (Simulation of Urban MObility), an open-source traffic microsimulator developed by the German Aerospace Center (DLR), licensed under EPL-2.0.

Synthetic data generation

Cities: Tokyo and Osaka. Source road networks: OpenStreetMap extracts (ODbL). Vehicle count per simulated day: approximately 1 million vehicles per city (Tokyo and Osaka). Simulated duration: 1 day (24 hours). Output: synthetic GPS pings every 5 seconds per vehicle, written to per-day Parquet partitions. Total probe points: roughly 84.7 million for Tokyo and 96.3 million for Osaka (about 181 million combined).

Pipeline architecture

A streaming Rust pipeline ingests the synthetic trajectories, map-matches them to the road network in parallel across all cores, then enriches and aggregates the results into columnar geospatial output — engineered for terabyte-scale inputs on commodity hardware.

Benchmark setup

Hardware: a single commodity multi-core workstation. OS: Linux. Toolchain: rustc 1.95.0 (release profile, codegen-units tuned). Pipeline configuration: production defaults (all cores, streaming enabled, tuned spatial index). What is measured: wall-clock time from process start to output flush — includes startup, index build, all I/O, and matching.

Results

Two runs underpin the speed and scale claims. Speed pilot (Niigata, ~17 million probe records): the Rust matcher finishes in about 2 minutes at ~3 GB peak RAM — versus ~45 minutes (~12 GB) for an equivalent single-threaded Python implementation and ~12 minutes (~8 GB) for Java on the same hardware: roughly 20× faster than Python and 5× faster than Java, at about a quarter of the memory (~140,000 records/sec). Production scale (SUMO synthetic): the full Tokyo run ingests 992,364 vehicles / 84.7M probe points and writes 2,473,954 matched-link rows in 2 h 09 m (19 GB peak); Osaka, 989,820 vehicles / 96.3M points → 1,924,373 rows in 1 h 39 m (13 GB).

Caveats

Synthetic SUMO trajectories follow simulated driver behavior with idealized GPS sampling. Real-world probe data carries additional noise (GPS drift, dropouts, partial trips) that the production pipeline handles via preprocessing stages not exercised by SUMO output. Numbers here represent the matcher's peak performance on clean inputs; production throughput on noisy real-world data is typically within 50–80% of these figures. All benchmark numbers cited elsewhere on this site reflect runs on the synthetic dataset described above; the site does not publish figures from any specific employer engagement.