Performance¶
Cost of generating a CBC catalogue data product with
gwmock-signal,
across backends (lal, pycbc, ripple), methods (per-event vs the
batched on-device path), and hardware.
Each run produces the catalogue twice — a cold run that pays one-time
JIT/XLA compilation and a warm steady-state run — and records both wall
times, the compile_seconds difference, throughput, core-hours, peak memory,
and output size. The warm numbers are the headline: at catalogue scale the
one-time compile amortizes away, so steady state is what a year-long run
actually sees. The cold points are kept beside it because a GPU's compile is
larger than a CPU's, which can mask the device's advantage at small event
counts.
The charts are scatter plots over the waveform model (x-axis): each
backend/method/hardware cell is a coloured point, so the hardware comparison
lives in the colour legend and models sit side by side. Cold and warm appear as
different point shapes, and models are sorted with the best result on the
left (highest throughput, lowest wall time / memory). The gwmock-signal
version each cell was produced with is shown in the tooltip and the table.
The charts are interactive — hover for exact values, click a cell in the legend to isolate it, and use the chart menu to export. The table below is sortable and searchable.
| cell | model | device | gwmock-signal | warm ev/s | cold wall (s) | warm wall (s) | compile (s) | peak mem (GB) | output (GB) | contributor |
|---|---|---|---|---|---|---|---|---|---|---|
| lal per-event (CPU) | IMRPhenomD | AMD EPYC 7643 48-Core Processor | 0.9.0 | 34 | 151 | 148 | 3.0 | 2.0 | 0.81 | |
| lal per-event (CPU) | IMRPhenomD | Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz | 0.9.0 | 26 | 192 | 191 | 1.2 | 2.0 | 0.81 | |
| pycbc per-event (CPU) | IMRPhenomD | AMD EPYC 7643 48-Core Processor | 0.9.0 | 9 | 569 | 566 | 3.9 | 2.4 | 0.81 | |
| pycbc per-event (CPU) | IMRPhenomD | Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz | 0.9.0 | 7 | 767 | 767 | 0.2 | 2.4 | 0.81 | |
| ripple batched (CPU) | IMRPhenomD | AMD EPYC 7643 48-Core Processor | 0.9.0 | 262 | 27 | 19 | 7.7 | 11.1 | 0.81 | |
| ripple batched (CPU) | IMRPhenomD | Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz | 0.9.0 | 177 | 37 | 28 | 8.4 | 9.5 | 0.81 | |
| ripple batched (GPU a30) | IMRPhenomD | NVIDIA A30 | 0.9.0 | 420 | 27 | 12 | 15.1 | 6.6 | 0.81 | |
| ripple batched (GPU) | IMRPhenomD | NVIDIA GeForce RTX 5060 Ti | 0.9.0 | 225 | 36 | 22 | 13.5 | 6.5 | 0.81 | |
| ripple per-event (CPU) | IMRPhenomD | AMD EPYC 7643 48-Core Processor | 0.9.0 | 3 | 1907 | 1999 | 0.0 | 5.6 | 0.81 | |
| ripple per-event (CPU) | IMRPhenomD | Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz | 0.9.0 | 2 | 2240 | 2312 | 0.0 | 5.7 | 0.81 |
Reproduce / contribute
uv run --extra signal gwmock-benchmark signal performance \
--backend ripple --method batched --n-events 5000 \
-o data/signal/performance/ripple_batched_<your-gpu>.json
Then open a pull request adding the data file — figures and tables regenerate automatically. See Contribute a benchmark.