Performance¶

Cost of generating a CBC catalogue data product with gwmock-signal, across backends (lal, pycbc, ripple), methods (per-event vs the batched on-device path), and hardware.

Each run produces the catalogue twice — a cold run that pays one-time JIT/XLA compilation and a warm steady-state run — and records both wall times, the compile_seconds difference, throughput, core-hours, peak memory, and output size. The warm numbers are the headline: at catalogue scale the one-time compile amortizes away, so steady state is what a year-long run actually sees. The cold points are kept beside it because a GPU's compile is larger than a CPU's, which can mask the device's advantage at small event counts.

The charts are scatter plots over the waveform model (x-axis): each backend/method/hardware cell is a coloured point, so the hardware comparison lives in the colour legend and models sit side by side. Cold and warm appear as different point shapes, and models are sorted with the best result on the left (highest throughput, lowest wall time / memory). The gwmock-signal version each cell was produced with is shown in the tooltip and the table.

The charts are interactive — hover for exact values, click a cell in the legend to isolate it, and use the chart menu to export. The table below is sortable and searchable.

cell	model	device	gwmock-signal	warm ev/s	cold wall (s)	warm wall (s)	compile (s)	peak mem (GB)	output (GB)
lal per-event (CPU)	IMRPhenomD	AMD EPYC 7643 48-Core Processor	0.9.0	34	151	148	3.0	2.0	0.81
lal per-event (CPU)	IMRPhenomD	Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz	0.9.0	26	192	191	1.2	2.0	0.81
pycbc per-event (CPU)	IMRPhenomD	AMD EPYC 7643 48-Core Processor	0.9.0	9	569	566	3.9	2.4	0.81
pycbc per-event (CPU)	IMRPhenomD	Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz	0.9.0	7	767	767	0.2	2.4	0.81
ripple batched (CPU)	IMRPhenomD	AMD EPYC 7643 48-Core Processor	0.9.0	262	27	19	7.7	11.1	0.81
ripple batched (CPU)	IMRPhenomD	Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz	0.9.0	177	37	28	8.4	9.5	0.81
ripple batched (GPU a30)	IMRPhenomD	NVIDIA A30	0.9.0	420	27	12	15.1	6.6	0.81
ripple batched (GPU)	IMRPhenomD	NVIDIA GeForce RTX 5060 Ti	0.9.0	225	36	22	13.5	6.5	0.81
ripple per-event (CPU)	IMRPhenomD	AMD EPYC 7643 48-Core Processor	0.9.0	3	1907	1999	0.0	5.6	0.81
ripple per-event (CPU)	IMRPhenomD	Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz	0.9.0	2	2240	2312	0.0	5.7	0.81

Reproduce / contribute

uv run --extra signal gwmock-benchmark signal performance \
    --backend ripple --method batched --n-events 5000 \
    -o data/signal/performance/ripple_batched_<your-gpu>.json

Then open a pull request adding the data file — figures and tables regenerate automatically. See Contribute a benchmark.