Generating Data¶
This guide shows how to use gwmock to create realistic mock data for
gravitational-wave detectors.
It uses the Einstein Telescope (ET) triangular configuration located in the Meuse-Rhine Euregion as an example.
For an overview of all example configuration files for ET simulations, see the Examples page. For a quick guide on reading and working with the output GWF files, see the Reading Data page.
Generating Detector Noise¶
Detector noise can be generated using configuration files in the
examples/noise
directory. An example configuration for producing two hours of ET noise data is
provided in
uncorrelated_gaussian/et_triangle_emr/config.yaml:
globals:
# Runtime arguments shared by simulators (can be overridden per-simulator)
simulator-arguments:
# Sampling frequency in Hz (affects time resolution and file size)
sampling-frequency: 4096
# Duration of each generated segment in seconds
duration: 4096
# Human-friendly total duration (may be parsed by the config loader)
total-duration: 1 day
# GPS start time for the simulation
start-time: 1577491218
# Working directory for generated outputs (relative or absolute)
working-directory: .
# Default directory where generated frame files are written
output-directory: output
# Directory to store metadata sidecars and checkpoints
metadata-directory: metadata
simulators:
# Uncorrelated Gaussian/Colored noise simulator config for ET triangular detectors
noise:
class: gwmock_noise.ColoredNoiseSimulator
arguments:
# PSD file describing the colored noise shape used by the simulator
psd_file: ET_10_full_cryo_psd.txt
# List of detector names to generate noise for (network support)
detectors:
- E1_triangle_emr
- E2_triangle_emr
- E3_triangle_emr
# Low frequency cutoff (Hz) to apply to the generated time series
low_frequency_cutoff: 2
# RNG seed for reproducibility; change per-run for different realizations
seed: 42
output:
# Template used to construct the output file name at runtime. Uses template variables
# such as `detectors`, `start_time` and `duration` expanded by the config loader.
file_name:
'E-{{ detectors }}_NOISE_STRAIN-{{ start_time }}-{{ duration
}}.gwf'
arguments:
# Channel name (template) for the output frame (e.g. "E1_triangle_emr:STRAIN")
channel: '{{ detectors }}:STRAIN'
This configuration generates one day of noise data per detector (E1, E2, E3). Each frame file covers 4096 seconds, resulting in 22 frame files, starting on 1 January 2030.
Noise is simulated using the ET_10_full_cryo_psd sensitivity curve from the CoBA Science Study and publicly available. A low-frequency cutoff of 2 Hz is used.
To generate the ET noise data, run:
# Create working directory
mkdir noise_et_triangle_emr
cd noise_et_triangle_emr
# Copy configuration file to your working directory
gwmock config --get noise/uncorrelated_gaussian/et_triangle_emr --output config.yaml
# Run simulation
gwmock simulate config.yaml
Storage Requirements¶
Each GWF file is approximately 123 MB. For three detectors with 21 files each:
- Data files: ~7.6 GB
- Metadata: ~52.5 KB
- Total: ~7.6 GB
Generating CBC Signals¶
Compact Binary Coalescence (CBC) signals can be generated using configuration
files in the
examples/signal/bbh
and
examples/signal/bns
directories.
Binary Black Hole (BBH) Signals¶
An example configuration for producing one day of ET data containing BBH signals
from a realistic population is provided in
signal/bbh/et_triangle_emr/config.yaml:
globals:
simulator-arguments:
# Sampling frequency in Hz (affects time resolution and file size)
sampling-frequency: 4096
# Duration of each generated segment in seconds
duration: 4096
# Human-friendly total duration (may be parsed by the config loader)
total-duration: 1 day
# GPS start time for the simulation
start-time: 1577491218
# Working directory for generated outputs (relative or absolute)
working-directory: .
# Default directory where generated frame files are written
output-directory: output
# Directory to store metadata sidecars and checkpoints
metadata-directory: metadata
orchestration:
population:
backend: FilePopulationLoader
source-type: bbh
n-samples: 128
arguments:
# Path or URL to population file (HDF5) containing simulated BBH events
path: https://sandbox.zenodo.org/records/413548/files/18321_1yrCatalogBBH.h5
signal:
# Waveform model and arguments for signal generation
waveform-model: IMRPhenomXPHM
# Additional waveform model arguments
waveform-arguments:
# Reference frequency in Hz for waveform generation
reference_frequency: 50
# Minimum frequency in Hz for the waveform generation
minimum-frequency: 2
# List of detector names to generate signal for (network support)
detectors:
- ET-Triangle-EMR
output:
output_directory: signal
# Template used to construct the output file name at runtime. Uses template variables
# such as `detectors`, `start_time` and `duration` expanded by the config loader.
file_name:
'E-{{ detectors }}_STRAIN_BBH-{{ start_time }}-{{ duration
}}.gwf'
arguments:
# Channel name (template) for the output frame (e.g. "E1_triangle_emr:STRAIN")
channel: '{{ detectors }}:STRAIN'
noise:
arguments:
psd_file: ET_10_full_cryo_psd.txt
seed: 42
detectors:
- E1_triangle_emr
- E2_triangle_emr
- E3_triangle_emr
output:
output_directory: noise
file_name: et-noise-{{ counter }}.gwf
arguments:
channel_prefix: STRAIN
As with the noise example, this configuration file produces one day of data per detectors, with each frame file lasting 4096 seconds (for a total of 21 frame files), starting on 1 January 2030.
BBH signals are injected into zero noise from the 18321_1yrCatalogBBH.h5 population file used in the CoBA study and publicly available. The IMRPhenomXPHM waveform model is used, with a low-frequency cutoff of 2 Hz and including Earth rotation effects.
To generate the ET data with BBH signals, run:
# Create working directory
mkdir bbh_et_triangle_emr
cd bbh_et_triangle_emr
# Copy configuration file to your working directory
gwmock config --get signal/bbh/et_triangle_emr --output config.yaml
# Run simulation
gwmock simulate config.yaml
Note
The configuration file automatically downloads the BBH population file from a Zenodo repository.
The file is saved in a cache directory (by default, ~/.gwmock/population/).
When the same population file is needed again, gwmock uses the cached copy to avoid re-downloading.
Binary Neutron Star (BNS) Signals¶
An example configuration for producing one day of ET data containing BNS signals
from a realistic population is provided in
signal/bns/et_triangle_emr/config.yaml.
It is equivalent to the BBH example configuration, except for:
population_file: https://sandbox.zenodo.org/records/413548/files/18321_1yrCatalogBNS.h5
waveform_model: IMRPhenomPv2_NRTidalv2
BNS signals are injected into zero noise from the 18321_1yrCatalogBNS.h5 population file used in the CoBA study and publicly available. The IMRPhenomPv2_NRTidalv2 waveform model is used, with a low-frequency cutoff of 2 Hz and including Earth rotation effects.
To generate the ET data with BNS signals, run:
# Create working directory
mkdir bns_et_triangle_emr
cd bns_et_triangle_emr
# Copy configuration file to your working directory for BNS simulation
gwmock config --get signal/bns/et_triangle_emr --output config.yaml
# Run simulation
gwmock simulate config.yaml
Note
The configuration file automatically downloads the BNS population file from a Zenodo repository.
The file is saved in a cache directory (by default, ~/.gwmock/population/).
When the same population file is needed again, gwmock uses the cached copy to avoid re-downloading.
Generating Transient Noise Artifacts (Glitches)¶
Glitches can be generated using configuration files in the
examples/glitch
directory. These examples attach
gwmock-noise
glitch models through orchestration.noise.arguments.glitches, so glitches are
treated as detector artifacts in the protocol-only pipeline. The bundled gengli
integration currently supports only blip glitches.
An example configuration for producing one day of ET data for the E1 detector
containing blip glitches from a realistic population is provided in
glitch/gengli/et_triangle_emr/e1/config.yaml:
globals:
simulator-arguments:
sampling-frequency: 4096
duration: 4096
total-duration: 1 day
start-time: 1577491218
working-directory: .
output-directory: output
metadata-directory: metadata
orchestration:
population:
backend: FilePopulationLoader
source-type: bbh
n-samples: 1
arguments:
path: examples/glitch/gengli/bbh_smoke_population.csv
signal:
waveform-model: IMRPhenomXPHM
minimum-frequency: 2
detectors:
- ET1_EMR
output:
output_directory: signal
file_name:
signal-{{ detectors }}-{{ start_time }}-{{ duration }}.gwf
arguments:
channel: '{{ detectors }}:STRAIN'
noise:
arguments:
seed: 42
detectors:
- E1_triangle_emr
glitches:
- kind: gengli_blip
rate: 0.0011111111111111111
amplitude_distribution:
distribution: lognormal
mean: 1.0
std: 0.0
population_file: https://sandbox.zenodo.org/records/413548/files/blip_glitch_population_E1.hdf5
psd_file: ET_10_full_cryo_psd.txt
low_frequency_cutoff: 5.0
output:
output_directory: noise
file_name: glitch-{{ counter }}.gwf
arguments:
channel_prefix: STRAIN
This configuration file generates one day of data for the E1 detector, divided into 4096-second frame files (for a total of 21 frames), starting on 1 January 2030.
Blip glitches are injected into zero noise from the
blip_glitch_population_E1.h5
population file, which can be generated from GravitySpy tables with
gwmock-noise build-blip-glitch-table. These glitches are modeled on LIGO blip
glitches observed during the O3 observing run and recolored to match the ET
sensitivity.
To generate the ET data for detector E1 with glitches, run:
# Create working directory
mkdir -p glitch_et_triangle_emr/e1
cd glitch_et_triangle_emr/e1
# Copy configuration file to your working directory for glitch simulation
gwmock config --get glitch/gengli/et_triangle_emr/e1 --output config.yaml
# Run simulation
gwmock simulate config.yaml
Note
The configuration file automatically downloads the glitch population file from a
Zenodo repository.
The file is saved in a cache directory (by default, ~/.gwmock/population/).
When the same population file is needed again, gwmock uses the cached copy to avoid re-downloading.
Using Different Detector Configurations¶
gwmock includes several pre-configured Einstein Telescope detector geometries,
available in
gwmock/detector/detectors:
Triangular Configuration (Meuse-Rhine Euregion)
E1_triangle_emrE2_triangle_emrE3_triangle_emr
Triangular Configuration (Sardinia)
E1_triangle_sardiniaE2_triangle_sardiniaE3_triangle_sardinia
2L Aligned Configuration
E1_2L_aligned_sardiniaE2_2L_aligned_emr
2L Misaligned Configuration
E1_2L_misaligned_sardiniaE2_2L_misaligned_emr
To use a specific configuration, update the detectors list in your
configuration file:
detectors:
- E1_2L_aligned_sardinia
- E2_2L_aligned_emr
You don't need to include all detectors. For example, to generate only E1 data:
detectors:
- E1_2L_aligned_sardinia
Using Different Sensitivity Curves¶
Multiple Einstein Telescope sensitivity curves (PSD files) are available in
gwmock/detector/noise_curves/.
These correspond to those used in the CoBA study.
To use a specific sensitivity curve:
simulators:
noise:
arguments:
psd: ET_15_HF_psd.txt
Note
The detector geometries assume 10 km arms for triangular configurations and 15 km arms for 2L configurations. Choose sensitivity curves accordingly.
Adjusting Dataset Duration¶
The length of a dataset is controlled by:
start-time: # GPS start time of the dataset
duration: # Duration per frame file (seconds)
total-duration: # Total duration of the dataset
To change the dataset duration, simply adjust these parameters in your configuration file.
You can also change the sampling frequency of your dataset (the number of
samples per second, measured in Hz), using the sampling-frequency argument.
Total number of frame files:
The total number of frame files depends on the duration of each frame file and the total duration of the dataset, and it's rounded up to the next integer:
max_samples = ceil(total-duration / duration)
Note
The total-duration argument can be passed as a float in seconds, or as a str specifying the time unit
("1 day", "5 days", "2 weeks", "2 months", etc.).
The supported time units are:
secondminutehourdayweekmonth(30 days)year(365 days).
Singular and plural forms are both accepted (e.g., "1 day" and "2 days").
Tip
A UTC/GPS time converter is available at the Gravitational Wave Open Science Center.
Tip
Sampling frequencies are often powers of 2 for efficiency. Common choices:
- 4096 Hz (standard for GW data analysis)
- 2048 Hz
- 16384 Hz (high-frequency instruments)
Lowering sampling frequency reduces computation time but also reduces the highest resolvable frequency (Nyquist limit = sampling_frequency / 2).
Generate Multi-Detector Correlated Noise¶
You can generate multi-detector correlated noise by specifying a cross-power spectral density (CSD) file:
Warning
The example configuration file is not fully tested yet. Use at your own risk.
globals:
simulator-arguments:
sampling-frequency: 4096
duration: 4096
total-duration: '1 day'
start-time: 1577491218
working-directory: './ET_Triangle_EMR_correlated_noise'
output-directory: 'data'
metadata-directory: 'metadata'
simulators:
noise:
class: gwmock_noise.CorrelatedNoiseSimulator
arguments:
psd_file: ET_10_full_cryo_psd.txt
csd_file: path_to_csd_file.txt
detectors:
- E1_Triangle_EMR
- E2_Triangle_EMR
- E3_Triangle_EMR
low_frequency_cutoff: 2
seed: 42
output:
file_name:
'E-{{ detectors }}_CORRELATED-NOISE_STRAIN-{{ start_time }}-{{
duration }}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
gwmock uses a windowing approach to generate long-duration datasets. If the
input CSD varies rapidly with frequency, this windowing can introduce artifacts
in the resulting frame files.
A diagnostic tool to check whether your CSD file is susceptible to such issues will be provided soon.
Resume Interrupted Simulations¶
If a simulation is interrupted, resume it by running the same command:
# Start
gwmock simulate config.yaml
# If interrupted, resume
gwmock simulate config.yaml
gwmock automatically detects and continues from the last checkpoint.
Combining Data Types¶
To create realistic mock data, you may generate noise, signals, and glitches separately, then combine them:
gwmock simulate noise_config.yaml
gwmock simulate signal_config.yaml
gwmock simulate glitch_config.yaml
Then merge the files using GWpy (see Reading Data for details).