Generating Data¶
This guide shows how to use gwmock to create realistic mock data for
gravitational-wave detectors.
It uses the Einstein Telescope (ET) triangular configuration located in the Meuse-Rhine Euregion as an example.
For an overview of all example configuration files for ET simulations, see the Examples page. For a quick guide on reading and working with the output GWF files, see the Reading Data page.
Generating Detector Noise¶
Detector noise can be generated using configuration files in the
examples/noise
directory. An example configuration for producing one day of ET noise data is
provided in
uncorrelated_gaussian/et_triangle_emr/config.yaml:
globals:
simulator-arguments:
sampling-frequency: 4096
duration: 4096
total-duration: 1 day
start-time: 1577491218
working-directory: .
output-directory: output
metadata-directory: metadata
orchestration:
noise:
arguments:
psd_file: ET_10_full_cryo_psd
seed: 42
minimum_frequency: 3
detectors:
- ET-Triangle-EMR
output:
output_directory: noise
file_name:
'E-{{ detectors }}_STRAIN_NOISE-{{ start_time }}-{{ duration
}}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
This configuration uses the ET-Triangle-EMR network alias, which expands to
its three interferometers (ET1_EMR, ET2_EMR, ET3_EMR), generating one day
of noise data per interferometer. Each frame file covers 4096 seconds, resulting
in 21 frame files per interferometer, starting on 1 January 2030.
Noise is simulated using the ET_10_full_cryo_psd sensitivity curve from the CoBA Science Study and publicly available. A low-frequency cutoff of 3 Hz is used.
To generate the ET noise data, run:
# Create working directory
mkdir noise_et_triangle_emr
cd noise_et_triangle_emr
# Copy configuration file to your working directory
gwmock config --get noise/uncorrelated_gaussian/et_triangle_emr --output config.yaml
# Run simulation
gwmock simulate config.yaml
Storage Requirements¶
Each GWF file is approximately 123 MB. For three detectors with 21 files each:
- Data files: ~7.6 GB
- Metadata: ~52.5 KB
- Total: ~7.6 GB
Generating CBC Signals¶
Compact Binary Coalescence (CBC) signals can be generated using configuration
files in the
examples/signal/bbh
directory.
Binary Black Hole (BBH) Signals¶
An example configuration for producing one day of ET data containing BBH signals
is provided in
signal/bbh/et_triangle_emr/config.yaml:
globals:
simulator-arguments:
sampling-frequency: 4096
duration: 4096
total-duration: 1 day
start-time: 1000000540
working-directory: .
output-directory: output
metadata-directory: metadata
orchestration:
population:
backend: FilePopulationLoader
source-type: bbh
arguments:
path: 'https://sandbox.zenodo.org/records/514722/files/mdc1_bbh.h5'
signal:
waveform-model: IMRPhenomXPHM
minimum-frequency: 10
earth-rotation: true
detectors:
- ET-Triangle-EMR
output:
output_directory: signal
file_name:
'E-{{ detectors }}_STRAIN_BBH-{{ start_time }}-{{ duration
}}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
As with the noise example, this configuration file produces one day of data per interferometer, with each frame file lasting 4096 seconds (for a total of 21 frame files), starting at GPS 1000000540 (14 September 2011).
BBH signals are injected into zero noise. The population is loaded with the
FilePopulationLoader from the MDC1 BBH catalogue hosted on Zenodo
(mdc1_bbh.h5); because n-samples is omitted, the full catalogue is used. The
IMRPhenomXPHM
waveform model is used, with a low-frequency cutoff of 10 Hz and the
time-dependent detector response enabled (earth-rotation: true).
To generate the ET data with BBH signals, run:
# Create working directory
mkdir bbh_et_triangle_emr
cd bbh_et_triangle_emr
# Copy configuration file to your working directory
gwmock config --get signal/bbh/et_triangle_emr --output config.yaml
# Run simulation
gwmock simulate config.yaml
Binary Neutron Star (BNS) Signals¶
BNS datasets are produced the same way, using the signal/bns/* examples. These
set population.source-type: bns, load the MDC1 BNS catalogue from Zenodo
(mdc1_bns.h5), and use the tidal
IMRPhenomPv2_NRTidalv2
waveform model with a low-frequency cutoff of 20 Hz:
globals:
simulator-arguments:
sampling-frequency: 4096
duration: 4096
total-duration: 1 day
start-time: 1000000540
working-directory: .
output-directory: output
metadata-directory: metadata
orchestration:
population:
backend: FilePopulationLoader
source-type: bns
arguments:
path: 'https://sandbox.zenodo.org/records/514722/files/mdc1_bns.h5'
signal:
waveform-model: IMRPhenomPv2_NRTidalv2
minimum-frequency: 20
earth-rotation: true
detectors:
- ET-Triangle-EMR
output:
output_directory: signal
file_name:
'E-{{ detectors }}_STRAIN_BNS-{{ start_time }}-{{ duration
}}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
To generate the ET data with BNS signals, run:
gwmock config --get signal/bns/et_triangle_emr --output config.yaml
gwmock simulate config.yaml
Generating Transient Noise Artifacts (Glitches)¶
Glitches can be generated using configuration files in the
examples/noise/glitches
directory. These examples attach
gwmock-noise
glitch models through orchestration.noise.arguments.glitches, so glitches are
treated as detector artifacts in the protocol-only pipeline. The bundled gengli
integration currently supports only blip glitches.
An example configuration for producing one day of ET data for the E1 detector
containing blip glitches from a realistic population is provided in
noise/glitches/gengli/et_triangle_emr/e1/config.yaml:
globals:
simulator-arguments:
sampling-frequency: 4096
duration: 4096
total-duration: 1 day
start-time: 1577491218
working-directory: .
output-directory: output
metadata-directory: metadata
orchestration:
noise:
arguments:
seed: 42
detectors:
- E1_triangle_emr
glitches:
- kind: gengli_blip
rate: 0.016666667 # 1 glitch per minute (1/60 Hz)
amplitude_distribution: # no amplitude scaling (multiply by 1.0)
distribution: lognormal
mean: 1.0
std: 0.0
population_file: https://sandbox.zenodo.org/records/514722/files/blip_glitch_population_E1.hdf5
psd_file: ET_10_full_cryo_psd
low_frequency_cutoff: 5.0
output:
output_directory: glitch
file_name:
'E-{{ detectors }}_STRAIN_GLITCH-{{ start_time }}-{{ duration
}}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
This configuration file generates one day of data for the E1 detector, divided into 4096-second frame files (for a total of 21 frames), starting on 1 January 2030.
Blip glitches are injected into zero noise from the
blip_glitch_population_E1.hdf5
population file, which can be generated from GravitySpy tables with
gwmock-noise build-blip-glitch-table. These glitches are modeled on LIGO blip
glitches observed during the O3 observing run and recolored to match the ET
sensitivity.
To generate the ET data for detector E1 with glitches, run:
# Create working directory
mkdir -p glitch_et_triangle_emr/e1
cd glitch_et_triangle_emr/e1
# Copy configuration file to your working directory for glitch simulation
gwmock config --get noise/glitches/gengli/et_triangle_emr/e1 --output config.yaml
# Run simulation
gwmock simulate config.yaml
Note
The configuration file automatically downloads the glitch population file from a
Zenodo repository.
The file is saved in a cache directory (by default, ~/.gwmock/population/).
When the same population file is needed again, gwmock uses the cached copy to avoid re-downloading.
Using Different Detector Configurations¶
gwmock includes several pre-configured Einstein Telescope detector geometries,
available in
gwmock/detector/detectors:
Triangular Configuration (Meuse-Rhine Euregion)
E1_triangle_emrE2_triangle_emrE3_triangle_emr
Triangular Configuration (Sardinia)
E1_triangle_sardiniaE2_triangle_sardiniaE3_triangle_sardinia
2L Aligned Configuration
E1_2L_aligned_sardiniaE2_2L_aligned_emr
2L Misaligned Configuration
E1_2L_misaligned_sardiniaE2_2L_misaligned_emr
To use a specific configuration, update the detectors list in your
configuration file:
detectors:
- E1_2L_aligned_sardinia
- E2_2L_aligned_emr
You don't need to include all detectors. For example, to generate only E1 data:
detectors:
- E1_2L_aligned_sardinia
Using Different Sensitivity Curves¶
Multiple Einstein Telescope sensitivity curves (PSD files) are available in
gwmock/detector/noise_curves/.
These correspond to those used in the CoBA study.
To use a specific sensitivity curve, set psd_file in the noise arguments:
orchestration:
noise:
arguments:
psd_file: ET_15_HF_psd.txt
Note
The detector geometries assume 10 km arms for triangular configurations and 15 km arms for 2L configurations. Choose sensitivity curves accordingly.
Adjusting Dataset Duration¶
The length of a dataset is controlled by:
start-time: # GPS start time of the dataset
duration: # Duration per frame file (seconds)
total-duration: # Total duration of the dataset
To change the dataset duration, simply adjust these parameters in your configuration file.
You can also change the sampling frequency of your dataset (the number of
samples per second, measured in Hz), using the sampling-frequency argument.
Total number of frame files:
The total number of frame files depends on the duration of each frame file and the total duration of the dataset, and it's rounded to the nearest integer:
max_samples = round(total-duration / duration)
For example, a one-day dataset (86400 s) in 4096-second frames yields
round(86400 / 4096) = 21 frame files per interferometer.
Note
The total-duration argument can be passed as a float in seconds, or as a str specifying the time unit
("1 day", "5 days", "2 weeks", "2 months", etc.).
The supported time units are:
secondminutehourdayweekmonth(30 days)year(365 days).
Singular and plural forms are both accepted (e.g., "1 day" and "2 days").
Tip
A UTC/GPS time converter is available at the Gravitational Wave Open Science Center.
Tip
Sampling frequencies are often powers of 2 for efficiency. Common choices:
- 4096 Hz (standard for GW data analysis)
- 2048 Hz
- 16384 Hz (high-frequency instruments)
Lowering sampling frequency reduces computation time but also reduces the highest resolvable frequency (Nyquist limit = sampling_frequency / 2).
Generate Multi-Detector Correlated Noise¶
You can generate multi-detector correlated noise by specifying a cross-power
spectral density (CSD) file via the orchestration.noise backend. Pass
csd_file as a noise argument:
Warning
Correlated noise generation is experimental and not fully tested. Use at your own risk.
globals:
simulator-arguments:
sampling-frequency: 4096
duration: 4096
total-duration: '1 day'
start-time: 1577491218
working-directory: .
output-directory: output
metadata-directory: metadata
orchestration:
noise:
arguments:
psd_file: ET_10_full_cryo_psd
csd_file: path_to_csd_file.txt
detectors:
- ET-Triangle-EMR
minimum_frequency: 3
seed: 42
output:
output_directory: noise
file_name:
'E-{{ detectors }}_STRAIN_CORRELATED-NOISE-{{ start_time }}-{{
duration }}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
gwmock uses a windowing approach to generate long-duration datasets. If the
input CSD varies rapidly with frequency, this windowing can introduce artifacts
in the resulting frame files.
A diagnostic tool to check whether your CSD file is susceptible to such issues will be provided soon.
Resume Interrupted Simulations¶
If a simulation is interrupted, resume it by running the same command:
# Start
gwmock simulate config.yaml
# If interrupted, resume
gwmock simulate config.yaml
gwmock automatically detects and continues from the last checkpoint.
Combining Data Types¶
To create realistic mock data, you may generate noise, signals, and glitches separately, then combine them:
gwmock simulate noise_config.yaml
gwmock simulate signal_config.yaml
gwmock simulate glitch_config.yaml
Then merge the files using GWpy (see Reading Data for details).