Skip to content

Generating Data

This guide shows how to use gwmock to create realistic mock data for gravitational-wave detectors.

It uses the Einstein Telescope (ET) triangular configuration located in the Meuse-Rhine Euregion as an example.

For an overview of all example configuration files for ET simulations, see the Examples page. For a quick guide on reading and working with the output GWF files, see the Reading Data page.

Generating Detector Noise

Detector noise can be generated using configuration files in the examples/noise directory. An example configuration for producing one day of ET noise data is provided in uncorrelated_gaussian/et_triangle_emr/config.yaml:

globals:
    simulator-arguments:
        sampling-frequency: 4096
        duration: 4096
        total-duration: 1 day
        start-time: 1577491218
    working-directory: .
    output-directory: output
    metadata-directory: metadata

orchestration:
    noise:
        arguments:
            psd_file: ET_10_full_cryo_psd
            seed: 42
            minimum_frequency: 3
            detectors:
                - ET-Triangle-EMR
        output:
            output_directory: noise
            file_name:
                'E-{{ detectors }}_STRAIN_NOISE-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

This configuration uses the ET-Triangle-EMR network alias, which expands to its three interferometers (ET1_EMR, ET2_EMR, ET3_EMR), generating one day of noise data per interferometer. Each frame file covers 4096 seconds, resulting in 21 frame files per interferometer, starting on 1 January 2030.

Noise is simulated using the ET_10_full_cryo_psd sensitivity curve from the CoBA Science Study and publicly available. A low-frequency cutoff of 3 Hz is used.

To generate the ET noise data, run:

# Create working directory
mkdir noise_et_triangle_emr
cd noise_et_triangle_emr

# Copy configuration file to your working directory
gwmock config --get noise/uncorrelated_gaussian/et_triangle_emr --output config.yaml

# Run simulation
gwmock simulate config.yaml

Storage Requirements

Each GWF file is approximately 123 MB. For three detectors with 21 files each:

  • Data files: ~7.6 GB
  • Metadata: ~52.5 KB
  • Total: ~7.6 GB

Generating CBC Signals

Compact Binary Coalescence (CBC) signals can be generated using configuration files in the examples/signal/bbh directory.

Binary Black Hole (BBH) Signals

An example configuration for producing one day of ET data containing BBH signals is provided in signal/bbh/et_triangle_emr/config.yaml:

globals:
    simulator-arguments:
        sampling-frequency: 4096
        duration: 4096
        total-duration: 1 day
        start-time: 1000000540
    working-directory: .
    output-directory: output
    metadata-directory: metadata

orchestration:
    population:
        backend: FilePopulationLoader
        source-type: bbh
        arguments:
            path: 'https://sandbox.zenodo.org/records/514722/files/mdc1_bbh.h5'
    signal:
        waveform-model: IMRPhenomXPHM
        minimum-frequency: 10
        earth-rotation: true
        detectors:
            - ET-Triangle-EMR
        output:
            output_directory: signal
            file_name:
                'E-{{ detectors }}_STRAIN_BBH-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

As with the noise example, this configuration file produces one day of data per interferometer, with each frame file lasting 4096 seconds (for a total of 21 frame files), starting at GPS 1000000540 (14 September 2011).

BBH signals are injected into zero noise. The population is loaded with the FilePopulationLoader from the MDC1 BBH catalogue hosted on Zenodo (mdc1_bbh.h5); because n-samples is omitted, the full catalogue is used. The IMRPhenomXPHM waveform model is used, with a low-frequency cutoff of 10 Hz and the time-dependent detector response enabled (earth-rotation: true).

To generate the ET data with BBH signals, run:

# Create working directory
mkdir bbh_et_triangle_emr
cd bbh_et_triangle_emr

# Copy configuration file to your working directory
gwmock config --get signal/bbh/et_triangle_emr --output config.yaml

# Run simulation
gwmock simulate config.yaml

Binary Neutron Star (BNS) Signals

BNS datasets are produced the same way, using the signal/bns/* examples. These set population.source-type: bns, load the MDC1 BNS catalogue from Zenodo (mdc1_bns.h5), and use the tidal IMRPhenomPv2_NRTidalv2 waveform model with a low-frequency cutoff of 20 Hz:

globals:
    simulator-arguments:
        sampling-frequency: 4096
        duration: 4096
        total-duration: 1 day
        start-time: 1000000540
    working-directory: .
    output-directory: output
    metadata-directory: metadata

orchestration:
    population:
        backend: FilePopulationLoader
        source-type: bns
        arguments:
            path: 'https://sandbox.zenodo.org/records/514722/files/mdc1_bns.h5'
    signal:
        waveform-model: IMRPhenomPv2_NRTidalv2
        minimum-frequency: 20
        earth-rotation: true
        detectors:
            - ET-Triangle-EMR
        output:
            output_directory: signal
            file_name:
                'E-{{ detectors }}_STRAIN_BNS-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

To generate the ET data with BNS signals, run:

gwmock config --get signal/bns/et_triangle_emr --output config.yaml
gwmock simulate config.yaml

Generating Transient Noise Artifacts (Glitches)

Glitches can be generated using configuration files in the examples/noise/glitches directory. These examples attach gwmock-noise glitch models through orchestration.noise.arguments.glitches, so glitches are treated as detector artifacts in the protocol-only pipeline. The bundled gengli integration currently supports only blip glitches.

An example configuration for producing one day of ET data for the E1 detector containing blip glitches from a realistic population is provided in noise/glitches/gengli/et_triangle_emr/e1/config.yaml:

globals:
    simulator-arguments:
        sampling-frequency: 4096
        duration: 4096
        total-duration: 1 day
        start-time: 1577491218
    working-directory: .
    output-directory: output
    metadata-directory: metadata

orchestration:
    noise:
        arguments:
            seed: 42
            detectors:
                - E1_triangle_emr
            glitches:
                - kind: gengli_blip
                  rate: 0.016666667 # 1 glitch per minute (1/60 Hz)
                  amplitude_distribution: # no amplitude scaling (multiply by 1.0)
                      distribution: lognormal
                      mean: 1.0
                      std: 0.0
                  population_file: https://sandbox.zenodo.org/records/514722/files/blip_glitch_population_E1.hdf5
                  psd_file: ET_10_full_cryo_psd
                  low_frequency_cutoff: 5.0
        output:
            output_directory: glitch
            file_name:
                'E-{{ detectors }}_STRAIN_GLITCH-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

This configuration file generates one day of data for the E1 detector, divided into 4096-second frame files (for a total of 21 frames), starting on 1 January 2030.

Blip glitches are injected into zero noise from the blip_glitch_population_E1.hdf5 population file, which can be generated from GravitySpy tables with gwmock-noise build-blip-glitch-table. These glitches are modeled on LIGO blip glitches observed during the O3 observing run and recolored to match the ET sensitivity.

To generate the ET data for detector E1 with glitches, run:

# Create working directory
mkdir -p glitch_et_triangle_emr/e1
cd glitch_et_triangle_emr/e1

# Copy configuration file to your working directory for glitch simulation
gwmock config --get noise/glitches/gengli/et_triangle_emr/e1 --output config.yaml

# Run simulation
gwmock simulate config.yaml

Note

The configuration file automatically downloads the glitch population file from a Zenodo repository. The file is saved in a cache directory (by default, ~/.gwmock/population/). When the same population file is needed again, gwmock uses the cached copy to avoid re-downloading.

Using Different Detector Configurations

gwmock includes several pre-configured Einstein Telescope detector geometries, available in gwmock/detector/detectors:

Triangular Configuration (Meuse-Rhine Euregion)

  • E1_triangle_emr
  • E2_triangle_emr
  • E3_triangle_emr

Triangular Configuration (Sardinia)

  • E1_triangle_sardinia
  • E2_triangle_sardinia
  • E3_triangle_sardinia

2L Aligned Configuration

  • E1_2L_aligned_sardinia
  • E2_2L_aligned_emr

2L Misaligned Configuration

  • E1_2L_misaligned_sardinia
  • E2_2L_misaligned_emr

To use a specific configuration, update the detectors list in your configuration file:

detectors:
    - E1_2L_aligned_sardinia
    - E2_2L_aligned_emr

You don't need to include all detectors. For example, to generate only E1 data:

detectors:
    - E1_2L_aligned_sardinia

Using Different Sensitivity Curves

Multiple Einstein Telescope sensitivity curves (PSD files) are available in gwmock/detector/noise_curves/. These correspond to those used in the CoBA study.

To use a specific sensitivity curve, set psd_file in the noise arguments:

orchestration:
    noise:
        arguments:
            psd_file: ET_15_HF_psd.txt

Note

The detector geometries assume 10 km arms for triangular configurations and 15 km arms for 2L configurations. Choose sensitivity curves accordingly.

Adjusting Dataset Duration

The length of a dataset is controlled by:

start-time: # GPS start time of the dataset
duration: # Duration per frame file (seconds)
total-duration: # Total duration of the dataset

To change the dataset duration, simply adjust these parameters in your configuration file.

You can also change the sampling frequency of your dataset (the number of samples per second, measured in Hz), using the sampling-frequency argument.

Total number of frame files:

The total number of frame files depends on the duration of each frame file and the total duration of the dataset, and it's rounded to the nearest integer:

max_samples = round(total-duration / duration)

For example, a one-day dataset (86400 s) in 4096-second frames yields round(86400 / 4096) = 21 frame files per interferometer.

Note

The total-duration argument can be passed as a float in seconds, or as a str specifying the time unit ("1 day", "5 days", "2 weeks", "2 months", etc.). The supported time units are:

  • second
  • minute
  • hour
  • day
  • week
  • month (30 days)
  • year (365 days).

Singular and plural forms are both accepted (e.g., "1 day" and "2 days").

Tip

A UTC/GPS time converter is available at the Gravitational Wave Open Science Center.

Tip

Sampling frequencies are often powers of 2 for efficiency. Common choices:

  • 4096 Hz (standard for GW data analysis)
  • 2048 Hz
  • 16384 Hz (high-frequency instruments)

Lowering sampling frequency reduces computation time but also reduces the highest resolvable frequency (Nyquist limit = sampling_frequency / 2).

Generate Multi-Detector Correlated Noise

You can generate multi-detector correlated noise by specifying a cross-power spectral density (CSD) file via the orchestration.noise backend. Pass csd_file as a noise argument:

Warning

Correlated noise generation is experimental and not fully tested. Use at your own risk.

globals:
    simulator-arguments:
        sampling-frequency: 4096
        duration: 4096
        total-duration: '1 day'
        start-time: 1577491218
    working-directory: .
    output-directory: output
    metadata-directory: metadata

orchestration:
    noise:
        arguments:
            psd_file: ET_10_full_cryo_psd
            csd_file: path_to_csd_file.txt
            detectors:
                - ET-Triangle-EMR
            minimum_frequency: 3
            seed: 42
        output:
            output_directory: noise
            file_name:
                'E-{{ detectors }}_STRAIN_CORRELATED-NOISE-{{ start_time }}-{{
                duration }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

gwmock uses a windowing approach to generate long-duration datasets. If the input CSD varies rapidly with frequency, this windowing can introduce artifacts in the resulting frame files.

A diagnostic tool to check whether your CSD file is susceptible to such issues will be provided soon.

Resume Interrupted Simulations

If a simulation is interrupted, resume it by running the same command:

# Start
gwmock simulate config.yaml

# If interrupted, resume
gwmock simulate config.yaml

gwmock automatically detects and continues from the last checkpoint.

Combining Data Types

To create realistic mock data, you may generate noise, signals, and glitches separately, then combine them:

gwmock simulate noise_config.yaml
gwmock simulate signal_config.yaml
gwmock simulate glitch_config.yaml

Then merge the files using GWpy (see Reading Data for details).