Generating Data¶

This guide shows how to use gwmock to create realistic mock data for gravitational-wave detectors.

It uses the Einstein Telescope (ET) triangular configuration located in the Meuse-Rhine Euregion as an example.

For an overview of all example configuration files for ET simulations, see the Examples page. For a quick guide on reading and working with the output GWF files, see the Reading Data page.

Generating Detector Noise¶

Detector noise can be generated using configuration files in the examples/noise directory. An example configuration for producing two hours of ET noise data is provided in uncorrelated_gaussian/et_triangle_emr/config.yaml:

globals:
    # Runtime arguments shared by simulators (can be overridden per-simulator)
    simulator-arguments:
        # Sampling frequency in Hz (affects time resolution and file size)
        sampling-frequency: 4096
        # Duration of each generated segment in seconds
        duration: 4096
        # Human-friendly total duration (may be parsed by the config loader)
        total-duration: 1 day
        # GPS start time for the simulation
        start-time: 1577491218
    # Working directory for generated outputs (relative or absolute)
    working-directory: .
    # Default directory where generated frame files are written
    output-directory: output
    # Directory to store metadata sidecars and checkpoints
    metadata-directory: metadata

simulators:
    # Uncorrelated Gaussian/Colored noise simulator config for ET triangular detectors
    noise:
        class: gwmock_noise.ColoredNoiseSimulator
        arguments:
            # PSD file describing the colored noise shape used by the simulator
            psd_file: ET_10_full_cryo_psd.txt
            # List of detector names to generate noise for (network support)
            detectors:
                - E1_triangle_emr
                - E2_triangle_emr
                - E3_triangle_emr
            # Low frequency cutoff (Hz) to apply to the generated time series
            low_frequency_cutoff: 2
            # RNG seed for reproducibility; change per-run for different realizations
            seed: 42
        output:
            # Template used to construct the output file name at runtime. Uses template variables
            # such as `detectors`, `start_time` and `duration` expanded by the config loader.
            file_name:
                'E-{{ detectors }}_NOISE_STRAIN-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                # Channel name (template) for the output frame (e.g. "E1_triangle_emr:STRAIN")
                channel: '{{ detectors }}:STRAIN'

This configuration generates one day of noise data per detector (E1, E2, E3). Each frame file covers 4096 seconds, resulting in 22 frame files, starting on 1 January 2030.

Noise is simulated using the ET_10_full_cryo_psd sensitivity curve from the CoBA Science Study and publicly available. A low-frequency cutoff of 2 Hz is used.

To generate the ET noise data, run:

# Create working directory
mkdir noise_et_triangle_emr
cd noise_et_triangle_emr

# Copy configuration file to your working directory
gwmock config --get noise/uncorrelated_gaussian/et_triangle_emr --output config.yaml

# Run simulation
gwmock simulate config.yaml

Storage Requirements¶

Each GWF file is approximately 123 MB. For three detectors with 21 files each:

Data files: ~7.6 GB
Metadata: ~52.5 KB
Total: ~7.6 GB

Generating CBC Signals¶

Compact Binary Coalescence (CBC) signals can be generated using configuration files in the examples/signal/bbh and examples/signal/bns directories.

Binary Black Hole (BBH) Signals¶

An example configuration for producing one day of ET data containing BBH signals from a realistic population is provided in signal/bbh/et_triangle_emr/config.yaml:

globals:
    simulator-arguments:
        # Sampling frequency in Hz (affects time resolution and file size)
        sampling-frequency: 4096
        # Duration of each generated segment in seconds
        duration: 4096
        # Human-friendly total duration (may be parsed by the config loader)
        total-duration: 1 day
        # GPS start time for the simulation
        start-time: 1577491218
    # Working directory for generated outputs (relative or absolute)
    working-directory: .
    # Default directory where generated frame files are written
    output-directory: output
    # Directory to store metadata sidecars and checkpoints
    metadata-directory: metadata

orchestration:
    population:
        backend: FilePopulationLoader
        source-type: bbh
        n-samples: 128
        arguments:
            # Path or URL to population file (HDF5) containing simulated BBH events
            path: https://sandbox.zenodo.org/records/413548/files/18321_1yrCatalogBBH.h5
    signal:
        # Waveform model and arguments for signal generation
        waveform-model: IMRPhenomXPHM
        # Additional waveform model arguments
        waveform-arguments:
            # Reference frequency in Hz for waveform generation
            reference_frequency: 50
        # Minimum frequency in Hz for the waveform generation
        minimum-frequency: 2
        # List of detector names to generate signal for (network support)
        detectors:
            - ET-Triangle-EMR
        output:
            output_directory: signal
            # Template used to construct the output file name at runtime. Uses template variables
            # such as `detectors`, `start_time` and `duration` expanded by the config loader.
            file_name:
                'E-{{ detectors }}_STRAIN_BBH-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                # Channel name (template) for the output frame (e.g. "E1_triangle_emr:STRAIN")
                channel: '{{ detectors }}:STRAIN'
    noise:
        arguments:
            psd_file: ET_10_full_cryo_psd.txt
            seed: 42
            detectors:
                - E1_triangle_emr
                - E2_triangle_emr
                - E3_triangle_emr
        output:
            output_directory: noise
            file_name: et-noise-{{ counter }}.gwf
            arguments:
                channel_prefix: STRAIN

As with the noise example, this configuration file produces one day of data per detectors, with each frame file lasting 4096 seconds (for a total of 21 frame files), starting on 1 January 2030.

BBH signals are injected into zero noise from the 18321_1yrCatalogBBH.h5 population file used in the CoBA study and publicly available. The IMRPhenomXPHM waveform model is used, with a low-frequency cutoff of 2 Hz and including Earth rotation effects.

To generate the ET data with BBH signals, run:

# Create working directory
mkdir bbh_et_triangle_emr
cd bbh_et_triangle_emr

# Copy configuration file to your working directory
gwmock config --get signal/bbh/et_triangle_emr --output config.yaml

# Run simulation
gwmock simulate config.yaml

Note

The configuration file automatically downloads the BBH population file from a Zenodo repository. The file is saved in a cache directory (by default, ~/.gwmock/population/). When the same population file is needed again, gwmock uses the cached copy to avoid re-downloading.

Binary Neutron Star (BNS) Signals¶

An example configuration for producing one day of ET data containing BNS signals from a realistic population is provided in signal/bns/et_triangle_emr/config.yaml. It is equivalent to the BBH example configuration, except for:

population_file: https://sandbox.zenodo.org/records/413548/files/18321_1yrCatalogBNS.h5
waveform_model: IMRPhenomPv2_NRTidalv2

BNS signals are injected into zero noise from the 18321_1yrCatalogBNS.h5 population file used in the CoBA study and publicly available. The IMRPhenomPv2_NRTidalv2 waveform model is used, with a low-frequency cutoff of 2 Hz and including Earth rotation effects.

To generate the ET data with BNS signals, run:

# Create working directory
mkdir bns_et_triangle_emr
cd bns_et_triangle_emr

# Copy configuration file to your working directory for BNS simulation
gwmock config --get signal/bns/et_triangle_emr --output config.yaml

# Run simulation
gwmock simulate config.yaml

Note

The configuration file automatically downloads the BNS population file from a Zenodo repository. The file is saved in a cache directory (by default, ~/.gwmock/population/). When the same population file is needed again, gwmock uses the cached copy to avoid re-downloading.

Generating Transient Noise Artifacts (Glitches)¶

Glitches can be generated using configuration files in the examples/glitch directory. These examples attach gwmock-noise glitch models through orchestration.noise.arguments.glitches, so glitches are treated as detector artifacts in the protocol-only pipeline. The bundled gengli integration currently supports only blip glitches.

An example configuration for producing one day of ET data for the E1 detector containing blip glitches from a realistic population is provided in glitch/gengli/et_triangle_emr/e1/config.yaml:

globals:
    simulator-arguments:
        sampling-frequency: 4096
        duration: 4096
        total-duration: 1 day
        start-time: 1577491218
    working-directory: .
    output-directory: output
    metadata-directory: metadata

orchestration:
    population:
        backend: FilePopulationLoader
        source-type: bbh
        n-samples: 1
        arguments:
            path: examples/glitch/gengli/bbh_smoke_population.csv
    signal:
        waveform-model: IMRPhenomXPHM
        minimum-frequency: 2
        detectors:
            - ET1_EMR
        output:
            output_directory: signal
            file_name:
                signal-{{ detectors }}-{{ start_time }}-{{ duration }}.gwf
            arguments:
                channel: '{{ detectors }}:STRAIN'
    noise:
        arguments:
            seed: 42
            detectors:
                - E1_triangle_emr
            glitches:
                - kind: gengli_blip
                  rate: 0.0011111111111111111
                  amplitude_distribution:
                      distribution: lognormal
                      mean: 1.0
                      std: 0.0
                  population_file: https://sandbox.zenodo.org/records/413548/files/blip_glitch_population_E1.hdf5
                  psd_file: ET_10_full_cryo_psd.txt
                  low_frequency_cutoff: 5.0
        output:
            output_directory: noise
            file_name: glitch-{{ counter }}.gwf
            arguments:
                channel_prefix: STRAIN

This configuration file generates one day of data for the E1 detector, divided into 4096-second frame files (for a total of 21 frames), starting on 1 January 2030.

Blip glitches are injected into zero noise from the blip_glitch_population_E1.h5 population file, which can be generated from GravitySpy tables with gwmock-noise build-blip-glitch-table. These glitches are modeled on LIGO blip glitches observed during the O3 observing run and recolored to match the ET sensitivity.

To generate the ET data for detector E1 with glitches, run:

# Create working directory
mkdir -p glitch_et_triangle_emr/e1
cd glitch_et_triangle_emr/e1

# Copy configuration file to your working directory for glitch simulation
gwmock config --get glitch/gengli/et_triangle_emr/e1 --output config.yaml

# Run simulation
gwmock simulate config.yaml

Note

The configuration file automatically downloads the glitch population file from a Zenodo repository. The file is saved in a cache directory (by default, ~/.gwmock/population/). When the same population file is needed again, gwmock uses the cached copy to avoid re-downloading.

Using Different Detector Configurations¶

gwmock includes several pre-configured Einstein Telescope detector geometries, available in gwmock/detector/detectors:

Triangular Configuration (Meuse-Rhine Euregion)

E1_triangle_emr
E2_triangle_emr
E3_triangle_emr

Triangular Configuration (Sardinia)

E1_triangle_sardinia
E2_triangle_sardinia
E3_triangle_sardinia

2L Aligned Configuration

E1_2L_aligned_sardinia
E2_2L_aligned_emr

2L Misaligned Configuration

E1_2L_misaligned_sardinia
E2_2L_misaligned_emr

To use a specific configuration, update the detectors list in your configuration file:

detectors:
    - E1_2L_aligned_sardinia
    - E2_2L_aligned_emr

You don't need to include all detectors. For example, to generate only E1 data:

detectors:
    - E1_2L_aligned_sardinia

Using Different Sensitivity Curves¶

Multiple Einstein Telescope sensitivity curves (PSD files) are available in gwmock/detector/noise_curves/. These correspond to those used in the CoBA study.

To use a specific sensitivity curve:

simulators:
    noise:
        arguments:
            psd: ET_15_HF_psd.txt

Note

The detector geometries assume 10 km arms for triangular configurations and 15 km arms for 2L configurations. Choose sensitivity curves accordingly.

Adjusting Dataset Duration¶

The length of a dataset is controlled by:

start-time: # GPS start time of the dataset
duration: # Duration per frame file (seconds)
total-duration: # Total duration of the dataset

To change the dataset duration, simply adjust these parameters in your configuration file.

You can also change the sampling frequency of your dataset (the number of samples per second, measured in Hz), using the sampling-frequency argument.

Total number of frame files:

The total number of frame files depends on the duration of each frame file and the total duration of the dataset, and it's rounded up to the next integer:

max_samples = ceil(total-duration / duration)

Note

The total-duration argument can be passed as a float in seconds, or as a str specifying the time unit ("1 day", "5 days", "2 weeks", "2 months", etc.). The supported time units are:

second
minute
hour
day
week
month (30 days)
year (365 days).

Singular and plural forms are both accepted (e.g., "1 day" and "2 days").

Tip

A UTC/GPS time converter is available at the Gravitational Wave Open Science Center.

Tip

Sampling frequencies are often powers of 2 for efficiency. Common choices:

4096 Hz (standard for GW data analysis)
2048 Hz
16384 Hz (high-frequency instruments)

Lowering sampling frequency reduces computation time but also reduces the highest resolvable frequency (Nyquist limit = sampling_frequency / 2).

Generate Multi-Detector Correlated Noise¶

You can generate multi-detector correlated noise by specifying a cross-power spectral density (CSD) file:

Warning

The example configuration file is not fully tested yet. Use at your own risk.

globals:
    simulator-arguments:
        sampling-frequency: 4096
        duration: 4096
        total-duration: '1 day'
        start-time: 1577491218
    working-directory: './ET_Triangle_EMR_correlated_noise'
    output-directory: 'data'
    metadata-directory: 'metadata'

simulators:
    noise:
        class: gwmock_noise.CorrelatedNoiseSimulator
        arguments:
            psd_file: ET_10_full_cryo_psd.txt
            csd_file: path_to_csd_file.txt
            detectors:
                - E1_Triangle_EMR
                - E2_Triangle_EMR
                - E3_Triangle_EMR
            low_frequency_cutoff: 2
            seed: 42
        output:
            file_name:
                'E-{{ detectors }}_CORRELATED-NOISE_STRAIN-{{ start_time }}-{{
                duration }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

gwmock uses a windowing approach to generate long-duration datasets. If the input CSD varies rapidly with frequency, this windowing can introduce artifacts in the resulting frame files.

A diagnostic tool to check whether your CSD file is susceptible to such issues will be provided soon.

Resume Interrupted Simulations¶

If a simulation is interrupted, resume it by running the same command:

# Start
gwmock simulate config.yaml

# If interrupted, resume
gwmock simulate config.yaml

gwmock automatically detects and continues from the last checkpoint.

Combining Data Types¶

To create realistic mock data, you may generate noise, signals, and glitches separately, then combine them:

gwmock simulate noise_config.yaml
gwmock simulate signal_config.yaml
gwmock simulate glitch_config.yaml

Then merge the files using GWpy (see Reading Data for details).