Skip to content

Configuration Files

This guide explains how to use and write configuration files to generate datasets tailored to your needs.

Verbosity

All gwmock commands accept a top-level --verbose / -v flag to control log output:

gwmock --verbose DEBUG simulate config.yaml   # detailed debug output
gwmock --verbose WARNING simulate config.yaml # only warnings and errors

Supported levels: NOTSET, DEBUG, INFO (default), WARNING, ERROR, CRITICAL.

Command-Line Options

Command simulate

gwmock simulate config.yaml

This is the primary command used to generate mock data. It takes a .yaml configuration file as input, which defines the simulation parameters.

Flag --overwrite (optional)

By default, gwmock does not overwrite existing output files. If a file already exists, the tool will raise an error and halt execution. To force overwriting of existing files, use the --overwrite flag:

gwmock simulate config.yaml --overwrite

Flag --dry-run (optional)

Test your configuration without generating data:

gwmock simulate config.yaml --dry-run

This validates the configuration and shows what would be generated without actually creating files.

Flag --output-dir (optional)

Override the output directory from the command line without editing the config:

gwmock simulate config.yaml --output-dir /scratch/my_run/data

Flag --metadata-dir (optional)

Override the metadata directory from the command line (config mode only):

gwmock simulate config.yaml --metadata-dir /scratch/my_run/metadata

Flag --metadata (optional)

Generate metadata files along with the data (automatically enabled by default):

gwmock simulate config.yaml --metadata

Metadata files contain complete provenance information including:

  • Simulator configuration
  • Random number generator state
  • Output file names
  • Version information

Flags --author and --email (optional)

Include author information in the metadata files:

gwmock simulate config.yaml --author <your-name> --email <your-email>

Command config

gwmock config <flag>

This command is used to manage default and example configuration files. Exactly one of the flags --list, --get, or --init must be provided.

Flag --list

List all the available example configuration files stored in the examples directory (see the Examples page).

gwmock config --list

Flag --get

Copy one of the available example configuration files from the examples directory into the working directory. The <example_label> must be one of the example names listed by the gwmock config --list command.

gwmock config --get <example_label>

Flag --init

Creates a default configuration file and saves it to the working directory.

gwmock config --init config.yaml

Flag --overwrite (optional)

By default, gwmock does not overwrite existing configuration files. If a file already exists, the tool will raise an error and halt execution. To force overwriting of existing files, use the --overwrite flag together with --get or --init:

gwmock config --get noise/uncorrelated_gaussian/quick_start --overwrite
gwmock config --init config.yaml --overwrite

Flag --output (optional)

Specifies the directory where the configuration file will be saved. This flag must be used together with --get or --init. If not provided, the working directory is used by default.

gwmock config --get <label of the configuration file> --output <directory or file>

Configuration File Structure

The configuration file uses YAML format. It consists of a shared globals section plus the adapter-backed orchestration schema.

Globals

Top-level shared parameters used across all simulators:

globals:
    working-directory: .
    output-directory: output
    metadata-directory: metadata
    simulator-arguments:
        sampling-frequency:
        duration:
        start-time:
        total-duration:
    output-arguments: {}

Key parameters:

  • working-directory: Base directory for operations
  • output-directory: Where to save generated data files
  • metadata-directory: Where to save metadata files
  • sampling-frequency: Sample rate in Hz
  • duration: Duration of each segment in seconds
  • start-time: GPS start time
  • total-duration: Total duration of the dataset
  • output-arguments: Additional global arguments passed to the file writer

Orchestration

The orchestration: section is required and must contain at least one of population, signal, or noise. CBC signal generation uses population plus signal. SGWB signal generation can use signal without population when signal.source-type is set.

orchestration:
    population:
        backend: FilePopulationLoader # or any registered backend alias
        source-type: bbh
        n-samples: 128 # optional; omit to load the full catalogue
        arguments:
            path: population.h5

    signal:
        waveform-model: IMRPhenomXPHM
        minimum-frequency: 10
        detectors:
            - ET-Triangle-EMR
        output:
            file_name:
                'E-{{ detectors }}_STRAIN_BBH-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

    noise:
        arguments:
            psd_file: ET_10_full_cryo_psd
            seed: 42
            detectors:
                - ET-Triangle-EMR
        output:
            file_name:
                'E-{{ detectors }}_STRAIN_NOISE-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

For SGWB studies, use signal.source-type: sgwb. Constructor options for the SGWB backend belong under signal.arguments, while spectrum parameters passed to simulate(...) belong under signal.parameters:

orchestration:
    signal:
        source-type: sgwb
        detectors:
            - ET-Triangle-Sardinia
        minimum-frequency: 5
        parameters:
            omega_ref: 1.0e-9
            spectral_index: 0.0
            reference_frequency: 25.0
        output:
            file_name: sgwb-{{ counter }}.hdf5

For the full schema and backend registration options, see the Orchestration guide.

Transient glitches are configured on the noise side under orchestration.noise.arguments.glitches using public gwmock-noise glitch models. For example:

orchestration:
    noise:
        arguments:
            glitches:
                - kind: gengli_blip
                  rate: 0.0011111111111111111
                  amplitude_distribution:
                      distribution: lognormal
                      mean: 1.0
                      std: 0.0
                  population_file: glitches.hdf5
                  psd_file: https://example.org/ET_10_full_cryo_psd.txt

Template Variables

You can use Jinja2-style templates in configuration values such as file names and channel names:

orchestration:
    noise:
        arguments:
            detectors:
                - E1_triangle_emr
                - E2_triangle_emr
                - E3_triangle_emr
        output:
            file_name:
                'E-{{ detectors }}_STRAIN_NOISE-{{ start_time }}-{{ duration
                }}.gwf'
            arguments:
                channel: '{{ detectors }}:STRAIN'

In this example, file_name is automatically expanded for each detector being processed.

Common variables:

  • {{ start_time }}: GPS start time from globals
  • {{ duration }}: Segment duration from globals
  • {{ detectors }}: Current detector being processed. A network alias such as ET-Triangle-EMR expands to one file/channel per interferometer, with {{ detectors }} resolving to the per-interferometer token (ET1_EMR, ET2_EMR, ET3_EMR)

Checkpointing

gwmock automatically creates checkpoints during long simulations. If a process is interrupted:

  1. A .gwmock_checkpoint/simulation.checkpoint.json file is saved in the working directory
  2. Rerun the same command to resume from the last checkpoint
  3. The tool automatically detects and continues from where it left off
# Start simulation
gwmock simulate config.yaml

# If interrupted (Ctrl+C, crash, etc.), resume with same command
gwmock simulate config.yaml

The checkpoint contains:

  • Simulator state
  • Progress information
  • Already-generated file tracking

Resource Usage Summary

After every successful simulation, gwmock writes a resource_usage_summary.json file to the working directory. This file records CPU time, peak memory usage, and wall time for the run. It is always written (overwriting any previous summary) and is not controlled by a flag.

Best Practices

  1. Use templates: Leverage Jinja2 templates for dynamic configuration
  2. Set seeds: Always set seed for reproducibility
  3. Check space: Ensure sufficient disk space before long runs
  4. Use dry-run: Test configurations with --dry-run before full simulation
  5. Organize outputs: Use descriptive output-directory and metadata-directory names