Configuration Files¶
This guide explains how to use and write configuration files to generate datasets tailored to your needs.
Verbosity¶
All gwmock commands accept a top-level --verbose / -v flag to control log
output:
gwmock --verbose DEBUG simulate config.yaml # detailed debug output
gwmock --verbose WARNING simulate config.yaml # only warnings and errors
Supported levels: NOTSET, DEBUG, INFO (default), WARNING, ERROR,
CRITICAL.
Command-Line Options¶
Command simulate¶
gwmock simulate config.yaml
This is the primary command used to generate mock data. It takes a .yaml
configuration file as input, which defines the simulation parameters.
Flag --overwrite (optional)¶
By default, gwmock does not overwrite existing output files. If a file already
exists, the tool will raise an error and halt execution. To force overwriting of
existing files, use the --overwrite flag:
gwmock simulate config.yaml --overwrite
Flag --dry-run (optional)¶
Test your configuration without generating data:
gwmock simulate config.yaml --dry-run
This validates the configuration and shows what would be generated without actually creating files.
Flag --output-dir (optional)¶
Override the output directory from the command line without editing the config:
gwmock simulate config.yaml --output-dir /scratch/my_run/data
Flag --metadata-dir (optional)¶
Override the metadata directory from the command line (config mode only):
gwmock simulate config.yaml --metadata-dir /scratch/my_run/metadata
Flag --metadata (optional)¶
Generate metadata files along with the data (automatically enabled by default):
gwmock simulate config.yaml --metadata
Metadata files contain complete provenance information including:
- Simulator configuration
- Random number generator state
- Output file names
- Version information
Flags --author and --email (optional)¶
Include author information in the metadata files:
gwmock simulate config.yaml --author <your-name> --email <your-email>
Command config¶
gwmock config <flag>
This command is used to manage default and example configuration files. Exactly
one of the flags --list, --get, or --init must be provided.
Flag --list¶
List all the available example configuration files stored in the
examples
directory (see the Examples page).
gwmock config --list
Flag --get¶
Copy one of the available example configuration files from the
examples
directory into the working directory. The <example_label> must be one of the
example names listed by the gwmock config --list command.
gwmock config --get <example_label>
Flag --init¶
Creates a default configuration file and saves it to the working directory.
gwmock config --init config.yaml
Flag --overwrite (optional)¶
By default, gwmock does not overwrite existing configuration files. If a file
already exists, the tool will raise an error and halt execution. To force
overwriting of existing files, use the --overwrite flag together with --get
or --init:
gwmock config --get noise/uncorrelated_gaussian/quick_start --overwrite
gwmock config --init config.yaml --overwrite
Flag --output (optional)¶
Specifies the directory where the configuration file will be saved. This flag
must be used together with --get or --init. If not provided, the working
directory is used by default.
gwmock config --get <label of the configuration file> --output <directory or file>
Configuration File Structure¶
The configuration file uses YAML format. It consists of a shared globals
section plus the adapter-backed orchestration schema.
Globals¶
Top-level shared parameters used across all simulators:
globals:
working-directory: .
output-directory: output
metadata-directory: metadata
simulator-arguments:
sampling-frequency:
duration:
start-time:
total-duration:
output-arguments: {}
Key parameters:
working-directory: Base directory for operationsoutput-directory: Where to save generated data filesmetadata-directory: Where to save metadata filessampling-frequency: Sample rate in Hzduration: Duration of each segment in secondsstart-time: GPS start timetotal-duration: Total duration of the datasetoutput-arguments: Additional global arguments passed to the file writer
Orchestration¶
The orchestration: section is required and must contain at least one of
population, signal, or noise. CBC signal generation uses population plus
signal. SGWB signal generation can use signal without population when
signal.source-type is set.
orchestration:
population:
backend: FilePopulationLoader # or any registered backend alias
source-type: bbh
n-samples: 128 # optional; omit to load the full catalogue
arguments:
path: population.h5
signal:
waveform-model: IMRPhenomXPHM
minimum-frequency: 10
detectors:
- ET-Triangle-EMR
output:
file_name:
'E-{{ detectors }}_STRAIN_BBH-{{ start_time }}-{{ duration
}}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
noise:
arguments:
psd_file: ET_10_full_cryo_psd
seed: 42
detectors:
- ET-Triangle-EMR
output:
file_name:
'E-{{ detectors }}_STRAIN_NOISE-{{ start_time }}-{{ duration
}}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
For SGWB studies, use signal.source-type: sgwb. Constructor options for the
SGWB backend belong under signal.arguments, while spectrum parameters passed
to simulate(...) belong under signal.parameters:
orchestration:
signal:
source-type: sgwb
detectors:
- ET-Triangle-Sardinia
minimum-frequency: 5
parameters:
omega_ref: 1.0e-9
spectral_index: 0.0
reference_frequency: 25.0
output:
file_name: sgwb-{{ counter }}.hdf5
For the full schema and backend registration options, see the Orchestration guide.
Transient glitches are configured on the noise side under
orchestration.noise.arguments.glitches using public gwmock-noise glitch
models. For example:
orchestration:
noise:
arguments:
glitches:
- kind: gengli_blip
rate: 0.0011111111111111111
amplitude_distribution:
distribution: lognormal
mean: 1.0
std: 0.0
population_file: glitches.hdf5
psd_file: https://example.org/ET_10_full_cryo_psd.txt
Template Variables¶
You can use Jinja2-style templates in configuration values such as file names and channel names:
orchestration:
noise:
arguments:
detectors:
- E1_triangle_emr
- E2_triangle_emr
- E3_triangle_emr
output:
file_name:
'E-{{ detectors }}_STRAIN_NOISE-{{ start_time }}-{{ duration
}}.gwf'
arguments:
channel: '{{ detectors }}:STRAIN'
In this example, file_name is automatically expanded for each detector being
processed.
Common variables:
{{ start_time }}: GPS start time from globals{{ duration }}: Segment duration from globals{{ detectors }}: Current detector being processed. A network alias such asET-Triangle-EMRexpands to one file/channel per interferometer, with{{ detectors }}resolving to the per-interferometer token (ET1_EMR,ET2_EMR,ET3_EMR)
Checkpointing¶
gwmock automatically creates checkpoints during long simulations. If a process is interrupted:
- A
.gwmock_checkpoint/simulation.checkpoint.jsonfile is saved in the working directory - Rerun the same command to resume from the last checkpoint
- The tool automatically detects and continues from where it left off
# Start simulation
gwmock simulate config.yaml
# If interrupted (Ctrl+C, crash, etc.), resume with same command
gwmock simulate config.yaml
The checkpoint contains:
- Simulator state
- Progress information
- Already-generated file tracking
Resource Usage Summary¶
After every successful simulation, gwmock writes a resource_usage_summary.json
file to the working directory. This file records CPU time, peak memory usage,
and wall time for the run. It is always written (overwriting any previous
summary) and is not controlled by a flag.
Best Practices¶
- Use templates: Leverage Jinja2 templates for dynamic configuration
- Set seeds: Always set
seedfor reproducibility - Check space: Ensure sufficient disk space before long runs
- Use dry-run: Test configurations with
--dry-runbefore full simulation - Organize outputs: Use descriptive
output-directoryandmetadata-directorynames