Skip to content

Reading Data

This guide explains how to read and work with GWF (frame file) data generated by gwmock.

We use the GWpy Python package for these examples. For more details, refer to the GWpy documentation.

Reading Frame Files

gwmock generates data in GWF (frame file) format. To read data from a frame file:

from gwpy.timeseries import TimeSeries

# Read specific channel from a frame file
data = TimeSeries.read("filename.gwf", channel="ET1_EMR:STRAIN")

Parameters:

  • filename: Path to the GWF file
  • channel: Channel name to read (common format: DETECTOR:CHANNEL_NAME)

Example

from gwpy.timeseries import TimeSeries

# Read ET1_EMR strain data
e1_data = TimeSeries.read("E-ET1_EMR_STRAIN_NOISE-1577491218-4096.gwf", channel="ET1_EMR:STRAIN")

# Check properties
print(f"Duration: {e1_data.duration}")
print(f"Sampling frequency: {e1_data.sample_rate}")
print(f"Start time: {e1_data.t0}")

Merging Frame Files

Frame files generated by gwmock may contain different types of content (noise, signals, glitches). To obtain a realistic data stream, merge multiple files:

from gwpy.timeseries import TimeSeries

# Read noise and signal data
noise_data = TimeSeries.read("filename_noise.gwf", channel="ET1_EMR:STRAIN")
signal_data = TimeSeries.read("filename_signal.gwf", channel="ET1_EMR:STRAIN")

# Combine them
combined_data = noise_data.inject(signal_data)

You can also merge files directly using the CLI:

gwmock merge filename_noise.gwf filename_signal.gwf \
    --metadata noise/metadata/orchestration-0.metadata.json \
    --metadata signal/metadata/orchestration-0.metadata.json \
    --channel ET1_EMR:STRAIN \
    --output-channel ET1_EMR:STRAIN

This produces a merged frame file and a merged metadata file documenting all input files and merge details.

Merging Multiple Files

To merge a sequence of files:

from gwpy.timeseries import TimeSeries

files = [
    "E-ET1_EMR_STRAIN_NOISE-1000000000-1024.gwf",
    "E-ET1_EMR_STRAIN_NOISE-1000001024-1024.gwf",
    "E-ET1_EMR_STRAIN_NOISE-1000002048-1024.gwf"
]

# Read all files
data_list = [TimeSeries.read(f, channel="ET1_EMR:STRAIN") for f in files]

# Concatenate
combined = data_list[0]
for data in data_list[1:]:
    combined = combined.append(data)

Warning

Two time series can only be combined if:

  1. Time properties match: Same start time, sampling frequency, and continuous coverage
  2. Units match: Both must have the same physical units (e.g., strain)

If units differ, override them before combining:

from astropy.units import Unit

noise_data.override_unit(Unit(""))
signal_data.override_unit(Unit(""))

Accessing Metadata

gwmock automatically generates metadata files for each simulation. Access them with:

import json

# Read a JSON metadata record
with open("metadata/orchestration-0.metadata.json", "r") as f:
    metadata = json.load(f)

print(metadata["schema_version"])    # e.g. "1.0.0"
print(metadata["gwmock_version"])    # e.g. "0.5.0"
print(metadata["config"])            # resolved config snapshot
print(metadata["outputs"])           # list of generated files with hashes

Metadata fields:

Field Description
schema_version Provenance format version
gwmock_version Package version used to generate the data
subpackage_versions Versions of gwmock_signal, gwmock_noise, gwmock_pop
config Resolved configuration snapshot for this run
config_sha256 SHA-256 hash of the resolved config
seed Top-level RNG seed
segment_seeds Per-segment deterministic seeds
population Population backend and provenance
signal Signal backend, waveform model, detector network
noise Noise backend and PSD
outputs List of generated files (path, channels, t0, duration, sha256)
host Platform, Python version, CPU, git SHA

For a quick guide on how to inspect and reuse metadata files to reproduce a dataset, see the Metadata Files page.

Working with Multiple Detectors

Process data from multiple detectors:

from gwpy.timeseries import TimeSeries

detectors = ["ET1_EMR", "ET2_EMR", "ET3_EMR"]

# Read data for each detector
detector_data = {}
for detector in detectors:
    channel = f"{detector}:STRAIN"
    filename = f"E-{detector}_STRAIN_NOISE-1000000000-1024.gwf"
    detector_data[detector] = TimeSeries.read(filename, channel=channel)

# Process or analyze each
for detector, data in detector_data.items():
    print(f"{detector}: {data.duration.to('minute')} of data")

Plotting Data

Visualize the data using GWpy's plotting utilities:

from gwpy.timeseries import TimeSeries
import matplotlib.pyplot as plt

# Read data
data = TimeSeries.read("E-ET1_EMR_STRAIN_NOISE-1000000000-1024.gwf", channel="ET1_EMR:STRAIN")

# Plot time series
plot = data.plot(title="Strain Data")
plot.show()

# Plot power spectral density
spectrum = data.psd()
plot = spectrum.plot()
plot.show()

Best Practices

  1. Always specify the channel: Use full channel name format DETECTOR:CHANNEL_NAME
  2. Check continuity: Verify time properties before combining files
  3. Preserve units: Don't remove or override units unless necessary
  4. Use metadata: Reference metadata files to understand generation parameters
  5. Handle large files: Use streaming/windowing for files larger than available RAM

Troubleshooting

"Channel not found" error

Check available channels in the file:

from gwpy.io import gwf

# List all channels
channels = gwf.get_channel_names("filename.gwf")
print(channels)

Units mismatch

Ensure both time series have compatible units:

# Check units
print(data1.unit)
print(data2.unit)

# Convert if needed
data2_converted = data2.to("strain")

Time alignment issues

Verify time properties before merging:

print(f"Data 1: {data1.t0} to {data1.tf}")
print(f"Data 2: {data2.t0} to {data2.tf}")