Reading Data¶

This guide explains how to read and work with GWF (frame file) data generated by gwmock.

We use the GWpy Python package for these examples. For more details, refer to the GWpy documentation.

Reading Frame Files¶

gwmock generates data in GWF (frame file) format. To read data from a frame file:

from gwpy.timeseries import TimeSeries

# Read specific channel from a frame file
data = TimeSeries.read("filename.gwf", channel="ET1_EMR:STRAIN")

Parameters:

filename: Path to the GWF file
channel: Channel name to read (common format: DETECTOR:CHANNEL_NAME)

Example¶

from gwpy.timeseries import TimeSeries

# Read ET1_EMR strain data
e1_data = TimeSeries.read("E-ET1_EMR_STRAIN_NOISE-1577491218-4096.gwf", channel="ET1_EMR:STRAIN")

# Check properties
print(f"Duration: {e1_data.duration}")
print(f"Sampling frequency: {e1_data.sample_rate}")
print(f"Start time: {e1_data.t0}")

Merging Frame Files¶

Frame files generated by gwmock may contain different types of content (noise, signals, glitches). To obtain a realistic data stream, merge multiple files:

from gwpy.timeseries import TimeSeries

# Read noise and signal data
noise_data = TimeSeries.read("filename_noise.gwf", channel="ET1_EMR:STRAIN")
signal_data = TimeSeries.read("filename_signal.gwf", channel="ET1_EMR:STRAIN")

# Combine them
combined_data = noise_data.inject(signal_data)

You can also merge files directly using the CLI:

gwmock merge filename_noise.gwf filename_signal.gwf \
    --metadata noise/metadata/orchestration-0.metadata.json \
    --metadata signal/metadata/orchestration-0.metadata.json \
    --channel ET1_EMR:STRAIN \
    --output-channel ET1_EMR:STRAIN

This produces a merged frame file and a merged metadata file documenting all input files and merge details.

Merging Multiple Files¶

To merge a sequence of files:

from gwpy.timeseries import TimeSeries

files = [
    "E-ET1_EMR_STRAIN_NOISE-1000000000-1024.gwf",
    "E-ET1_EMR_STRAIN_NOISE-1000001024-1024.gwf",
    "E-ET1_EMR_STRAIN_NOISE-1000002048-1024.gwf"
]

# Read all files
data_list = [TimeSeries.read(f, channel="ET1_EMR:STRAIN") for f in files]

# Concatenate
combined = data_list[0]
for data in data_list[1:]:
    combined = combined.append(data)

Warning

Two time series can only be combined if:

Time properties match: Same start time, sampling frequency, and continuous coverage
Units match: Both must have the same physical units (e.g., strain)

If units differ, override them before combining:

from astropy.units import Unit

noise_data.override_unit(Unit(""))
signal_data.override_unit(Unit(""))

Accessing Metadata¶

gwmock automatically generates metadata files for each simulation. Access them with:

import json

# Read a JSON metadata record
with open("metadata/orchestration-0.metadata.json", "r") as f:
    metadata = json.load(f)

print(metadata["schema_version"])    # e.g. "1.0.0"
print(metadata["gwmock_version"])    # e.g. "0.5.0"
print(metadata["config"])            # resolved config snapshot
print(metadata["outputs"])           # list of generated files with hashes

Metadata fields:

Field	Description
`schema_version`	Provenance format version
`gwmock_version`	Package version used to generate the data
`subpackage_versions`	Versions of `gwmock_signal`, `gwmock_noise`, `gwmock_pop`
`config`	Resolved configuration snapshot for this run
`config_sha256`	SHA-256 hash of the resolved config
`seed`	Top-level RNG seed
`segment_seeds`	Per-segment deterministic seeds
`population`	Population backend and provenance
`signal`	Signal backend, waveform model, detector network
`noise`	Noise backend and PSD
`outputs`	List of generated files (path, channels, t0, duration, sha256)
`host`	Platform, Python version, CPU, git SHA

For a quick guide on how to inspect and reuse metadata files to reproduce a dataset, see the Metadata Files page.

Working with Multiple Detectors¶

Process data from multiple detectors:

from gwpy.timeseries import TimeSeries

detectors = ["ET1_EMR", "ET2_EMR", "ET3_EMR"]

# Read data for each detector
detector_data = {}
for detector in detectors:
    channel = f"{detector}:STRAIN"
    filename = f"E-{detector}_STRAIN_NOISE-1000000000-1024.gwf"
    detector_data[detector] = TimeSeries.read(filename, channel=channel)

# Process or analyze each
for detector, data in detector_data.items():
    print(f"{detector}: {data.duration.to('minute')} of data")

Plotting Data¶

Visualize the data using GWpy's plotting utilities:

from gwpy.timeseries import TimeSeries
import matplotlib.pyplot as plt

# Read data
data = TimeSeries.read("E-ET1_EMR_STRAIN_NOISE-1000000000-1024.gwf", channel="ET1_EMR:STRAIN")

# Plot time series
plot = data.plot(title="Strain Data")
plot.show()

# Plot power spectral density
spectrum = data.psd()
plot = spectrum.plot()
plot.show()

Best Practices¶

Always specify the channel: Use full channel name format DETECTOR:CHANNEL_NAME
Check continuity: Verify time properties before combining files
Preserve units: Don't remove or override units unless necessary
Use metadata: Reference metadata files to understand generation parameters
Handle large files: Use streaming/windowing for files larger than available RAM

Troubleshooting¶

"Channel not found" error

Check available channels in the file:

from gwpy.io import gwf

# List all channels
channels = gwf.get_channel_names("filename.gwf")
print(channels)

Units mismatch

Ensure both time series have compatible units:

# Check units
print(data1.unit)
print(data2.unit)

# Convert if needed
data2_converted = data2.to("strain")

Time alignment issues

Verify time properties before merging:

print(f"Data 1: {data1.t0} to {data1.tf}")
print(f"Data 2: {data2.t0} to {data2.tf}")