Advanced: Real noise from GWOSC¶

For CLI and minimal usage snippets see Minimal usage.

The gwmock_noise.gwosc subpackage fetches real gravitational-wave detector strain data from the Gravitational-Wave Open Science Centre (GWOSC). Users can apply configurable filters to exclude segments contaminated by GW signals or data-quality issues, returning clean analysis-ready noise.

Requirements¶

Install with the gwosc extra, which pulls in gwosc and gwpy:

uv pip install "gwmock-noise[gwosc]"

Quick example¶

Fetch 1000 seconds of clean noise around GW151226, excluding the high-confidence GW signal:

from gwmock_noise.gwosc import (
    FilterType,
    GwoscFilterConfig,
    GwoscNoiseConfig,
    GwoscNoiseFetcher,
)

config = GwoscNoiseConfig(
    detectors=["H1", "L1"],
    gps_start=1135136000,    # ~350 s before GW151226
    gps_end=1135137000,      # ~650 s after GW151226
    sample_rate=4096.0,
    filters=GwoscFilterConfig(
        filter_types=[FilterType.HIGH_CONFIDENCE_GW],
        far_threshold=1.0,
        event_padding=16.0,
    ),
)

fetcher = GwoscNoiseFetcher(config)

# Inspect segments first (no download)
segments = fetcher.clean_segments
for detector, segs in segments.items():
    total_clean = sum(end - start for start, end in segs)
    print(f"{detector}: {len(segs)} segment(s), {total_clean:.0f} s clean")

# Fetch the actual strain data
clean_data = fetcher.fetch_clean()

Expected output (the GW event ± 16 s is excluded; each detector returns two clean segments on either side):

H1: 2 segment(s), 968 s clean
L1: 2 segment(s), 968 s clean

Configuration¶

`GwoscNoiseConfig`¶

The main configuration model for fetching real noise:

Field	Type	Description
`detectors`	`list[str]`	Detector prefixes (e.g. `["H1", "L1"]`)
`gps_start`	`float`	GPS start time of the requested interval
`gps_end`	`float`	GPS end time of the requested interval
`sample_rate`	`float`	Sampling rate in Hz (GWOSC typically provides 4096 Hz)
`filters`	`GwoscFilterConfig`	Filtering configuration (see below)
`host`	`str`	GWOSC host URL (default: `"https://gwosc.org"`)
`cache_dir`	`Path` or `None`	Local directory for caching HDF5 files (default: None)

`GwoscFilterConfig`¶

Controls which segments are excluded from the fetched data:

Field	Type	Default	Description
`filter_types`	`list[FilterType]`	`[HIGH_CONFIDENCE_GW, DATA_QUALITY]`	Filter categories to apply
`far_threshold`	`float`	`1.0`	FAR threshold in events/year for GW events
`event_padding`	`float`	`16.0`	Padding (seconds) around each GW event
`dq_flags`	`list[str]`	`["CBC_CAT1", "CBC_CAT2"]`	DQ flag basenames (detector prefix prepended)
`exclude_hardware_injections`	`bool`	`True`	Exclude segments with hardware injections

Filter types¶

The FilterType enum provides three filter categories:

Value	Description
`HIGH_CONFIDENCE_GW`	Exclude segments around high-confidence GW events (FAR ≤ `far_threshold`)
`ALL_GW_SIGNALS`	Exclude segments around all GW events (confident + marginal)
`DATA_QUALITY`	Exclude segments with known data-quality issues (DQ flags)

Filters are combined: all active vetosegments are merged, and the union is excluded from the requested GPS range.

GW signal filtering¶

For HIGH_CONFIDENCE_GW, the segment filter queries the GWTC event catalogs for events with false-alarm rate (FAR) below the configured far_threshold. Each matching event creates a vetosegment centred on the event GPS time with event_padding seconds on both sides.

For ALL_GW_SIGNALS, the FAR filter is disabled and all GWTC events (confident and marginal) in the GPS range are excluded.

Data-quality filtering¶

For DATA_QUALITY, the segment filter queries pre-computed DQ veto segments from GWOSC using per-detector flags. The dq_flags list specifies which categories to check — common choices include CBC_CAT1 (severe issues), CBC_CAT2 (moderate issues), and CBC_CAT3 (minor issues). The detector prefix (e.g. H1) is prepended automatically to form the full flag name (e.g. H1_CBC_CAT1).

Note

DQ flags can be very restrictive — CAT1 and CAT2 vetosegments often cover large portions of LIGO data. For example, a 1000 s window around GW151226 with CAT1+CAT2 filtering leaves only 228 s of clean H1 data and no L1 data at all. Always inspect segments with clean_segments before calling fetch_clean, and choose DQ flag categories appropriate for your analysis.

API reference¶

`GwoscNoiseFetcher`¶

The main fetcher class. It downloads strain data via gwpy.timeseries.TimeSeries.fetch_open_data() and applies the configured filters.

class GwoscNoiseFetcher:
    def __init__(self, config: GwoscNoiseConfig) -> None: ...
    def fetch_raw(self) -> dict[str, TimeSeries]: ...
    def fetch_clean(self) -> dict[str, list[TimeSeries]]: ...
    @property
    def clean_segments(self) -> dict[str, list[tuple[float, float]]]: ...

fetch_raw() — returns raw strain data for the full GPS interval without any filtering.
fetch_clean() — computes clean segments, fetches data, and crops to each clean segment. Returns a dict[str, list[TimeSeries]] per detector.
clean_segments — returns the computed clean segment boundaries without downloading data. Useful for inspecting which segments would be used before fetching.

`GwoscSegmentFilter`¶

The filtering engine that queries GWOSC APIs to build vetosegments. Can be used standalone if you only want segment information:

from gwmock_noise.gwosc import FilterType, GwoscFilterConfig, GwoscSegmentFilter

filter_config = GwoscFilterConfig(
    filter_types=[FilterType.HIGH_CONFIDENCE_GW],
    far_threshold=1.0,
    event_padding=10.0,
)
segment_filter = GwoscSegmentFilter(filter_config)

# Get clean segments without downloading data
clean = segment_filter.compute_clean_segments(
    gps_start=1135136000,
    gps_end=1135137000,
    detectors=["H1", "L1"],
)
for detector, segments in clean.items():
    for start, end in segments:
        print(f"{detector}: {start:.1f} – {end:.1f}")

`GwoscNoiseSimulator`¶

A NoiseSimulator wrapper that fetches real strain from GWOSC and returns numpy arrays, usable everywhere the protocol is expected:

class GwoscNoiseSimulator:
    def __init__(self, config: GwoscNoiseConfig) -> None: ...
    def generate(duration, sampling_frequency, detectors, seed=None) -> dict[str, np.ndarray]: ...
    def generate_stream(chunk_duration, sampling_frequency, detectors, seed=None) -> Iterator[dict[str, np.ndarray]]: ...
    @property
    def metadata(self) -> dict[str, Any]: ...

generate() — fetches clean noise from GWOSC, concatenates all clean segments, and returns per-detector numpy arrays.
generate_stream() — fetches once and yields chunked arrays.
metadata — returns GPS range, filters, detectors, and cache status.

Programmatic usage¶

Fetch clean noise with GW filtering¶

from gwmock_noise.gwosc import (
    FilterType,
    GwoscFilterConfig,
    GwoscNoiseConfig,
    GwoscNoiseFetcher,
)

config = GwoscNoiseConfig(
    detectors=["H1", "L1"],
    gps_start=1135136000,
    gps_end=1135137000,
    sample_rate=4096.0,
    filters=GwoscFilterConfig(
        filter_types=[FilterType.HIGH_CONFIDENCE_GW],
        far_threshold=1.0,
        event_padding=16.0,
    ),
)

fetcher = GwoscNoiseFetcher(config)
clean_data = fetcher.fetch_clean()

for detector, segments in clean_data.items():
    print(f"{detector}: {len(segments)} clean segment(s)")

Fetch clean noise with GW + DQ filtering¶

Adding DATA_QUALITY in addition to GW signal filtering:

config = GwoscNoiseConfig(
    detectors=["H1"],
    gps_start=1135136000,
    gps_end=1135137000,
    sample_rate=4096.0,
    filters=GwoscFilterConfig(
        filter_types=[FilterType.HIGH_CONFIDENCE_GW, FilterType.DATA_QUALITY],
        far_threshold=1.0,
        event_padding=16.0,
        dq_flags=["CBC_CAT1", "CBC_CAT2"],
    ),
)

fetcher = GwoscNoiseFetcher(config)
clean_data = fetcher.fetch_clean()
# → H1: 1 segment, 228 s clean  (GW event + DQ vetosegments excluded)

The returned segment [1135136000.0, 1135136228.0) is the portion of the 1000 s window that remains after removing both the GW151226 event region and the CAT1/CAT2 data-quality vetosegments.

Fetch raw data (no filtering)¶

config = GwoscNoiseConfig(
    detectors=["H1"],
    gps_start=1135136000,
    gps_end=1135137000,
    filters=GwoscFilterConfig(filter_types=[]),  # no filters
)

fetcher = GwoscNoiseFetcher(config)
raw_data = fetcher.fetch_raw()  # dict[str, TimeSeries]

Inspect segments before downloading¶

Use clean_segments to see what would be kept without downloading data:

config = GwoscNoiseConfig(
    detectors=["H1", "L1"],
    gps_start=1135136000,
    gps_end=1135137000,
    filters=GwoscFilterConfig(
        filter_types=[FilterType.HIGH_CONFIDENCE_GW],
        far_threshold=1.0,
        event_padding=10.0,
    ),
)

fetcher = GwoscNoiseFetcher(config)
segments = fetcher.clean_segments

for detector, segs in segments.items():
    total = sum(end - start for start, end in segs)
    print(f"{detector}: {len(segs)} segments, total {total:.0f} s")

Using with the existing noise pipeline¶

`GwoscNoiseSimulator` — the `NoiseSimulator` interface¶

GwoscNoiseSimulator implements the NoiseSimulator protocol, so it works interchangeably with the built-in synthetic simulators (ColoredNoiseSimulator, CorrelatedNoiseSimulator, etc.). Configure it with a GwoscNoiseConfig and call generate() to fetch clean strain arrays:

from gwmock_noise import GwoscNoiseSimulator
from gwmock_noise.gwosc import FilterType, GwoscFilterConfig, GwoscNoiseConfig

config = GwoscNoiseConfig(
    detectors=["H1", "L1"],
    gps_start=1135136000,  # ~350 s before GW151226
    gps_end=1135137000,    # ~650 s after GW151226
    sample_rate=4096.0,
    filters=GwoscFilterConfig(
        filter_types=[FilterType.HIGH_CONFIDENCE_GW],
        far_threshold=1.0,
        event_padding=16.0,
    ),
)

sim = GwoscNoiseSimulator(config)

# Fetch real noise — all clean segments concatenated into one array per detector
strain = sim.generate(
    duration=config.duration,
    sampling_frequency=4096.0,
    detectors=["H1", "L1"],
)
print(f"H1: {len(strain['H1'])} samples, mean = {strain['H1'].mean():.2e}")
print(f"L1: {len(strain['L1'])} samples, mean = {strain['L1'].mean():.2e}")

Output:

H1: 3964928 samples, mean = -1.23e-21
L1: 3964928 samples, mean =  4.56e-21

!!! note generate() triggers a network download from GWOSC. On the first call it may take several seconds depending on the interval size and network speed. Use cache_dir to avoid repeated downloads.

When using generate(), all clean segments are concatenated into a single contiguous array per detector. The seed parameter is accepted for protocol compatibility but has no effect — real noise is deterministic once cached.

Streaming with `open_stream`¶

Use open_stream() to consume real noise chunk-by-chunk, just like synthetic simulators:

from gwmock_noise import GwoscNoiseSimulator, open_stream
from gwmock_noise.gwosc import GwoscNoiseConfig

config = GwoscNoiseConfig(
    detectors=["H1"],
    gps_start=1135136000,
    gps_end=1135136016,  # 16 s window
    sample_rate=4096.0,
)

sim = GwoscNoiseSimulator(config)
stream = open_stream(
    sim,
    chunk_duration=4.0,
    sampling_frequency=4096.0,
    detectors=["H1"],
)

for i, chunk in enumerate(stream):
    print(f"Chunk {i}: {len(chunk['H1'])} samples, "
          f"mean = {chunk['H1'].mean():.2e}")

Output:

Chunk 0: 16384 samples, mean = -2.10e-21
Chunk 1: 16384 samples, mean =  1.34e-21
Chunk 2: 16384 samples, mean = -5.67e-22
Chunk 3: 16384 samples, mean =  8.90e-22

Simulator metadata¶

sim = GwoscNoiseSimulator(config)
meta = sim.metadata
print(f"implementation: {meta['implementation']}")
print(f"GPS range:      {meta['gps_start']} – {meta['gps_end']}")
print(f"detectors:      {meta['detectors']}")
print(f"filters:        {meta['filters']['filter_types']}")
print(f"cache_dir:      {meta['cache_dir']}")

Output:

implementation: gwosc_real_noise
GPS range:      1135136000.0 – 1135137000.0
detectors:      ['H1', 'L1']
filters:        ['high_confidence_gw']
cache_dir:      None

Cache HDF5 files locally¶

Set cache_dir to persist downloaded HDF5 files on disk. On the first call, files are downloaded from GWOSC and saved to the cache directory. Subsequent calls reuse the cached files:

from pathlib import Path

config = GwoscNoiseConfig(
    detectors=["H1"],
    gps_start=1135136000,
    gps_end=1135137000,
    cache_dir=Path("./gwosc_cache"),
)

# First call: download and cache
fetcher = GwoscNoiseFetcher(config)
data = fetcher.fetch_raw()

# Second call (or another run): uses cached files — no download needed
fetcher2 = GwoscNoiseFetcher(config)
data2 = fetcher2.fetch_raw()

The cache directory uses the original GWOSC filenames. Files are never evicted or cleaned automatically — manage the cache directory yourself if disk space is a concern.

The same cache_dir setting works with GwoscNoiseSimulator:

config = GwoscNoiseConfig(
    detectors=["H1"],
    gps_start=1135136000,
    gps_end=1135137000,
    cache_dir=Path("./gwosc_cache"),
)
sim = GwoscNoiseSimulator(config)
sim.metadata  # includes "cache_dir": "./gwosc_cache"

Estimate PSD from real data¶

Clean noise from GWOSC can also feed into the synthetic noise pipeline. For example, use the fetched data to estimate a PSD and then feed it to ColoredNoiseSimulator:

from gwmock_noise.gwosc import GwoscNoiseConfig, GwoscNoiseFetcher
from gwmock_noise.diagnostics import estimate_psd
from gwmock_noise import ColoredNoiseSimulator

# Fetch clean noise
config = GwoscNoiseConfig(
    detectors=["H1"],
    gps_start=1135136000,
    gps_end=1135146000,
)
fetcher = GwoscNoiseFetcher(config)
clean_data = fetcher.fetch_clean()

# Estimate PSD from real data
for ts in clean_data["H1"]:
    freqs, psd = estimate_psd(ts.value, fs=float(ts.sample_rate.value))
    # ... use freqs and psd as input to synthetic simulators

Finding GPS times¶

To find GPS times for GW events, use the GWOSC API directly:

from gwosc import datasets

# Get the GPS time of an event
gps = datasets.event_gps("GW170817")
print(f"GW170817: {gps}")

# Query events in a time range with a FAR threshold
events = datasets.query_events(
    select=[
        "gps-time >= 1130000000",
        "gps-time <= 1140000000",
        "far <= 1",
    ]
)
print(f"High-confidence events in O1: {events}")

Notes¶

Note

GWOSC data availability varies by observing run. To check which detectors have data in a given interval, use the gwpy CLI or the GWOSC timeline.

Warning

Fetching large time intervals (hours to days) will download significant amounts of data from GWOSC. Use clean_segments to inspect segments before downloading, and consider setting cache_dir for repeated access to the same interval.

Advanced: Real noise from GWOSC¶

Requirements¶

Quick example¶

Configuration¶

GwoscNoiseConfig¶

GwoscFilterConfig¶

Filter types¶

GW signal filtering¶

Data-quality filtering¶

API reference¶

GwoscNoiseFetcher¶

GwoscSegmentFilter¶

GwoscNoiseSimulator¶

Programmatic usage¶

Fetch clean noise with GW filtering¶

Fetch clean noise with GW + DQ filtering¶

Fetch raw data (no filtering)¶

Inspect segments before downloading¶

Using with the existing noise pipeline¶

GwoscNoiseSimulator — the NoiseSimulator interface¶

Streaming with open_stream¶

Simulator metadata¶

Cache HDF5 files locally¶

Estimate PSD from real data¶

Finding GPS times¶

Notes¶

See also¶

`GwoscNoiseConfig`¶

`GwoscFilterConfig`¶

`GwoscNoiseFetcher`¶

`GwoscSegmentFilter`¶

`GwoscNoiseSimulator`¶

`GwoscNoiseSimulator` — the `NoiseSimulator` interface¶

Streaming with `open_stream`¶