Exports
Research exports & data dictionary
HealthArchive provides metadata-only exports for research and reproducibility. Exports do not include raw HTML or full diff bodies.
Exports overview
Use the export manifest to discover available formats and limits.
Export manifest
https://api.healtharchive.ca/api/exportsDataset releases
Quarterly metadata-only dataset releases are published on GitHub with checksums.
Download / print
Download the data dictionary (Markdown) or use your browser’s print dialog to save as PDF.
Snapshots export (fields)
- snapshot_id: numeric snapshot ID.
- source_code / source_name: source identifiers.
- captured_url: URL captured at crawl time.
- normalized_url_group: canonical grouping key.
- capture_timestamp_utc: UTC timestamp (ISO-8601).
- language, status_code, mime_type,title: metadata when available.
- job_id / job_name: edition anchor (if available).
- snapshot_url: stable public URL for citation.
Changes export (fields)
- change_id: numeric change event ID.
- source_code / source_name, normalized_url_group.
- from_snapshot_id / to_snapshot_id and corresponding UTC timestamps.
- change_type, summary, section/line counts, and change ratio.
- high_noise and diff_truncated flags.
- diff_version, normalization_version, computed_at_utc.
- compare_url: stable public URL for the diff view.
Limitations
- Exports reflect captured content, not real-time source updates.
- Coverage is limited to in-scope sources and successful captures.
- Replay fidelity varies by site and asset type.
For bulk or custom exports, see the researchers page for the request workflow.
