Fable × X — why there's a curation gate, not a tweet database
provenance-audit
There is no queryable tweet corpus for Fable in this archive — by design. What exists is a hand-curated social-research policy: X's API is paywalled and unauthenticated tweet pages are gated, so an X-sourced creation is ingested only when its provenance clears a hard gate. The audit also explains a real failure: a research tool that blew its context window by ingesting the heavy data paths instead of the few small X notes.
0queryable tweet tables3provenance gate routes2currently ingested9handles rejected as slop
Findings
No tweet table, no x_corpus.jsonl, no bulk keyword/handle scrape — the X source is hand-curated, not auto-scraped, because the API is paywalled and unauthenticated pages are gated.
An X-sourced creation is ingested only if it clears one of three gates: (1) a live artifact that resolves, (2) a public repo with a Fable-5 authorship claim, or (3) multi-source corroboration (≥2 independent outlets). X-only claims with no artifact are excluded as slop.
Currently ingested under source_id=x (2 live in the index): @kieradev (a Mario-Kart-type racer) and @ChrissGPT (a Gen-1 Pokémon clone). Rejected handles total 9, kept out for having no resolvable artifact.
The X ingest path is idempotent: append a URL to the seed list, re-run, dedupe by URL — so re-running never double-counts.
Root-cause of a research-tool context blow-up: it ingested the heavy data paths (the reports archive, the GitHub scrape, the large index files) instead of the three small X notes. The fix is a load-scope guide: point the tool only at the curated X notes.