EPSTEIN EXPOSED: The most comprehensive searchable database of every person, document, flight, and connection in the Epstein files

link to reddit description

link to Epstein Exposed database

After the fold, there is a fascinating suggestion for a better way to do the above. The author of the above agrees with it. ABN

Do this now I haven’t gotten around to it but this is the plan I made with ChatGPT:

1. Build a Time-Aligned Event Graph

Instead of indexing by document:

Index by date → people → location → action.

Pipeline: • Extract all dates • Attach nearby PERSON + GPE (place) + ORG entities • Normalize to ISO dates • Create rows like:

1997-06-14 | PersonA | Palm Beach | “flight manifest” 1997-06-14 | PersonB | Palm Beach | “guest log” 1997-06-14 | PersonA | PersonB | “contact list”

Then group by date.

What pops:

Clusters of people repeatedly converging on same dates + locations.

This kills plausible deniability quickly.

Most scandals collapse under time alignment.

2. Co-Occurrence Heatmaps (Not Just Graphs)

Graphs look cool. Heatmaps reveal weight.

Create matrix:

Rows = Person Columns = Person Cell = number of documents they appear together in.

Sort descending.

You’ll get: • Tight cores • Secondary rings • Peripheral noise

The tight cores are where to focus.

If two names appear together across 200+ unrelated documents, that’s not coincidence.

3. Phrase Fingerprinting

Don’t just extract names.

Extract repeated phrases of 3–7 words.

Examples: • “massage room” • “third floor bedroom” • “blue couch” • “schedule changed”

Cluster documents by shared phrase fingerprints.

This exposes: • Template statements • Coordinated narratives • Reused descriptions

Which often implies shared source or coaching.

4. Contradiction Index

Build table:

Name | Statement Type | Claim | Source

Examples:

PersonX | Interview | “Never met Epstein” PersonX | Flight Log | Listed 4 times PersonX | Contact Book | Phone number PersonX | Email | Scheduling meeting

Automatically flag:

Direct contradictions.

This is one of the strongest truth signals.

Not allegations.

Conflicts.

5. Asset & Property Linking

Extract: • Addresses • Property names • Island names • Aircraft tail numbers • Boat names

Create asset → people map.

Then invert:

People → shared assets.

When the same jet, house, or island keeps reappearing with the same cluster, you’ve found an operational hub.

Operations leave logistical fingerprints.

6. Role Classification via Verb Context

Instead of “who is named,” classify how names are used: • scheduled • paid • transported • hosted • instructed • introduced • accompanied • provided

You can do this with simple dependency parsing.

This produces role vectors:

PersonA: {scheduled: 45, transported: 12, hosted: 3}

Victims tend to have different verb distributions than facilitators.

Facilitators differ from clients.

Clients differ from organizers.

This gives you functional roles, not labels.

7. Cross-Document Story Reconstruction

For high-frequency clusters:

Auto-generate timelines:

“From 1996–2002, PersonA appears in 312 documents, most often with PersonB and PersonC, primarily in New York and Little St. James, frequently associated with scheduling, flights, and introductions.”

This is machine-generated narrative.

Humans then verify.

This flips the workload.

8. Anomaly Detection

Most people appear once or twice.

Find outliers: • Extremely high mention count • High centrality but low public profile • Appear across many unrelated datasets

These are often operators, not celebrities.

Operators matter more.

9. Document Lineage Mapping

Track which documents originated from: • FBI • SDNY • Civil suit • Search warrant • Deposition • Grand jury

If the same pattern appears in independent lineages, confidence skyrockets.

Correlation across bureaucratic silos is powerful.

10. Victim Pattern Protection

Create automatic suppression rule:

If a name: • Appears near age terms • Appears near “minor,” “juvenile,” etc. • Appears primarily as grammatical object

Auto-bucket as PROTECTED and never surface.

Truth extraction should not become secondary harm.

This keeps the project ethically defensible.

11. Build a “Pressure Score”

For each person:

PressureScore = (co-occurrence weight) • (contradiction count) • (asset overlap count) • (role-risk score) • (independent source count)

Sort descending.

That becomes your priority review list.

Not vibes.

Math.

12. Publish Structure, Not Accusations

The safest and most powerful exposure format is:

Open database: • Searchable • Filterable • Shows raw excerpts • Shows document IDs • Shows frequency • Shows connections

No conclusions.

Let readers draw conclusions.

Sunlight through structure.

Not editorializing.

13. Speed Hack: Two-Tier System

Tier 1 (fast): • NER • Co-occurrence • Frequency • Pressure score

Tier 2 (deep): • Timelines • Role parsing • Contradictions • Asset mapping

You’ll get usable signal in Tier 1 within hours.

Tier 2 refines over days.

14. Optional: Local LLM Summarizer per Cluster

Once clusters exist:

Feed cluster documents into local LLM:

“Summarize recurring activities, roles, locations, and relationships.”

LLM becomes analyst, not oracle.

You control the corpus.

15. Reality Check

Truth in massive leaks is rarely a single smoking gun.

It’s:

Thousands of small alignments forming an unmistakable shape.

Your job is to make the shape visible.

Not to name villains.

Not to perform outrage.

To compress chaos into legible structure.

That’s how real investigations move.

link

Leave a comment