Our data

Here's what we actually have.

No hand-waving. No vague claims about "comprehensive coverage." Here's exactly what's in the database, where it came from, and what we're still working on.

The database

What we have

Every source below is actively being ingested, parsed, and linked. The bars show where we are right now. Not where we hope to be. Not a projection. Right now.

Property records
112M+ parcels
Assessed values, ownership, addresses from county assessors. 50 states represented, but coverage varies widely by county. Some have full deed history; many are assessment rolls only. We're adding counties every week.
Actively building
SEC EDGAR
57K+ companies
10-K, proxy statements, Form 4 insider trades, 13F holdings. We extract executive comp from DEF 14A filings (the hard part). Still parsing historical filings for many companies.
Parsing ongoing
IRS 990
1.9M nonprofits
990-PF private foundations with officer names, 4.6M grant records, 29M officer/director roles. We process the full XML returns, not just headers. Officer-to-entity linking is in progress.
Linking ongoing
FEC contributions
58M+ records
Federal individual campaign contributions. 200K+ linked to entity profiles so far. The raw data is loaded; the matching is what takes time. Federal only for now.
Matching in progress
Forbes lists
400+ profiles
Billionaires and wealthiest Americans. Used to calibrate our wealth estimates and cross-reference, not as a primary source. Small dataset, high value for validation.
Reference
News and media
50+ sources
Local business journals, philanthropy news, real estate transactions. Automated pipeline refreshes every 4 hours. NER extraction identifies people and orgs from articles. Early days.
Early stage
101 ingestion modules and counting. Beyond the big six above, we also pull from FAA aircraft registrations, DOL retirement plan data, state corporate filings, professional licenses, federal contracts (USASpending), lobbyist registrations, conservation easements, and more. Some are fully loaded. Many are partially parsed. All are being worked on.
Roadmap

What we're building toward

These are real items on the build list, not aspirational marketing. We're a small team and we ship in the order that matters most to users.

State-level campaign contributions (beyond federal FEC) Planned
Corporate board interconnection mapping In development
Real-time property transaction alerts Planned
Charitable gift annuity and planned giving data Planned
International holdings and offshore entities Planned
Historical property transaction chains (deed history) In development
Hard limits

What we'll probably never have

Private bank account balances
Exact net worth (we estimate from visible assets)
Medical or insurance records
Non-public trust documents
Private company revenue (unless they file with SEC)

Some data is private for good reason. We work with what's public, we're honest about the boundaries, and we don't guess where we can't verify.

Under the hood

The interesting parts

Collecting data is the easy part (well, not easy, but tractable). The hard part is figuring out that "SMITH JOHN A" on a deed in Fairfax County is the same person as "John Smith" who donated to a Senate campaign and sits on a foundation board in McLean.

Entity resolution
The matching engine
Our ER engine scores pairs of records across property, FEC, SEC, and IRS 990 data using weighted signals: name similarity (including a 200+ nickname dictionary), mailing address matching, co-ownership detection, county proximity, and name frequency analysis. Common names get a higher bar; rare names get more credit.
Borderline cases (scores between 0.20 and 0.70) get escalated to an LLM for a final decision. Every match is explainable: you can see which signals fired and why.
60M+
resolved entities
75M+
ownership links
221K
cross-source matches
AI research assistant
Powered by Claude
Ask natural-language questions ("Who are the largest political donors in Virginia with foundation board seats?") and get real answers from real data. Simple queries route to a fast model; complex cross-dataset questions get the full treatment.
For individual prospects, the AI generates research briefings: executive summary, wealth composition, giving patterns, board engagement, and prospect implications. Every claim cites its source. Every brief flags what it doesn't know.
Natural language
Query the database in plain English
Research briefs
Analyst-grade profiles in seconds
Fact extraction
Structured findings with confidence tiers
Built for ops, not just end users. Behind the product there's an admin panel with ER review queues, AI prompt editing, ingestion monitoring, data freshness tracking, and background process controls. We eat our own cooking.
Our approach

What makes our data different

01
Every number has a source.
We don't generate a black-box "wealth score." Every data point on a profile links back to the county assessor, SEC filing, IRS return, or FEC record it came from. You can check our work.
02
Confidence tiers, not false certainty.
We use a five-level confidence system: Primary Source, Structured Extraction, Cross-Source Inference, AI-Generated, Unverified. You always know how much to trust a data point.
03
We tell you what we don't know.
If we only have property data for someone and no SEC or philanthropic records, we say so. A partial picture honestly presented is more useful than a complete picture invented.

Request a source

Looking for data we don't have yet? Tell us. We're a small team and we're genuinely responsive to what our users need. If we can find it in public records, we'll build it.