How to compare two audience segments using public data
"Compare two audiences" is usually a project that takes weeks. A research firm fields a panel, samples a few hundred respondents per segment, and returns with means and a 20 page deck.
Most marketing teams have a clearer picture of one segment than the other. The one they brief campaigns against is dressed up in habits, quotes from interviews, and a stock photo. The other one, the one in the expansion brief, is a sentence in a roadmap doc.
The interesting question for spend is rarely about the well-known segment. It is the comparison: how do the two differ in income, geography, household type, and the constraints around each one? A random workshop persona cannot answer that. There is nothing structurally on either side to compare.
Start with two real population queries
A comparison only works if both sides are anchored in the same data. The strongest free source for US audiences is the American Community Survey, the U.S. Census Bureau's annual household survey. It records age, geography, household type, income band, occupation, education, transport, language, and dozens of other variables for millions of respondents each year. The variables are stable, documented, and freely accessible.
In that frame, a segment is a filter on those variables. "Renters in Wayne County, Michigan, aged 25 to 34" is a query. "Single-earner households in metros over a million people, with income between $40,000 and $70,000" is another. Both are reproducible. Anyone on the team can run them and get the same answer.
Before drafting either segment, write down the variables that will define them. An earlier Cambium AI piece on using public data for market research covers which variables matter most for marketing work, and where the multi-year and single-year estimates trade off.
Compare the constraints, not the labels
Two segments with similar demographics often behave differently because the constraints around them are different. A 30-year-old with a $70,000 household income in central Boston is not the same buyer as a 30-year-old with a $70,000 household income in rural Indiana, even though the demographic profile reads the same on a slide.
What changes the behaviour is the constraint: rent as a share of income, commute time, household structure, and access to a vehicle. These attributes are in the survey directly. The Cambium AI walk-through of comparing demographic data across two counties covers what happens when two areas with the same median income are placed side by side on housing burden. The labels look the same. The decisions a marketing team makes for each are not.
Anchor each side to a verifiable source
The cost of a comparison built on workshop assumptions is not felt in the room where the comparison was drawn. It is felt later, in the campaign that under-delivers on the segment that was less well-understood. The fix is to anchor each side to data that anyone can re-run.
The pre-tabulated tables sit on data.census.gov. For more detailed cuts, the record-level microdata provides flexibility. For comparable work across years, IPUMS USA at the University of Minnesota harmonises the same data with stable variable names. None of these requires a research budget. They require a query and a definition. If you would like to query the data with natural language questions, you can do so here.
Show what each segment misses
Every dataset has edges. Survey panels under-sample renters, low-income households, people with limited transport access, and people who do not pick up the phone. Public data sources are the most comprehensive household surveys in the US, but they still have confidence intervals at the small-area level that should be read off the page, not hidden under a chart.
What changes when the comparison is grounded
A grounded comparison changes how the room talks about the two segments. The conversation moves from "the second segment feels riskier" to "the second segment has 18% higher rent burden and 22% lower car ownership; the messaging needs to be different". The cost of being wrong narrows because the assumptions are now visible.
It is also repeatable. Next quarter, when the question comes back, the team is not redoing the workshop. They are tightening a filter.
For a worked example of the wedge format applied to a single occupation, the Cambium AI piece on who the average US CEO actually is shows what a single-segment view looks like when it is anchored in real population data. Compare it against the segment you brief against today.