StackSwap · Methodology v1.0.0

How we model 100,000 GTM stacks

Every cited statistic on stackswap.ai — "38% of modeled stacks contained both Outreach and HubSpot," "median annual recoverable: $93,240", "81.66% of stacks have at least one detected overlap" — comes from this simulation. Synthetic stacks, real engine, reproducible code. Here's exactly how it works.

The pipeline

Three steps, all deterministic from a fixed seed:

Generate. A seeded RNG (Mulberry32) picks an archetype weighted by realistic operator distribution, samples team headcount within that archetype's range, picks an industry, then assembles a tool list from the archetype's core tools, common additions (probabilistic), optional additions (probabilistic), and legacy drift (probabilistic). Every tool name is validated against TOOL_LIST at script start.
Score. Each generated stack is passed to scanStack(input) — the same pure function that powers /stackscan. No HTTP, no DB. Output: current spend, optimized spend, per-tool verdicts, modeled recovery, overlap pairs detected, AI readiness scores, and tier pricing.
Aggregate. Across all 100,000 runs we compute: tool prevalence (% of stacks containing each tool), overlap-pair prevalence + recovery distributions (median, p25/p75/p95), team-size-bucketed recovery medians, and headline summary stats. The result writes to data/stack-simulation/aggregates.json — committed in the repo, ~25KB. Raw per-stack CSV (~18MB) writes to simulations/ which is gitignored.

The 12 archetypes

Weights sum to 1.0. Distribution reflects estimated operator population prevalence — not even-split. Tools assigned to each archetype are sourced from realistic patterns we observe in the GTM operator community.

Archetype	Team-size range	Weight
Early-stage B2B SaaS (founder-led)	1-10	12%
Mid-market B2B SaaS (sales-led)	15-50	20%
Growth-stage mid-market (multi-channel)	30-100	15%
Enterprise RevOps	100-500	8%
Dev-tools PLG	10-50	10%
AI-native modern team	10-40	6%
Late-stage enterprise (multi-region)	500-2000	5%
Post-acquisition tangled (Outreach + Salesloft)	50-300	3%
PLG + sales-assist hybrid	25-150	7%
Mid-market sales with Apollo + HubSpot	10-80	6%
Bootstrapped lean (1-15)	1-15	5%
Marketing-led B2B (HubSpot heavy)	20-80	3%

What the engine produces

From the current run (SIM_SEED=42 npm run simulate:100k):

81.66% of modeled stacks contained at least one detected overlap pair.
Median monthly recoverable per stack: $7,770 (annual: $93,240).
Distribution: p25 $280/mo · p75 $21,321/mo · p95 $119,460/mo.
Median current monthly stack spend: $20,700.
Median overlap-pair count per stack: 2.

Top citable findings

81.66% of modeled stacks contained at least one overlap pair flagged by the engine.
Median annual recoverable across 100k modeled stacks: $93,240.
29.97% of modeled stacks contained both HubSpot Marketing Hub and Salesforce; median modeled annual recovery from consolidating: $1,800.
23.79% of modeled stacks contained both Clari and Gong; median modeled annual recovery from consolidating: $1,200.
20.49% of modeled stacks contained both Apollo.io and ZoomInfo; median modeled annual recovery from consolidating: $2,400.
18.74% of modeled stacks contained both Outreach and Salesloft; median modeled annual recovery from consolidating: $1,200.
17.75% of modeled stacks contained both Apollo.io and Outreach; median modeled annual recovery from consolidating: $1,200.

Tool prevalence and engine verdicts

Top 20 tools by prevalence across 100,000 modeled stacks. Replace rate = % of stacks containing that tool where the engine flagged a replacement candidate (e.g. Apollo for Outreach). Remove rate = % where another tool already in the stack made it redundant.

Tool	% of stacks	Replace rate	Remove rate
Slack	100%	—	—
Notion	62.81%	—	—
ZoomInfo	61.37%	84.19%	1.78%
Outreach	54.38%	100%	—
Salesforce	50.88%	—	39.51%
Gong	50.88%	—	46.75%
HubSpot	49.12%	—	1.31%
LinkedIn Sales Navigator	46.49%	—	—
Calendly	44.6%	—	—
Apollo.io	34.69%	—	0.5%
HubSpot Marketing Hub	32.94%	—	29.95%
Marketo	27.67%	18.4%	—
Clari	23.79%	—	—
Loom	20.11%	—	—
Mixpanel	19.94%	—	—
Linear	19.68%	—	—
Salesloft	18.74%	100%	—
Segment	18.07%	—	—
Clearbit	13.99%	61.76%	32.17%
Chorus	12.98%	100%	—

Most prevalent redundant pairs

The 15 overlap pairs that show up most often across the modeled population. Recovery values are deterministic per pair (see limitations) — consider them modeled annual upper bounds for the consolidation move.

Pair	% of stacks with both	Median annual recovery
HubSpot Marketing Hub + Salesforce	29.97%	$1,800/yr
Clari + Gong	23.79%	$1,200/yr
Apollo.io + ZoomInfo	20.49%	$2,400/yr
Outreach + Salesloft	18.74%	$1,200/yr
Apollo.io + Outreach	17.75%	$1,200/yr
Linear + Notion	15.2%	$960/yr
Clearbit + ZoomInfo	13.99%	$3,600/yr
Chorus + Gong	12.98%	$1,200/yr
HubSpot Marketing Hub + Marketo	11.91%	$17,280/yr
HubSpot + Mailchimp	11.78%	$1,200/yr
6sense + ZoomInfo	11.49%	$15,600/yr
Clearbit + LinkedIn Sales Navigator	9.63%	$540/yr
Salesforce + Salesforce Tableau	5.97%	$7,200/yr
Marketo + Pardot	5.75%	$33,000/yr
Bombora + ZoomInfo	3.11%	$15,600/yr

Highest-value consolidation opportunities

Top 10 overlap pairs by modeled annual recovery, ranked by dollars. Less prevalent than the list above, but higher-impact when present.

Pair	Median annual recovery	% of stacks with both
Marketo + Pardot	$33,000/yr	5.75%
6sense + Demandbase	$24,000/yr	2.72%
HubSpot Marketing Hub + Marketo	$17,280/yr	11.91%
6sense + ZoomInfo	$15,600/yr	11.49%
Bombora + ZoomInfo	$15,600/yr	3.11%
6sense + Bombora	$12,000/yr	1.76%
6sense + Cognism	$12,000/yr	1.16%
Bombora + Cognism	$12,000/yr	0.53%
Salesforce + Salesforce Tableau	$7,200/yr	5.97%
HubSpot + HubSpot Marketing Hub	$7,200/yr	2.97%

Recoverable spend by team size

Median modeled monthly waste scales sharply with headcount. The biggest single jump is 6–15 to 16–25 — the inflection point where stack drift starts costing real money. Useful framing for “when to audit your stack” conversations.

Team size	% of all stacks	Median monthly recoverable	Median monthly spend
1-5	7.81%	$0/mo	$580/mo
6-15	13.04%	$80/mo	$1,950/mo
16-25	11.05%	$5,120/mo	$11,750/mo
26-50	32.03%	$8,680/mo	$21,090/mo
51-100	18.01%	$21,410/mo	$46,030/mo
101-200	5.97%	$25,890/mo	$74,050/mo
201-500	7.15%	$76,280/mo	$178,300/mo
501-1000	1.61%	$131,550/mo	$407,360/mo
1000+	3.33%	$242,910/mo	$751,620/mo

Independent validation

Modeled stacks are how StackSwap measures the patterns. The State of GTM Engineering 2026 (OneGTM, n=228) is how the operators in those stacks describe themselves. The numbers below are theirs — included here to show that the things our engine scores for (overlap, consolidation, AI-readiness, fit-for-stage) match what the audience actually says they need.

Independent benchmark · 2026

71%
of GTM Engineers do tool evaluation / stack architecture as part of the role
Stack design is a first-class workflow for the operators building modern revenue systems — not an annual procurement event.
n=228 across 32 countries
84%
use Clay — and it is also the most-frustrating tool in the stack
Clay is the #1 most-loved AND most-polarizing tool. Adoption alone is not a fit signal — the love-to-frustration ratio matters more.
#1
tool GTM Engineers wish existed: an all-in-one outbound platform (28 mentions)
Despite hundreds of point tools, the top unmet need is consolidation. Unify registers just 8.8% adoption. The platform space is wide open.
5,205%
YoY growth in GTM Engineering job postings (63 → 3,342 in 2025)
GTM Engineering went from niche to mainstream demand in twelve months. The category has hit escape velocity.
Per Sentrion, 230+ job boards, 6 years
25%
of GTMEs cite bandwidth as their #1 bottleneck — not budget (3%)
Modern GTM teams are blocked by capacity, not spend. Decisions that buy back time beat decisions that cut cost.

Source: The State of GTM Engineering 2026 · OneGTM LLC · stateofgtme.com. StackSwap analysis at /state-of-gtm-engineering-2026-stats.

Honest limitations

Synthetic, not empirical. These are modeled stacks, not real customer scans. We're explicit about that everywhere the data is cited. Pre-revenue means no customer base; we'd rather model honestly than fabricate a customer count.
Slack at 100% prevalence. Every archetype includes Slack as a core tool, so it shows up in 100% of stacks — that's an over-representation we know about. Doesn't affect overlap-pair statistics but inflates Slack's prevalence number.
Operator-judged weights. The 12 archetype weights are estimated from operator pattern recognition, not external market research. We could be off by 5-10 percentage points on any given archetype's real-world prevalence.
Deterministic recovery values. Each overlap pair has a fixed modeled annual recovery from the OVERLAPS table (e.g. Apollo+ZoomInfo = $2,400/yr modeled recovery, every detection). The p25-p75 tightness in our distributions reflects this engine determinism, not bug.
No business-context override. The engine flags overlap based on capability redundancy. In practice some teams run both tools for legitimate reasons (regulated industry, post-acquisition transition, specific motion). The modeled recovery is the upper bound, not the recommended action.

Reproducibility contract

Same seed (SIM_SEED=42) + same archetype templates + same engine version = bit-identical aggregates.json. Anyone can verify the cited statistics by running the script themselves. The methodology_version field bumps when:

The scoring engine logic changes (new overlap pairs, updated cost modeling, etc.)
Archetype templates are added, removed, or rebalanced
Aggregation logic changes (new metrics, different percentile cutoffs)

When the version bumps, all content pages automatically pick up the new numbers on next deploy — the citation helpers in lib/stack-simulation/citations.ts always read the latest aggregates.

How to cite this dataset

Journalists and researchers are welcome to cite these statistics. Below are ready-to-paste citation formats. Please link back to this page so readers can verify the methodology.

Short citation (inline)

Source: StackSwap, "100,000 GTM Stack Simulation" (methodology v1.0.0, https://stackswap.ai/methodology).

Long citation (academic / report style)

StackSwap. (2026). 100,000 GTM Stack Simulation: Modeled tool prevalence, overlap, and recoverable spend across 12 operator archetypes (methodology v1.0.0, seed 42). Retrieved from https://stackswap.ai/methodology.

Example sentences (for journalists / writers)

“According to a 100,000-stack simulation by StackSwap, 81.66% of B2B SaaS GTM stacks contain at least one redundant tool pair.”
“StackSwap's modeled dataset of 100,000 synthetic GTM stacks puts the median annual recoverable spend at $93,240 per company.”
“A modeled analysis of 100,000 GTM configurations found that [Tool A] and [Tool B] appear together in [X]% of stacks — see stackswap.ai/methodology for the full table.”

Pre-built citation helpers for every overlap pair and tool are available in lib/stack-simulation/citations.ts — citationForOverlapPair() and citationForToolPrevalence() return formatted sentences for any tool or pair in the dataset.

Source

Simulator script: scripts/simulate-stacks-100k.ts
Scoring engine: lib/scanStack.ts (same function powers /stackscan)
Citation helpers: lib/stack-simulation/citations.ts
Output dataset: data/stack-simulation/aggregates.json (committed)
Tool universe: data/tools.json

FAQ

No, and that's the point. The 100,000 stacks are synthetic — generated by sampling realistic tool combinations from 12 operator archetypes, then run through the same scoring engine that powers the production StackScan tool at /stackscan. The aggregate statistics are real (derived from real engine runs on real cost data) but the inputs are modeled, not collected. We're explicit about this everywhere we cite the numbers because honesty about methodology beats fake customer counts.

Two reasons. First, we're pre-revenue with no real customer base — citing customer scans we don't have would be misleading. Second, even at scale a customer dataset would be biased toward customers who buy us (specific company sizes, motions, pain points). Synthetic stacks let us span the full operator distribution, including shapes a real customer pool would underweight. The methodology is reproducible: you can run the simulation yourself and get bit-identical outputs.

Each synthetic stack runs through `scanStack()` (lib/scanStack.ts in the open repo) — the same function called when a real user pastes their stack at /stackscan. It produces: current monthly spend (from the COSTS table in data/tools.json), optimized monthly spend (after applying overlap detection + replacement recommendations), per-tool verdicts (KEEP/REPLACE/REMOVE), and modeled annual recovery. We aggregate across all 100k runs to produce prevalence and recovery distributions per tool and per overlap pair.

Weights are estimated from operator population prevalence — not even-split. Mid-market sales-led (20%), growth-stage multi-channel (15%), early-stage founder-led (12%), dev-tools PLG (10%), enterprise RevOps (8%), and AI-native modern team (6%) are the primary buckets. We added late-stage enterprise (5%), post-acquisition tangled stack (3%), PLG + sales-assist hybrid (7%), and a few others for distribution coverage. Weights sum to 1.0 (asserted at script start).

The engine assigns a deterministic `save` value per overlap pair from the OVERLAPS array in data/tools.json. So every detection of "Apollo.io + ZoomInfo" gets the same modeled annual recovery ($2,400/yr in the current run). That tightness in the p25-p75 distribution is engine determinism showing through, not a bug. The detection rate (% of stacks containing the pair) varies by archetype mix — that's where the variance lives.

Clone the repo, install deps, then run `SIM_SEED=42 npm run simulate:100k`. Same seed + same archetypes + same engine = bit-identical aggregates.json. ~10 minutes on a modern laptop. Bumping `methodology_version` (currently 1.0.0) or changing the archetype templates will produce different numbers — we'll bump the version when that happens and refresh all cited content.

Honest list: (1) Slack appears in 100% of modeled stacks because every archetype includes it — that's an over-representation we know about. (2) The archetype distribution is operator-judged, not market-research-derived. We could be off by 5-10 points on any given archetype's weight. (3) Recovery numbers are deterministic per-overlap-pair (engine design choice). (4) The engine assumes overlap detection at face value; in practice some teams need both tools for legitimate reasons. None of these invalidate the citable patterns — but the numbers should be read as "modeled directional truth," not "empirical fact."

Yes. The dataset will refresh when (1) the engine changes (new overlap pairs, updated cost data, new replacement logic), (2) we add or rebalance archetypes based on operator feedback, or (3) we get real customer scan data and want to compare modeled vs. empirical. Each refresh bumps `methodology_version`. The citation helpers in `lib/stack-simulation/citations.ts` always read the latest aggregates — content automatically updates on next deploy.

No. Editorial decisions — which tools we recommend, where they rank in lists, what we say about them in reviews and comparisons — are made independently of affiliate economics. A handful of pages on this site link to vendor partners (Instantly, n8n, others); those links pay us a commission if you sign up, at no extra cost to you. They are clearly disclosed inline with a 'sponsored' label and a notice on every page where they appear. They do not influence whether a tool gets recommended, where it ranks, what we score it at, or what we write about it. If we recommend something it's because we think it's the right pick for the use case described — and we'll say no to a tool we don't actually believe in even if the affiliate economics are good. The /alternatives, /best-gtm-stack-for, /overlap, and /best-* pages explicitly include non-affiliate competitors at the same depth as affiliate-monetized tools.

Canonical URL: https://stackswap.ai/methodology
Generated at: 2026-05-06T01:56:31.303Z

How we model 100,000 GTM stacks

The pipeline

The 12 archetypes

What the engine produces

Top citable findings

Tool prevalence and engine verdicts

Most prevalent redundant pairs

Highest-value consolidation opportunities

Recoverable spend by team size

Independent validation

Honest limitations

Reproducibility contract

How to cite this dataset

Source

FAQ

Is this real customer data?

Why use synthetic stacks instead of real ones?

What's the engine doing?

How are the 12 archetypes weighted?

Why do recovery numbers look identical for many overlap pairs?

How do I reproduce the dataset?

What are the methodology's limitations?

Will this evolve?

Do affiliate or partner relationships affect rankings or recommendations?

Related