StackSwap · Methodology v1.1.0 · Last reviewed 2026-05-18

How we model 100,000 GTM stacks

Every cited statistic on stackswap.ai — "38% of modeled stacks contained both Outreach and HubSpot," "median annual recoverable: $93,240", "81.66% of stacks have at least one detected overlap" — comes from this simulation. Synthetic stacks, real engine, reproducible code. Here's exactly how it works.

Editorial firewall

How we keep recommendations honest while running on affiliate revenue:

  1. Facts are separate from recommendations. Tool facts — price/seat, integrations, contract minimums, data coverage — live in data/tools.json and are sourced and verified independently of any commercial relationship. The facts do not change based on who pays us.
  2. Equal-depth coverage for non-affiliates. When we compare or rank tools, non-affiliate competitors get the same word count, the same data depth, and the same fact verification as affiliate partners. ZoomInfo, Outreach, Gong, and Salesforce appear at full depth across our comparison and ranking pages despite having no commercial relationship with us.
  3. Affiliate signal only breaks ties. When the recommendation engine returns a ranked list for a buyer profile, fit is the primary score — cost-effectiveness, integration depth, stack compatibility, AI-readiness. Affiliate status can break ties between equivalent-fit tools but never moves an unfit tool above a fit one.
  4. We turn down partners. See Carveouts below for tools and categories we have actively declined to recommend or take money to promote. The list is reviewed quarterly.
  5. Affiliate links are labeled inline. Every partner link carries a sponsored label and a disclosure on the page where it appears, at no extra cost to the buyer.

If you ever see a recommendation that looks affiliate-weighted rather than fit-weighted, email nick@stackswap.ai with the page URL. We treat that as a trust bug and triage it the same day.

Tool fact corpus

Per-tool facts — price/seat, integrations, contract structure, data coverage — live in data/tools.json. Where they come from and how they stay current:

Carveouts

Tools and categories we have actively declined to recommend or take money to promote. Reviewed quarterly — if something moves off this list, we explain why in the methodology change log.

If you are a vendor who thinks we have represented you unfairly, or a buyer who thinks a tool we recommend belongs on this list instead, email nick@stackswap.ai with specifics. Every flag gets a real answer, not a form reply.

The pipeline

Three steps, all deterministic from a fixed seed:

  1. Generate. A seeded RNG (Mulberry32) picks an archetype weighted by realistic operator distribution, samples team headcount within that archetype's range, picks an industry, then assembles a tool list from the archetype's core tools, common additions (probabilistic), optional additions (probabilistic), and legacy drift (probabilistic). Every tool name is validated against TOOL_LIST at script start.
  2. Score. Each generated stack is passed to scanStack(input) — the same pure function that powers /stackscan. No HTTP, no DB. Output: current spend, optimized spend, per-tool verdicts, modeled recovery, overlap pairs detected, AI readiness scores, and tier pricing.
  3. Aggregate. Across all 100,000 runs we compute: tool prevalence (% of stacks containing each tool), overlap-pair prevalence + recovery distributions (median, p25/p75/p95), team-size-bucketed recovery medians, and headline summary stats. The result writes to data/stack-simulation/aggregates.json — committed in the repo, ~25KB. Raw per-stack CSV (~18MB) writes to simulations/ which is gitignored.

The 12 archetypes

Weights sum to 1.0. Distribution reflects estimated operator population prevalence — not even-split. Tools assigned to each archetype are sourced from realistic patterns we observe in the GTM operator community.

ArchetypeTeam-size rangeWeight
Early-stage B2B SaaS (founder-led)1-1012%
Mid-market B2B SaaS (sales-led)15-5020%
Growth-stage mid-market (multi-channel)30-10015%
Enterprise RevOps100-5008%
Dev-tools PLG10-5010%
AI-native modern team10-406%
Late-stage enterprise (multi-region)500-20005%
Post-acquisition tangled (Outreach + Salesloft)50-3003%
PLG + sales-assist hybrid25-1507%
Mid-market sales with Apollo + HubSpot10-806%
Bootstrapped lean (1-15)1-155%
Marketing-led B2B (HubSpot heavy)20-803%

What the engine produces

From the current run (SIM_SEED=42 npm run simulate:100k):

How the AI-native score works

The AI-native score (0–100) estimates how ready a stack is to participate in AI and automation workflows — agent orchestration, data piping, programmatic triggers. It is a capability model, scored from each tool's documented integration surface (public API, automation hooks, data access, native AI) — not a measure of how heavily a given team actually uses those capabilities. It is deterministic: the same stack always scores the same. Source: lib/ai-readiness.ts.

Each tool gets four sub-scores (0–100), combined with fixed weights:

DimensionWeightWhat it measures
API30%Public REST API, webhooks, SDKs — can an agent read and write this tool programmatically?
Automation30%Native triggers and first-class support in Zapier / Make / n8n pipelines.
Data access20%Export, warehouse sync, and API-level access to the data the tool holds.
AI-native20%Built-in AI features, not bolt-on — the tool was designed around models, not retrofitted.

Within each dimension a tool lands on a tier, so the numbers fall out of capability rather than being hand-set per tool: a legacy tool (one we carry an AI-native swap for) scores low across the board (~20–30), a tool with a strong public API scores 90 on the API axis, and an AI-native tool scores 95 on the AI axis.

Worked example. A legacy tool scores about 26: API 25, automation 30, data 25, AI-native 20, blended as 25·0.3 + 30·0.3 + 25·0.2 + 20·0.2. An AI-native tool scores about 85 (API 85, automation 80, data 82, AI-native 95). The stack score is just the average of its tools' scores — which is why replacing two legacy tools with AI-native ones can swing the headline 30+ points: you pull ~26s out of the average and drop ~85s in.

The projected score on a StackScan report is that same average recomputed after the recommended swaps are applied — legacy tools removed, replacements added. Four classification bands:

ScoreClassification
0–30Legacy Stack
31–60AI-Capable
61–80Automation Ready
81–100AI-Native Stack

Honest limitation. This scores capability, not adoption. A tool with a great API still scores 90 on that axis even if your team has never connected it — the score tells you what is possible with the stack, not what you have wired up. Read it as a ceiling on AI and automation leverage, not a usage report.

Top citable findings

Tool prevalence and engine verdicts

Top 20 tools by prevalence across 100,000 modeled stacks. Replace rate = % of stacks containing that tool where the engine flagged a replacement candidate (e.g. Apollo for Outreach). Remove rate = % where another tool already in the stack made it redundant.

Tool% of stacksReplace rateRemove rate
Slack100%
Notion62.81%
ZoomInfo61.37%84.19%1.78%
Outreach54.38%100%
Salesforce50.88%39.51%
Gong50.88%46.75%
HubSpot49.12%1.31%
LinkedIn Sales Navigator46.49%
Calendly44.6%
Apollo.io34.69%
HubSpot Marketing Hub32.94%29.95%
Marketo27.67%18.4%
Clari23.79%
Loom20.11%
Mixpanel19.94%
Linear19.68%
Salesloft18.74%100%
Segment18.07%
Clearbit13.99%61.76%32.17%
Chorus12.98%100%

Most prevalent redundant pairs

The 15 overlap pairs that show up most often across the modeled population. Recovery values are deterministic per pair (see limitations) — consider them modeled annual upper bounds for the consolidation move.

Pair% of stacks with bothMedian annual recovery
HubSpot Marketing Hub + Salesforce29.97%$1,800/yr
Clari + Gong23.79%$1,200/yr
Apollo.io + ZoomInfo20.49%$2,400/yr
Outreach + Salesloft18.74%$1,200/yr
Apollo.io + Outreach17.75%$1,200/yr
Linear + Notion15.2%$960/yr
Clearbit + ZoomInfo13.99%$3,600/yr
Chorus + Gong12.98%$1,200/yr
HubSpot Marketing Hub + Marketo11.91%$17,280/yr
HubSpot + Mailchimp11.78%$1,200/yr
6sense + ZoomInfo11.49%$15,600/yr
Clearbit + LinkedIn Sales Navigator9.63%$540/yr
Salesforce + Salesforce Tableau5.97%$7,200/yr
Marketo + Pardot5.75%$33,000/yr
Bombora + ZoomInfo3.11%$15,600/yr

Highest-value consolidation opportunities

Top 10 overlap pairs by modeled annual recovery, ranked by dollars. Less prevalent than the list above, but higher-impact when present.

PairMedian annual recovery% of stacks with both
Marketo + Pardot$33,000/yr5.75%
6sense + Demandbase$24,000/yr2.72%
HubSpot Marketing Hub + Marketo$17,280/yr11.91%
6sense + ZoomInfo$15,600/yr11.49%
Bombora + ZoomInfo$15,600/yr3.11%
6sense + Bombora$12,000/yr1.76%
6sense + Cognism$12,000/yr1.16%
Bombora + Cognism$12,000/yr0.53%
Salesforce + Salesforce Tableau$7,200/yr5.97%
HubSpot + HubSpot Marketing Hub$7,200/yr2.97%

Recoverable spend by team size

Median modeled monthly waste scales sharply with headcount. The biggest single jump is 6–15 to 16–25 — the inflection point where stack drift starts costing real money. Useful framing for “when to audit your stack” conversations.

Team size% of all stacksMedian monthly recoverableMedian monthly spend
1-57.81%$0/mo$580/mo
6-1513.04%$80/mo$1,950/mo
16-2511.05%$5,120/mo$11,750/mo
26-5032.03%$8,680/mo$21,090/mo
51-10018.01%$21,410/mo$46,030/mo
101-2005.97%$25,890/mo$74,050/mo
201-5007.15%$76,280/mo$178,300/mo
501-10001.61%$130,630/mo$407,360/mo
1000+3.33%$239,890/mo$751,620/mo

Independent validation

Modeled stacks are how StackSwap measures the patterns. The State of GTM Engineering 2026 (OneGTM, n=228) is how the operators in those stacks describe themselves. The numbers below are theirs — included here to show that the things our engine scores for (overlap, consolidation, AI-readiness, fit-for-stage) match what the audience actually says they need.

Honest limitations

  1. Synthetic, not empirical. These are modeled stacks, not real customer scans. We're explicit about that everywhere the data is cited. Pre-revenue means no customer base; we'd rather model honestly than fabricate a customer count.
  2. Slack at 100% prevalence. Every archetype includes Slack as a core tool, so it shows up in 100% of stacks — that's an over-representation we know about. Doesn't affect overlap-pair statistics but inflates Slack's prevalence number.
  3. Operator-judged weights. The 12 archetype weights are estimated from operator pattern recognition, not external market research. We could be off by 5-10 percentage points on any given archetype's real-world prevalence.
  4. Deterministic recovery values. Each overlap pair has a fixed modeled annual recovery from the OVERLAPS table (e.g. Apollo+ZoomInfo = $2,400/yr modeled recovery, every detection). The p25-p75 tightness in our distributions reflects this engine determinism, not bug.
  5. No business-context override. The engine flags overlap based on capability redundancy. In practice some teams run both tools for legitimate reasons (regulated industry, post-acquisition transition, specific motion). The modeled recovery is the upper bound, not the recommended action.

Reproducibility contract

Same seed (SIM_SEED=42) + same archetype templates + same engine version = bit-identical aggregates.json. Anyone can verify the cited statistics by running the script themselves. The methodology_version field bumps when:

When the version bumps, all content pages automatically pick up the new numbers on next deploy — the citation helpers in lib/stack-simulation/citations.ts always read the latest aggregates.

How to cite this dataset

Journalists and researchers are welcome to cite these statistics. Below are ready-to-paste citation formats. Please link back to this page so readers can verify the methodology.

Short citation (inline)

Source: StackSwap, "100,000 GTM Stack Simulation" (methodology v1.1.0, https://stackswap.ai/methodology).

Long citation (academic / report style)

StackSwap. (2026). 100,000 GTM Stack Simulation: Modeled tool prevalence, overlap, and recoverable spend across 12 operator archetypes (methodology v1.1.0, seed 42). Retrieved from https://stackswap.ai/methodology.

Example sentences (for journalists / writers)

Pre-built citation helpers for every overlap pair and tool are available in lib/stack-simulation/citations.ts citationForOverlapPair() and citationForToolPrevalence() return formatted sentences for any tool or pair in the dataset.

Source

FAQ

No, and that's the point. The 100,000 stacks are synthetic — generated by sampling realistic tool combinations from 12 operator archetypes, then run through the same scoring engine that powers the production StackScan tool at /stackscan. The aggregate statistics are real (derived from real engine runs on real cost data) but the inputs are modeled, not collected. We're explicit about this everywhere we cite the numbers because honesty about methodology beats fake customer counts.

Two reasons. First, we're pre-revenue with no real customer base — citing customer scans we don't have would be misleading. Second, even at scale a customer dataset would be biased toward customers who buy us (specific company sizes, motions, pain points). Synthetic stacks let us span the full operator distribution, including shapes a real customer pool would underweight. The methodology is reproducible: you can run the simulation yourself and get bit-identical outputs.

Each synthetic stack runs through `scanStack()` (lib/scanStack.ts in the open repo) — the same function called when a real user pastes their stack at /stackscan. It produces: current monthly spend (from the COSTS table in data/tools.json), optimized monthly spend (after applying overlap detection + replacement recommendations), per-tool verdicts (KEEP/REPLACE/REMOVE), and modeled annual recovery. We aggregate across all 100k runs to produce prevalence and recovery distributions per tool and per overlap pair.

It is a deterministic capability model (lib/ai-readiness.ts), not a usage metric. Each tool gets four 0-100 sub-scores: API access (30%), automation support (30%), data access (20%), and built-in AI (20%), set by which capability tier the tool falls into. A legacy tool scores ~20-30 per axis; a strong public API scores 90; an AI-native tool scores 95 on the AI axis. The per-tool total is the weighted blend; the stack score is the average across all tools. Bands: 0-30 Legacy, 31-60 AI-Capable, 61-80 Automation Ready, 81-100 AI-Native. The projected score on a StackScan report is the same average recomputed after the recommended swaps are applied. It measures what the stack makes possible, not what a given team has wired up. Full breakdown: stackswap.ai/methodology#ai-native-score.

Weights are estimated from operator population prevalence — not even-split. Mid-market sales-led (20%), growth-stage multi-channel (15%), early-stage founder-led (12%), dev-tools PLG (10%), enterprise RevOps (8%), and AI-native modern team (6%) are the primary buckets. We added late-stage enterprise (5%), post-acquisition tangled stack (3%), PLG + sales-assist hybrid (7%), and a few others for distribution coverage. Weights sum to 1.0 (asserted at script start).

The engine assigns a deterministic `save` value per overlap pair from the OVERLAPS array in data/tools.json. So every detection of "Apollo.io + ZoomInfo" gets the same modeled annual recovery ($2,400/yr in the current run). That tightness in the p25-p75 distribution is engine determinism showing through, not a bug. The detection rate (% of stacks containing the pair) varies by archetype mix — that's where the variance lives.

Clone the repo, install deps, then run `SIM_SEED=42 npm run simulate:100k`. Same seed + same archetypes + same engine = bit-identical aggregates.json. ~10 minutes on a modern laptop. Bumping `methodology_version` (currently 1.1.0) or changing the archetype templates will produce different numbers — we'll bump the version when that happens and refresh all cited content.

Honest list: (1) Slack appears in 100% of modeled stacks because every archetype includes it — that's an over-representation we know about. (2) The archetype distribution is operator-judged, not market-research-derived. We could be off by 5-10 points on any given archetype's weight. (3) Recovery numbers are deterministic per-overlap-pair (engine design choice). (4) The engine assumes overlap detection at face value; in practice some teams need both tools for legitimate reasons. None of these invalidate the citable patterns — but the numbers should be read as "modeled directional truth," not "empirical fact."

Yes. The dataset will refresh when (1) the engine changes (new overlap pairs, updated cost data, new replacement logic), (2) we add or rebalance archetypes based on operator feedback, or (3) we get real customer scan data and want to compare modeled vs. empirical. Each refresh bumps `methodology_version`. The citation helpers in `lib/stack-simulation/citations.ts` always read the latest aggregates — content automatically updates on next deploy.

No. Editorial decisions — what we recommend, how tools rank, what we write — are made independently of affiliate economics. See the Editorial firewall section above for the full policy. Short version: facts are separate from recommendations, non-affiliate competitors get equal-depth coverage, affiliate signal can only break ties between equivalent-fit tools, and we publish a Carveouts list of tools and categories we've actively declined to recommend or take money to promote (Hyperbound, deceptive-email-capture tools, cold-email-at-scale spam, review-site arbitrage). If you ever see a recommendation that looks affiliate-weighted rather than fit-weighted, email nick@stackswap.ai with the page URL — we treat that as a trust bug.

Related

Canonical URL: https://stackswap.ai/methodology
Generated at: 2026-06-02T01:48:55.367Z