Evidence Notes for Human Intelligence Debt Propossed Study

Document Status — Preliminary Evidence Note · Companion to Paper 5

This is not a further conceptual paper in the Human Intelligence Gap series. It is the preliminary-evidence attachment that the measurement working paper (Paper 5, Measuring Human Intelligence Debt) points to: a record of the public signatures of architectural disorder we have already found, presented as preliminary supporting evidence rather than as a result. It documents a present-day deficit — that organisational architecturability is low now, and has resisted three decades of tooling — and a pattern of non-convergence. It does not claim to prove cumulative year-over-year decay, nor any causal or structural claim about Human Intelligence Debt; those are earned only by the experimental programme in Paper 5. Every third-party figure below is a benchmark with selection bias, flagged accordingly; series that count different populations with different methods are reported separately and never merged into a single number; analyst forecasts are marked as forecasts; and figures of contested origin are named as folklore.

A Flagged Evidence Map of Capability Multiplicity, Portfolio Answerability, and Three Decades of Non-Convergence — Compiled to Accompany the Human Intelligence Debt Measurement Programme

How to read the verification tags. Every figure carries a status tag, so it can be re-found, weighted, or challenged:

[confirmed] — located and corroborated in this research pass.
[vendor] — vendor benchmark; the publisher sells a solution to the problem it measures, on a cloud-forward, sprawl-prone sample (selection bias).
[consortium] — industry-consortium data; more neutral than single-vendor sources.
[academic] — peer-reviewed.
[secondary] — cited via a secondary source; primary not yet confirmed.
[needs-primary] — repeated widely but the primary citation is unconfirmed; verify or drop before external use.
[folklore] — contested or mis-attributed origin; cite as folklore, not evidence.
[firewalled] — real and rigorous, but about code architecture, a different level of analysis from enterprise architecturability; used as methodological precedent only.
[forecast] — an analyst prediction, not a measurement.

Prefatory Note — what this evidence can and cannot carry

The Human Intelligence Debt programme separates into three layers, and most of the risk in the work comes from blurring them: evidence (what public data shows), epistemics (what that licenses one to claim), and execution (how to measure the debt inside a real organisation). This note is the first layer, and it is disciplined about its own ceiling. The recurring failure mode — the one this note exists to prevent — is letting execution-layer causal claims borrow evidence-layer credibility.

So the single most important sentence in this document is a limit, not a claim: deficit is not decay. The public record establishes, durably and from many independent directions, that organisational architecturability is low today and has not converged despite successive waves of tooling. That is a present-day deficit. Whether architecturability has fallen over time — true cumulative decay — is a stronger, temporal claim that the public record only hints at, because almost every available series is a repeated snapshot of a changing population rather than a tracked measurement of the same estates. The honest verdict, stated up front and defended in the closing section, is that the evidence supports «flat failure despite rising investment and tooling», with two directions of genuine recent worsening, and falls short of proving accumulating entropy.

A second discipline runs throughout. The firms that publish most of this data — identity providers, SaaS-management vendors, ITAM and FinOps platforms — sell the cure for the disease they measure, to customers already predisposed to have it. Their figures are benchmarks, not population statistics, and they are flagged as such on every appearance. Where a more neutral source exists (an industry consortium, a peer-reviewed study, a government standard), it is used to triangulate. Numbers from incompatible panels are reported side by side, never averaged: they converge on a conclusion, not on a single figure.

Part 1 — The Three Indicators (the measurement spine)

Paper 5 proposes three external indicators of architectural entropy. This section gives them their evidence. Their suitability is unequal, and the framing has been corrected from earlier drafts in three specific ways that matter for credibility.

Indicator 1 — Market supply per capability (context only)

As the market for any capability matures, it produces more solutions able to serve that capability end to end, so the option to consolidate onto a capable platform becomes more available over time, not less. The cleanest public series is a single capability domain tracked on a stable method for fifteen years: the marketing-technology landscape, which recorded roughly 150 solutions in 2011 against more than fifteen thousand by the mid-2020s — a hundredfold growth in supply, with year-over-year growth flattening to a high-churn plateau in the most recent editions rather than indefinite expansion [vendor/industry; confirmed]. Adjacent counts tell the same story at smaller scale: customer-data platforms grew from roughly 17 (2016) to several hundred by the mid-2020s [vendor; confirmed].

This series carries two readings, and the discipline is to keep them apart. It shows that the supply of capable solutions has exploded, so the feasibility of consolidation has never been higher; and it shows that consolidation, repeatedly forecast by analysts across these years, did not occur at the vendor level — new entrants arrive faster than incumbents disappear. But a market-supply count measures products that exist, not whether one product covers a whole capability, and certainly not what any single organisation deploys. It measures optionality and fragmentation of the market, not end-to-end feasibility inside a firm. We therefore demote this indicator to context. The principled replacement — a minimum coherent solution set, the smallest set of solutions covering a threshold of weighted capability requirements — is a research contribution to be built, not data that exists, and must never become a precondition for publishing the rest.

Indicator 2 — Capability multiplicity and functional redundancy (the strong core)

This is the corrected form of «applications deployed per capability». At the level of the whole estate, public panels make the trend visible. An identity-provider panel reports the average number of applications per company crossing one hundred for the first time in its 2025 edition, after sitting in the low nineties for several prior years, and records direct duplication — on the order of half of one suite’s customers also running a competing suite [vendor; confirmed]. A SaaS-management panel on a broader portfolio definition reports figures in the high two-hundreds to low three-hundreds, and — decisively for this indicator — observes per-capability redundancy directly: on the order of fourteen training applications, ten project-management tools, and nine or ten collaboration tools in a single organisation [vendor; confirmed]. A management platform reports roughly 371 applications per organisation, about half of them unmanaged [vendor].

What this indicator must not claim is monotonic growth. It does not hold: a managed-application panel shows two consecutive years of decline — roughly 130 (2022), 112 (2023), 106 (2024) — a clean refutation of any «counts always rise» story [vendor; confirmed]. The two panels that disagree on direction are measuring different populations with different definitions (a narrower managed set versus a broader portfolio including shadow IT), differing by a factor of about three in absolute count; they are not reconciled into one number, and their shapes agree on the conclusion that matters — sprawl peaked, briefly consolidated, and is re-inflating with AI tooling — without agreeing on a level.

The defensible claim, and the stronger one because the data support it, is therefore not growth but persistent multiplicity, persistent functional redundancy, and persistent churn surviving despite explicit consolidation effort. The right measures are not raw counts but: an Effective Application Count that weights applications by usage or workload share, so a niche tool and a system carrying ninety per cent of transactions are not treated as equivalent (an inverse-Herfindahl of usage share — relevant because one panel reports only about half of provisioned licences are actually used [vendor]); a Functional Redundancy Index; and a Justified Multiplicity Ratio allowing for the multiplicity a domain genuinely needs (resilience, regulation, regional autonomy, specialised function). A neutral government standard supports the underlying premise: the US Department of Defense Business Enterprise Architecture guidance treats multiple systems per capability, process or data domain as duplication, fragmentation and risk [secondary/needs-primary].

Indicator 3 — Portfolio answerability and retrieval cost (the strongest, the most exposed)

This is the corrected form of «TCO answerability», and it is the decisive indicator, because it explains why the gap cannot be closed: the governance knowledge required to close it has itself decayed. The supporting figures are directional and lean heavily on vendor sources, so it is the indicator that must be handled most defensively. Organisations underestimate their own estates — reportedly by about 1.7× on applications and about 3× on spend [vendor] — and the share of applications owned by central IT has fallen sharply on one panel, from roughly a quarter to roughly an eighth, with the great majority of applications and spend now sitting outside IT [vendor]. On the visibility question, an ITAM survey records the share of respondents reporting complete visibility falling across editions, from 47% to 43% [vendor; confirmed] — one of the few directly repeated cross-sections available, and therefore one of the strongest single hints of worsening, though still a repeated snapshot of a shifting sample. A cloud survey reports roughly 29% wasted cloud spend, with managing spend the top challenge for the large majority [vendor; confirmed]. On enterprise architecture specifically, the most-repeated analyst figure holds that only about a quarter of organisations get meaningful value from a configuration management database [needs-primary]; paired claims that a majority of manually entered CMDB data is inaccurate, or that most CMDB projects fail, circulate widely but lack a confirmable primary source and should be verified or dropped before external use [needs-primary].

Two corrections make this indicator defensible. First, on the construct: total cost of ownership is not complexity-invariant in the naive sense, so the question is not «can they produce a TCO number?» but «can they produce a reproducible cost under a standardised boundary and allocation method?» Ownership, likewise, is multidimensional — business, technical, budget, and the one that actually governs rationalisation, decommission authority — so the question is not whether a single generic owner name exists but whether explicit decommission accountability can be named. The instrument is a fixed discovery checklist applied to a sample of applications, recording the fraction answerable from governed sources, the fraction reconstructable only by archaeology, and the fraction unobtainable, scored as a Portfolio Answerability Index (completeness, provenance, freshness, consistency, retrieval-effort penalty) alongside a simple Time-to-Answer.

Second, and decisively: this is a deficit, not a proven decay. The figures above establish that answerability is low now. The one repeated cross-section that worsened (visibility 47%→43%) is a population snapshot, not a tracked panel of the same estates; reverse causality compounds the danger, since sicker estates launch more modernisation programmes, so any cross-sectional association between «number of past programmes» and «poor answerability» cannot be read causally. Establishing genuine decay requires the longitudinal, archival or flow evidence that Paper 5’s velocity instrument and experimental ladder are built to supply. The most neutral longitudinal instrument in this space is an industry-consortium survey of financial-operations practice, which is the closest available approximation to a neutral repeated measure and should be weighted accordingly [consortium; confirmed].

Part 2 — The Wider Evidence Base (the non-convergence record)

The three indicators do not stand alone. They sit inside a thirty-year record across many technology waves, and that record is the single most persuasive thing this note has to offer — not because any one series proves decay, but because the same blocker recurs across every generation of tooling, which is exactly what an architectural-entropy account predicts. Each direction below is given with its figures, its meaning, and its caveat.

2.1 IT and software project outcomes

The longest-running series tracks IT project success, challenged, and failure rates over three decades: from roughly 16% / 53% / 31% in 1994 to roughly 31% / 50% / 19% in the 2020s [confirmed]. Success roughly doubled from its 1994 low but then plateaued around a third, despite Agile, DevOps and cloud — a flat trend over thirty years. The essential academic caveat is that a peer-reviewed critique (Eveleens & Verhoef, IEEE Software, 2010), applying the series’ own definitions to 5,457 forecasts across 1,211 real projects, found the figures to be a definitional artifact: success and challenge are defined by the accuracy of cost and time estimates, not by value delivered, so the flat trend partly reflects unchanged estimation bias rather than unchanged capability [academic]. A separate large study (McKinsey–Oxford, 2012, on more than 5,400 IT projects) corroborates the picture for large projects: on average about 45% over budget, 7% over time, and 56% under the predicted value, with about 17% of large projects classed as ruinous «black swans» [confirmed]. Meaning: a durable, decades-long failure plateau despite continuous tooling investment — strongly consistent with the thesis, but with a real definitional confound on the headline series.

2.2 Data warehouse, big data, and paradigm recurrence

Across data-platform waves the pattern repeats. Data-warehouse failure was put above 50% in the mid-2000s; a 2012 survey found roughly 41% considered successful [secondary]. Big-data project failure was estimated at around 60% in 2016, with a prominent analyst then calling that too conservative and putting the real rate «closer to 85%», later observing that nothing had changed [confirmed]. The most thesis-relevant signal here is not any single failure rate but the recurrence of paradigms: relational warehouse, data mart and operational data store, data lake (early 2010s), the «data swamp» backlash, the logical or modern warehouse (~2012), data fabric (~2019), data mesh (2019), and the lakehouse (~2020) — a new architecture roughly every five years, each promising to finally make data usable [confirmed]. Meaning: serial reinvention is itself evidence of non-convergence; if architecturability were improving, one would expect consolidation onto a stable paradigm, not a new one every half-decade.

2.3 AI, machine learning, and generative AI — the clearest recent signal

This is the area with the sharpest recent longitudinal movement, and it deserves careful sourcing because it is also the area most polluted by folklore. The widely-repeated claim that «87% of data science projects never reach production» traces to a 2019 conference panel, not to any rigorous survey, and should be treated as folklore, not evidence [folklore]. The defensible series is an analyst’s tracked production rate: roughly 53% of AI projects reaching production (2019), 54% (2021, n≈699), and 48% (2024, taking some eight months from prototype) — flat-to-declining despite a massive tooling build-out, and directly thesis-consistent, with a caveat that the survey year of the earliest figure is reported inconsistently [confirmed]. Two further sources sharpen the recent picture. A 2024 RAND study (65 practitioner interviews) found that more than 80% of AI projects fail — about twice the rate of non-AI IT — with data and infrastructure among the leading causes [academic-adjacent; confirmed]. And the strongest single piece of recent worsening: an enterprise survey (n≈1,006, late 2024) found the share of companies abandoning the majority of their AI initiatives surged from 17% to 42% year over year, with roughly 46% of projects scrapped between proof of concept and broad adoption [confirmed]. A widely-cited industry report (MIT NANDA, 2025) frames the same gap as roughly 95% of organisations seeing no measurable profit-and-loss impact from generative AI — high adoption, low transformation [confirmed; industry, not peer-reviewed]. Several analyst forecasts sit alongside these measurements and must be labelled as forecasts: that a substantial share of generative-AI projects will be abandoned after proof of concept, that a majority of AI projects will be abandoned for want of AI-ready data, and that a large share of agentic-AI projects will be cancelled within a few years [forecast]. Meaning: the production-rate plateau and the abandonment surge are the two cleanest recent worsening signals in the whole record — though both are recent enough that they could be artifacts of the generative-AI hype cycle rather than durable decay.

2.4 «Data isn’t ready» — the thirty-year root cause

The single most thesis-consistent qualitative finding is that the same root cause recurs across every wave. Failure was attributed to data not meeting requirements in the warehouse era; to integration, skills and governance in the big-data era; to a lack of adequate, well-managed training data in the machine-learning era; and to «AI-ready data», quality and integration in the generative-AI era. The often-cited «80% of data scientists’ time is spent on data preparation» traces to a 2014 newspaper figure of «50 to 80 percent», echoed by surveys across the following decade, and is best treated as a soft, partly artifactual figure whose persistence across a decade of tooling is the notable part, not its precision [confirmed origin; secondary critique]. Meaning: the identical blocker recurring across data warehousing, CRM, big data, data science and generative AI is the strongest non-convergence argument in the note — the same wall, hit by every new generation of tools.

2.5 ERP, digital transformation, dark data, data-quality cost, trust

ERP. Consultant reports put a large majority of implementations over timeline, with a five-year aggregate around 58% over budget, 65% over schedule, and roughly half achieving under half their expected benefits; a more recent edition shows some improvement on budget adherence against declining satisfaction — flat, with a vendor-selection caveat [vendor; confirmed].

Digital transformation. The familiar «70% fail» traces to a 2018 consultancy framing; the more precise version (a 2020 study) found only 30% of transformations met or exceeded their value target, 44% created partial value, and 26% limited value — drawn from work with about 70 companies plus an 825-executive survey, not «850 companies» as often paraphrased; a 2024 analysis put the shortfall figure near 88% [confirmed]. The stability is real but partly citation inertia, since definitions of «fail» drift across restatements — a changing-definition confound to flag.

Dark data. A 2019 survey put roughly 55% of organisational data «dark» — unused, or unknown — a share repeated through the mid-2020s, against estimates of data volume growing to hundreds of exabytes per day [vendor; confirmed]. The share is flat; the absolute quantity of un-modelled reality grows with total volume — a denominator effect that matters if architecturability is read as the compressibility of reality into a usable model.

Cost of poor data quality. A per-organisation figure rose across analyst restatements (roughly $8.8M, then $9.7M, then a canonical $12.9M/year in 2020, with later restatements near $15M), alongside a much-cited multi-trillion national figure [secondary/needs-primary]. This is the weakest evidence in the note and is flagged as such: it is most likely a measurement and restatement artifact across differing surveys and definitions, not a clean longitudinal series. It is included to be dismissed, not leaned on.

Trust in data. A vendor analytics survey reported confidence in data’s relevance down 18% and in its accuracy down 27% against a 2023 baseline; an earlier survey found only about a third of executives trust their data enough to act on it [vendor; confirmed]. Flat-to-worsening on trust, with a vendor caveat.

2.6 Technical debt — the direct self-reported worsening

The clearest self-reported worsening signal comes from technical-debt research: technical debt estimated at roughly 20–40% of the technology estate’s value (around 40% of IT balance sheets), 10–20% of new-product budgets diverted to servicing it, and — the load-bearing datapoint — about 60% of CIOs reporting that their technical debt had risen over the prior three years [confirmed]. About half of completed modernisation programmes reportedly failed to reduce technical debt; a large majority of CIOs say existing-system complexity prevents next-generation investment; and one developer survey put more than 40% of developer time on technical debt and bad code [confirmed]. Meaning: a direct, self-reported, longitudinal worsening on the people closest to the estate — recent, and consistent with accumulating debt, though again within the window where the generative-AI cycle could be inflating it.

Part 3 — The Firewalled Precedent: software-architecture decay

There is one place where architectural decay has already been measured rigorously, longitudinally, and in time-series data: the software-engineering literature on code architecture. Change entropy rises in files touched by many developers over a project’s history; architectural decay is predictable across successive versions of a system, including sudden decay events, and refactoring measurably reduces it [firewalled; needs-primary]. This is the best existing proof that architectural decay is a real, detectable, instrumentable phenomenon — and it supplies validated methods. But it must be firewalled: it concerns code architecture (modules, dependencies, commits), a different level of analysis from enterprise architecturability (systems, capabilities, governance). It is used here only as methodological precedent and plausibility — never as enterprise evidence, and never merged into the enterprise record.

Part 4 — Reading Decay From a Snapshot: the velocity instrument

Because almost every series above is a repeated snapshot of a changing population, the temporal question — has architecturability fallen? — is the one the public record answers least well. Paper 5’s velocity instrument is the response, and it belongs here because it is what would convert this note’s deficit into a defensible reading of direction. The move is to measure flows, not stocks: at a single instant, the rate at which order is destroyed against the rate at which it is created.

order_destruction_rate = (undocumented changes + new shadow assets + sole-owner departures + records going stale) per unit time
order_creation_rate = (assets onboarded to the model + records verified + assets decommissioned + handovers documented) per unit time

entropy_production ∝ order_destruction_rate − order_creation_rate ( > 0 ⇒ architecturability declining now )
fidelity_half_life ≈ ln(2) / verification_decay_rate (from the «last-verified» staleness distribution)

If destruction exceeds creation at the instant of measurement, architecturability is declining at that instant — no history required. The reading is honest about its own limit: it hints at instantaneous direction, not a proven multi-year trajectory, since flows can change. But it converts an almost unanswerable question into one obtainable from a single good audit, and there is a sharper point inside it that this note’s own difficulties demonstrate: the inability to reconstruct the past is itself a reading of decay. An orderly estate keeps its records; an estate that cannot tell you its own history is already telling you something. The reason this note can document a deficit but not a decay is, in part, the very phenomenon it is trying to measure.

Part 5 — The Ledger: established, contested, open

To keep the layers honest, the evidence sorts into three bins.

Established (publicly defensible now)

Enterprise software estates are large — from around a hundred identity-connected applications to several hundred in broader portfolios.
Functional redundancy per capability is directly observable — on the order of ten or more applications for several common functions.
Purchasing and ownership are increasingly decentralised, outside central IT.
Organisations systematically underestimate their own estates (applications and, more sharply, spend).
Visibility remains weak despite dedicated portfolio, ITAM, SaaS-management and FinOps tooling, and has worsened in at least one repeated cross-section.
Application count alone is not the whole problem — some estates are consolidating even as spend, churn and visibility problems persist.

Contested or refuted (claims that over-reach)

That applications per capability rise monotonically — refuted by the consecutive-decline panel.
That one application per capability is generally optimal — it varies by domain, and justified multiplicity is real.
That per-application answerability has decayed continuously over decades — the evidence shows a deficit, not a tracked decay.
That more modernisation programmes cause worse answerability — confounded by reverse causality (sick estates launch more programmes).
That Human Intelligence Debt is demonstrably growing as a population-level quantity — not established by this record.

Open (the research frontier — Paper 5’s territory)

The value of the recovery coefficient ρ and its dynamics (hysteresis) — field-only, unproven, the keystone empirical question.
A neutral, reproducible, longitudinal measure of architecturability — it does not exist publicly; building it (or the velocity instrument) may be the programme’s single most original contribution.
The oversight threshold (the U-shaped relationship between net oversight impact and coupling) that bridges this series to the AI Operational Integrity Architecture series.
Sector-specific multiplicity, answerability and decay profiles.

Part 6 — The Honest Verdict

Taken together, the public and historical record provides a credible hint — not proof — that the situation is at least not improving and may be worsening. Non-convergence despite three decades of investment is well supported: the failure plateaus, the recurring «data isn’t ready» blocker, and the five-yearly paradigm churn are mutually reinforcing and drawn from many independent directions. Genuine recent worsening is visible clearly in only two places — the year-over-year surge in AI-project abandonment, and the self-reported rise in technical debt — and both are recent enough to be confounded by the generative-AI hype cycle. Cumulative decay of architecturability itself is hinted at, not demonstrated, because no public series cleanly tracks the same estates over time.

Two further honesties are owed. First, the convergence of sources is partly an artifact of their overlap: much of this record draws on a similar pool of vendors and consultancies, so their agreement confirms an interpretation of that pool, not two independent confirmations of the world. Second, real confounds run throughout — rising ambition and scope (the target is harder, so constant failure may mask improving execution), genuinely increasing complexity, survivorship bias (failed firms exit, so surveys of survivors understate failure), and drifting definitions of «success» and «failure» across editions.

And there is one direction that cuts against the thesis, which intellectual honesty requires stating plainly: the brief SaaS consolidation of the early 2020s and the modest ERP budget-adherence gains show that focused ordering effort can work. That is consistent with the programme’s own premise — that entropy is reversible with sustained ordering effort — but it also demonstrates that decay is neither monotonic nor inevitable. The thesis is supported at the level of «non-convergence despite investment»; it is not yet established at the level of «measured cumulative decay». Closing that gap is precisely the job of Paper 5.

Finally, the benchmarks that would change this verdict, stated so the note can be held to account: if tracked AI production rates rebound above roughly 60% in future surveys, tooling is winning and the thesis weakens; if abandonment rates fall back toward earlier levels, the recent spike was a hype artifact, not decay; and if a purpose-built repeated measure of architecture-model fidelity shows fidelity rising over time, the architectural-entropy reading is directly disconfirmed. The framework names its own falsifiers — which is the property this evidence note exists to protect.

Sources and verification status

Figures are drawn from a public-data scan and a longitudinal evidence review as of early 2026, to be refreshed and primary-sourced before any external publication. They are grouped by direction; tags as defined at the head of this note.

IT/software project outcomes. Standish Group CHAOS series (1994–2020s) [confirmed]; Eveleens & Verhoef, «The Rise and Fall of the Chaos Report Figures», IEEE Software 27(1), 2010 [academic]; McKinsey–Oxford (2012), 5,400+ IT projects [confirmed].
Data warehouse / big data / paradigms. Analyst data-warehouse and big-data failure estimates (mid-2000s–2017) [secondary/confirmed]; the data-architecture paradigm timeline (data lake → logical DW → data fabric → data mesh → lakehouse) [confirmed].
AI / ML / GenAI. The «87% never reach production» panel claim [folklore]; analyst AI production-rate series 53%→54%→48% [confirmed]; analyst abandonment and AI-ready-data forecasts [forecast]; RAND, «The Root Causes of Failure for AI Projects» (2024), 65 interviews [academic-adjacent]; MIT NANDA, «The GenAI Divide» (2025) [confirmed; industry]; S&P Global «Voice of the Enterprise: AI & ML» (n≈1,006, late 2024), abandonment 17%→42% [confirmed]; Informatica CDO Insights (2025) [vendor].
«Data isn’t ready» persistence. The «50–80% data-prep» lineage (2014 onward) and its partial measurement-artifact critique [confirmed origin; secondary critique].
ERP. Annual consultant ERP reports [vendor; confirmed].
Digital transformation. The «70% fail» framing (2018) and the precise 30/44/26 breakdown (2020, ~70 companies + 825-executive survey) [confirmed]; later ~88%-shortfall analysis [confirmed].
Dark data. The ~55%-dark survey (2019, repeated through the mid-2020s); exabyte-per-day volume estimates [vendor; confirmed].
Cost of poor data quality. The $8.8M→$12.9M→~$15M analyst series and the multi-trillion national figure [secondary/needs-primary] — weakest evidence; likely a restatement artifact.
Trust / answerability. Vendor analytics-confidence survey (down 18% relevance, 27% accuracy vs 2023) [vendor; confirmed]; «one-third trust their data» (2019) [secondary].
Enterprise architecture / CMDB. «~25% get meaningful CMDB value» [needs-primary]; «~60% manual CMDB data inaccurate / ~80% CMDB projects fail» [needs-primary]; EA current/target-state documentation share [secondary/needs-primary]; US DoD Business Enterprise Architecture guidance on capability duplication [secondary/needs-primary].
SaaS sprawl. Identity-provider panel (≈101 apps/company, 2025; cross-suite duplication) [vendor; confirmed]; SaaS-management panel (high-200s to ~305; per-capability redundancy 14.2 / 9.9 / 9.5; underestimate ≈1.7× apps, ≈3× spend; IT-owned 23%→13%; ~49–54% licence use; ~84% apps / ~74% spend outside IT) [vendor; confirmed]; managed-app panel (130→112→106) [vendor; confirmed]; management-platform panel (~371 apps/org, ~half unmanaged) [vendor].
ITAM / cloud / FinOps. ITAM visibility series (complete visibility 47%→43%) and cloud-waste survey (~29%) [vendor; confirmed]; industry-consortium financial-operations survey [consortium; confirmed] — the most neutral longitudinal source.
Technical debt. Consultancy technical-debt research (20–40% of estate value; 60% of CIOs report a 3-year rise; ~half of modernisation programmes failed to reduce it; multi-trillion projected productivity loss) [confirmed]; developer-time-on-debt survey [secondary].
Software-architecture decay (firewalled). Change-entropy and architectural-decay-prediction literature [firewalled; needs-primary] — code architecture, methodological precedent only.
Academic context. Solow (1987); Brynjolfsson (1993); Brynjolfsson & Hitt (2000); Acemoglu & Restrepo (2019, 2022); David (1990); Bainbridge (1983); Carr (2014); Strauss & Star (articulation work) [academic].

Status of the whole: a synthesis of the public evidence as it stands. All vendor figures are to be refreshed; all needs-primary, folklore and firewalled items are to be verified, down-weighted or removed before external publication. The answerability-decay claim and the recovery coefficient ρ are not evidenced here — they await the field measurement specified in Paper 5.

This work is produced by the AI Integrity Management working group at The Integral Management Society, a Swiss non-profit association bringing together senior specialists from adaptive systems, complex systems, artificial intelligence, mission-critical operations and governance. It is a preliminary-evidence attachment to the Human Intelligence Debt measurement programme and documents a present-day deficit and non-convergence, not proven decay.