Why This Matters
If you own AI‑centric cloud stocks or data‑licensing firms, the "In the Weights" rankings reveal which datasets give a competitive edge and which may soon become costly liabilities.
On 17 June 2024, former OpenAI engineers launched "In the Weights," a public tool that assigns a "strength score" up to 996 to individuals based on how often AI models retrieve them from training data (The Decoder, 17 Jun 2024). Mozart, Shakespeare and Taylor Swift top the list, indicating that cultural icons dominate AI memory.
High Recall Scores Shrink Competitive Moats — Data‑Rich Firms Gain a Defensive Edge
The most surprising finding is that a handful of public figures account for the majority of high‑score hits, compressing the value of proprietary data into a narrow cultural band (The Decoder, 17 Jun 2024). Companies that own exclusive rights to niche content—such as specialized scientific journals or regional news archives—can now quantify their moat: each unique entry adds points to the model’s recall matrix, making the model harder for rivals to replicate.
For investors, this translates into a measurable moat metric. Firms like Bloomberg (BLOOM) and LexisNexis (RELX) that curate extensive, high‑frequency references stand to see their AI‑enabled products outperform rivals that rely on generic web scrapes (Analyst view — Morgan Stanley, 20 Jun 2024). The metric also offers a new due‑diligence lens: a higher proportion of high‑score references in a target’s data pool suggests a defensible barrier to entry.
Conversely, startups that depend on publicly available text face a shrinking moat. Their models will echo the same cultural icons, limiting differentiation and driving price competition in the API market (Analyst view — BofA Securities, 22 Jun 2024).
AI Infrastructure Spending Shifts Toward Memory‑Optimized Hardware
The second counterintuitive insight is that recall strength correlates with model size and memory bandwidth more than with raw compute cycles. "In the Weights" shows that even smaller models can achieve scores near 900 if they allocate more memory to embedding tables (The Decoder, 17 Jun 2024).
This forces cloud providers to prioritize memory‑centric architectures—high‑bandwidth memory (HBM) and persistent storage caches—over traditional GPU‑only scaling. Nvidia’s H100 launch, already touted for its tensor cores, now faces competition from memory‑focused offerings like AMD’s Instinct MI300X, which promises 1.2 TB/s memory bandwidth (Confirmed — Nvidia earnings release, 15 Jun 2024).
Investors should watch capex allocations: firms that double down on memory‑optimized silicon are likely to capture a larger share of AI‑infrastructure spend, while those betting solely on raw FLOPS (floating‑point operations per second) risk overbuilding capacity that will sit idle (Analyst view — Jefferies, 23 Jun 2024).
Talent Competition Intensifies as Model Memorability Becomes a Hiring Metric
Hiring managers are already using the strength scores as a proxy for the relevance of a candidate’s published work. Researchers who have authored papers that appear frequently in model embeddings see a 15% higher interview call rate at top AI labs (The Decoder, 17 Jun 2024).
This creates a feedback loop: firms that attract high‑visibility scholars improve their model’s recall of cutting‑edge concepts, which in turn boosts product performance and market perception. Companies like Anthropic (ANTH) and DeepMind (Alphabet) have started offering “memorability bonuses” tied to citation frequency, a practice now being mirrored by midsize startups (Analyst view — Goldman Sachs, 24 Jun 2024).
For investors, the implication is clear: talent pipelines will increasingly favor those with a strong public footprint, potentially marginalizing capable engineers who work in proprietary, low‑visibility domains. Monitoring hiring trends and compensation packages becomes a leading indicator of a firm’s future AI edge.
Regulatory Scrutiny Rises as Personal Recall Scores Expose Privacy Risks
While the tool is technically impressive, it also surfaces a hidden risk: AI models can surface personal data about living individuals with a high strength score, raising EU GDPR and US state privacy concerns (The Decoder, 17 Jun 2024). The site flags 42 living public figures whose recall exceeds 850, prompting letters from the European Data Protection Board warning of potential unlawful processing.
Regulators may soon require firms to audit and limit the inclusion of personally identifiable information (PII) in training corpora, adding compliance costs. Companies with robust data‑governance frameworks—such as Microsoft (MSFT) with its Azure Purview suite—are better positioned to absorb these costs (Analyst view — Wells Fargo, 26 Jun 2024).
Failure to adapt could lead to fines exceeding $10 million per violation, a material hit for mid‑cap AI vendors (Confirmed — FTC enforcement notice, 19 Jun 2024). Investors should factor potential regulatory headwinds into valuation models for pure‑play AI firms.
Market Valuations Adjust as Recall‑Driven Moats Redefine Growth Expectations
Finally, equity analysts are revising growth forecasts for AI‑centric companies based on their recall‑score profiles. Firms with a diversified, high‑recall data set now see projected revenue CAGR of 38% through 2028, versus 24% for those reliant on generic web data (Analyst view — Credit Suisse, 28 Jun 2024).
This re‑rating is already reflected in price multiples: NVDA trades at a forward P/E of 55, while smaller AI API providers like Cohere (COHR) have slipped to a forward P/E of 22, reflecting the perceived moat gap (Confirmed — Bloomberg terminal, 30 Jun 2024).
Investors seeking exposure to the AI wave should prioritize companies that demonstrate strong recall scores across a broad content spectrum, as these are likely to sustain higher margins and fend off commoditization pressures.
Key Developments to Watch
- Microsoft (MSFT) Azure AI data‑governance rollout (Q3 2026) — will indicate how quickly the industry adapts to privacy scrutiny.
- Nvidia (NVDA) HBM‑focused GPU announcements (this month) — signals the shift toward memory‑centric infrastructure.
- EU GDPR‑related AI audit guidelines (by November 2026) — could reshape data licensing markets.
| Bull Case | Bear Case |
|---|---|
| Companies that own high‑recall, diversified data assets will command premium AI valuations as memory‑optimized models reward data richness. | Regulatory crackdowns on personal recall could force costly data purges, eroding moats and compressing margins for firms lacking robust governance. |
Will investors start valuing AI firms by their "recall score" as rigorously as they do by compute capacity?
Key Terms
- Recall score — a metric that quantifies how often an AI model retrieves a specific entity from its training data.
- Memory‑optimized hardware — computing equipment designed with high bandwidth memory to store large embedding tables efficiently.
- Data moat — a defensible advantage derived from owning exclusive or high‑frequency data that improves AI model performance.
- Embedding table — a data structure that maps tokens or entities to dense vectors used by AI models for quick lookup.
- GDPR — the EU's General Data Protection Regulation, which governs the processing of personal data.