1. AutoAgent: The first library for self-optimizing agent harnesses — Kevin Gu

  • Why read: Discover how agents are now autonomously outperforming manual prompt engineering and tool design.
  • Summary: AutoAgent is an open-source library that uses a "meta-agent" to autonomously improve task agents by tweaking prompts and adding tools through thousands of parallel simulations. It recently took the #1 spot on SpreadsheetBench and TerminalBench, beating out every other entry that was hand-engineered by humans. This marks a shift from manual "grid search" prompting to agentic self-optimization where the meta-agent reads failure traces and keeps only the improvements. The core discovery is that agents are often better at "seeing like an agent" and designing their own action spaces than human developers are.
  • Link: https://twitter.com/kevingu/status/2039843234760073341/?rw_tt_thread=True

2. LLM Knowledge Bases — Andrej Karpathy

  • Why read: Learn a high-signal workflow for personal research that prioritizes "compiled knowledge" over transient chat sessions.
  • Summary: Andrej Karpathy details his current workflow for building personal research wikis where an LLM acts as the primary compiler rather than just a chatbot. He indexes raw documents into markdown, uses an LLM to categorize concepts and write articles, and then views the entire structured output in Obsidian. This approach replaces "fancy RAG" with a simple directory of .md files that the LLM can navigate and "lint" for inconsistencies. By rendering answers as markdown files or slides rather than text blocks, his research "adds up" over time into a durable asset.
  • Link: https://twitter.com/karpathy/status/2039805659525644595/?rw_tt_thread=True

3. Google's 20-Year Secret Is Now Available to Every Enterprise — Jaya Gupta

  • Why read: Understand why the next era of enterprise value will be built on "decision traces" rather than just behavioral clicks.
  • Summary: While consumer giants like Netflix and TikTok grew by instrumenting every user click, B2B companies have historically lacked such compounding loops because enterprise decisions (negotiations, approvals) were hidden. AI is changing this by making "decision traces"—the reasoning behind why a discount was given or a clause was redlined—structured and queryable for the first time. Companies that treat these traces as disposable exhaust will lose to those who treat them as "organizational intelligence." The future of SaaS lies in capturing the missing layer between event and outcome to create a business context graph.
  • Link: https://twitter.com/JayaGup10/status/2039737982576636294/?rw_tt_thread=True

4. Lessons from 5 Billion Tokens — Teddy Riker

  • Why read: A sobering look at why AI "productivity" doesn't actually clear your task list—it just raises the quality bar.
  • Summary: After using 5 billion Anthropic tokens at Ramp, Riker observes that he still completes the same number of priorities each week because the "quality floor" has radically shifted. Product specs have evolved from text docs to full working branches in QA, and strategy memos are now stress-tested against AI stakeholder simulations before reaching humans. This creates a "lethal trifecta" where the output is higher fidelity, but the human verification remains the ultimate bottleneck. Operators should focus on this "human-in-the-loop" limit rather than simply trying to expand scope indefinitely.
  • Link: https://twitter.com/teddy_riker/status/2039649121812840796/?rw_tt_thread=True

5. Data Architectures For Tracing Harnesses & Agents — Aparna Dhinakaran

  • Why read: Infrastructure advice on moving AI data from 30-day SaaS "exhaust" to durable, queryable business assets.
  • Summary: Most companies make the mistake of treating agent traces like infrastructure logs (disposable after 15-30 days), but AI data is institutional knowledge. Dhinakaran argues for a unified data layer where traces, evaluations, and human annotations live in open formats like Parquet/Iceberg within your own data lake. This prevents "divergent copies" and allows for long-term pattern recognition that a 30-day retention window misses. By augmenting traces in place, organizations build a "business context graph" that serves as the training signal for future iterations.
  • Link: https://twitter.com/aparnadhinak/status/2039724128266334257/?rw_tt_thread=True

6. The Great Convergence — Nicholas Charriere

  • Why read: A strategic analysis of why every major tech player (Linear, OpenAI, Anthropic) is converging on the same "agent harness" product shape.
  • Summary: Tech is moving toward a "general harness" model where software is defined by a goal, a set of tools, and a model loop. Breakthroughs like Claude Code have proven that a smart looping agent generalizes across almost any computer-based task if the tools are right. This convergence marks the end of "feature-led" moats, as autonomy levels become a mere configuration setting. Because these harnesses are just code, they possess the unique ability to reflect on and improve themselves, scaling along a dimension of indefinite research progress.
  • Link: https://twitter.com/nichochar/status/2039739581772554549/?rw_tt_thread=True

7. An AI state of the union: Simon Willison — Lenny's Newsletter

  • Why read: Insights from a trusted developer on why November 2025 was the definitive inflection point for AI coding.
  • Summary: Simon Willison argues that November 2025 was the moment agentic engineering crossed from "mostly works" to "actually works," fundamentally changing how builders operate. He reveals that he now writes 95% of his code from his phone, leading to intense "mental exhaustion" by 11 a.m. due to the high-bandwidth decision-making required. The interview highlights that mid-career engineers are more at risk than juniors because the "mechanic" aspect of the job is being swallowed by agents. He emphasizes that "agentic engineering" is now a primary skill set, not a peripheral one.
  • Link: https://www.lennysnewsletter.com/p/an-ai-state-of-the-union

8. Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — Latent.Space

  • Why read: Deep dive into the "frontier after LLMs": long-running, interactive world models built using game engines.
  • Summary: The Latent Space podcast explores a new approach to World Modeling that uses game engines to bootstrap multiplayer, interactive environments for agents. Unlike Google's Genie 3, which suffers from "terrain clipping" and non-interactivity, these models focus on physics and causal reasoning. The discussion suggests that the next generation of AI will be trained not just on text, but on the ability to interact with and predict physical outcomes in simulated worlds. This has massive implications for robotics and autonomous systems that require a "common sense" understanding of reality.
  • Link: https://www.latent.space/p/moonlake

9. Bad Analogies: Not Every Money-Burning Company is Amazon — Not Boring

  • Why read: A critical economic perspective on why OpenAI's massive burn shouldn't be blindly compared to Jeff Bezos's strategy.
  • Summary: Packy McCormick deconstructs the common defense that OpenAI's lack of profitability is "just like early Amazon." He argues that Bezos's "calculated loss" was built on a capital-efficient flywheel (AWS/Retail) that generated massive cash flow to reinvest. In contrast, frontier AI labs face unprecedented, non-negotiable capital expenditures just to stay in the race. The essay warns that "money burning" is not a strategy in itself; it only works if it leads to a structural cost advantage that your competitors cannot replicate.
  • Link: https://www.notboring.co/p/bad-analogies

10. The Kano model in AI — Des Traynor

  • Why read: A quick but vital strategy check on how AI is accelerating the "delight-to-commodity" lifecycle.
  • Summary: Intercom's founder observes that the Kano Model—where new features move from delightful differentiators to performance criteria to hard requirements—is being compressed by AI. What used to take years (competitors copying a feature until it becomes a baseline) now happens in mere months. For product teams, this means that "AI features" are commoditizing at record speed, requiring a constant focus on deeper data moats and "decision traces" to stay ahead. The "window of delight" is effectively closing.
  • Link: https://twitter.com/destraynor/status/2039781547855770021/?rw_tt_thread=True

Themes from yesterday

  • The "General Harness" Consensus: Every major SaaS and AI player is converging on the agentic loop (Goal + Tools + Harness) as the core product primitive.
  • Trace Durability: A shift from treating AI logs as transient "debugging exhaust" to seeing them as the foundational "organizational intelligence" of the enterprise.
  • Quality Inflation: AI isn't clearing schedules; it's raising the "standard" of work from a text doc to a working prototype, moving the bottleneck to human verification.
  • Kano Compression: The lifecycle of a feature from "mind-blowing differentiator" to "boring requirement" has collapsed from years to months.