1. “It’s Hard to Eval” Is a Product Smell — Hamel Husain

  • Why read: Shows why building verifiability into your product interface makes AI evaluations much easier.
  • Summary: AI tools often fail because they only give final answers. Users can't check the work without redoing it. Instead, surface intermediate steps like SQL queries and assumptions. When interfaces let domain experts validate outputs naturally, you get better feedback. It improves the user experience, lowers annotation costs, and strengthens evaluation signals. Skip opaque outputs and build transparent ones.
  • Read more

2. 🎙️ How I AI: GLM-5.2 review & How Gusto built a new product line with Claude Code — Lenny's Newsletter

  • Why read: Argues that open-weight models are now good enough for complex production coding tasks.
  • Summary: GLM-5.2 rivals Claude Opus in reasoning, function calling, and handling million-token contexts. Self-hosting these models frees engineering teams from single-provider API pricing. Integrating GLM-5.2 into tools like Cursor and Claude Code takes less than an hour and enables autonomous bug hunting. The focus is shifting from model limits to cost, control, and avoiding vendor lock-in.
  • Read more

3. The Mythical Agent-Minute — Byrne @ The Diff

  • Why read: Updates the "Mythical Man-Month" for AI agents, outlining the new bottlenecks in software development.
  • Summary: AI development resembles traditional org charts: humans write specs for agents to execute. As agents quickly turn specs into prototypes, the bottleneck moves from coding speed to organizing ideas. Scaling agents, like adding human developers, creates friction in context sharing, tech debt, and parallel integration. Predicting timelines remains difficult. The main advantage is clearing backlogs of ideas that are now cheap to prototype.
  • Read more

4. When AI Costs More Than the Engineer — Tomasz Tunguz

  • Why read: Flags a coming shift where AI compute costs could exceed engineer salaries.
  • Summary: Top AI labs spend twice their payroll on compute. The broader market is following suit. The top 1% of companies now spend $89k per engineer on AI annually. As agents consume more tokens, costs will rise. If prices stay flat while demand grows, AI bills could hit $600k per engineer by 2029. If open-weight models drive prices down, costs might stabilize. Leaders need to model these scenarios before infrastructure expenses overtake payroll.
  • Read more

5. How to Build Vertical AI that Actually Works w/ Christophe Rimann (Founder & CEO of Camber) — Luke Sophinos

  • Why read: Explains why solving unglamorous, specific operational bottlenecks builds profitable AI companies.
  • Summary: Successful vertical AI platforms ignore general-purpose moonshots and target specific, expensive operational problems. Camber built its business on healthcare claims, a task too costly for humans but ideal for AI. In markets with clear budgets and pain points, product capability matters more than distribution. Handling edge cases and beating human performance in production forms the moat. Operating in stealth lets teams refine their automation before incumbents catch on.
  • Read more

6. How to build a cloud software factory - spec-driven development skills — Zach Lloyd

  • Why read: Outlines how to integrate specialized AI agents to handle ambiguous software development tasks.
  • Summary: Simple bugs suit single-shot agents, but complex features need spec-driven development. A "Spec agent" can bridge high-level roadmaps and executable code. It breaks ambiguous requirements into technical specs and decides what needs human review versus what is ready to build. The process uses clear labels and GitHub Actions to route tasks to the right agents. This setup scales engineering output while maintaining architectural standards.
  • Read more

7. How to design good ML experiments and actually learn from them — George Grigorev

  • Why read: A Poolside researcher explains how to test AI infrastructure ideas and build research taste.
  • Summary: Running ML experiments at scale requires discipline. Every run must answer a specific question. Without a clear goal, results are uninterpretable, even if metrics improve. Good research taste means finding simple ideas that are easy to test and debug before adding complexity. A fast training framework lets simple changes succeed where they might otherwise fail. Start with a hypothesis, set a baseline, and log everything.
  • Read more

8. The Economy of Tokens — Vipul Ved Prakash

  • Why read: Shows how standardized AI interfaces are unbundling the intelligence ecosystem.
  • Summary: The AI industry is standardizing around transformers, OpenAI-compatible APIs, and agent harnesses, much like the early PC ecosystem. This unbundling lets teams innovate at each layer independently, cutting the cost and time to deploy frontier models. As tokens gain utility, agents are moving from demos to production. Open-weight models drive down inference costs and spread advancements quickly. Intelligence is industrializing, moving beyond a few major labs.
  • Read more

9. The Pricing Model That Kills Its Vendors — nishil

  • Why read: Argues that outcome-based pricing will commoditize AI agent startups.
  • Summary: Executives want clear ROI from AI, making outcome-based pricing attractive for sales. But charging only for resolved tickets or leads removes lock-in and forces competitors into the same model. As models and compute become commodities, the agents do too. When outcomes are auctioned, value flows to the platforms verifying the work, not the agents performing it. Competing on outcomes accelerates a startup's decline.
  • Read more

10. MEMORY IS THE MOAT — Gokul Rajaram

  • Why read: Summarizes Nikesh Arora's view that context and memory define enterprise AI winners.
  • Summary: Consumers tolerate hallucinations; enterprises demand precision. Value accrues to systems that operate autonomously without human backup. The real moat is the memory and context a system accumulates about a business, which raises switching costs. Reaching this depth requires massive investment in proprietary training that cannot be rented via API. Successful adoption requires redesigning workflows around AI rather than attaching it to old processes. The next enterprise software wave will be opinionated and alter workforce structures.
  • Read more

11. THE SMOOTH EXPONENTIAL — Gokul Rajaram

  • Why read: Outlines Dario Amodei's views on AI progress and why software complexity is a dying moat.
  • Summary: AI progress looks sudden from the outside, but Anthropic's CEO views it as a smooth, predictable exponential curve. Because AI automates software creation, engineering complexity no longer protects businesses. Deep customer relationships, domain knowledge, and operational skills are what remain. Amodei suggests companies audit their moats now to lean into durable advantages. Success depends on aligning the business model with company values and surviving rapid scaling.
  • Read more

12. Many Small Steps for Robots, One Giant Leap for Mankind — Packy McCormick

  • Why read: Argues that physical automation will advance through incremental engineering, not single breakthroughs.
  • Summary: Silicon Valley often assumes a single compute breakthrough will unlock general robotics. But deploying robots in the real world exposes massive variability. Autonomy means handling edge cases: shifting light, unstructured terrain, and unpredictable humans. Robotics will advance like aerospace or self-driving, by slowly expanding operational domains. Automating the physical world requires incremental engineering, not a single miracle model.
  • Read more

13. I think too many people see AI labs as just... — CJ Hess

  • Why read: Frames the AI race as a supply chain and infrastructure challenge, rather than a contest of smart models.
  • Summary: Judging AI labs by their foundation models ignores the battle for compute, power, and distribution. The race resembles the game Factorio: success means managing bottlenecks across data centers, hardware, and products. OpenAI has distribution through ChatGPT but depends on Microsoft's infrastructure. Meanwhile, xAI taps into an empire of cars, robots, satellites, and data centers. The winner will be whoever connects these massive physical and digital components without collapsing.
  • Read more

14. you probably don't need an expensive sandbox — Nathan Flurry

  • Why read: Highlights the inefficiency of running coding agents in heavy Linux VMs and suggests lightweight WebAssembly alternatives.
  • Summary: Developers are giving AI agents full Linux environments to boost performance, which leads to expensive, resource-heavy sandboxes. These VMs reserve gigabytes of RAM and CPU for agents that often sit idle. Most agents only need a virtual operating system to emulate basic commands, files, and networking. WebAssembly environments like agentOS cut RAM usage by 47x while keeping security intact. These lightweight setups let teams run coding agents in their backends at a fraction of the cost.
  • Read more

15. The World as Model — Meltem Demirors

  • Why read: Explores how the future of physical infrastructure belongs to companies that can digitize and compress reality into actionable data.
  • Summary: Markets emerge when we find better ways to model the world, similar to how high-frequency trading exploited millisecond data. The physical economy of factories and grids is becoming an information system. With robotics and sensors breaking the language data bottleneck, the race is on to compress physical reality. Companies that map the industrial world accurately will control how physical assets are priced, traded, and insured. The advantage goes to those capturing predictive signals that incumbents cannot replicate.
  • Read more

Themes from yesterday

  • From Chat to Action: AI is moving from chatbots to agents. This requires verifiable outputs, updated pricing models, and new infrastructure to support software development and enterprise tasks.
  • Shifting Moats: Software complexity and model performance no longer protect businesses. The new moats are enterprise memory, proprietary physical data, and infrastructure scale.
  • Infrastructure Reality Check: Operators are finding that brute-force compute has limits. Robotics struggles with real-world variability, and coding agents waste money on heavy Linux VMs. Efficient engineering is beating raw power.