1. Building the most AI-pilled engineering team in the world — Lenny's Newsletter

  • Why read: How engineering leadership changes when AI boosts coding output 8x.
  • Summary: Fiona Fung manages teams building Claude Code and Cowork at Anthropic, where engineers now ship 8x more code. The bottleneck has moved from writing syntax to verifying outputs and scoping bigger projects. Managers now oversee both humans and agents while defending quality bars. Context-switching remains an unsolved problem. As development costs plummet, engineering culture needs to adapt, encouraging teams to take larger risks and make new mistakes.
  • Read more

2. General Agent: A Self-Evolving, Synthetic Agent Environment — primeintellect.ai

  • Why read: How a synthetic environment uses a two-player game to train AI agents continuously.
  • Summary: Prime Intellect open-sourced "general-agent," an environment that exposes agents to thousands of tools. A Synthesizer agent designs tasks, and a Solver agent validates them, creating a self-evolving training corpus. The tasks run on stateful database operations and use semantic checks for accuracy. Harder tasks seed the next wave, naturally increasing the difficulty. This trains agents on complex, multi-step tool use without requiring humans to write the tasks.
  • Read more

3. The Self-Improving Loop: a 300-agent swarm on Kimi K2.6 — Movez

  • Why read: How open-source models coordinate parallel swarms to beat expensive proprietary options.
  • Summary: A 300-agent swarm running on the open-source Kimi K2.6 model executed 4,000 coordinated steps for high-quality research. After each run, the system saves reusable skills and tighter constraints to improve over time. A frontier model like Opus 4.8 acts as the verifier, blocking garbage data from becoming a saved skill. This separates the learning engine from the verifier, avoiding the limits of rigid workflows. It shows how to build a swarm that compounds its intelligence every time it runs.
  • Read more

4. Write a spec, not a prompt — Movez

  • Why read: A one-line prompt lets an agent swarm decide everything, which burns credits and returns junk.
  • Summary: When deploying a swarm, treat the agents like contractors, not genies. Write a spec instead of a prompt. A good spec outlines what to collect, what is valid, allowed sources, output formats, and conflict resolution rules. The agent then uses the spec to break down tasks and structure its own workflow. This spec is the most valuable artifact in the loop. It eventually seeds a reusable skill, helping the swarm build on its knowledge.
  • Read more

5. Escaping the chain-of-thought trap: What is next for LLM reasoning — Ben Dickson

  • Why read: Chain-of-Thought (CoT) prompting spikes compute costs without actually improving reasoning.
  • Summary: Uber and Meta are capping AI compute spend as CoT token costs spiral. CoT forces models to output step-by-step text, but research shows these tokens often mimic human formatting rather than actually reasoning. Models can get the right answer with fake reasoning steps, or fail despite a perfect logical trace. This text bottleneck drains budgets, slows inference, and creates an illusion of intelligence. Sustainable AI scaling requires moving past CoT to better reasoning mechanisms.
  • Read more

6. Evals: the strategic IP that will define the next era of AI — Garrett Lord

  • Why read: Evaluations are the strategic asset needed to move AI agents from pilot to production.
  • Summary: Enterprise AI often stalls in pilots because leaders can't measure accuracy or trust the system in production. To fix this, organizations have to measure AI with strict evaluation frameworks. A solid eval suite scores tool usage, captures judgment nuances, and grades tasks against objective rubrics in a simulation. Domain experts need to curate historical data and edge cases to pressure-test models. By treating AI as predictable engineering, companies turn their evals into proprietary IP.
  • Read more

7. Each function needs a distinct AI strategy — Garrett Lord

  • Why read: Enterprises need a segmented approach to AI, changing strategies by business unit and function.
  • Summary: Enterprise departments need specific AI tools, not a single monolithic solution. A mid-sized insurer might buy an off-the-shelf coding agent, but build custom models to protect their underwriting logic as IP. Customer service might need vertical solutions optimized for RAG. Regardless of the tool, real setup, maintenance, and evaluations are required. In enterprise AI, continuous performance management depends on these customized strategies.
  • Read more

8. Introducing Sakana Fugu: A full multi-agent orchestration system — Sakana AI

  • Why read: Sakana Fugu provides multi-agent orchestration behind a single API endpoint to avoid vendor lock-in.
  • Summary: Sakana Fugu orchestrates a pool of swappable agents to handle complex tasks, hiding the complexity from the user's code. It manages model selection, delegation, and verification on the fly, matching the performance of frontier models. This setup acts as a hedge against centralized power and export controls. Fugu offers a low-latency version for daily tasks and an "Ultra" version for heavy research. Users can also disable specific agents in the pool for data compliance.
  • Read more

9. CODING IS NO LONGER THE BOTTLENECK — Gokul Rajaram

  • Why read: AI code generation shifts engineering focus from writing syntax to scoping products.
  • Summary: As AI agents write and ship code at volume, writing syntax is no longer the bottleneck. The hard work has shifted to verifying code, scoping ambitious projects, and maintaining team culture. Features that used to be "too hard" can now be delegated to agents, making timid scoping a mistake. Engineering managers are moving from tracking lines of code to evaluating product impact. Teams need to hire creative builders with product sense and systems experts who can verify complex AI outputs.
  • Read more

10. Manage Through the Agent — Gokul Rajaram

  • Why read: Engineering leaders are using coding agents to track organizational output.
  • Summary: With developers shipping more code via AI, managers need agents to keep visibility. By running an AI in every repo and Slack channel, managers can track what shipped and how it landed. The agent becomes a chat interface for exploring codebase changes, tracing bugs, and checking organizational health. This turns a code review tool into a management dashboard, letting leaders stay connected to the output of hundreds of developers without micromanaging.
  • Read more

11. Specs as the Review Framework — Gokul Rajaram

  • Why read: AI agents do better code reviews when they have explicit definitions of what good looks like.
  • Summary: Engineering teams should check detailed specs into their repos alongside the code. When specs are kept current, AI agents use them to grade every code change and pull request. This is the next step for test-driven development: the agent generates and enforces the test criteria. By writing the standard down once, the AI automatically verifies it on every change. This keeps engineering quality high, consistent, and cheap.
  • Read more

12. The Cursor prehistory — Ali Partovi

  • Why read: Early talent identification and mentorship led to Cursor's $60 billion acquisition by SpaceX.
  • Summary: Before Cursor, its founders were identified by Neo Scholars college scouting. Michael Truell and his co-founders showed strong coding skills and entrepreneurial drive early on. Investors acted as "people capitalists," offering mentorship and resources years before the startup existed. Early interactions showed Truell's mix of calm confidence, humility, and technical skill. This long-term bet on individuals led to the largest VC-backed startup acquisition in history.
  • Read more

13. A few thoughts on the open vs closed model landscape — Freda Duan

  • Why read: Models must be either the absolute best or incredibly cheap to survive market consolidation.
  • Summary: The model market is splitting into frontier models and ultra-cheap options like DeepSeek. Everything in between is getting squeezed out. Model routing is now required to assign tasks and maximize performance per dollar. While cheaper models lower the floor, base token pricing is misleading due to cache hit rates, stability, and retry costs. Demand for frontier intelligence proves that cheap models aren't "good enough" for complex reasoning yet. Total cost of ownership depends more on token efficiency and infrastructure than the listed price.
  • Read more

14. Is AI talking about your product in the right way? — Kyle Poyar

  • Why read: How companies are influencing LLMs to ensure accurate product representation.
  • Summary: Users now rely on LLMs for product discovery, making brand representation critical. Companies like Clay, Glean, and Rippling are developing strategies to influence the data feeding these models. An LLM using outdated information or hallucinating can damage a brand. Marketers need to optimize content for AI, treating LLMs like a new search engine that needs its own SEO. Accurate representation requires constant monitoring and aligned structured data across all properties.
  • Read more

15. The AI Shift in GTM: New Leverage, Same Accountability — Scott Gibbs

  • Why read: AI changes Go-To-Market operations, requiring centralized data and upskilled RevOps teams.
  • Summary: AI brings efficiency to GTM strategies, but human leadership and cross-functional alignment are still required. To use AI well, companies need an Enterprise Data Warehouse as a single source of truth instead of fragmented vendor platforms. Revenue Operations teams have the highest upside for AI integration but need new roles like Data Engineers and AI Systems Leaders. Centralizing data prevents shadow AI and enables predictive models for deal risk. Embracing AI drives growth in bookings and retention while enhancing human success.
  • Read more

Themes from yesterday

  • Coding to Scoping: AI agents write syntax at volume, moving the engineering bottleneck to verifying outputs, writing specs, and setting larger project scopes.
  • Cost and Evaluation: Raw token generation via Chain-of-Thought is getting too expensive. Scaling requires proprietary evaluation frameworks and model routing to get the best performance per dollar.
  • Swarm Intelligence: Teams are coordinating massive, parallel swarms of specialized agents. These swarms use open-source models to iteratively improve and avoid vendor lock-in.