1. The Anatomy of a Reliable AI Agent — J.B.

  • Why read: A framework for shifting from simple prompting to building system-level harnesses for AI agents.
  • Summary: The real bottleneck in agent development is no longer the models, but the harnesses that wrap them. A reliable agent needs a system to enforce workflows, use tools, run checks, and log traces. This prevents agents from skipping steps, forgetting goals, or producing outputs that look done but fail in practice. Multi-layered harnesses constrain work, route subtasks, and create verification points so agents can complete tasks without constant human correction.
  • Read more

2. UTILITY PER TOKEN: Why the Labs have a serious enterprise problem. — carried_no_interest

  • Why read: A critique of how enterprise incentives clash with AI value creation, focusing on "utility per token" instead of raw capability.
  • Summary: Large AI labs sell pure intelligence, but enterprise employees use it to buy back their own time without increasing their output. In large corporations, knowledge workers automate tasks and capture the economic surplus themselves, resulting in near-zero utility per token. In early-stage startups, incentives are tied to equity, aiming every token at real business problems. The barrier to AI ROI is the incentive structure of the user, not the model's capability.
  • Read more

3. Anatomy of a harness: building a coding agent that can run for hours — Sam Bhagwat

  • Why read: A look at the technical requirements for upgrading a fragile AI agent into a resilient, long-running process.
  • Summary: Naive agent loops fail in production due to context rot, getting stuck in loops, or losing instructions during compaction. Building a coding agent that runs autonomously for hours requires thread persistence and crash recovery. The agent needs to actively maintain a live task list and accept steering through an interruptible queue. Tools should pause for human approval on high-stakes decisions and fall back to plain-text policies for headless CI execution. A functional harness manages its own context and holds its decisions when no one is watching.
  • Read more

4. Building a Good Vertical Agent — Peter Wang

  • Why read: Practical advice on optimizing agent accuracy by treating an LLM's context window as a layered cache.
  • Summary: High-accuracy vertical agents require careful context management rather than dumping files and tools into a prompt. The challenge is minimizing the context spent per task to keep prompts focused and avoid cache misses. Developers should structure agent context like CPU memory tiers (L1, L2, L3), keeping frequent information instantly accessible. The system should only pull from slower, broader context tiers for rare tasks. This directly trades information compression against speed and accuracy.
  • Read more

5. The Log Is the Agent — Ishaan Sehgal

  • Why read: A reframing of AI agents that defines them by their durable event history instead of their runtime or model.
  • Summary: An AI agent is its append-only log of events. This log contains every user input, tool call, tool result, and model output, alongside static session definitions. Treating the log as the source of truth lets any executor reconstruct the agent's exact state and resume work instantly. This database-like architecture makes agents fault-tolerant, easy to debug, and simple to audit. As long as state transitions are durably written, the agent survives crashes and shifts across execution environments.
  • Read more

6. Inside AI-pilled engineering teams: Five lessons for scaling without losing the plot — Bessemer

  • Why read: Leadership strategies for managing code generation and agentic workflows in engineering organizations.
  • Summary: Adopting AI tools creates new pressures around code quality, security, and comprehension debt. To maintain speed without sacrificing quality, leaders are decoupling shipping velocity from release risk using two-tiered release systems. They are also standardizing infrastructure with LLM proxies to control costs and routing, while letting developers experiment with specific tools. This approach helps teams evaluate actual productivity gains, manage token budgets, and prepare for agentic development.
  • Read more

7. Of Termites & Tokens — tomcritchlow.com

  • Why read: An argument that AI should multiply organizational throughput instead of just making existing workflows cheaper.
  • Summary: Most companies fall into a margin trap, using AI to replace human effort in static processes. The real opportunity is treating the organization like a biological colony to expand workflow capacity. Instead of one competitor analysis per quarter, a swarm of agents can monitor the market continuously. This turns discrete projects into continuous organizational senses. Embracing distributed local action lets companies scale their impact exponentially.
  • Read more

8. Beyond Tokenmaxxing — Kayvon Beykpour

  • Why read: A proposed metric for evaluating engineering productivity with AI coding agents.
  • Summary: Minimizing token spend ignores the value generated by AI tools. A better metric is "coding time," which measures the human-equivalent hours represented by code changes. This highlights the difference between code that is pushed and code successfully landed in production. While pushed code is growing rapidly, landed code is rising slower, showing that human judgment and review remain essential. AI tools act as force multipliers, widening the gap between top and average engineers.
  • Read more

9. How to reduce AI costs? — stepan

  • Why read: A breakdown of techniques to control AI inference bills without sacrificing performance.
  • Summary: Token-saving habits are insufficient for controlling costs at scale. Prompt caching is the most effective optimization, cutting costs by up to 90% on repeated inputs if stable content is positioned first. Batch APIs for non-urgent tasks save 50%. Enforcing strict output discipline is critical because output tokens cost more than input tokens. Static model selection and dynamic routing within the harness ensure expensive models handle planning while cheaper models handle execution. Tuning the model's effort level on reasoning tasks prevents compute waste.
  • Read more

10. lessons learned designing agentic experiences so far — — Kazden

  • Why read: Insights into designing UI and UX for non-linear agentic software.
  • Summary: Designing for AI agents means creating a flexible substrate for infinite user paths rather than strict linear flows. The process starts with markdown files defining intent and structure. Initial testing happens in terminal interfaces to refine interactions before building a visual UI. Surface-level polish, like tool call micro-interactions, only applies after choreography and simulated use cases prove valuable. The work resembles molding clay more than drawing blueprints.
  • Read more

11. A Comprehensive Guide to Model Routing — notdiamond.ai

  • Why read: Clarifies the difference between AI gateways and model routers for enterprise inference infrastructure.
  • Summary: Intelligent model routing is necessary to select the best and cheapest model per request. A gateway provides unified API access and billing, but a router makes active decisions based on prompt length, semantics, or quality predictors. This optimizes for cost, quality, and latency, preventing applications from wasting expensive tokens on simple tasks. Routing lets teams use a mix of models without hardcoding selections. Deploying a router is now a top priority for IT and finance teams.
  • Read more

12. The 5-Minute AI Exercise I’d Run With Every Team — Hiten Shah

  • Why read: An exercise to uncover and operationalize hidden AI skills within a company.
  • Summary: Companies should stop focusing on generic tool rollouts and look at how employees repeatedly correct AI outputs. Asking teams what outputs they fixed more than once a week uncovers gaps where human judgment is still required. These manual fixes indicate the model lacks a specific proprietary method or standard. Capturing the pattern behind these corrections lets teams build custom AI skills to solve workflow bottlenecks, turning daily frustrations into automated rules.
  • Read more

13. How we use LangChain to power Lumi, our e-commerce copilot for 180k+ merchants — Tadeo Donegana Braunschweig

  • Why read: An architectural case study on deploying a multi-agent supervisor system in an e-commerce platform.
  • Summary: The developers of Lumi moved from a single ReAct loop to a multi-agent architecture using LangGraph to support varied merchant requests. This supervisor-plus-specialists pattern ensures each sub-agent operates with a focused toolbelt and isolated context, preventing system prompt bloat. It improves routing precision and makes it easier to evaluate individual skills. The graph of sub-agents is constructed dynamically per request, letting the team gate features and run experiments without complex conditional edges.
  • Read more

14. Direct the ship, don’t row it — Sherif Mansour

  • Why read: A take on how AI expands the roles of Product Managers, Designers, and Engineers instead of merging them.
  • Summary: The narrative that AI will merge PMs, designers, and engineers into a single role is flawed for medium-to-large organizations. Instead of converging, distinct crafts are expanding their throughput and overlapping slightly. PMs prototype more easily and designers draft better strategies, but core responsibilities remain distinct. The practical approach is to use increased capacity to solve bottlenecks in your specific area rather than abandoning your craft to do someone else's job.
  • Read more

15. The Rise of the AI-Native Organization and AI Specialist — Eric Jing

  • Why read: A look at how treating AI as a collaborative workforce creates a new ceiling for individual output.
  • Summary: AI is pushing businesses into AI-native organizations where agents run alongside human employees. Successful platforms orchestrate models behind the scenes to deliver direct results instead of making users evaluate fragmented tools. This creates the AI specialist, a professional who manages tasks that previously required large teams by delegating to autonomous agents. Agents handle research and operational tasks, freeing humans to focus on higher-value decisions.
  • Read more

Themes from yesterday

  • From Models to Harnesses: The bottleneck in AI development has moved from model capabilities to the infrastructure (harnesses, routing, logging) that makes agents reliable.
  • Runaway Costs and Enterprise ROI: Leaders are trying to control inference budgets and translate pure token consumption into measurable business utility and landed code.
  • Expanding Organizational Throughput: Instead of cutting costs, AI is being used to expand workflow volume and build multi-agent systems.
  • Role Expansion vs. Convergence: AI enables cross-functional tasks, but mature organizations are seeing traditional roles expand their throughput rather than collapse into a single hybrid role.