1. Migrating from Claude to DeepSeek — Lindy

  • Why read: A look at how to move LLMs in production to cut costs by 90% without breaking the user experience.
  • Summary: Lindy moved most agent traffic from Claude to DeepSeek v4 Flash, cutting inference costs by 90%. They warn that changing models is more than flipping a switch; naive swaps fail the vibe test in the real world. To fix this, they built offline evaluations to replay actual tasks and compare models. They found the inference provider matters as much as the model itself. Rolling out to internal traffic first caught bugs before they hit customers.
  • Read more

2. Our AI Agent now costs more than human junior bankers — Ori Eldarov

  • Why read: A look at the rising costs of AI agents and why model routing matters.
  • Summary: An internal AI agent's monthly run-rate hit $35K, outpricing human junior bankers. This forced the team to abandon their frontier-only approach and focus on cost-efficiency. Model routing will likely become standard: expensive frontier models for complex reasoning and cheaper models for routine tasks. Because AI capability remains uneven, model bills often add to headcount costs instead of replacing them.
  • Read more

3. How to win Agent-led Growth. — Jay

  • Why read: How to optimize product discoverability now that AI agents make decisions.
  • Summary: Nearly 40% of consumers start searches with AI instead of Google, meaning Agent-led Growth is replacing SEO. To win, you need Agent Engine Optimization (AEO) so models recommend you, and Agent Experience (AX) so bots can onboard easily. Improving AEO means getting mentioned across the web before training runs, building dense data assets for RAG, and optimizing for AI discovery tools. Ranking first on Google isn't enough anymore; agents need to recommend you by default.
  • Read more

4. 13 signals you're inside a Supercompany — Greg & Taylor

  • Why read: How to tell if a company has actually integrated AI into how it works.
  • Summary: A "Supercompany" defaults to AI optimism and evaluates every task for automation. Signs include naming agents, rejecting bad AI first drafts, and using terms like orchestration and inference budgets. Employees share AI skills, and documents are written in markdown for agents to read. If you don't see these signals, the organization is probably falling behind on AI.
  • Read more

5. How to build a cloud software factory - the automatic triage skill — Zach Lloyd

  • Why read: How to build a semi-automated software development life cycle using agents.
  • Summary: A "cloud software factory" uses agents to handle issue triage, specification, coding, and verification. The author shows a workflow where a triage agent reviews incoming issues and sends automatable tasks to an implementation agent. This uses basic skills and loops instead of locking into one platform, so it works with existing issue trackers and cloud environments. You measure success by the percentage of issues the system implements up to the code review stage.
  • Read more

6. thoughts on why mcp didn't work, what's next — Rhys

  • Why read: Why the Model Context Protocol (MCP) fell short and why CLI tools work better for AI agents.
  • Summary: Early MCP servers were limited by security concerns and small toolsets, making them less useful than traditional CLIs. Giving agents a bash tool proved much stronger, letting them install tools, chain commands, and filter output on the fly. MCP adds friction like client restarts, while CLIs rely on decades of development. The future is in harnesses that support MCP, CLI, API, and GraphQL without locking you in.
  • Read more

7. The Rebel Alliance — Nick Grossman

  • Why read: Why a decentralized AI ecosystem will outcompete vertically integrated tech giants.
  • Summary: As AI shifts from chatbots to agents, the underlying architecture is moving from vertically integrated models to multi-layered systems. The "Rebel Alliance" is a growing ecosystem of open-weight models, orchestration tools, and specialized layers that distribute compute. This decentralized approach scales better than proprietary models trying to own everything. Agent web traffic now beats human traffic, meaning these composed agents will soon be part of every online experience.
  • Read more

8. Can you trust what Claude Code is about to do? — Jordan Crawford

  • Why read: How to review AI-generated code plans when you aren't technical enough to read the code.
  • Summary: Non-technical operators often rubber-stamp AI execution plans because they look like a wall of code. To fix this, the author uses "Plain Plan Mode," which forces the AI to explain its actions in ninth-grade English. It uses clear symbols for costs, warnings, and decisions, plus an "honesty rule" to make sure numbers match up. By approving the method instead of a black box, operators can safely manage coding agents.
  • Read more

9. The Future of Inference — Spencer Farrar

  • Why read: Why asynchronous inference is the missing piece for long-running AI agents.
  • Summary: The current AI inference stack is built for real-time chat. That makes it expensive and fragile for multi-turn agent workflows. Demand is now shifting toward semi-synchronous agents that think and code over minutes or hours. Making this compute cheap unlocks background coding and deep research. New orchestration platforms treat inference as a schedulable workload, which maximizes hardware use and cuts costs.
  • Read more

10. The AI era requires a different kind of experimentation. — Elena's Growth Scoop

  • Why read: Why old product experimentation playbooks no longer work in the AI era.
  • Summary: Traditional experimentation focuses on minor tweaks and fast wins. In the AI era, this doesn't work because product development speed and user interactions have changed. Teams need to stop A/B testing small design tweaks and start placing bigger bets on solving core complexity. You have to adopt this shift to make actual progress on AI-native products.
  • Read more

11. [AINews] OpenAI reports median internal Codex output tokens grew 56x in Research, 32x in Customer Support, 27x in … — AINews

  • Why read: Data on how fast AI usage is growing in non-engineering roles at OpenAI.
  • Summary: OpenAI's internal metrics show token consumption surging outside of coding. Over six months, median usage jumped 56x in Research, 32x in Customer Support, and 27x in Engineering. This shows how organizations adopt AI when given unlimited access. Even tech workers start off underusing AI before hitting a steep inflection point.
  • Read more

12. Scaling Laws, Carefully — Lilian Weng

  • Why read: How scaling laws govern compute, data, and model size allocation.
  • Summary: Scaling laws show that training loss drops predictably on a power-law curve as you increase model size, dataset size, and compute. This helps optimize how you spend compute during training. The slope of this curve seems tied to the specific problem domain, not the model's architecture. You have to understand these dynamics to estimate what it takes to build frontier models.
  • Read more

13. The Age of Discovery — Markus J. Buehler

  • Why read: How AI is moving from summarizing text to generating new scientific discoveries.
  • Summary: Early AI compressed and repeated existing knowledge. Now, discovery AI generates its own data by proposing hypotheses, running simulations, and checking them against physical laws. This requires systems that revise themselves when new evidence contradicts what they know, treating their world model as a hypothesis. This shift will speed up breakthroughs in fields like materials science.
  • Read more

14. Everything is a Pipeline — Technically

  • Why read: A mental model that frames different technical architectures as simple data pipelines.
  • Summary: You can understand many complex technical systems by viewing them as pipelines. Whether it's a RAG setup, a data engineering architecture, or frontend compilation, the basics are the same. If you know how data is ingested, cleaned, and curated in one area, you can figure out other technical workflows. This mental model makes it easier to navigate tech.
  • Read more

15. GREAT BOARD MEETINGS VS NOT — Gokul Rajaram

  • Why read: The difference between useful and destructive board meetings.
  • Summary: A great board meeting focuses on the single most critical, uncomfortable topic the company faces. This focus highlights vital issues and drives success, even if preparation is stressful. Bad board meetings drag on, burn management's time on bloated presentations, and slow the company down. Founders need to drop exhaustive reporting and focus on specific problems to get value from their boards.
  • Read more

Themes from yesterday

  • The rising cost of agents: As AI agents handle longer, more complex tasks, inference costs are spiking. This is pushing teams toward model routing and asynchronous inference.
  • Agent-first infrastructure: The tech stack is shifting away from real-time chat toward multi-layered systems like the "Rebel Alliance" and "cloud software factories."
  • New ways to manage AI: With agents doing more autonomous work, operators need better ways to check their output. Solutions range from offline evaluations for model migrations to plain-English execution plans.