AI Builders Digest — 2026-06-29

X / TWITTER

Swyx

Swyx highlighted a useful measurement shift for open-model evaluations: if closed APIs and open models have very different dollar-per-token economics, eval reporting should compare thinking budgets by inference dollars, not only by token count. He also shared an impromptu AI Engineer preshow floor tour and AMA.

Swyx 提醒了一个很实际的评测口径：当 closed API 和 open model 的 token 成本完全不在一个量级时，推理能力不应只按 token 数比较，也应按 inference dollar 比较。他还分享了 AI Engineer 会前现场 tour 和 AMA。

Links:
- https://x.com/swyx/status/2070949306060931312
- https://x.com/swyx/status/2070971772548366788

Thibault Sottiaux, Codex and ChatGPT at OpenAI

Thibault Sottiaux posted a dense Codex update: long threads now behave more smoothly, navigation and settings search are easier to use, zoom changes no longer break UI positioning, Slack pastes preserve Markdown formatting, and large text pastes no longer freeze the interface. The signal is that Codex is being hardened around real daily usage friction, not just core model capability.

OpenAI 的 Thibault Sottiaux 发布了一组 Codex 改进：超长线程更顺滑，导航和设置搜索更易用，缩放不再导致 tooltip、dialog、menu 等错位，复制到 Slack 能保留 Markdown，长文本粘贴也不再卡死。这里的信号是 Codex 正在补真实高频使用中的摩擦，而不只是堆模型能力。

Links:
- https://x.com/thsottiaux/status/2071071289247244481
- https://x.com/thsottiaux/status/2071077932244570112
- https://x.com/thsottiaux/status/2071089307062837744

Peter Yang

Peter Yang pushed back on a common autonomy framework: if a problem has been burning for days, waiting until a higher "level" before telling others can make the situation worse than simply surfacing the issue early and solving it together. He also shared his Hermes health-check workflow, which pulls from Withings, Fitbit, Google Health, an MCP server, and a mobile fitness app he vibe-coded.

Peter Yang 反驳了一类常见的自主性分级框架：如果问题已经燃烧了好几天，等到所谓更高级别再上报，可能比一开始就同步并共同解决更糟。他还分享了 Hermes 健康周报工作流，数据来自 Withings、Fitbit、Google Health、一个 MCP server，以及他 vibe-coded 的移动健身 app。

Links:
- https://x.com/petergyang/status/2071058953115767275
- https://x.com/petergyang/status/2070906940352520477

Guillermo Rauch, Vercel CEO

Guillermo Rauch warned that frontier-model cybersecurity systems can be powerful on both defense and offense. His practical recommendation was to run deepsec or similar harnesses with current frontier models, because companies that remain unaware of latent vulnerabilities may be exposed if equivalent offensive capabilities spread.

Vercel CEO Guillermo Rauch 提醒，frontier model 驱动的 cybersecurity 能力既可用于防御，也可能用于进攻。他的实践建议是用 deepsec 或类似 harness 跑当前 frontier models，因为一旦类似 offensive capability 扩散，不了解自身潜在漏洞的公司会暴露在更高风险下。

Links:
- https://x.com/rauchg/status/2071047674187714830
- https://x.com/rauchg/status/2070982746080715052
- https://x.com/rauchg/status/2071085680017773046

Aaron Levie, Box CEO

Aaron Levie argued that token-cost optimization in enterprise AI is not a generic prompt trick. The valuable layer is the applied AI company that deeply understands workflows, context, business process, evals, UX, and adoption, because that layer can deliver more intelligence per dollar for a specific use case.

Box CEO Aaron Levie 认为，企业 AI 的 token 成本优化不是通用 prompt 技巧。真正有价值的是 applied AI company 这一层：深度理解 workflow、context、business process、eval、UX 和落地 adoption，从而让企业在具体场景里用同样预算买到更多 intelligence。

Link:
- https://x.com/levie/status/2070937863806751154

Matt Turck, FirstMark Capital VC

Matt Turck used the history of smart glasses to make a market-timing point: Google, Microsoft, Meta, Apple, and now Snap have all tried different framings for the category, but mainstream demand remains unproven. The builder lesson is that AI alone may not fix a product category whose core behavior has not crossed the social and utility threshold.

FirstMark Capital 的 Matt Turck 用 smart glasses 的历史做了一个市场时机判断：Google、Microsoft、Meta、Apple 到 Snap 都用不同叙事尝试过这个品类，但主流需求仍未被证明。对 builder 的启发是，AI 不一定能拯救一个还没跨过社交接受度和实用性门槛的产品形态。

Link:
- https://x.com/mattturck/status/2070972014945243622

Zara Zhang

Zara Zhang shared a sharp example of AI-native building leverage: she went from barely knowing GitHub to 10k GitHub followers in a year, while openly saying she still cannot write code by hand. Her framing is that the real skill is connecting technology to user problems, solving her own pain points, and telling clear product stories.

Zara Zhang 分享了一个 AI-native builder 的典型杠杆案例：一年前还不太懂 GitHub，现在 GitHub 已经 10k followers，同时她坦诚自己仍然不能手写代码。她强调的能力不是传统 coding，而是把技术连接到用户问题、解决自己的痛点，并把产品故事讲清楚。

Links:
- https://x.com/zarazhangrui/status/2070982013822333007
- https://x.com/zarazhangrui/status/2070982170219593904
- https://x.com/zarazhangrui/status/2071116793234813272

PODCASTS

Training Data: "Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin"

The Takeaway: Engram's bet is that the next frontier for useful AI is not just bigger models or longer context windows, but models that continuously internalize a team's evolving context into weights.

Dan Biderman and Jessy Lin describe Engram as a lab focused on memory and continual learning. Their core claim is that context engineering, RAG, tool use, and long prompts are useful but incomplete: companies need models that understand their workflows the way a long-tenured employee does. That means deciding which knowledge should stay external and which patterns should be internalized through training, adapters, LoRAs, supervised fine-tuning, RL, or distillation.

The most interesting economic argument is token compression. If a model has deeply learned a company's priorities, people, tools, and workflow conventions, it may answer in hundreds of tokens what a generic model would need tens or hundreds of thousands of retrieved context tokens to reconstruct. Engram is therefore not just selling memory as recall, but memory as lower inference cost, better domain behavior, and faster adaptation during the several-month gap before frontier models absorb a new capability.

核心 takeaway：Engram 押注的不是更大的模型或更长的 context window，而是能把团队不断变化的 context 持续内化到 weights 里的模型。

Dan Biderman 和 Jessy Lin 把 Engram 定位为研究 memory 与 continual learning 的实验室。他们的核心判断是：context engineering、RAG、tool use、长 prompt 都有价值，但不完整。企业真正需要的是像资深员工一样理解内部 workflow 的模型。这要求系统判断哪些知识应该外部化，哪些模式应该通过 training、adapter、LoRA、SFT、RL 或 distillation 被模型内化。

最值得关注的是经济账。如果模型已经深度学习了公司优先级、人员关系、工具链和 workflow 习惯，它可能用几百个 token 回答一个通用模型需要数万甚至十几万 context token 才能拼出来的问题。所以 Engram 卖的不是简单记忆检索，而是更低推理成本、更好的领域行为，以及在 frontier model 尚未普遍掌握某类新能力前的快速适配能力。

Link:
- https://www.youtube.com/watch?v=aiR7F4jqjXY

Generated through the Follow Builders skill: https://github.com/zarazhangrui/follow-builders