![[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor](https://assets.flightcast.com/V2Uploads/nvaja2542wefzb8rjg5f519m/01K4D8FB4MNA071BM5ZDSMH34N/square.jpg)
从伯克利机器人实验室到OpenAI 2017年Dota时代的实习,再到在GPT-4o、o1和o3上实现强化学习(RL)突破,如今又领导Cursor的模型开发——Ashvin Nair的履历堪称全面。
我们在NeurIPS 2025上采访了Ashvin,深入探讨了OpenAI推理团队的内部故事(剧透:团队从十几人扩张到300多人),为什么2022年时IOI金牌看似触手可及,但当o1真正实现时却未能改变世界,强化学习为何无法泛化到训练分布之外(这意味着需要通过产品与模型的协同设计,将经济上有用的任务纳入分布),RL研究时代(2017-2022)的深层教训——大部分成果未能落地是因为社区过度拟合了基准测试,Cursor如何凭借每两小时更新一次策略的持续学习能力以及产品-模型协同设计,让工程师保持专注而非陷入多任务切换的ADHD地狱,以及他预测的下一个范式转变:拥有无限记忆的持续学习——模型只需经历一次(一个bug、一次错误、一种用户模式)便永不忘却,在不超载容量的情况下将数百万部署令牌存入权重。
好的,这是为您生成的播客内容摘要:
本次采访在NeurIPS 2024现场进行,嘉宾是前OpenAI O1/O3团队成员、现Cursor机器学习负责人Ashwin。他分享了自己从机器人学博士到投身大语言模型研究的独特路径,深入探讨了OpenAI内部如何孕育出O系列推理模型,并展望了AI智能体与机器人技术的未来。
行动号召:Ashwin代表Cursor发出邀请,欢迎对代码数据、奖励模型以及产品-模型共同设计感兴趣的人才加入。
From Berkeley robotics and OpenAI's 2017 Dota-era internship to shipping RL breakthroughs on GPT-4o, o1, and o3, and now leading model development at Cursor, Ashvin Nair has done it all.
We caught up with Ashvin at NeurIPS 2025 to dig into the inside story of OpenAI's reasoning team (spoiler: it went from a dozen people to 300+), why IOI Gold felt reachable in 2022 but somehow didn't change the world when o1 actually achieved it, how RL doesn't generalize beyond the training distribution (and why that means you need to bring economically useful tasks into distribution by co-designing products and models), the deeper lessons from the RL research era (2017–2022) and why most of it didn't pan out because the community overfitted to benchmarks, how Cursor is uniquely positioned to do continual learning at scale with policy updates every two hours and product-model co-design that keeps engineers in the loop instead of context-switching into ADHD hell, and his bet that the next paradigm shift is continual learning with infinite memory—where models experience something once (a bug, a mistake, a user pattern) and never forget it, storing millions of deployment tokens in weights without overloading capacity.