Microsoft Research
チャンネル登録者数 34.4万人
1156 回視聴 ・ 27いいね ・ 2025/03/03
Speakers: Diganta Misra
Host: Sanchit Ahuja
In the fast-evolving world of software libraries, code generation models are struggling to keep pace. Most existing benchmarks focus on static, version-agnostic code predictions, failing to capture the true complexity of adapting to frequent updates and maintaining compatibility with multiple library versions. To address this gap, we introduce GitChameleon, a novel dataset featuring 116 Python code completion tasks, each tied to specific library versions and accompanied by executable unit tests. This dataset is designed to rigorously evaluate the ability of large language models (LLMs) to generate version-specific code that is both syntactically correct and functionally accurate. Our findings are revealing: even state-of-the-art models like GPT-4o achieve a pass@10 of just 39.9% (43.7% with error feedback), highlighting significant limitations in their ability to adapt to versioned code. In this talk, I’ll explore why today’s LLMs, while impressive, still fall short in the dynamic landscape of evolving software libraries. By examining these challenges, we hope to spark a conversation about how to build more adaptable, reliable code generation tools for the future.
コメント
関連動画

Press Secretary Karoline Leavitt Briefs Members of the Media, May 19, 2025
68,601 回視聴 - 3 時間前 に配信済み

Trump Thanks Qatar for Their Generous Jet Bribe & Accidentally Does a Socialism | The Daily Show
6,653,765 回視聴 - 6 日前

LLM Course – Build a Semantic Book Recommender (Python, OpenAI, LangChain, Gradio)
178,051 回視聴 - 3 か月前
再生方法の変更
動画のデフォルトの再生方法を設定できます。埋め込みで見れるなら埋め込みで見た方が良いですよ。
現在の再生方法: education
コメントを取得中...
コメントを取得中...