Adapting Code LLMs for Under-Explored Domains
The talk surveys why code language models degrade once they leave English, Python, and the benchmarks the field already optimizes for, and presents the data, modeling, and evaluation work (mHumanEval, TigerLLM, TigerCoder, MojoBench) that makes these under-explored settings measurable and improvable.