Publications

You can also find my publication list from my Google Scholar profile.

Jump to Research Topics:

Code LLM & Agents Long-Horizon Planning RAG & Reasoning Text Diffusion AI Scientist All Publications

Code LLM & Agents

Building intelligent coding assistants and autonomous agents that understand and generate code

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, Yizhe Zhang, ICLR 2026, 2026

[ paper ] [ code ]

Abstract

Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. We trained a 7B model on 130B code tokens and proposed coupled-GRPO, a novel reinforcement learning sampling scheme. Our work achieved +4.4% on EvalPlus and demonstrated how diffusion models can reduce dependence on autoregressive bias during code generation.

778 GitHub stars - Masked diffusion for code generation with Coupled-GRPO, achieving +4.4% on EvalPlus

Training Software Engineering Agents and Verifiers with SWE-Gym

Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, Yizhe Zhang, ICML, 2025

[ paper ] [ code ]

Abstract

We introduce SWE-Gym, a new training environment containing 2,438 real-world Python tasks, each with an executable codebase, unit tests, and natural language specifications. Fine-tuning language model-based software engineering agents on this dataset achieves up to 19% absolute gains in resolve rate on SWE-Bench Verified and Lite test sets. We further explore inference-time scaling via verifiers trained on agent trajectories, reaching state-of-the-art results for open-weight agents: 32.0% and 26.0% on the respective benchmarks. SWE-Gym, trained models, and agent trajectories are publicly available to support future research.

602 GitHub stars - Training framework for software engineering agents with real-world GitHub tasks

Executable Code Actions Elicit Better LLM Agents

Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji, ICML, 2024

[ paper ] [ code ]

Abstract

Large language model (LLM) agents have demonstrated remarkable capabilities in automating complex tasks across diverse domains. However, most existing approaches rely on generating natural language actions or constrained predefined action spaces, which can be ambiguous or inflexible for real-world applications. We propose CodeAct, a framework that enables LLM agents to express and execute actions through executable Python code. This approach offers several advantages: (1) Python's expressiveness allows agents to combine multiple primitive actions flexibly, (2) code execution provides deterministic and verifiable action outcomes, and (3) the structured nature of code facilitates better error handling and debugging. We evaluate CodeAct on diverse interactive tasks spanning web browsing, database querying, and embodied control. Our experiments show that CodeAct agents consistently outperform both natural language and predefined action baselines, achieving state-of-the-art results while being more sample-efficient and robust to distribution shifts.

CodeAct agent achieves state-of-the-art on diverse interactive tasks using executable Python code

All-Hands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al., ICLR, 2024

[ paper ] [ code ]

Abstract

Recent advances in large language models (LLMs) have enabled AI systems to perform increasingly complex software engineering tasks. However, building generalist AI agents that can operate effectively across diverse real-world software development scenarios remains challenging. We present All-Hands (OpenHands), an open platform designed to enable the development and evaluation of AI agents that can act as generalist software developers. All-Hands provides a unified framework for agents to interact with software development environments through standardized actions, observations, and sandboxed execution. The platform supports diverse agent architectures and enables systematic evaluation on comprehensive benchmarks spanning code generation, debugging, issue resolution, and repository understanding tasks. We demonstrate that agents built on All-Hands can effectively tackle real-world GitHub issues and compete with state-of-the-art proprietary systems while being fully open-source. Our platform enables the research community to collaboratively advance towards more capable and generalizable AI software developers.

65.8k GitHub stars - Open platform enabling AI agents to perform complex software engineering tasks

Long-Horizon Planning

Enabling LLMs to perform complex, multi-step reasoning and planning over extended sequences

LaDi-RL: Latent Diffusion with Reinforcement Learning for Test-Time Scaling

Yizhe Zhang, Haoqiang Kang, Jie He, Navdeep Jaitly, arXiv, 2026

Yizhe Zhang

Publications

Jump to Research Topics:

Code LLM & Agents

Long-Horizon Planning

RAG & Reasoning with Continuous Tokens

Text Diffusion Models

Coding-Based AI Scientist

All Publications (Chronological)

Preprint

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2012