Haonan Chang

I'm a fifth-year robotics Ph.D. student at Rutgers University in New Brunswick, advised by Prof. Abdeslam Boularias. I am currently working on combining the latest generative models, such as LLMs, VLMs, and VDMs, with robot manipulation to achieve general robotic manipulation abilities.

At Rutgers I've worked on LLM/VLM-driven Manipulation: UniAff, A3VLM, LGMCTS; Diffusion-based manipulation: DAP; LLM-driven Scene understanding: OVSG, OVIR-3D; Dynamic Scene Reconstruction: Mono-STAR, STAR-non-prior. I also participated in a series of work on scalable robot learning framework, including ARP, VKT.

In 2024, I was a research intern at ByteDance Foundation Seeds, working on combining video diffusion models with long-horizon manipulation tasks.

In 2023, I was a research intern at MERL, working on contact-rich robot manipulation in collaboration with Dr. Siddarth Jain. Our paper, InsertOne, was accepted to IROS 2024.

In 2022, I was an Applied Scientist intern at Amazon Lab126, working on the SLAM system for the Astro robot.

Previously, I received an M.S. in Robotics and an M.S. in Mechanical Engineering from the University of Michigan, Ann Arbor, advised by Prof. Chad Jenkins. I received my B.S. in Mechanical Engineering and Mathematics from Tsinghua University, Beijing, where I worked with Prof. Chuxiong Hu.

Email / CV / Scholar / Twitter / Github / LinkedIn

Research

I have a broad interest in various aspects of robotics, and my research covers perception, planning, and control of robots. Some papers are highlighted.

	Autoregressive Action Sequence Learning for Robotic Manipulation Xinyu Zhang, Yuhan Liu, Haonan Chang, Liam Schramm, Abdeslam Boularias RAL, 2025 (S.O.T.A on RLBench) github / arXiv We propose the Chunking Causal Transformer (CCT), which extends the next-single-token prediction of causal transformers to support multi-token prediction in a single pass. We evaluate ARP across diverse robotic manipulation environments, including Push-T, ALOHA, and RLBench, and show that it outperforms the state-of-the-art methods in all tested environments, while being more efficient in computation and parameter sizes.
	UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Qiaojun Yu, Siyuan Huang, Xibin Yuan, Zhengkai Jiang, Ce Hao, Xin Li, Haonan Chang, Junbo Wang, Liu Liu, Hongsheng Li, Peng Gao, Cewu Lu ICRA, 2025 project page / huggingface Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation.
	A3VLM: Actionable Articulation-Aware Vision Language Model Siyuan Huang, Haonan Chang*, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, Hongsheng Li CoRL*, 2024 github / arXiv We propose an Articulation-aware Vision Language Model that is able to located the task-related articulation structure and affordance based on language task description.
	Scaling Manipulation Learning with Visual Kinematic Chain Prediction Xinyu Zhang, Yuhan Liu, Haonan Chang, Abdeslam Boularias, CoRL, 2024 project page / arXiv / github We propose unified representation, i.e. Visual Kinematic Chain, to model different robotics tasks into a unified reprsentation for scalable training.
	LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement Haonan Chang, Kai Gao, Kowndinya Boyalakuntla, Alex Lee, Baichuan Huang, Harish Udhaya Kumar, Jingjin Yu, Abdeslam Boularias IROS, 2024 (Oral Presentation) project page / arXiv / github We combined LLM with Monte-Carlo Tree Search planner to solve exectuable semantic object rearrangement tasks.
	Insert-One: One-Shot Robust Visual-Force Servoing for Novel Object Insertion with 6-DoF Tracking Haonan Chang, Abdeslam Boularias, Siddarth Jain IROS, 2024 arXiv We propose a two-stage, visual servoing + forcing servoing algorithm for insertion task on novel objects.
	DAP: Diffusion-based Affordance Prediction for Multi-modality Storage Haonan Chang, Kowndinya Boyalakuntla, Yuhan Liu, Xinyu Zhang, Liam Schramm, Abdeslam Boularias IROS, 2024 arXiv / github We propose diffusion-based affordance prediction architecture to locate the interactable region within a multi-modality storage problem.
	Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric Jing, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias CoRL, 2023 project page / arXiv / github We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries.
	OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data Shiyang Lu, Haonan Chang, Eric Jing, Abdeslam Boularias, Kostas Bekris CoRL, 2023 project page / arXiv We propose OVIR-3D, a straightforward yet effective method for open-vocabulary 3D object instance retrieval without using any 3D data for training.
	Mono-STAR: Mono-camera Scene-level Tracking and Reconstruction Haonan Chang, Dhruv Metha, Shijie Geng, Abdeslam Boularias IROS, 2022 github / arXiv We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework.
	Scene-level Tracking and Reconstruction without Object Priors Haonan Chang, Abdeslam Boularias IROS, 2022 github / arXiv We present the first real-time system capable of tracking and reconstructing, individually, every visible object in a given scene, without any form of prior on the rigidness of the objects, texture existence, or object category.
	Geofusion: Geometric consistency informed scene estimation in dense clutter Zhiqiang Sui, Haonan Chang, Ning Xu, Chad Jenkins RAL, 2020 arXiv We propose GeoFusion, a SLAM-based scene estimation method for building an object-level semantic map in dense clutter.
	glassfusion: glassmetric consistency informed scene estimation in dense clutter Zheming Zhou, Tianyang Pan, Shiyu Wu, Haonan Chang, Chad Jenkins IROS, 2019 arXiv Transparent objects are prevalent across many environments of interest for dexterous robotic manipulation. GlassLoc classifies graspable locations in space informed by a Depth Likelihood Volume (DLV) descriptor.
	Deep GRU neural network prediction and feedforward compensation for precision multiaxis motion control systems Chuxiong Hu, Tiansheng Ou, Haonan Chang, Yu Zhu, Limin Zhu IEEE/ASME Transactions on Mechatronics, 2020 IEEE We propose a gated recurrent unit (GRU) neural network prediction and compensation (NNC) strategy for precision multiaxis motion control systems with contouring performance orientation.
	Toward modularization of neural network autonomous driving policy using parallel attribute networks Zhuo Xu, Haonan Chang, Chen Tang, Changliu Liu, IEEE Intelligent Vehicles Symposium, 2019 arXiv We propose to modularize the complicated driving policies in terms of the driving attributes, and present the parallel attribute networks (PAN), which can learn to fullfill the requirements of the attributes in the driving tasks separately, and later assemble their knowledge together.

Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.