I'm a fifth-year robotics Ph.D. student at Rutgers University in New Brunswick, advised by Prof. Abdeslam Boularias. I am currently working on combining the latest
generative models, such as LLMs, VLMs, and VDMs, with robot manipulation to achieve general robotic manipulation abilities.
At Rutgers I've worked on LLM/VLM-driven Manipulation: UniAff, A3VLM, LGMCTS; Diffusion-based manipulation: DAP; LLM-driven Scene
understanding: OVSG, OVIR-3D; Dynamic Scene Reconstruction: Mono-STAR,
STAR-non-prior. I also participated in a series of work on scalable robot learning framework, including ARP, VKT.
In 2024, I was a research intern at ByteDance Foundation Seeds, working on combining video diffusion models with long-horizon manipulation tasks.
In 2023, I was a research intern at MERL, working on contact-rich robot manipulation in collaboration with Dr. Siddarth Jain. Our paper, InsertOne, was accepted to IROS 2024.
In 2022, I was an Applied Scientist intern at Amazon Lab126, working on the SLAM system for the Astro robot.
Previously, I received an M.S. in Robotics and an M.S. in Mechanical Engineering from the University of Michigan, Ann Arbor, advised by Prof. Chad Jenkins. I received my
B.S. in Mechanical Engineering and Mathematics from Tsinghua University, Beijing, where I worked with Prof. Chuxiong Hu.
I have a broad interest in various aspects of robotics, and my research covers perception, planning, and control of robots. Some papers are highlighted.
We propose the Chunking Causal Transformer (CCT), which extends the next-single-token prediction of causal transformers to support multi-token prediction in a single pass. We
evaluate ARP across diverse robotic manipulation environments, including Push-T, ALOHA, and RLBench, and show that it outperforms the state-of-the-art methods in all tested
environments, while being more efficient in computation and parameter sizes.
Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a
comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation.
We propose an Articulation-aware Vision Language Model that is able to located the task-related articulation structure and affordance based on language task description.
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form
text-based queries.
We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological
change under a unified framework.
We present the first real-time system capable of tracking and reconstructing, individually, every visible object in a given scene, without any form of prior on the rigidness of the
objects, texture existence, or object category.
Transparent objects are prevalent across many environments of interest for dexterous robotic manipulation. GlassLoc classifies graspable locations in space informed by a Depth
Likelihood Volume (DLV) descriptor.
We propose a gated recurrent unit (GRU) neural network prediction and compensation (NNC) strategy for precision multiaxis motion control systems with contouring performance
orientation.
We propose to modularize the
complicated driving policies in terms of the driving attributes,
and present the parallel attribute networks (PAN), which can
learn to fullfill the requirements of the attributes in the driving
tasks separately, and later assemble their knowledge together.
Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it
includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.