We collect real-world robot demonstrations/trajectories on a UR7e using Meta Quest 3 controllers, store demonstrations in readable logs, and replay motion with conditional execution based on sensory input.
Goal, Motivation & Applications
Modern robot learning policies (including imitation learning and diffusion models) are constrained by a scarcity of diverse, high-quality training data, particularly for 7-DoF manipulation tasks. To address this, we developed a novel, low-cost teleoperation interface that combined Meta Quest 3 Virtual Reality headset with the UR7e robotic arm allowing a human operator to perform manipulation tasks naturally in 3D space through a headset while recording synchronized joint and gripper data to train modern Imitation Learning policies.
Modern robot learning (e.g., Diffusion Policies, Open X-Embodiment) is bottlenecked by the lack of high-quality demonstration data. Traditional collection methods such as kinesthetic teaching or "puppet" arms are either dangerous, unintuitive, expensive, or produce noisy and inconsistent results for complex tasks.
The Core Problem: This project bridges two distinct coordinate systems: the unconstrained Euclidean space of a VR controller and the kinematic constraints of an industrial robot. The key technical challenge was developing a "Logical Home Pairing" algorithm (Absolute Pose Mapping) to translate user hand poses into safe and accurate robot trajectories in real-time without risking singularities or sudden jumps.
Beyond collecting training data for AI, this teleoperation architecture has immediate utility in several high-impact domains:
Criteria, Choices & Trade-offs
To act as a viable data collection platform, the system had to meet three critical performance benchmarks:
We prioritized spatial intuition and operator ergonomics to ensure high-quality data collection. With this in mind we implemented an Absolute Pose Mapping strategy rather than Relative Velocity Control.
Absolute mapping "locks" the robot's end-effector to the user's hand, enabling intuitive spatial understanding. Velocity control often suffers from drift which often makes precise stacking tasks frustrating for the user.
Workspace vs. Precision: By mapping 1:1, the user is physically limited by their own arm reach. We sacrificed the ability to move infinitely by resetting the controller and its pose in exchange for higher precision and safety in a fixed workspace.
Our architecture moves beyond academic theory to solve the "Data Bottleneck" in industrial robotics. We prioritized portability and fault-tolerance to create a deployable solution.
We replaced capital-intensive motion capture labs (e.g., OptiTrack) with consumer inside-out tracking. This drastically reduces the "cost per datum" and allows for decentralized data collection without specialized facility infrastructure.
Hard-coded automation fails on edge cases. Our teleoperation stack ensures reliability by providing a remote human fallback interface, allowing operators to resolve failures in hazardous or unstructured environments without stopping the line.
Full Pipeline
The system implements a 72Hz synchronous control loop. To ensure intuitive handling, we decouple the absolute coordinate systems using a "Logical Home" calibration. This allows the operator to map a comfortable hand position (Pvr) to the robot's safety home (Probot) instantly.
We provide users with the ability to record arbitrary teleoperation trajectories through a custom ROS 2 recording pipeline. Trajectory recording is initiated by launching the teleo_recorder ROS 2 node that we implemented, followed by invoking the service call ros2 service call /start_recording std_srvs/srv/Trigger "{}". Once triggered, the recording service begins capturing robot command messages published by the teleoperation stack (i.e., the command topic used to drive the robot during live control). These messages reflect the real-time motion generated during teleoperation and are logged continuously for the duration of the recording session. The resulting trajectories can then be replayed or processed downstream, enabling repeatable execution and offline analysis of teleoperated demonstrations.
We provide the option for the user to replay any trajectory they recorded by passing in the path to the .txt file to a ROS2 node that populates a job queue with the joint positions recorded at each 0.5 second sample and executes trajectories to reach each of those joint positions one by one in a smooth fashion to replay the entire trajectory.
Another option is for the user to run the trajectory replay continuously with color sensing. First, a ROS node must be run that loads the Intel Realsense D435i camera and publishes the detected color of objects to a ROS2 topic. For color detection, we filter colors using the HSV color spectrum, an adjustable distance (in our demo we used 0.75 meters to 1.1 meters), and adjustable area parameters to control the range of the size of objects that can be detected. Then a second ROS2 node must be run that pulls the information published by the camera, and once a color is detected, runs a full exection involving loading a trajectory .txt file that corresponds with the color of the object detected, populating a job queue with the joint positions recorded at each 0.5 second sample, and executing trajectories to reach each of those joint positions one by one in a smooth fashion to replay the entire trajectory. Once the trajectory has been replayed, the node returns to detecting colors and can replay another trajectory if another color is found.
Overall, our system successfully met our core design criteria. We demonstrated a complete end-to-end VR-based teleoperation pipeline from the Meta Quest 3 to the UR7e robot arm, including real-time control, trajectory recording, and reliable trajectory replay. This outcome closely aligns with our initial project goals and the scope refined after meeting with our TA.
In particular, the system achieved stable and intuitive teleoperation, allowing a human operator wearing the Quest 3 headset to control the UR7e’s end-effector motion in real time with low perceptual latency. The mapping between VR controller motion and robot motion was sufficiently smooth to enable precise manipulation behaviors, validating our design choice of using VR as a natural and expressive human–robot interface. Users were able to guide the robot through meaningful manipulation trajectories with minimal training.
Trajectory recording and replay functioned as intended and served as a strong indicator of system robustness. Recorded demonstrations could be replayed consistently, with the robot closely following the original motion paths. This confirms that our trajectory encoding, time synchronization, and playback solutions preserved the essential structure of the human demonstrations. These results make possible our broader motivation of using teleoperation as a high-quality data collection method.
One flaw in our system is that it doesn't have a safety net to prevent collisions if the human operator makes a mistake while controlling it. For example, if the human operator drops a controller the arm will go down and may crash into the table. To prevent this issue from hapening, we aim to add robot workspace bounding and proximity checks to prevent unsafe motions. Furthermore, instead of a basic color sensing pipeline, we aim to incorporate reinforcement learning to build upon the teleoperated demonstrations collected in this project. Rather than relying on task-specific object placement, learned policies will adapt the demonstrated trajectories to new object poses and environmental variations, improving robustness and generalization.
Roles & Contributions
Interested in robotic control, perception, and learning. Focused on designing robots that can replicate human manipulation skills.
Focused on control and learning algorithms for robotic arms and humanoids to perform difficult manipulation tasks.
Focused on machine learning applications for robotics, specifically control strategies for humanoid platforms.
Research interests lie in robot learning, specifically enabling robots to acquire sophisticated skills through data-driven approaches.
Interested in humanoid robots and LLMs. Focusing on learning-based policies for motion execution in simulation.
Documentation & Resources
Our comprehensive final presentation covering the complete development process, technical challenges, experimental results, and future work:
Complete technical overview including motivation, design decisions, implementation details, experimental results, and team contributions.
B.S.: Aaron Zheng, Akshaj Gupta, Kourosh Salahi, Ziteng (Ender) Ji
M.S.: Samuel Mankoff
Link:
https://github.com/samuel-mankoff/meceng206a-Final-Project
We would like to thank EECS 106A Course Staff for helping us throughout this project