OopsieVerse

A Safety Benchmark with Damage-Aware Simulation for Robot Manipulation

University of Texas at Austin *Equal contribution; order by dice roll
Robotics: Science and Systems (RSS) 2026
Scroll
Overview

What is OopsieVerse?

What if your home robot finishes the job but breaks your kitchen in the process?

Robots keep getting better at handling everyday objects, but finishing the task isn't the whole story. A robot that picks up an egg and cracks it, or pours a glass of water and spills half of it, still isn't ready for a real home. Safety is the part that's missing, and today's simulators barely measure it.

OopsieVerse is a unified, damage-aware simulation framework for household manipulation. At its core, DamageSim turns physical signals like contact forces, heat, and liquid into measurable mechanical, thermal, and fluid damage, so the benchmark can score not just whether a robot finished the task, but whether it did so safely. It runs in both BEHAVIOR-1K and RoboCasa, and supports safer data collection, damage-aware imitation and reinforcement learning, and Vision-Language-Action safety evaluation.

Explore the work

Cite

BibTeX

@inproceedings{balaji2026oopsieverse,
  title={OopsieVerse: A Safety Benchmark with Damage-Aware Simulation for Robot Manipulation},
  author={Balaji, Arnav and Bahety, Arpit and Ambatipudi, Sriniket and Lam, Daniel and Xu, Junhong and Mart{\'\i}n-Mart{\'\i}n, Roberto},
  booktitle={Robotics: Science and Systems (RSS), 2026},
  year={2026}
}
Framework

DamageSim

DamageSim is our simulator-agnostic plugin that makes physical safety measurable by tracking object-centric “health.” It monitors simulator signals—such as contact forces, temperature, and liquid exposure—and converts them into mechanical (e.g., impact or compression), thermal, and fluid damage, which can be used as observations, rewards, or termination conditions. We instantiate it in RoboCasa (MuJoCo) and BEHAVIOR-1K (Omniverse) showcasing its consistency across simulators.

Mechanical Damage

Thermal Damage

Fluid Damage

Damage-Augmented POMDP Implementation with DamageSim

DamageSim is simulator-agnostic. we instantiate it in BEHAVIOR-1k (Nvidia Omniverse) and RoboCasa (MuJoCo) to demonstrate consistent safety measurement across different physics backends.

MuJoCoRoboCasa

OmniverseBEHAVIOR-1k

Benchmark

OopsieBench

OopsieBench is a suite of 32 household tasks in total (15 tasks shown in the grid below; hover a tile to see its name). The suite is designed to (i) expose policies to realistic, physically damaging failure modes in household manipulation, and (ii) make safety measurable by contrasting easy but risky strategies with safer ones that require more careful interaction (e.g., gentler contact, safer approaches, or avoiding hazards). The benchmark spans diverse scenes, objects, and damage modalities, is cross-platform (BEHAVIOR-1k and RoboCasa), and includes a dataset of safe and unsafe human teleop demonstrations for five tasks.

Pour Water
Ignite Wood
Add Firewood
Fill Bowl
Open Drawer
Nav to Table
Attach Camera
Pick Scrub
Shelve Item
Open Single Door
Lift Egg
Place Plate
Put Cup in Microwave
Turn on Microwave
Wipe Counter

 Click a tile to enlarge.

Experiments

What can you do with OopsieVerse?

1 Safety-aware data collection

OopsieVerse can provide real-time safety feedback during data collection, enabling data collectors to collect safer data. This feedback is provided in two ways: (1) damage-based coloration and (2) health bars of tracked objects. The health bars are particularly helpful when damage occurs outside the current camera view (e.g., the wineglass falling behind the counter).

2 Imitation learning (IL)

Many real-world (or sim-collected) demonstration datasets mix safe and unsafe behavior. Using DamageSim, we can automatically flag trajectories that incur damage and filter them out to construct a dataset of only safe demonstrations. In the videos below, the top row shows an IL policy trained on the full dataset (including unsafe trajectories), while the bottom row shows a policy trained on the filtered safe-only demonstrations.

Pour Glass

Add Firewood

Lift Egg

Shelve Item

Wipe Countertop

Imitation Learning Results

3 Reinforcement learning (RL)

DamageSim lets us turn physically-grounded damage signals into a shaping reward that penalizes harmful interactions. Combined with the task reward, this yields a learning signal that encourages agents to complete the task while minimizing damage. We evaluate this idea in three settings: (i) pure RL on the Place Plate task with and without the damage reward (PPO), (ii) behavior cloning on Move Glass of Water task followed by PPO finetuning with the damage reward, and (iii) fine-tuning a Shelve Item policy—initially trained on the full IL dataset—using an additional damage-reward using DSRL. The video below shows the training process for (iii), illustrating that over time the agent learns to reduce damage while maintaining high task performance.
Reinforcement Learning Results

4 VLA evaluations

We evaluate the safety of modern manipulation policies by benchmarking GR00T, a state-of-the-art Vision-Language-Action model from NVIDIA. Even when GR00T achieves high task success, it frequently causes damage, resulting in much lower safe success rates and degraded environment health. This underscores the limits of evaluating VLAs solely by task completion and highlights the need for benchmarks and learning signals that explicitly account for harmful interactions.
VLA Evaluation Results

5 Sim2Real transfer

Our ultimate goal is safe real-world robot behavior, so we test whether policies trained with OopsieVerse transfer safely to real world. Compared to a baseline IL policy trained on all data, a damage-aware IL policy trained on health-filtered episodes behaves more cautiously in the Pour Water and Shelve Cereal Box tasks. It goes futher away from the laptop when pouring water and learns to make space for the cereal box by pushing non-fragile objects like the crackers box instead of fragile objects like glass bottles. Our experiments suggest that explicit damage signals obtained via OopsieVerse can translate into improved safety on hardware.

Pour Water (Safe + Unsafe Data)

Pour Water (Safety-filtered Data)

Shelve Cereal Box (Safe + Unsafe Data)

Shelve Cereal Box (Safety-filtered Data)