Datamodels for Imitation

TL;DR

Robotics has amassed ever larger and more diverse datasets to train generalist robot policies via imitation. However, while imitation learning (IL) is a powerful paradigm for training robots to perform complex tasks, the performance of IL algorithms is highly sensitive on the data it's trained on. In this work, we introduce DataMIL, a novel data selection method that leverages datamodels to select high-quality task-aware datasets for IL. DataMIL estimates how each training sample influences the policy performance on a given task and selects samples that leads to the maximum improvement, thus enabling end-to-end policy-aware data seleciton. We demonstrate the effectiveness of DataMIL on 60+ simulation and real world tasks, most notably selecting relevant data from Open X-Embodiment datasets, showing significant improvements in performance compared to existing data selection methods, even for novel embodiments, not present in the prior data.

What are datamodels?

Datamodels is a framework which tries to answer the question -- how would the output of a model change if we had trained on a different dataset? In other words, datamodels provide a way to directly measure how the presence/absence of each training sample in the training of our model would affect its output without actually training the model on the new dataset. While there are several ways of estimating datamodels, we focus on regression and metagradient-based datamodels, which assign a scaler influence score to each training sample based on how much it affects the model's output. These scores can then be used to select the most relevant samples or filter out the most harmful ones for a given task.

DataMIL: DataModels for Imitation Learning

Datamodels have found several applications in the fields of NLP and computer vision, but need several key modifications before they can be applied to robotics. DataMIL (Datamodels for Imitation Learning) provides a recipe to adapt datamodels for the robotics datasets, providing a tractable optimization objective in place of costly rollouts and several modifications to reduce the noise in estimation, improving the quality of the selected data.

1. Proxy Metric

Validation loss over a few target task demos as proxy for costly rollouts

2. Clustering

Temporal clustering to reduce estimation noise

3. Co-Training with Target

Minimizing distribution shift by co-training with target data

Summary: Data Selection with DataMIL

Data selection with DataMIL is a two step process:

Estimate datamodels. We first cluster the training samples into trajectories or sub-trajectories and estimate datamodels on them with our proposed target metric as the proxy
Select data for policy training. The datamodels provide a scaler score (influence) to each training sample which indicates how positively or negatively they influence our target metric (which in our case is the validation loss over a few target task demos). Using these scores, we can select the top x% of the samples to train our policy on.

Finally, we employ a co-training recipe and train our final policy by uniformly sampling from the selected and the target task dataset.

OXE Results

We evaluate DaMIL on over 60 tasks spanning both simulation and real-world settings. In the real world, we use the Open X-Embodiment dataset as the prior and test on four target tasks (shown above). As illustrated in the results below, DaMIL consistently outperforms existing dataset selection methods across tasks with diverse characteristics. Most notably, it successfully selects relevant data for a completely new embodiment—Tiago—by identifying useful samples from datasets collected with different robots, such as the Google Robot and WidowX. We also extend our evaluation to a multitask setting, where the target set includes multiple tasks, and show that DaMIL can effectively retrieve data that supports all of them. For a detailed analysis, please refer to the full paper.

What data does DataMIL select?

We qualitatively find 3 interesting insights about the data selected by DataMIL:

Distribution of selected data. Below we show the distribution of datasets selected by DataMIL and some representative baselines on Tiago-Sink and Franka-Pouch tasks. We find that the data selected by DataMIL typically spans several datasets, while similarity based baselines selects most of their data from a single dataset. We hypothesize that since there is no data that exactly matches the target task, the selected data must not only be relevant but general, so as to enable positive transfer in capabilities and not make the policy overfit to a single type of domain.

DataMIL

Action Retrival

Selected dataset distribution for Tiago-Sink task

Behavior Retrival

Type of embodiment selected. DataMIL is able to select useful data for a completely new embodiment in the Tiago-Sink task. Even though the datasets selected seem visually quite different, sampled from datasets such as RT-1, BC-Z and Bridge, they still represent the essence of the target task -- robots operating on a table top from an ego-perspective. For baselines, even when the target embodiment is present in prior data (eg. Franka-Pouch task), the selected data often comes from other embodiments possible due to them putting more weight on the scene and distractors when computing similarity. In contrast, DataMIL is able to select datasets from the correct embodiment if present in the prior dataset.

DataMIL

Flow Retrival

Selected dataset distribution for Franka-Pouch task

Behavior Retrival

Top and bottom samples. Interestingly when we visually inspect the data selected by DataMIL (shown below), we find that the highest and lowest ranked samples typically look alike. This is in line with observation in computer vision where most useful data looks very similar to the most harmful, albiet with different labels. Similarly in robotics, similar states can have very different action distributions, and while some of these actions might help reduce the policy loss on the target data, the others might lead to a large deviation, making them harmful for final policy learning.

Franka-Ball

Franka-Pouch

Tiago-Sink

Droid-Multitask

BibTeX

      
        @article{dass2025datamil,
          title={DataMIL: Selecting Data for Robot Imitation Learning with Datamodels},
          author={Dass, Shivin and Khaddaj, Alaa and Engstrom, Logan and Madry, Aleksander and Ilyas, Andrew and Mart{\'\i}n-Mart{\'\i}n, Roberto},
          journal={arXiv preprint arXiv:2505.09603},
          year={2025}
        }