Grasp-Anything

Grasp-Anything [1] dataset is synthesized from foundation models. With 1 million samples and 3 million diverse objects, it significantly surpasses previous datasets, facilitating zero-shot learning in both simulated and real-world settings.

Data Pipeline

data_pipeline

We use ChatGPT to generate an expansive array of scene descriptions. Next, we transform these scene descriptions into images using Stable-Diffusion [2], and label them with a pretrained RAGT-3/3 model. These grasping poses are post-proceeded to ensure high quality.

Demonstration

Samples

grasp-anything-sample

References

[1] An Dinh Vuong, Minh Nhat Vu, Hieu Le, Baoru Huang, Binh Huynh, Thieu Vo, Andreas Kugi, Anh Nguyen. Grasp-anything: Large-scale grasp dataset from foundation models. In ICRA, 2024.
[2] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Omme. High-resolution image synthesis with latent diffusion models. In CVPR, 2020.