๐ค Hand Action Prediction with VITRA
Upload a landscape, egocentric (first-person) image containing hand(s) and provide instructions to predict future 3D hand trajectories.
๐ Steps:
- Upload an landscape view image containing hand(s).
- Enter text instructions describing the desired task.
- Configure advanced settings (Optional) and click "Generate 3D Hand Trajectory".
๐ก Tips:
- Use Left/Right Hand: Select which hand to predict based on what's detected and what you want to predict.
- Instruction: Provide clear and specific imperative instructions separately for the left and right hands, and enter them in the corresponding fields. If the results are unsatisfactory, try providing more detailed instructions (e.g., color, orientation, etc.).
- For best inference quality, it is recommended to capture landscape view images from a camera height close to that of a human head. Highly unusual or distorted hand poses/positions may cause inference failures.
- It is worth noting that each generation produces only a single action chunking starting from the current state, which does not necessarily complete the entire task. Executing an entire chunking in one step may lead to reduced precision.
๐ Input
โ๏ธ Prediction Settings
๐ Select Hands:
โ๏ธ Instructions:
Left hand:
Right hand:
๐ Final Instruction:
Left hand: Put the trash into the garbage. Right hand: None.
Left hand: Put the trash into the garbage. Right hand: None.
1 9
1 50
1 15
๐ฌ Output
๐ Examples
๐ Click any example below to load the image and instruction