🦁 Zoo3D: Zero-Shot 3D Object Detection at Scene Level 🐼

GitHub Repository

Upload a video or a set of images to create a 3D reconstruction and run open‑vocabulary 3D object detection from your text labels. The app builds a point cloud and draws colored wireframe bounding boxes for the detected objects.

Getting Started:

Upload Your Data: Use "Upload Video" or "Upload Images". Videos are sampled at 1 frame/sec.
Enter Text Labels (Required): Provide one or more labels separated by semicolons, e.g. chair; table; plant.
Detect: Click "Detect Objects". The app will reconstruct the scene (if needed) and then run detection.
Threshold (Optional): Tune the Detection Threshold (0–1). Higher = fewer, more confident detections.
Visualize & Download: A single 3D view shows the point cloud and colored wireframe boxes. A legend maps colors to labels. You can download the GLB.

Notes: Reconstruction is triggered automatically on first run. If no labels are provided, you'll see an error: Please enter at least one text label (separated by ';').