Computer Vision for Objects used in Group Work: Challenges and Opportunities
Changsoo Jung, Sheikh Mannan, Jack Fitzgerald, Nathaniel Blanchard

TL;DR
This paper introduces FiboSB, a challenging 6D pose dataset for group work with small objects, evaluates state-of-the-art methods, and improves object detection accuracy for collaborative educational scenarios.
Contribution
The work presents a new dataset, FiboSB, and benchmarks current 6D pose estimation methods, highlighting their limitations and improving object detection with fine-tuning of YOLO11-x.
Findings
Current algorithms struggle with small objects in group settings.
Fine-tuning YOLO11-x significantly improves detection accuracy.
FiboSB provides a challenging benchmark for future research.
Abstract
Interactive and spatially aware technologies are transforming educational frameworks, particularly in K-12 settings where hands-on exploration fosters deeper conceptual understanding. However, during collaborative tasks, existing systems often lack the ability to accurately capture real-world interactions between students and physical objects. This issue could be addressed with automatic 6D pose estimation, i.e., estimation of an object's position and orientation in 3D space from RGB images or videos. For collaborative groups that interact with physical objects, 6D pose estimates allow AI systems to relate objects and entities. As part of this work, we introduce FiboSB, a novel and challenging 6D pose video dataset featuring groups of three participants solving an interactive task featuring small hand-held cubes and a weight scale. This setup poses unique challenges for 6D pose because…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Hand Gesture Recognition Systems · Human Pose and Action Recognition
