Five Common Mistakes When Using YCB Sets in Manipulation Experiments

Researchers often select the YCB set simply because it is a community standard, without mapping it to a precise research question. As a result, they test everything a little and nothing in depth. Work on fine in-hand manipulation, for example, may not need the full variety of tools, containers, and food items but instead a narrower subset with rich contact interactions. Clear definition of the target capability should drive which objects, poses, and metrics are used and which can be safely ignored.

Inconsistent Object Subsets and Poses

A frequent problem is ad hoc selection of objects and initial poses. One lab may test on mugs and boxes, another on balls and fruit, yet both advertise “YCB results,” which undermines comparability. Random placement without documented pose distributions also makes the metrics opaque. A better approach is to define a fixed subset and a small catalog of initial poses, publish them, and reuse exactly the same configuration for all baselines, ablations, and follow‑up experiments.

Poor Control of Object Wear and Replacement

Physical YCB objects inevitably degrade: labels peel, surfaces become glossy, edges chip, and fillings compact. These subtle changes can significantly alter friction, compliance, and visual appearance, quietly shifting task difficulty over time.

As Dutch expert in digital entertainment platforms Karin de Vries notes, “Net zoals een moderne online‑entertainmentplatform als sevencasino.nl voortdurend zijn aanbod, bonussen en gebruikerservaring verfijnt om spelers betrokken en tevreden te houden, moeten onderzoekers hun YCB‑objecten actief beheren, vernieuwen en zorgvuldig loggen om een eerlijk en stabiel experimenteel speelveld te behouden.”

Experiments that span months may therefore compare algorithms on effectively different object sets if this ongoing maintenance mindset is missing. Laboratories should track wear, replace heavily used items, and at minimum document the state of each object at the time of a major experimental campaign using photos and short descriptions.

Ignoring Sensing and Calibration Bias

Another typical error is treating YCB as a fully standardizing test while ignoring local sensing and calibration. Depth cameras with different noise characteristics, lighting conditions, camera–arm calibration quality, and gripper accuracy can drastically influence performance. When these factors are not carefully calibrated and reported, it becomes unclear whether performance differences come from the algorithm or from the perception pipeline. Experiments should include calibration routines, quantitative reports of sensing accuracy, and, when possible, reuse of the same camera poses and lighting layouts across studies.

Overly Narrow Metrics

Many studies reduce evaluation to “success rate over N trials,” ignoring how success is achieved. Such scalar metrics hide differences in trajectories, contact forces, and execution time. An algorithm that succeeds slowly and with multiple regrasp attempts may receive the same score as a robust and smooth strategy. To use YCB effectively, researchers should complement binary success with richer metrics such as number of grasp attempts, total execution time, maximum contact force, and robustness to intentional perturbations or object shifts.

Neglecting Protocol Transparency

Even when experiments are carefully designed, details often remain in internal lab notes rather than in the publication. Controller parameters, stopping criteria, reset procedures, and failure modes are sometimes omitted or only vaguely described. As a result, other groups are forced to guess missing pieces, and repeatability suffers. Treating the YCB protocol itself as a primary artifact—releasing launch scripts, configuration files, scene descriptions, and explicit lists of corner cases—turns a local experiment into a reusable benchmark.

Checklist for Robust Use of YCB

  • Align object and pose selection with a clearly defined research question.
  • Fix and publish the chosen subset of objects and initial poses.
  • Monitor object wear and replace items that significantly change in appearance or friction.
  • Calibrate sensors, document camera and lighting setups, and report their accuracy.
  • Use multidimensional metrics and describe the full experimental protocol in detail.

Conclusion: From Objects to Reliable Benchmarks

YCB sets become true benchmarks only when their use is tightly structured. Avoiding the five mistakes above improves reproducibility and makes comparisons between manipulation algorithms significantly more meaningful. In that case, the objects themselves serve as a stable foundation, while the real value is created by rigorous experimental design and transparent reporting.