Dataset Analysis

Analyzing the Statistical Properties of the Segmented Annotations

Category Distribution of Annotations

We compute the percentage of object instances in each category (and COCO super-categories). To display different number of categories in the same plot, we compute the accumulated frequency of all sorted categories so that all plots go from 0 to 100%. Put the mouse on the plot points to see the name of the category that represents and its frequency. We also show the global sizes of the databases in a table.
#Categories #Images #Instances
Pascal 20 2913 6934
Train+Val 1464+1449 3507+3427
SBD 20 11355 26843
Train+Val 8498+2587 20172+6671
COCO 80 123287 886284
Train+Val 82783+40504 597869+288415
In the three databases, person is by far the most common category (around 25-30%). In COCO, this is especially exaggerated, being the second category at 5%, 59 categories having less than 1% of the objects, and 20 less than 0.5%. Pascal and SBD categories, and especially super-categories in COCO, are more balanced.


Would you like to discuss something about these results? Let us know below!