Separability criteria for the evaluation of boundary detection benchmarks
Title: Separability criteria for the evaluation of boundary detection benchmarks
Journal: IEEE Transactions on Image Processing
Abstract: There exists a significant number of benchmarks for evaluating the performance of boundary detection algorithms, most of them relying on some sort of comparison of the automatically-generated boundaries with human-labeled ones. Such benchmarks are composed of a representative image dataset, as well as a reliable comparison measure on the universe of boundary images. Despite many such datasets and measures have been proposed, there is no clear way of knowing which combinations of them are the most suitable for the task. In this work, we introduce four quantitative criteria that allow for a sensible evaluation of the performance of a comparison measure on a given dataset. The criteria mimic the way in which humans understand boundary images, as well as their ability to recognize the underlying scenes. These criteria can, as a final goal, quantify the ability of the boundary detection benchmarks to evaluate the performance of boundary detection methods, either edge-based or segmentation-based.
Keywords: Boundary comparison; boundary image evaluation; edge detection; segmentation; error measure
Cite as: C. Lopez-Molina, H. Bustince and B. De Baets, “Separability criteria for the evaluation of boundary detection benchmarks”, IEEE Trans. on Image Processing, 25 (3), 1047-1055 (2016).
In the past, a range of strategies has been proposed for boundary quality evaluation, taking inspiration from classification theory, spatial topology or statistics. This has lead to a rather fragmented scene of experimental results that are obtained in several different ways, reducing their representativity.
A quality evaluation method for boundary detection usually involves the comparison of the automatically-generated boundary images with the ground truth. This comprises two nuclear components: (a) a dataset and (b) a comparison measure able to capture the concordances and discordances between the boundary images and the ground truth. Such a measure can assess either the similarity (focus on quality) or the dissimilarity (focus on error) w.r.t. the ground truth. In our opinion, one of the major problems in quality evaluation for boundary detection is the fact that datasets and comparison measures are audited in an independent manner. This has led to decontextualized conclusions on the validity of different proposals.
One of the problems with the previous approaches is the fact that they intend to individually judge a comparison measure or dataset, although their usefulness is intrinsically bound. In this work, we intend to judge the combination of both, i.e. we do not judge the general suitability of either a comparison measure or a dataset, but that of their combination instead. We do so by elaborating on a set of minimal conditions the comparison measures should satisfy on a given dataset. Then, we put the most popular comparison measures in the literature to the test in combination with the BSDS (the most used dataset).
Code (in the KITT): The following pieces of code are of interest for the study and/or use of the developments in this work:
- No code uploaded yet (please email carlos.lopez-at-unavarra.es for individual dispatch)
- The paper at IEEExplore;
Related works (in the KITT):
- Twofold Consensus for boundary image ground truth;
- Quantitative error measures for edge detection.
Related works (web):
- [Martin03]– D. R. Martin, “An empirical approach to grouping and segmentation,” Ph.D. dissertation, University of California, Berkeley, 2003.
- [PontTuset13]– J. Pont-Tuset and F. Marques, “Measures and meta-measures for the supervised evaluation of image segmentation,” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2013, pp. 2131–2138.
- [Hou12]– X. Hou, A. Yuille, and C. Koch, “A meta-theory of boundary detection benchmarks,” in Proc. of the NIPS Workshop on Human Computation for Science and Computational Sustainability, 2012.
- [Hou13]– X. Hou, A. Yuille, and C. Koch, “Boundary detection benchmarking: Beyond F-Measures,” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2013, pp. 2123–2130.