Linear colour segmentation revisited

Anna Smagina; Valentina Bozhkova; Sergey Gladilin; Dmitry Nikolaev

arXiv:1901.00534·cs.CV·March 26, 2019

Linear colour segmentation revisited

Anna Smagina, Valentina Bozhkova, Sergey Gladilin, Dmitry Nikolaev

PDF

2 Repos

TL;DR

This paper revisits linear colour segmentation algorithms, introduces a novel region adjacency graph-based method with a projective transform for better shadow handling, and demonstrates improved results on a new benchmark dataset.

Contribution

It proposes a new segmentation algorithm based on region adjacency graphs and a projective transform, with demonstrated qualitative improvements over existing methods.

Findings

01

Qualitative advantages over other model-based algorithms

02

Positive effect of each proposed modification

03

Effective handling of shadows and highlights

Abstract

In this work we discuss the known algorithms for linear colour segmentation based on a physical approach and propose a new modification of segmentation algorithm. This algorithm is based on a region adjacency graph framework without a pre-segmentation stage. Proposed edge weight functions are defined from linear image model with normal noise. The colour space projective transform is introduced as a novel pre-processing technique for better handling of shadow and highlight areas. The resulting algorithm is tested on a benchmark dataset consisting of the images of 19 natural scenes selected from the Barnard's DXC-930 SFU dataset and 12 natural scene images newly published for common use. The dataset is provided with pixel-by-pixel ground truth colour segmentation for every image. Using this dataset, we show that the proposed algorithm modifications lead to qualitative advantages over…

Tables2

Table 1. Table 1: Comparison of Klinker’s and Nikolaev’s algorithms.

	Klinker’s	Nikolaev’s
Availability of the implementation	Lost	Lost but reconstructed
Optical image formation model	Does not take metals into account, allows only a single light source	Takes metals into account, allows multiple light sources
Additional heuristics	Consideration of L-shaped clusters of rank 2 for highlights, deep shadows and the off-scale area analysis	No
Algorithm infrastructure	A complex set of actions	Greedy merging technique supplemented by the edges locking
Use of region-competition to improve the accuracy of segmentation	Partly	No

Table 2. Table 2: Segmentation quality of proposed algorithm and optimal σ 0 subscript 𝜎 0 \sigma_{0} , σ G subscript 𝜎 𝐺 \sigma_{G} , δ L subscript 𝛿 𝐿 \delta_{L} parameters configuration, the μ B subscript 𝜇 𝐵 \mu_{B} was tuned by an expert.

Sub-dataset	$μ_{B}$	$σ_{0}$	$σ_{G}$	$δ_{L}$	$dataset - mIoU$
Selected-SFU	230	10.0	1.0	22.5	0.65
IITP-close	160	8.5	1.0	25.0	0.85
IITP-diffuse	250	6.0	1.0	30.0	0.71

Equations12

U (S^{M}) = m = 1 \sum M i = 1 \sum n_{m} ρ_{r}^{2} (I_{m}^{M}, p_{m, i}),

U (S^{M}) = m = 1 \sum M i = 1 \sum n_{m} ρ_{r}^{2} (I_{m}^{M}, p_{m, i}),

d_{r} (k, l) = U (S^{M}) - U (S^{M - 1}) = i = 1 \sum n_{k} + n_{l} ρ_{r}^{2} (I_{T}^{M - 1}, p_{T, i}) - i = 1 \sum n_{k} ρ_{r}^{2} (I_{k}^{M}, p_{k, i}) - i = 1 \sum n_{l} ρ_{r}^{2} (I_{l}^{M}, p_{l, i}),

d_{r} (k, l) = U (S^{M}) - U (S^{M - 1}) = i = 1 \sum n_{k} + n_{l} ρ_{r}^{2} (I_{T}^{M - 1}, p_{T, i}) - i = 1 \sum n_{k} ρ_{r}^{2} (I_{k}^{M}, p_{k, i}) - i = 1 \sum n_{l} ρ_{r}^{2} (I_{l}^{M}, p_{l, i}),

c_{i}^{'} = H_{4 x 4} c_{i},

c_{i}^{'} = H_{4 x 4} c_{i},

H_{4 x 4} = b ab ab a - \frac{b}{2} + \frac{1}{2} ab b ab a - \frac{b}{2} + \frac{1}{2} ab ab b a - \frac{b}{2} + \frac{1}{2} 000 - a + \frac{3 b}{2} - \frac{1}{2} .

H_{4 x 4} = b ab ab a - \frac{b}{2} + \frac{1}{2} ab b ab a - \frac{b}{2} + \frac{1}{2} ab ab b a - \frac{b}{2} + \frac{1}{2} 000 - a + \frac{3 b}{2} - \frac{1}{2} .

IoU (S^{*}, \tilde{S}) = \frac{∣ S ^{*} \cap S ~ ∣}{∣ S ^{*} \cup S ~ ∣} .

IoU (S^{*}, \tilde{S}) = \frac{∣ S ^{*} \cap S ~ ∣}{∣ S ^{*} \cup S ~ ∣} .

dataset \mbox - mIoU = k = 0 \sum K min (IoU (S_{k}^{*}, \tilde{S}_{k}), 0.5),

dataset \mbox - mIoU = k = 0 \sum K min (IoU (S_{k}^{*}, \tilde{S}_{k}), 0.5),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Linear colour segmentation revisited

Anna Smagina\supit1

Valentina Bozhkova\supit1

Sergey Gladilin\supit1

Dmitry Nikolaev\supit1 \skiplinehalf\supit1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute)

Abstract

In this work we discuss the known algorithms for linear colour segmentation based on a physical approach and propose a new modification of segmentation algorithm. This algorithm is based on a region adjacency graph framework without a pre-segmentation stage. Proposed edge weight functions are defined from linear image model with normal noise. The colour space projective transform is introduced as a novel pre-processing technique for better handling of shadow and highlight areas. The resulting algorithm is tested on a benchmark dataset consisting of the images of 19 natural scenes selected from the Barnard’s DXC-930 SFU dataset and 12 natural scene images newly published for common use. The dataset is provided with pixel-by-pixel ground truth colour segmentation for every image. Using this dataset, we show that the proposed algorithm modifications lead to qualitative advantages over other model-based segmentation algorithms, and also show the positive effect of each proposed modification. The source code and datasets for this work are available for free access at http://github.com/visillect/segmentation.

keywords:

colour segmentation, colour space, colour homography, clusterisation

1 Introduction

Colour segmentation (CS) is one of the most interesting problems in image analysis. Its goal is to split images into segments – non-intersecting areas corresponding to uniformly coloured objects or their parts. Colour segmentation is used in e.g. augmented reality technology and object tracking in video streams. Although the colour of an object in the desired area of the image is considered constant, the colour of pixels in it can vary significantly due to differences in the illumination and the viewing angle for each pixel. Since segmentation divides an image into areas according to colour of physical world objects and not the colour of pixels, it can be viewed as a special case of the colour constancy problem [1], which is to determine the colour parameters of an object by its image. However, the CS is much easier since it aims only to outline the boundaries of uniformly coloured objects without determining the colour itself.

The physical approach to the CS [2] consists in construction of the algorithms using mathematical models of image formation, which are derived from physical laws of light reflection and scattering. Most significantly this approach is presented in Klinker’s[3] and Nikolaev’s [4] algorithms. These algorithms are viewed in detail in this paper and we propose an algorithm combining both of them. The image formation model can be divided into an optical image formation model – the spectral distribution of the sensor (camera) lighting – and the sensor model that forms a digital image from the optical one. Each point of the resulting digital image is a vector in the colour space (CSp). Usually RGB-CSp is used, so we will consider the CSp three-dimensional. The approach based on the linear model of image formation makes it possible to outline a uniformly coloured object with high accuracy by analysing the shape of clusters corresponding to the object in different lighting conditions in the CSp.

Although the sensor intensity transfer function in most cases is non-linear [5], this non-linearity can often be adjusted by the calibration [6]. If the image source is unknown and no calibration data are available, a blind calibration can be applied similar to the proposed one for a radial distortion [7]. Special CS algorithms which do not assume sensor linearity [8] are required only when non-linearity correction is impossible, and are not considered in this paper.

In most recent works about colour [9] as well as semantic and instance segmentation [10] problems that we have studied, neural networks are used without image formation model. Classical algorithms may also be used for image segmentation [11, 12], although works studying that are quite rare. In this work we assume that a more accurate result with less risk of overfitting can be achieved by combining neural-network and classical model-based approaches. For example, in work [13] the neural network is used to calculate the potential energy of each pixel in the classical segmentation watershed algorithm. Another example of combining neural-network and algorithmic approaches can be found in [14], where authors propose deep neural networks architecture based on the classical convolutional network, but also containing additional intermediate layers calculating the fast Hough transform. From that perspective studying the physical approach to CS remains significant, despite the active development of machine learning methods.

2 HISTORY OF PHYSICS-BASED APPROACH TO LINEAR COLOUR SEGMENTATION

2.1 Klinker’s algorithm

One of the first published physics-based CS algorithms was that of Klinker with co-authors [3] based on Shafer’s [15] most known model of spectral distribution of sensor illumination – dichromatic reflectance model (DRM). The Schafer’s model describes a large class of materials – inhomogeneous dielectrics – which includes paints, plastics, ceramics, paper and different natural materials (see also [16, 17, 18]). For a uniformly coloured object covered with glossy inhomogeneous dielectric and one dominating light source the model states that the light reflected by the object can be decomposed into linear combination of specular (interface) and diffuse (body) reflectance components. In a sensor CSp such an object generates clusters that have the skewed T- or L-shapes [3, 19]. The first stroke of such cluster shape – body part – extends from the black point that defines the lower end of this vector to the point of maximum body reflection. The other part starts somewhere along the body-reflection cluster and extends to the highlight maximum.

The Klinker’s algorithm aims at distinguishing such clusters. It consists of two stages: pre-segmentation and main segmentation. At the pre-segmentation stage the image content is not used, rather, the entire image is divided regularily into square segments (cells) of equal size. Then for each segment a principal components analysis (PCA) of its colour distribution is carried out. As a result the colour distribution of each segment is classified as pointlike, linear, planar or volumetric. From this point onwards, each segment is characterized by its model in the CSp – a specific point, line or plane, the coordinates of which are set by the center of mass of the colour cluster and its eigenvectors. In a case of mergers or changes in segment boundaries the model is re-calculated.

The main segmentation stage starts with the neighbouring segments being checked in pairs for the similarity of their colour model and merged into one if they pass the test. According to this test their class should be the same, not to be volumetric and persist after the merge. Further processing is based on modelling clusters with a straight line. Each of clusters, in descending order of size, attaches neighbouring pixels of other segments that satisfy (according to the distance to cluster’s axis) the model of the current segment well enough. Segments of other classes can be reduced in area or be fully absorbed. Since the linear clusters near the zero of the CSp are very close to each other, the processing of pixels close to zero is carried out in a special way. Then, if the L-shape test for clusters of neighbouring segments indicates the presence of a highlight, the linear segments merge into planar according to the DRM. Finally planar segments attach boundary pixels in a similar way to such a stage for linear clusters, and this concludes the segmentation process.

Unfortunately, the implementation of the Klinker’s algorithm was not preserved even by her co-authors [20].

2.2 Nikolaev’s algorithm

Nikolaev’s algorithm [4] is based on a model proposed by Nikolayev that is more general than the DRM model of the spectral distribution $F(\lambda)$ of the illumination of the sensor, recording the image with complex lighting conditions (more than one light source) [21] and containing objects with different reflective surface properties, including metals. In this model the $F(\lambda)$ of each point of the object is an element of the linear submanifold of dimension $r$ (which is called the rank of the object) in the spectral function space.

The sensor projects infinite-dimensional spectral illumination distributions into a three-dimensional RGB CSp, and this preserves the degeneracy if $r<3$ : the points of the object form clusters in the CSp, lying in the same plane ( $r=2$ ), on a straight line ( $r=1$ ) or being a point ( $r=0$ ). In some cases the cluster in the CSp may have a rank not equal to, but lower than the rank of the object. Note that the ranking of sets in the CSp corresponds exactly to the classification (pointlike, linear, planar or volumetric) used in the Klinker’s algorithm, but Nikolayev applies it to clusters corresponding to the whole objects, while Klinker classifies only areas of the preliminary segmentation.

The authors of [4] described a set of scenes containing objects with different types of surfaces and located in various lighting and observation conditions, creating clusters of different ranks. In particular, flat objects illuminated by a distant light source, observed from afar, have a rank 0 according to the linear theory. Strongly matted and/or metallic objects (the reflectance model for which is proposed by Healey and Tominaga [22, 17]), when illuminated by one close source, have a rank 1. Convex glossy chromatic dielectrics, when illuminated by a close source, have a rank 2. In addition, rank 2 have convex glossy chromatic dielectrics illuminated by two sources: close chromatic and diffuse ones. An extended example list of scenes of various ranks for 3D CSp is given in table 1 of [4]. Thus, it is possible to reformulate the problem of CS in the following form: in order to segment the image, one needs to decompose the colour histograms into point-like, linear and planar distributions.

Nikolaev’s algorithm has two stages. At the first stage, the Gaussian filtering of the image is carried out, followed by pre-segmentation using morphological watersheds [23]. The second stage is the stage of main segmentation, where the region adjacency graph (RAG) [24] is constructed and the region merging technique is consistently applied on this graph with three different weight functions. The weight functions are chosen under the assumption that all segments have a rank of 0, 1, or 2, respectively, but the algorithm as a whole provides segmentation of images containing arbitrary combinations of segments of all listed ranks. We note that region-based image segmentation methods are often used in development of automatic segmentation systems [25, 26].

The algorithm takes into account the fact that region merging with a weight function of a higher rank breaks the boundaries of objects of lower rank since it is always possible to draw a plane through a point and a straight line. To avoid this, the segments corresponding to uniformly coloured areas of the scene objects (with some degree of certainty) are excluded from processing (i.e. marked isolated) between stages of merging with different weight functions.

Unfortunately, the original implementation of the Nikolaev’s algorithm was not preserved, but later it was partially re-implemented again [27]. In the new implementation the pre-segmentation phase was eliminated, and individual pixels were used as the initial partitioning elements. Khanipov’s implementation does not involve segmentation of objects ranked other than 0, and contained only one region merging cycle. It was further extended for the colour-texture segmentation [28].

2.3 Nikolaev’s and Klinker’s algorithm in comparison

Nikolaev’s and Klinker’s algorithms are compared in table 1. As we see, Nikolaev’s algorithm is based on a more complete colour model and uses a well-researched graph-based region merging technique as an infrastructure, while Klinker’s algorithm uses a number of unique heuristics reflecting the knowledge about the shape of clusters in the CSp, which improves the accuracy of segmentation. It comes to mind that Nikolaev’s algorithm modification with techniques similar to the one used in Kinker’s algorithm would be beneficial.

3 Proposed Algorithm for Colour Segmentation

In this work we propose a CS algorithm based on Nikolaev’s approach supplemented with heuristics based on Klinker’s observations. The proposed modifications are the following:

Bilateral filter is used for image pre-processing. 2. 2.

Individual pixels are used as elements of initial segment set as in [27]. 3. 3.

RAG edges costs and algorithm termination criteria are now derived from the general approach to minimise the sum of squared deviations of the image from its linear model (section 3.1). 4. 4.

The algorithm is working in a projectively transformed CSp, not in the linear CSp of the sensor (section 3.2). 5. 5.

L- or T-shape of rank 2 clusters is taken into account (section 3.3). 6. 6.

Geometric heuristic to include off-scale (overexposed or over-saturated) areas into regions is used (section 3.4).

3.1 Approach to edge weight function construction

Segment merging in Nikolaev’s algorithm is governed by the weight functions estimating colour proximity of two segments. In this work we propose an approach to weight function construction that is general for all ranks and is based on minimisation of sum of squared deviations (SSD) of image pixel values from segments models.

Within the linear colour theory we consider 3 models of segments corresponding to ranks from 0 to 2: point, straight line and plane. We assume that all segments have the same rank $r$ on each iteration of the region merging algorithm. Here, for each segment we would choose parameters $I$ for the model with the rank $r$ (i.e. for point, straight line or plane) in order to minimise the SSD of pixels of the segment from the model.

Suppose at some iteration of region merging for rank $r$ we get a segmentation $S^{M}=\{S_{m}\}_{m\in\{1,..,M\}}$ of image into $M$ segments. Let each segment $m$ be given its model $I^{M}_{m}$ with rank $r$ . Let $\rho_{r}(\vec{I^{M}_{m}},\vec{p})$ be the distance from pixel $p\in S_{m}$ to model $I^{M}_{m}$ in CSp. Parameters of such a model are estimated with the least-squares method.

In assumption of the linear image formation model and normal noise lets define the cost function of fragmentation $S^{M}$ as the sum of squared deviations of pixels from models of their corresponding segments:

[TABLE]

where $\vec{p}=(\mathrm{R},\mathrm{G},\mathrm{B})^{\mathrm{T}}$ are coordinates of pixels in CSp and $n_{m}$ is the size of $m$ -th segment.

We can view the CS as the minimisation of (1) and approximate it with the greedy merging of the RAG. In terms of (1) a neighbouring pair of segments $S_{l}$ and $S_{k}$ from $S^{M}$ is an optimal merging if it leads to the smallest difference $U(S^{M-1})-U(S^{M})$ , where $S^{M-1}=S^{M}\backslash\{S_{l},S_{k}\}\cup\{S_{l}\cup S_{k}\}$ . Hence, the cost $d_{r}(k,l)$ of merging segments $S_{k}$ and $S_{l}$ could be associated with this difference and given as

[TABLE]

where $T$ is a result of joining segments $k$ and $l$ .

Note that for SSD estimation for pixels of $m$ -th segment from model $\sum_{i=1}^{n_{m}}\rho_{r}^{2}(I^{M}_{m},\vec{p}_{m,i})$ it is enough to know segment area, its centre of mass and co-variance matrix of its pixels. These three characteristics can be calculated using additive statistics, namely using area and its 1 and 2 moments (i.e sum of pixel components $\sum_{i=1}^{n_{m}}\{\mathrm{R}_{i},\mathrm{G}_{i},\mathrm{B}_{i}\}$ and sum of all possible pairwise products of pixels components $\sum_{i=1}^{n_{m}}\{\mathrm{R}_{i},\mathrm{G}_{i},\mathrm{B}_{i}\}\{\mathrm{R}_{i},\mathrm{G}_{i},\mathrm{B}_{i}\}$ ). Thus the calculation of RAG edges costs with weight function (2) may be implemented without iterating through all segment points and requires $O(1)$ time.

Such approach to RAG edge weighing is well-known [29] and is used, for example, in SAR image segmentation[30, 31], but applied to CS presumably for the first time. Note that in [32] as well as in original Nikolaev’s algorithm merging terminates when the weight of the best edge exceed a given threshold. Such termination criterion seems natural, but does not have a reasonable mathematical proof. Firstly, the weight of the merged edge have the meaning of error increment, not the full error, and so cannot be easily connected with expected noise level of the image. Secondly, for the incomplete graph (and the RAG is a planar graph, i.e. almost always incomplete) monotonous growth of the weight cannot be guaranteed in the sequence of the best edges. So we propose another criterion.

On every step of merging we estimate the sum of squared deviations $U(S^{M-1})$ . As follows from (2), this can be done with one addition operation $U(S^{M-1})=U(S^{M})+d_{r}(k,l)$ . Merging terminates when the value $\sqrt{\frac{U(S^{M})}{N}}$ , where $N$ is the total amount of image pixels, exceeds a given threshold. When merging with the weight function $d_{0}$ , $\sigma_{0}$ – an algorithm parameter proportionate to the noise – is used as the threshold. When merging with weight functions $d_{1}$ and $d_{2}$ , $\sigma_{1}=\sqrt{\frac{2}{3}}\sigma_{0}$ and $\sigma_{2}=\sqrt{\frac{1}{3}}\sigma_{0}$ are used as thresholds.

3.2 Colour space projective transform

Note that approach with the SSD minimisation introduced above is still applied when making a transition to the transformed CSp in a case when such transformation does not change linear properties of clusters models, i.e rank structure is persisted. This requires one-to-one correspondence of planes of initial space to planes of the target space and, as a result, one-to-one correspondence of lines and points of initial space to lines and points of the target space respectively. In other words, it is possible to apply CS in projectively transformed CSp. Homography of 3-dimensional CSp proposed for the first time presumably in [33], but applied to another problem – photo-realistic colour transfer.

Besides the rank classification in the linear colour theory there is a sub-rank classification of clusters in CSp [4]. It considers cluster orientation relatively to coordinate axes of CSp, what allows to reduce the dimension of space for clusters analysis in most cases. The requirement of sub-rank structure persistence when using CSp homography impose some additional restrictions on it:

•

origin of initial space should map into origin of the target space,

•

the line through points $(0,0,0)$ and $(1,1,1)$ of the initial space should map into the line through the same points of the target space.

Further restrictions to CSp homography will be introduced according to the following.

Let us consider two pixel pairs $(\vec{p}_{a},\vec{p}_{b})$ and $(\vec{p}_{b},\vec{p_{c}})$ , so $\lVert\vec{p^{\prime}}_{a}-\vec{p^{\prime}}_{b}\rVert=\lVert\vec{p^{\prime}}_{b}-\vec{p_{c}^{\prime}}\rVert$ and $l_{c}>l_{a}$ , where $l=\frac{R+G+B}{3}$ is a pixel brightness. Then we will distort the CSp so $\lVert\vec{p^{\prime}}_{a}-\vec{p^{\prime}}_{b}\rVert<\lVert\vec{p^{\prime}}_{b}-\vec{p_{c}^{\prime}}\rVert$ . In the context of merging with region adjacency graph such space transformation will result in edges weight increasing the more, the less is the average brightness of pixels segments, corresponding to edges end-points. Such approach can be used for solving conflicts when merging segments near the zero brightness. Here conflicts occur since of all matte clusters merge near the dark corner of the colour cube, what was specified by Klinker in [3], where she excludes dark-colour pixels from colour analysis.

Let us require homography to be symmetrical with respect to the brightness axis (i.e line through the $(0,0,0)$ and $(1,1,1)$ ). Note that although the non-linearity correction of colour variation in the RGB-space may improve clusters partition in the CSp by its colour from human perception point of view, it is rational to apply it independently from this transformation and CS as such (as well as the sensor transfer function non-linearity compensation), since such correction requires an information about characteristics of sensor colour coverage.

Also note that homographies of CSp differing only in scale of transformed space are equivalent. Changes in scale do not influence mutual orientation of clusters in CSp, so its only entails the necessity of recounting the merging threshold value when using segmentation with the region merging technique to achieve equivalent segmentation result.

Transformation, which satisfies linear colour theory and considerations above, could be defined by the following relations of points of the initial and the target CSp:

•

$(0,0,0)\leftrightarrow(0,0,0)$ ,

•

$(1,1,1)\leftrightarrow(b,b,b)$ ,

•

$(1,0,0)\leftrightarrow(1,a,a)$ ,

•

$(0,1,0)\leftrightarrow(a,1,a)$ ,

•

$(0,0,1)\leftrightarrow(a,a,1)$ ,

where $0\leq a\leq 1$ , $\frac{2a+1}{3}\leq b\leq 1$ and $0\leq\mathrm{R},\mathrm{G},\mathrm{B}\leq 1$ . Such transformation leads to lower resolution in brightness close to point $(1,1,1)$ of CSp, and vice versa, close to origin the resolution in brightness increases (fig. 1). When $a>0,b=1$ only transfer of colour cube points of maximum saturation occurs (fig. 1b). When $a=0,b<1$ only compression along brightness axis of colour cube diagonal occurs. (fig. 1с). When $a=0,b=1$ transformation is identical.

4-dimensional homography of 3-dimensional space is defined uniquely by setting five pairs of points corresponding to points transformation. So for each set of parameters values $a$ и $b$ there is only one projective transformation. Homography defined by the matrix $\mathrm{H_{4x4}}$ , transforms coordinates of the pixel from the initial CSp $\vec{p}$ to $\vec{p}^{\prime}$ in the following way:

[TABLE]

where $\vec{c}=(\mathrm{R},\mathrm{G},\mathrm{B},1)^{\mathrm{T}}$ и $\vec{c}^{\prime}=(\mathrm{R^{\prime}},\mathrm{G^{\prime}},\mathrm{B^{\prime}},\mathrm{W})^{\mathrm{T}}$ – four-dimansional homogeneous coordinates of the vectors $\vec{p}$ and $\vec{p^{\prime}}$ respectively, $\vec{p^{\prime}}_{i}=(\mathrm{R}^{\prime}/\mathrm{W},\mathrm{G}^{\prime}/\mathrm{W},\mathrm{B}^{\prime}/\mathrm{W})^{\mathrm{T}}$ . The desired parametric transform family is defined by homography matrix as

[TABLE]

.

3.3 Considering L- or T-shape of rank 2 clusters

L- and Т-shape of rank 2 clusters, noticed by Klinker in [3], lies beyond the linear theory. In segmentation algorithm we consider L- or T-structure of two rank 1 clusters union as an additional check before applying region merging technique with the weight function (2) for rank 2. In order to do that we model each of merging clusters by a segment of a straight line, which is the major axis of pixels dispersion ellipsoid in CSp. The centre of this segment coincides with the dispersion ellipsoid centre, and its length equals doubled square root of dispersion ellipsoid semi-axes. We assume that two clusters forms a skew T- or L-shape, if segments modeling them intersect in the way that at least one segment has at least one end, so the distance from it to another segment is less than threshold $\delta_{L}$ , which is provided as an input of the segmentation algorithm.

3.4 Geometric heuristic to include off-scale areas into regions

The pixels in the off-scale (overexposed or over-saturated) areas generally do not fall into the planar slice defined for the dichromatic plane of an object area, as a result the planar segmentation excludes these off-scale areas. A geometric heuristic could be applied to to include distorted pixels into the region. As it was noticed by Klinker in [3], such pixels with distorted colours generally are found in the middle of a highlight region.

In the proposed algorithm, we apply this heuristic after region merging with the edge weight function $d_{2}$ . The regions in the RAG are considered to be off-scale if its average brightness exceeds a given threshold $\mu_{B}$ . If an off-scale region has only one neighbour than these two are merged. In addition to the above-mentioned Klinker’s observation, we consider that off-scales could also occurs as stripes across the body, which is a typical case for cylindrical objects. So, if two regions adjacent to an off-scale region, but not neighbouring with each other, form an L- or T-shaped cluster, all three are also merged.

3.5 Formulation of proposed algorithm

The proposed algorithm consists of the following steps:

Applying bilateral filtering and CSp projective transform given by (3). 2. 2.

Initialising RAG with vertices corresponding to individual 4-connected pixels and the SSD equal to 0. 3. 3.

Applying the region merging technique with the weight function $d_{0}$ and a threshold $\sigma_{0}$ which is comparable with the noise level of the image. 4. 4.

Marking isolated segments of rank 0, i.e., those segments from which the minimal value of Kullback-Leibler divergence $d_{min}$ to the neighbouring segment (among all adjacent edges) exceeds a certain threshold $d_{min}~{}>~{}\sigma_{G}$ . 5. 5.

Reinitialise the SSD with a sum of squared deviations of pixel values from the models of segments of rank 1. 6. 6.

Applying the region merging technique with the weight function $d_{1}$ and a threshold $\sigma_{1}=\sqrt{\frac{2}{3}}\sigma_{0}$ , ignoring edges leading to isolated segments. 7. 7.

Applying the region merging technique with the weight function $d_{1}$ and a threshold $\sigma_{1}$ , ignoring edges connecting two isolated segments. 8. 8.

Marking isolated those pairs of vertices that do not pass the L- or T-shape check with a threshold $\delta_{L}$ . 9. 9.

Reinitialise the SSD with a sum of squared deviations of pixel values from the models of the segments of rank 2. 10. 10.

Applying the region merging technique with weight function $d_{2}$ and thresholds $\sigma_{2}=\sqrt{\frac{1}{3}}\sigma_{0}$ , ignoring those edge connecting two isolated segments. 11. 11.

Finally, the geometric heuristic for off-scales on a RAG with a mean brightness threshold $\mu_{B}$ .

Note that before the merging with the weight function $d_{2}$ we use the L-shape check for the rank 2 clusters with a threshold $d_{L}\ll\sigma_{G}$ to mark segments as isolated instead of using the criterion $d_{min}>\sigma_{G}$ .

4 Experimental results

4.1 Dataset

To evaluate the performance of the proposed algorithm we require a dataset satisfying 2 requirements. First of all, images should be recorded by a linear sensor (in order to satisfy Nikolayev’s linear colour theory) or at least sensor non-linearity type and parameters should be known to compensate them. Secondly, images of scenes shouldn’t contain over-detailed objects and colours which distinction is challenging even for a human. Unfortunately, datasets satisfying both requirements were not found in public sources. Therefore specific dataset was collected and released [34]. The dataset consists of three distinct parts (sub-datasets). Each part is fully acquired with the use of a single camera sensor.

The first part of the dataset consists of 19 637x468 pixels images of natural scenes selected from Barnard’s DXC-930 SFU dataset for colour research [35] (further – Selected-SFU). In the original dataset each scene is taken under 11 different close light sources. In this work scenes under Philips Ultralume were chosen. Among chosen scenes 17 images contain flat varied in colour sheets of paper (so-called ”mondrians”), 5 images contain volumetric objects without or with insignificant highlights, on 2 more images volumetric objects with highlights and inter-reflection effects are depicted. For all images linear contrast adjustment with 95% quantile was applied, 5% of pixels were allowed to be saturated.

The second and the third parts of the dataset consist of natural scene images taken by our colleagues at IITP RAS that were used in [4] but not published. There are 5 about 450x500 images of scenes with close light source (further – IITP-close) and 7 1280x1024 images of scenes with diffuse light source (further – IITP-diffuse). The camera model for IITP-close is unfortunatelly unknown. Images from IITP-diffuse sub-dataset was taken with Olympus D600L camera. All images contain volumetric objects, some of which with highlights, but with no inter-reflection effects.

To examine intensity transfer function linearity of the recording sensor, regions of rank 1 were chosen in each sub-dataset and the shape of the corresponding clusters of pixels in CSp was analysed. For all three sub-datasets parts the shape of the clusters was shown to be well approximated by a line segment (fig. 2).

All images are provided with pixel-by-pixel CS annotations. For annotation purposes, images were first automatically splitted into small subregions with guaranteed colour constancy, and then are manually merged if their colours were undistinguished by a human eye. Segments containing less than 20 pixels were not annotated. In addition, annotations for regions corresponding to deep shadows (i.e. with average brightness close to zero) (fig. 3b,c) were provided. Such regions appears at Selected-SFU images containing volumetric objects. Information about colour in such regions is lost and can not be recovered. Annotation for each shadow segment is provided as a separate binary mask.

4.2 Quality evaluation of proposed algorithm

On dataset described above, we evaluated the quality of proposed algorithm, which was configured as follows. Bilateral filtering was applied with smoothing parameters $f_{r}=50$ and $g_{s}=50$ , where $f_{r}$ is the range kernel for smoothing differences in intensities and $g_{s}$ is the spatial kernel for smoothing differences in coordinates. CSp homography was applied with parameters $a=0$ , $b=0.4$ . Off-scale brightness threshold $\mu_{B}$ was tuned by an expert for each sub-dataset and provided in 2. The value of $\mu_{B}$ is calculated for non-projectively transformed CSp in a range from 0 to 255.

The values of thresholds on merging $\sigma_{0}`$ and edges locking $\sigma_{G}$ , as well as the $\delta_{L}$ threshold, which is used to check L-shape of the rank 2 clusters, was chosen in a such way to achieve the best segmentation quality according to the metric. Such procedure was applied separately for each sub-dataset.

To match output segments with the ground truth ones the intersection-over-union (IoU) score, also known as Jaccard index, was calculated for each possible pair of $S^{*}$ ground truth and $\tilde{S}$ output segments:

[TABLE]

which gives the ratio in [0, 1]. The evaluation quality was calculated as dataset-mIoU at IoU = .50 (provides one-to-one segments matching) which is the official metric of the segmentation task in Pascal VOC [36] and numerous popular completions and tasks:

[TABLE]

where $K$ is a number of ground truth segments. The quality range is also in [0, 1].

In a case of binary masks with shadow segments annotation are represented in dataset for a given image, output segments was firstly compared to shadow segments. As shadow segments may overlap each other, so, according to criterion IoU = .50 output segment may match several shadow segments. Therefore to avoid ambiguity, comparison was applied only with such shadow segment, which had maximum IoU with the output one. Then segments unmatched with shadow segments output were compared to other ground truth segments excluding sets of pixels already matched the shadow segments.

The table 2 shows the experimental results for the proposed algorithm for each of the subsets of the dataset, showing the optimal values of the adjustable parameters, and the figure 3 illustrates the segmentation results. We see that the quality is high on the IITP-close sub-dataset which consists of images captured in conditions similar to the model ones, whereas the quality on the Selected-SFU subset is worst, reflecting the much more complex nature of images.

Since we do not have the possibility to compare the results of the proposed algorithm with Klinker’s or Nikolaev’s results, we have had to resort to testing of the proposed modifications to prove their positive impact on the segmentation result. We were turning them off one by one, readjusting the parameter values to be optimal, and then applying testing this restricted algorithm on the dataset. As we see from the figure 4d, without the geometric heuristic for off-scales one of these areas remains unmerged with the highlight one; disabling the cluster L-shape check leads to erroneous merging of areas corresponding to different colours (fig. 4e); in other cases, we see false separations of colours on areas differing only in brightness even in the absence of sharp shadows (fig. 4e–h).

4.3 Properties of the proposed algorithm

Let us consider the properties of the algorithm as seen from the figure 3. We see that the uniform CS is performed correctly in the following cases:

•

flat and volumetric dielectric surfaces with diffuse shadows (except neighbouring very small uniformly coloured areas), without jagged contour and false positive segments near the object boundaries;

•

deep shadows like on figure 3e;

•

overexposed areas on the uniformly coloured dielectrics (fig. 3d) and metals (fig. 3c).

Some kinds of scenes are still difficult:

•

image areas different only in brightness are not always segmented correctly: on fig. 3d the light blue book cover is merged with its white pages, and on fig. 3b, conversely, a part of the red wall is not merged with the red square block;

•

in some cases the sharp shadow boundaries are incorrectly considered to be separate objects (fig. 3c);

•

some small segments are incorrectly merged with larger neighbouring ones (fig. 3a,e), this is caused by the choice of the cost function 1 and could be solved by its refinement;

•

Some images contain the thin elongated gaps between segment boundaries (e.g. in fig. 3e between the left and central objects), this can be fixed by taking the segment perimeter into account in the cost function as it was proposed in [29].

We specially note that our algorithm does not correctly process the inter-reflections (like in fig. 3c on the metallic pan to the left), but inter-reflections can not be described in terms of linear colour theory with rank lower than 3, and thus this behaviour is expected. We also cannot expect the correct segmentation of very deep shadows with near-zero brightness (fig. 3b) and is accounted for in the reference dataset annotation.

5 Conclusion

The development of the linear CS algorithm based on Nikolaev’s approach with modifications inspired by Klinker’s heuristics is presented. A novel generalised approach to weight function construction is used, which is based on minimisation of the sum of squared deviations of the image from its linear model. Another proposed feature is the CSp projective transform as a pre-processing step that allows better separation of segments in low illumination areas thus better accounting for shadows, while preserving the linear properties of clusters models in the CSp.

The proposed algorithm is tested on a novel dataset which is partially based on the (rather complex) Barnard’s DXC-930 SFU dataset [35] supplemented by images representing simpler conditions allowing the better study of the adherence of the algorithm to the considered cases of the linear colour model. The per-pixel annotation is provided for each image of the dataset. Using this dataset, we show that all the proposed modifications do in fact enhance the segmentation quality. The experimentally discovered properties of the algorithm include good processing of strong shadows and overexposed areas, while some of the drawbacks may be attributed either to limitations of the model (segment rank less than 3) or of the weight function, which may be addressed in future work. Both algorithm implementation and the dataset are available for public use at [37].

Acknowledgements.

This work is partially supported by Russian Foundation for Basic Research (project 17-29-03514). Authors thank Dr. Pavel Chochia for dataset images collected at IITP RAS. Authors also express their sincere gratitude to Dmitry Sidorchuk, Ivan Konovalenko and Alexey Glikin for their technical help.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Gijsenij, T. Gevers, and J. Van De Weijer, “Computational color constancy: Survey and experiments,” IEEE Transactions on Image Processing 20 (9), 2475–2489 (2011).
2[2] H.-D. Cheng, X. H. Jiang, Y. Sun, and J. Wang, “Color image segmentation: advances and prospects,” Pattern recognition 34 (12), 2259–2281 (2001).
3[3] G. J. Klinker, S. A. Shafer, and T. Kanade, “A physical approach to color image understanding,” International Journal of Computer Vision 4 (1), 7–38 (1990).
4[4] D. P. Nikolaev and P. P. Nikolayev, “Linear color segmentation and its implementation,” Computer Vision and Image Understanding 94 , 115–139 (2004).
5[5] G. J. Klinker, S. A. Shafer, and T. Kanade, “The measurement of highlights in color images,” International Journal of Computer Vision 2 (1), 7–32 (1988).
6[6] D. Shepelev, E. Ershov, A. Tereshin, T. Chernov, and D. P. Nikolaev, “Weighted search for a projective optical flow resistant to specular reflections,” Sensory systems 32 (1), 73–82 (2018).
7[7] I. A. Kunina, S. A. Gladilin, and D. P. Nikolaev, “Blind radial distortion compensation in a single image using a fast Hough transform,” Computer optics 40 (3), 395–403 (2016).
8[8] S.-K. Kim, S.-D. Lee, C.-Y. Kim, Y.-S. Seo, and D. Nikolayev, “New image segmentation method using mode finding, multi-link clustering, and region graph analysis,” in Vision Geometry XII , 5300 , 138–146, International Society for Optics and Photonics (2004).