Neither Global Nor Local: A Hierarchical Robust Subspace Clustering For   Image Data

Maryam Abdolali; Mohammad Rahmati

arXiv:1905.07220·cs.CV·January 7, 2020

Neither Global Nor Local: A Hierarchical Robust Subspace Clustering For Image Data

Maryam Abdolali, Mohammad Rahmati

PDF

TL;DR

This paper introduces a hierarchical robust subspace clustering method that combines local patch-based and global representations to improve robustness against noise, occlusion, and disguise in image data.

Contribution

A novel hierarchical framework that integrates local and global data representations for enhanced robustness in subspace clustering.

Findings

01

Effective in handling noise, occlusion, and disguise

02

Outperforms existing methods on real datasets

03

Provides robust clustering results in complex scenarios

Abstract

In this paper, we consider the problem of subspace clustering in presence of contiguous noise, occlusion and disguise. We argue that self-expressive representation of data in current state-of-the-art approaches is severely sensitive to occlusions and complex real-world noises. To alleviate this problem, we propose a hierarchical framework that brings robustness of local patches-based representations and discriminant property of global representations together. This approach consists of 1) a top-down stage, in which the input data is subject to repeated division to smaller patches and 2) a bottom-up stage, in which the low rank embedding of local patches in field of view of a corresponding patch in upper level are merged on a Grassmann manifold. This summarized information provides two key information for the corresponding patch on the upper level: cannot-links and recommended-links.…

Tables6

Table 1. Table 5.1: Parameters of the compared approaches.

Approach

Parameters

Extended Yale B

AR

Coil 20

LGSSC

α = 20

,

λ_{1} = 1

,

λ_{2} = 10

p = 4

,

s = 2

α = 100

,

λ_{1} = 5

,

λ_{2} = 10

p = 4

,

s = 3

α = 20

,

λ_{1} = 2

,

λ_{2} = 10

p = 4

,

s = 2

MG-SSC

α = 20

,

p = 4

,

s = 3

α = 100

,

p = 9

,

s = 3

α = 20

,

p = 4

,

s = 2

SSC

α = 20

α = 100

α = 20

LRR

λ = 0.009

λ = 0.095

λ = 0.0092

EDSC

λ_{1} = 0.06

,

λ_{2} = 0.01

dim = 10,

α = 4

λ_{1} = 0.06

,

λ_{2} = 0.01

dim = 10,

α = 4

λ_{1} = 0.06

,

λ_{2} = 0.01

dim = 12,

α = 8

S^{3} ​ C

γ = 1

,

α = 20

γ = 1

,

α = 100

γ = 1

,

α = 20

LRSC

τ = 0.045

,

α = 10^{5}

τ = 0.07

,

α = 0.1

τ = 0.045

,

α = 0.07

Table 2. Table 5.2: Average performance on the Extended Yale B data set with different number of subjects. The best performance is indicated in bold.

Algorithm	LG-SSC	MG-SSC	SSC	LRR	EDSC	$S^{3} C$	LRSC
2 subjects
ACC	99.92	99.91	98.14	89.69	97.35^∗	99.48^∗	96.23
NMI	99.54	99.38	93.16	66.69	-	-	82.05
ARI	99.70	99.66	94.29	68.13	-	-	86.23
3 subjects
ACC	99.42	99.87	96.70	79.09	96.35^∗	99.11^∗	93.55
NMI	99.04	99.43	92.75	59.61	-	-	81.06
ARI	98.99	99.62	92.61	53.22	-	-	82.61
5 subjects
ACC	99.35	99.78	95.68	65.46	94.89^∗	98.49^∗	90.46
NMI	99.02	99.32	91.56	54.53	-	-	80.74
ARI	98.85	99.46	90.17	39.07	-	-	78.99
8 subjects
ACC	99.41	99.72	94.13	59.02	93.93^∗	97.69^∗	76.36
NMI	98.54	99.35	90.58	56.34	-	-	70.71
ARI	98.65	99.38	86.44	36.27	-	-	59.15
10 subjects
ACC	99.68	99.68	92.60	60.42	92.76^∗	97.19^∗	66.56
NMI	99.33	99.33	89.37	59.79	-	-	66.27
ARI	99.31	99.31	82.72	38.13	-	-	49.23

Table 3. Table 5.3: Performance on the Extended Yale B data set with different number of subjects. The best performance is indicated in bold.

#subjects

Metric

LG-SSC

MG-SSC

SSC

LRR

EDSC

S³C

LRSC

15

ACC

NMI

ARI

99.47

99.10

98.87

100

78.81

79.14

60.89

64.30

66.18

41.18

86.44

88.97

80.13

88.24

91.25

84.94

68.75

71.57

51.65

20

ACC

NMI

ARI

98.73

98.05

97.31

98.65

98.02

97.04

73.61

76.67

54.50

68.07

70.68

42.99

88.51

90.79

81.98

85.73

91.15

82.79

71.08

75.49

52.90

30

ACC

NMI

ARI

98.69

98.09

97.27

92.43

94.57

88.35

74.66

77.69

51.20

71.50

75.35

43.90

87.22

91.22

79.46

84.91

90.48

80.63

71.24

75.19

52.90

38

ACC

NMI

ARI

93.37

94.91

86.03

90.27

91.47

78.99

70.67

75.44

40.52

66.28

72.19

45.99

85.29

90.08

72.67

78.71

86.78

68.16

70.17

75.19

52.46

Table 4. Table 5.4: Performance on the AR data set with different number of subjects. The best accuracy is indicated in bold.

#subjects

Metric

LG-SSC

MG-SSC

SSC

LRR

EDSC

S³C

LRSC

5

ACC

NMI

ARI

100

76.92

62.79

48.32

82.31

71.57

62.33

75.38

64.74

56.17

76.92

64.04

48.55

63.08

47.87

35.55

10

ACC

NMI

ARI

100

66.54

65.34

41.97

68.08

72.27

55.13

73.46

81.29

65.89

71.15

73.19

50.13

67.69

68.64

52.11

20

ACC

NMI

ARI

100

90.00

92.82

86.38

59.42

68.86

40.41

80.58

86.14

73.89

66.16

75.61

51.78

60.00

69.25

37.84

72.69

77.42

55.68

50

ACC

NMI

ARI

96.15

97.71

94.28

86.15

90.33

75.09

67.31

78.21

47.21

87.31

91.79

75.85

65.76

79.95

51.56

58.08

72.90

31.95

69.15

79.76

56.30

75

ACC

NMI

ARI

94.97

96.93

92.69

87.08

91.23

70.93

67.64

82.11

53.34

84.26

91.39

75.85

67.69

83.69

55.32

61.49

78.17

38.81

69.49

81.55

59.13

100

ACC

NMI

ARI

90.00

93.74

83.06

83.27

90.98

73.02

68.15

82.87

52.52

79.92

90.01

71.06

67.54

82.77

52.29

60.54

79.89

43.09

67.23

82.34

57.25

Table 5. Table 5.5: Performance on the COIL-20 data set with different number of clusters. The best accuracy is indicated in bold.

Metric

LG-SSC

MG-SSC

SSC

LRR

EDSC

S³C

LRSC

ACC

NMI

ARI

89.58

95.34

85.53

78.26

87.92

72.30

78.68

90.39

74.89

54.86

70.03

42.19

84.51

93.52

81.54

74.86

88.28

66.88

64.09

72.29

52.22

Table 6. Table 5.6: Accuracy of LG-SSC with respect to different values for levels ( s 𝑠 s ) and number of blocks in each level ( p 𝑝 p )

	$2 \times 2$			$3 \times 3$			$4 \times 4$
	s=2	s=3	s=4	s=2	s=3	s=4	s=2	s=3	s=4
AR (10)	60.38	100	99.61	86.54	100	99.62	100	100	22.31
YALE B (10)	100	100	99.84	99.84	100	99.68	99.68	100	18.13
COIL-20	89.44	77.84	-	77.22	74.16	-	74.37	72.5	-

Equations54

C \in R^{N \times N}, E \in R^{D \times N} min

C \in R^{N \times N}, E \in R^{D \times N} min

X = X C + E and C_{i, i} = 0 for all i,

V_{j}^{(i - 1)} = ar g V \in R^{N \times n} min k = 1 \sum p t r (V^{T} L_{ℓ_{j}^{k}}^{i} V) - α k = 1 \sum p t r (V V^{T} U_{ℓ_{j}^{k}}^{i} U_{ℓ_{j}^{k}}^{i}^{T}) such that V^{T} V = I,

V_{j}^{(i - 1)} = ar g V \in R^{N \times n} min k = 1 \sum p t r (V^{T} L_{ℓ_{j}^{k}}^{i} V) - α k = 1 \sum p t r (V V^{T} U_{ℓ_{j}^{k}}^{i} U_{ℓ_{j}^{k}}^{i}^{T}) such that V^{T} V = I,

(L_{j}^{(i)})_{s u mma r y} = k = 1 \sum p L_{ℓ_{j}^{k}}^{i} - α k = 1 \sum p U_{ℓ_{j}^{k}}^{i} U_{ℓ_{j}^{k}}^{i}^{T} .

(L_{j}^{(i)})_{s u mma r y} = k = 1 \sum p L_{ℓ_{j}^{k}}^{i} - α k = 1 \sum p U_{ℓ_{j}^{k}}^{i} U_{ℓ_{j}^{k}}^{i}^{T} .

K_{1}^{1} = (V_{1}^{1}) (V_{1}^{1})^{T},

K_{1}^{1} = (V_{1}^{1}) (V_{1}^{1})^{T},

Θ_{1}^{1} = (1 - K_{1}^{1}),

Θ_{1}^{1} = (1 - K_{1}^{1}),

C_{1}^{1} = ar g

C_{1}^{1} = ar g

such that d ia g (C) = 0,

∣∣ C ∣ ∣_{1} + λ_{1} ∣∣ Θ_{1}^{1} ⊙ C ∣ ∣_{1} = i, j \sum ∣ C_{ij} ∣ (1 + λ_{1} (Θ_{1}^{1})_{ij})

∣∣ C ∣ ∣_{1} + λ_{1} ∣∣ Θ_{1}^{1} ⊙ C ∣ ∣_{1} = i, j \sum ∣ C_{ij} ∣ (1 + λ_{1} (Θ_{1}^{1})_{ij})

C_{1}^{1} = ar g C \in R^{N \times N} min ∣∣ C ∣ ∣_{1} + λ_{1} ∣∣ Θ_{1}^{1} ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{μ}{2} ∣∣ X - X C ∣ ∣_{F}^{2} such that d ia g (C) = 0,

C_{1}^{1} = ar g C \in R^{N \times N} min ∣∣ C ∣ ∣_{1} + λ_{1} ∣∣ Θ_{1}^{1} ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{μ}{2} ∣∣ X - X C ∣ ∣_{F}^{2} such that d ia g (C) = 0,

C \in R^{N \times N}, Z \in R^{N \times N} min ∣∣ C ∣ ∣_{1} + λ_{1} ∣∣Θ ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{μ}{2} ∣∣ X - X Z ∣ ∣_{F}^{2} such that Z = C - d ia g (C) .

C \in R^{N \times N}, Z \in R^{N \times N} min ∣∣ C ∣ ∣_{1} + λ_{1} ∣∣Θ ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{μ}{2} ∣∣ X - X Z ∣ ∣_{F}^{2} such that Z = C - d ia g (C) .

L (C, Z, Δ) = ∣∣ C ∣ ∣_{1} + λ_{1} ∣∣Θ ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{μ}{2} ∣∣ X - X Z ∣ ∣_{F}^{2} + \frac{β}{2} ∣∣ Z - (C - d ia g (C)) ∣ ∣_{F}^{2} + t r (Δ^{T} (Z - C + d ia g (C))),

L (C, Z, Δ) = ∣∣ C ∣ ∣_{1} + λ_{1} ∣∣Θ ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{μ}{2} ∣∣ X - X Z ∣ ∣_{F}^{2} + \frac{β}{2} ∣∣ Z - (C - d ia g (C)) ∣ ∣_{F}^{2} + t r (Δ^{T} (Z - C + d ia g (C))),

(μ X^{T} X + β I) Z^{(k + 1)} = μ X^{T} X + β C^{(k)} - Δ^{(k)} .

(μ X^{T} X + β I) Z^{(k + 1)} = μ X^{T} X + β C^{(k)} - Δ^{(k)} .

C min ∣∣ W ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{β}{2} ∣∣ Z - C ∣ ∣_{F}^{2} + t r (Δ^{T} (Z - C)),

C min ∣∣ W ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{β}{2} ∣∣ Z - C ∣ ∣_{F}^{2} + t r (Δ^{T} (Z - C)),

C min ∣∣ W ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{β}{2} ∣∣ C - (Z + \frac{Δ}{β}) ∣ ∣_{F}^{2} .

C min ∣∣ W ⊙ C ∣ ∣_{1} + λ_{2} j = 1 \sum N g \in G \sum ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{β}{2} ∣∣ C - (Z + \frac{Δ}{β}) ∣ ∣_{F}^{2} .

(C_{: j})_{g} min ∣∣ (W_{: j})_{g} ⊙ (C_{: j})_{g} ∣ ∣_{1} + λ_{2} ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{β}{2} ∣∣ (C_{: j})_{g} - ((Z_{: j})_{g} + \frac{( Δ _{: j} ) _{g}}{β}) ∣ ∣_{F}^{2} .

(C_{: j})_{g} min ∣∣ (W_{: j})_{g} ⊙ (C_{: j})_{g} ∣ ∣_{1} + λ_{2} ∣∣ (C_{: j})_{g} ∣ ∣_{2} + \frac{β}{2} ∣∣ (C_{: j})_{g} - ((Z_{: j})_{g} + \frac{( Δ _{: j} ) _{g}}{β}) ∣ ∣_{F}^{2} .

(W_{: j})_{g} ⊙ (S_{: j})_{g} + λ_{2} \frac{( C _{: j} ) _{g}}{∣∣ ( C _{: j} ) _{g} ∣ ∣ _{2}} + β (C_{: j})_{g} - (Z_{: j})_{g} + \frac{( Δ _{: j} ) _{g}}{β})),

(W_{: j})_{g} ⊙ (S_{: j})_{g} + λ_{2} \frac{( C _{: j} ) _{g}}{∣∣ ( C _{: j} ) _{g} ∣ ∣ _{2}} + β (C_{: j})_{g} - (Z_{: j})_{g} + \frac{( Δ _{: j} ) _{g}}{β})),

[(S_{: j})_{g}]_{k} = ⎩ ⎨ ⎧ 1 [- 1, 1] - 1 [(C_{: j})_{g}]_{k} < 0 [(C_{: j})_{g}]_{k} = 0 [(C_{: j})_{g}]_{k} > 0

[(S_{: j})_{g}]_{k} = ⎩ ⎨ ⎧ 1 [- 1, 1] - 1 [(C_{: j})_{g}]_{k} < 0 [(C_{: j})_{g}]_{k} = 0 [(C_{: j})_{g}]_{k} > 0

(C_{: j})_{g} + \frac{λ _{2}}{β} \frac{( C _{: j} ) _{g}}{∣∣ ( C _{: j} ) _{g} ∣ ∣ _{2}} (Z_{: j})_{g} + = \frac{( Δ _{: j} ) _{g}}{β} - \frac{( W _{: j} ) _{g} ⊙ ( S _{: j} ) _{g}}{β} .

(C_{: j})_{g} + \frac{λ _{2}}{β} \frac{( C _{: j} ) _{g}}{∣∣ ( C _{: j} ) _{g} ∣ ∣ _{2}} (Z_{: j})_{g} + = \frac{( Δ _{: j} ) _{g}}{β} - \frac{( W _{: j} ) _{g} ⊙ ( S _{: j} ) _{g}}{β} .

(J_{: j})_{g} C^{(k + 1)} = T_{b} ((Z_{: j})_{g} + \frac{( Δ _{: j} ) _{g}}{β} - \frac{( W _{: j} ) _{g} ⊙ ( S _{: j} ) _{g}}{β}, \frac{λ _{2}}{β}), for j \in [1, ..., N] & g \in G = J - d ia g (J)

(J_{: j})_{g} C^{(k + 1)} = T_{b} ((Z_{: j})_{g} + \frac{( Δ _{: j} ) _{g}}{β} - \frac{( W _{: j} ) _{g} ⊙ ( S _{: j} ) _{g}}{β}, \frac{λ _{2}}{β}), for j \in [1, ..., N] & g \in G = J - d ia g (J)

T_{b} (x, ρ) = \frac{x}{∣∣ x ∣ ∣ _{2}} ma x {0, ∣∣ x ∣ ∣_{2} - ρ} .

T_{b} (x, ρ) = \frac{x}{∣∣ x ∣ ∣ _{2}} ma x {0, ∣∣ x ∣ ∣_{2} - ρ} .

(J_{: j})_{g} C^{(k + 1)} = T_{b} (T ((Z_{: j})_{g} + \frac{( Δ _{: j} ) _{g}}{β}, \frac{( W _{: j} ) _{g}}{β}), \frac{λ _{2}}{β}), for j \in [1, ..., N] & g \in G = J - d ia g (J) .

(J_{: j})_{g} C^{(k + 1)} = T_{b} (T ((Z_{: j})_{g} + \frac{( Δ _{: j} ) _{g}}{β}, \frac{( W _{: j} ) _{g}}{β}), \frac{λ _{2}}{β}), for j \in [1, ..., N] & g \in G = J - d ia g (J) .

T (x, ρ) = (∣ x ∣ - ρ)_{+} s i g n (x)

T (x, ρ) = (∣ x ∣ - ρ)_{+} s i g n (x)

Δ^{(k + 1)} = Δ^{(k)} + β (Z^{(k + 1)} - C^{(k + 1)})

Δ^{(k + 1)} = Δ^{(k)} + β (Z^{(k + 1)} - C^{(k + 1)})

(μ X^{T} X + β I) Z^{(k + 1)} = μ X^{T} X + β C^{(k)} - Δ^{(k)}

(μ X^{T} X + β I) Z^{(k + 1)} = μ X^{T} X + β C^{(k)} - Δ^{(k)}

(J_{: j})_{g} C^{(k + 1)} = T_{b} (T ((Z_{: j}^{(k + 1)})_{g} + \frac{( Δ _{: j}^{(k)} ) _{g}}{β}, \frac{( W _{: j} ) _{g}}{β}), \frac{λ _{2}}{β}), for j \in [1, ..., N] & g \in G = J - d ia g (J) .

(J_{: j})_{g} C^{(k + 1)} = T_{b} (T ((Z_{: j}^{(k + 1)})_{g} + \frac{( Δ _{: j}^{(k)} ) _{g}}{β}, \frac{( W _{: j} ) _{g}}{β}), \frac{λ _{2}}{β}), for j \in [1, ..., N] & g \in G = J - d ia g (J) .

Δ^{(k + 1)} = Δ^{(k)} + β (Z^{(k + 1)} - C^{(k + 1)})

Δ^{(k + 1)} = Δ^{(k)} + β (Z^{(k + 1)} - C^{(k + 1)})

ACC (%) = \frac{# of correctly classified points}{total # of points} \times 100.

ACC (%) = \frac{# of correctly classified points}{total # of points} \times 100.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Neither Global Nor Local: A Hierarchical Robust Subspace Clustering For Image Data

Maryam Abdolali Department of Computer Engineering and Information Technology, Amirkabir University of Technology, Tehran, Iran. Email addresses: [email protected] (Maryam Abdolali), [email protected] (Mohammad Rahmati)

Mohammad Rahmati∗

Abstract

In this paper, we consider the problem of subspace clustering in presence of contiguous noise, occlusion and disguise. We argue that self-expressive representation of data in current state-of-the-art approaches is severely sensitive to occlusions and complex real-world noises. To alleviate this problem, we propose a hierarchical framework that brings robustness of local patches-based representations and discriminant property of global representations together. This approach consists of 1) a top-down stage, in which the input data is subject to repeated division to smaller patches and 2) a bottom-up stage, in which the low rank embedding of local patches in field of view of a corresponding patch in upper level are merged on a Grassmann manifold. This summarized information provides two key information for the corresponding patch on the upper level: cannot-links and recommended-links. This information is employed for computing a self-expressive representation of each patch at upper levels using a weighted sparse group lasso optimization problem. Numerical results on several real datasets confirm the efficiency of our approach.

Keywords: sparse subspace clustering, multi-scale, grassmann manifold, group lasso, spectral clustering, graph fusion

1 Introduction

Detecting low-dimensional structures in high-dimensional data is essential to many applications in different areas including computer vision and image processing. In many of these applications, data often have fewer degree of freedom than the original high ambient dimension. This observation led to development of several classical approaches to find the low-dimensional representation for the data [8, 14, 7]. Huge family of these approaches, including the well-known Principal Component Analysis (PCA) method [18], assumes that the data lie in a single low-dimensional subspace/manifold. However, in many applications, data points are distributed around several subspaces and should better be represented by multipe low-dimensional subspaces. Modeling data with collection of subspaces led to a more general problem that is often referred to as Subspace Clustering [39]:

Definition 1 (Subspace Clustering): Let $X={\{x_{i}\in\mathbb{R}^{D}\}}_{i=1}^{N}$ be a collection of data points from $n$ unknown subspaces $S_{1},S_{2},\cdots,S_{n}$ with intrinsic dimensions $d_{1},d_{2},\cdots,d_{n}$ (where $d_{i}<<D$ ). The goal of subspace clustering is to segment the data points according to their underlying subspaces and to identify parameters of each subspace.

This problem has received considerable attention over the past decade and many attempts have been made to address it [19, 41, 37, 15, 30, 36, 45]. Among them, a family of approaches based on recent developments in low-rank and sparse representation have achieved state-of-the-art results [10, 11, 25, 24]. These algorithms have two major key steps: 1) Constructing an affinity matrix (neighborhood graph) using self-expressiveness property of data [10, 43], which assumes that each data point can be written as linear combination of other data points, and 2) Obtaining the data clusters by applying spectral clustering on the affinity matrix. In particular, the affinity matrix in these algorithms is constructed by optimizing the following problem:

[TABLE]

where $X\in\mathbb{R}^{D\times N}$ is the data matrix (with $D$ -dimensional data points as columns and $N$ the number of data points), $C\in\mathbb{R}^{N\times N}$ is the coefficient matrix and $E\in\mathbb{R}^{D\times N}$ is the error matrix. $f(.)$ and $g(.)$ denote the regularizations on the matrices $C$ and $E$ respectively. The three main representative methods among these algorithms, namely SSC (Sparse Subspace Clustering) [11], LRR (Low Rank Representation) [24] and LSR (Least Square Regression) [28] differ in the function of $f(C)$ . In particular, $f(C)$ is $\ell_{1}$ norm for SSC which prioritizes sparse solutions, nuclear norm (sum of singular values of input matrix) for LRR which prioritizes low rank solutions and Frobinuis norm for LSR. Once the problem in (1.1) is optimized, the affinity matrix is obtained by symmetrizing the coefficient matrix $C$ via $|C|+|C^{T}|$ . The clusters are next calculated by applying spectral clustering on the affinity matrix.

Even though advances in sparse and low-rank representation literature contributed significantly to the development of elegant subspace clustering algorithms, these algorithms are based on global self-expressive representation of data in which the entire data point is expressed as linear combination of other data points. However, clustering based on global data representation can be easily affected by occlusions and severely corrupted noisy blocks [3] and the conventional $\ell_{1}$ or Forbinuis norms which are often utilized to regularize the error matrix are incapable to model the complexity of real-world data corruptions [29]. On the other hand, block-wise (local) representations tend to be more robust to occlusions and contiguous noises but combining the clustering results of the local patches by a trivial majority voting scheme might drastically be affected by non-informative patches.

In order to overcome this major shortcomming, we propose an efficient hierarchical framework that combines the advantages of local representation with global self-expressive representation of data. Specifically, in our approach, each data point is divided into a collection of blocks/patches and the local connectivities of these patches are summarized using the corresponding low-dimensional embeddings on a Grassmann manifold [38]. This local connectivity information is then used to guide the global self expressive representation by providing prior knowledge on 1) which connections should be avoided using weighted sparse regularization and 2) which connections are recommended by the local representations using group sparse regularization [12].

The main contribution of this paper is to propose a novel framework, dubbed as LG-SSC (Locally-Guided SSC), to bridge the gap between discriminative global representation and robust local alternative in order to achieve a robust discriminant representation for subspace clustering. The proposed framework includes of two key steps: 1) combining diverse characteristics of local patches using a hierarchical low-dimensional analysis on Grassmann manifolds and 2) computing global sparse self-expressive representation of data using group lasso and weighted sparse regularizations. As we will show, our approach is more robust to occlusions, illumination effects and continuous block noises.

This paper is an extension of the work originally presented in [2]. Inspired by the significant increase in clustering accuracy using the self-expressive representation of local patches in our preliminary work, we further utilized this information in a different hierarchical approach. In particular, in this paper, the local information fusion in each level is fed back to guide the calculation of coefficient matrix in upper level in order to achieve a more robust representation. This approach which can be considered as coefficient calculation-level fusion exhibits stronger robustness and clustering accuracy compared to the previous subspace estimation-level fusion.

The rest of the paper is organized as follows: In Section 2, the related works are briefly reviewed. After presenting the motivation behind our proposed approach in Section 3, the detailed explanation of the two suggested major steps is elaborated in Section 4. We evaluate the performance of proposed approach in Section 5 and finally conclude our work in Section 6.

2 Related Works

Self-expressive based subspace clustering methods enjoy broad theoretical guarantees for data that is drawn perfectly from a collection of low-dimensional subspaces [34, 11] and recently these foundations have been extended to noisy cases as well [44]. An intuitive simple way for modeling errors was first proposed in [10] where self-expressiveness term was relaxed to $X=XC+E$ . The elements of error matrix $E$ were generally modeled by independent Laplacian or Gaussian distributions using $\ell_{1}$ or Frobinuis norms. Even though this has the advantage of simplicity and maintaining the convexity of the optimization problem, in practice error matrices are generated by more complicated variations and specifically the independence assumption between error elements in these models is too restrictive.

To further improve robustness, other distributions were used to model the noise, including Cauchy [23] and Mixture of Gaussian [46]. In [40] they consider the errors directly in the original space by proposing a non-convex formulation in which they assumed that the corrupted data are generated by linear combination of error matrix and clean data matrix and the self-expressiveness holds for the clean data. Even though these approaches show some improvements in dealing with practical noise, they still assume that the elements of the error matrix are independent. However, the contiguous error caused by occlusion, illumination effects and disguise does not follow this assumption.

There are several approaches that tried to improve robustness in the representation space instead of the input space. Lu et al [27] utilized the relatively new Trace Lasso norm to regularize the coefficient matrix in order to improve the grouping effect. In [33], an iterative approach was proposed in which a linear transformation was learned such that low-rank structures for data from the same subspaces were restored and a maximally separated structure for data from different subspaces was obtained. In [22], a unified optimization framework for learning both the coefficient matrix and the segmentation was presented. In this iterative approach, at each iteration, a segmentation matrix was constructed to help re-weighting the representation matrix in order to avoid certain connections. Lu at al [26] proposed a block diagonal induced regularizer to explicitly enforce the Laplacian matrix corresponding to the coefficient matrix to be block diagonal. This enforces the neighborhood graph to contain exactly $n$ connected components and might reduce the number of wrong connections.

Although the above approaches could enhance the robustness in some cases, an important shortcoming is that they are based on global representation and hence can be easily affected by severely corrupted regions in data points. Patch-based image representations were often used to increase robustness to continuous noises in sparse representation literature for classification [47, 35, 20]. Multi-scale patch based regularization using nuclear norm for modeling the error has shown effectiveness for face classification [29] and single low-rank subspace estimation [1]. In our previous work, MG-SSC (Multi-Graph based SSC) [2], a multi-layer graph was constructed by dividing each data into patches in different sizes and computing the sparse affinity matrix corresponding to each collection of patches using SSC. A summarized low-dimensional representation of this multi-layer graph is computed using a weighted subspace analysis of individual graphs on a Grassmann manifold. In this paper, the advantages of global and local (patch-based) representations are brought together for the problem of subspace clustering in a novel different framework. In the following sections, the details of the proposed approach are discussed thoroughly.

3 The importance of local representations in robust subspace clustering

In many practical subspace clustering cases, the data could be partially occluded or corrupted. Even in these cases, detecting the low-dimensional structures may be still possible using the redundancy that is often present in high-dimensional data. However, corrupted samples might affect the neighborhood graph severely such that they lead to a completely wrong understanding of data structure.

We illustrate this concept with a real world example. We consider facial images of three subjects from AR database [31] (more on this database in the experiment section). Few samples from the selected images are shown in Figure 3.1. As it can be seen several varieties of corruptions are present in these images, including disguise using sun glasses, scarves and illumination variations. It is usually assumed that facial images of a subject under different lighting conditions can be approximated by a nine dimensional linear subspace [4] and hence, a collection of facial images of several subjects lie on a union of nine dimensional subspaces [10].

We select three subsets $\{X^{(i)}\}_{i=1}^{3}$ from these images: $X^{(1)}$ which includes images with neither illumination effect nor disguise, $X^{(2)}$ which includes images with no disguise but under different illumination effects and $X^{(3)}$ which includes the whole images under both variations. The SSC algorithm is applied on these subsets of images separately. The normalized spectral embedding of the three corresponding coefficient matrices $\{C^{(i)}\}_{i=1}^{3}$ are plotted in Figure 3.2. In particular, for each coefficient matrix $C^{(i)}$ , the corresponding Laplacian matrix $L^{(i)}$ is calculated as $L^{(i)}=I-{D^{(i)}}^{-\frac{1}{2}}C^{(i)}{D^{(i)}}^{-\frac{1}{2}}$ where $D^{(i)}$ is the diagonal matrix with its $(k,k)$ th entry defined as $D^{(i)}_{kk}=\sum_{j}C^{(i)}_{kj}$ . The normalized eigenvectors corresponding to the second and third smallest eigenvalues of the Laplacian matrix provide a 2-dimensional representation for the data affinity. The representations for three individuals are plotted in three different colors (red, blue and black). By comparing the three embedded data in Figure 3.2 (a) - (c), it can be seen that illumination and especially occlusion can affect the global self-expressive coefficients. Especially the occlusion (sun glasses and scarves) makes the detection of the three subspaces impossible.

In contrast to global representations, patch-based representations can be more robust to severely corrupted regions. To illustrate this, we divide each facial image in $X^{(3)}$ (which contains all images of the three selected individuals) to 4 patches and plot the spectral embedding of the 4 patches right next to the corresponding patch in Figure 3.3.111The spectrum of each patch is obtained using the proposed algorithm by dividing it into 4 more patches. We explain this in the next section. As it can be noticed, each local representation provides some information on the connectivities within data points, however, in almost none of them, the 3 clusters are perfectly separated. Moreover, with no prior knowledge we cannot identify the best representation among these 4 representations. Our proposed framework, based on multi-layer graph fusion, obtains a (spectral kernel) summary representation from these local representations (which is shown in Figure 3.3 (b) for complicity). This summary representation is a $N\times N$ matrix with values between -1 and 1, whose entries quantify the possibility of two corresponding data points belonging to the same subspace according to the local representations. The values which are closer to 1 indicate the higher possibility. This information is used to improve the quality of global representation (details are presented in the next section). The final global low-dimensional embedding corresponding to our proposed LG-SSC is plotted in Figure 3.3 (c) and it can be already seen that not only the three clusters are perfectly separated but also all the samples within each cluster have the exact same embedding in this case. This suggests that our proposed framework can improve within clusters connectivities along with overall clustering accuracy.

4 Proposed Framework: Locally-Guided SSC

The proposed hierarchical framework, dubbed as LG-SSC, consists of a top-down stage and a bottom-up stage. LG-SSC has two major ingredients in each level of this hierarchy: 1) Dividing the data into local patches (in top-down stage) and summarizing the connectivity information between them (in bottom-up stage) and 2) guiding the detection of low-dimensional structures using this information (in bottom-up stage). In this section, details of these two key parts are presented.

4.1 Division and local information summarization

Given the data matrix $X=[x_{1},x_{2},...,x_{N}]$ , we construct a hierarchical structure composed of $s$ levels. At the top of this structure is the given data matrix. In the next level, the image data is divided into $p$ patches (typically 4 or 9). The division is carried out for each image data $x_{i}$ in its (2d) matrix form. This process is repeated for each patch in the next level. The created $p$ patches indicate the field of view of the corresponding patch from upper level. This procedure is shown in Figure 4.1 for $s=3$ and $p=4$ for one image data. Let the whole gallery patches at level $i$ be represented by $X^{(i)}=\{X^{(i)}_{1},...,X^{(i)}_{T}\}$ where $T=p^{(i-1)}$ is the number of patches that are generated in the level of $i$ and $X^{(i)}_{j}$ ( $j=1,...,T$ ) is the collection of local patches (at the $i$ th level) with same coordination (position) from all images. Clearly the patches in the lower levels are smaller in size and generally contain less discriminative information about the whole image data. However, they are more robust to corruptions.

Suppose that in the $i$ th level, the collection of patches $X^{(i)}_{j}$ (where $j\in\{1,...,p^{i-1}\}$ ) are further divided to $p$ patches. Hence, at the $(i+1)$ th level, $p$ coefficient matrices can be obtained corresponding to the set of $p$ collected patches. We denote these patches and their corresponding coefficient representations by $\{X^{(i+1)}_{\ell_{j}^{k}}\}_{k=1}^{p}=\{X^{(i+1)}_{\ell_{j}^{1}},...,X^{(i+1)}_{\ell_{j}^{p}}\}$ and $\{C^{(i+1)}_{\ell_{j}^{k}}\}_{k=1}^{p}=\{C^{(i+1)}_{\ell_{j}^{1}},...,C^{(i+1)}_{\ell_{j}^{p}}\}$ respectively. $\ell_{j}^{k}$ ( $k=1,...,p$ ) indicates the indices of collection of patches that are generated by dividing the $j$ th patch in the upper level. These patches are in the field of view of the collection of patches in $X^{(i)}_{j}$ . The set $\{C^{(i+1)}_{\ell_{j}^{k}}\}_{k=1}^{p}$ contains $p$ different types of relationships between the data points. With no prior knowledge, it is highly non-trivial to choose the best representation among them. In fact, different coefficient matrices can be interpreted as different views from the larger patches at the finer level where each coefficient matrix corresponds to an affinity matrix of a graph. Hence, in order to combine the information of the different coefficient matrices, we use a multi-layer graph fusion approach [9] by transforming the information of each representation into a subspace on the Grassmann manifold.

The utilized graph fusion approach [9] is briefly presented as follows: Given $p$ different graph affinity matrices with common vertex set representing the data points, the goal is to merge the information provided by different individual graphs. To summarize the intrinsic relationships between the data points (the vertices of the graph), the problem can be mapped to the problem of finding a low-dimensional representation such that it preserves information of each affinity matrix in a meaningful way.

As mentioned in Section 2, for each graph affinity matrix ( $A^{i}_{\ell_{j}^{k}}$ ), there is a corresponding normalized spectral embedding ( $U^{i}_{\ell_{j}^{k}}$ ) which can be considered as the low-dimensional representation of the affinity matrix. This spectral embedding can be obtained by simply calculating the $n$ eigenvectors corresponding to $n$ smallest eigenvalues of the Laplacian matrix (from the affinity matrix). Using the collection of subspace representations of the $p$ available affinity matrices, one can naturally define a summary representation of multiple affinity matrices by optimizing the following problem [9]:

[TABLE]

where $\alpha$ is the regularization parameter. The first term in the above optimization problem, attempts to find a summary representation $V$ from the collection of subspace representations $\{U^{i}_{\ell_{j}^{k}}\}^{p}_{k=1}$ such that the connectivity information of each individual affinity matrix is preserved. The second term is the sum of squared projection distances between $V$ and individual subspaces $\{U^{i}_{\ell_{j}^{k}}\}^{p}_{k=1}$ on a Grassmann manifold. This term enforces the summary representation to be close to other subspace representations on the Grassmann manifold. This optimization problem has a closed form solution based on Rayleigh-Ritz theorem [43]. The summary representation $V$ can be obtained by calculating the smallest $n$ eigenvectors of the following matrix:

[TABLE]

Each row of the matrix $V^{(i-1)}_{j}$ is normalized to have unit $\ell_{2}$ norm. As we shall see, the subspace representation of the $p$ local patches in the $i$ th level can be used to calculate the coefficient matrix of the corresponding pacth in the higher level ( $i-1$ ) more efficiently.

4.2 From local summarization to global representations

Following dividing the image data into local patches in different scales, and in order to obtain the global coefficient matrix, a bottom-up process should be carried out. For a clear explanation, suppose we have only two levels ( $s=2$ ) and each data point (an image) is divided into 4 patches ( $p=4$ ). The overview of LG-SSC for $p=4$ and $s=2$ is illustrated in Figure 4.2. In particular, we start from the coarsest level (second level here) and the regular SSC algorithm is applied on the collection of patches in this level. Hence, 4 sets of affinity matrices $\{C^{2}_{\ell_{1}^{k}}\}^{4}_{k=1}$ are obtained.

Using the methodology in previous subsection, a summary representation ( $V^{1}_{1}\in\mathbb{R}^{N\times N}$ ) from the 4 local representations can be calculated efficiently. We define the spectral kernel of the summary representation in the embedded space as:

[TABLE]

The matrix $K^{1}_{1}\in[-1:1]^{N\times N}$ provides information on the similarity between samples in the embedded space. The entries with low values indicate the connectivites which local representations highly agree that should be avoided. Therefore, we can indicate the cannot-links by adding an extra penalty on coefficient matrix when the values of $K^{1}_{1}$ are low. We define the matrix $\Theta^{1}_{1}\in\mathbb{R}^{N\times N}$ as:

[TABLE]

The entries of the matrix $\Theta^{1}_{1}$ have high values when the corresponding entries in $K^{1}_{1}$ have low values. Using this matrix, we obtain the coefficient matrix $C^{1}_{1}$ in the fine level (global representation in this example) by optimizing the following problem:

[TABLE]

where $\lambda_{1}$ and $\mu$ are regularization parameters and $\odot$ denotes the element-wise (Hadamard) production. This is in fact a weighted lasso optimization problem, where we are adding an extra penalty on the entries of the matrix $C$ when the corresponding entries in $\Theta_{1}^{1}$ are high. This can be validated by rewriting the first two terms as:

[TABLE]

where $(\cdot)_{ij}$ denotes the (i,j)the entry of the input matrix.

Remark 1.

The elements of the matrix $K_{j}^{i}$ (here $i=j=1$ ) can be interpreted as a quantified measure of the possibility that the two corresponding samples belonging to the same subspace according to the obtained summary representation. In particular, the closer the values of these elements are to one, the higher is the possibility of the two samples belonging to the same subspace. Hence, we can drop the additional penalty on entries with high values, specifically the ones higher than a predefined threshold. In practice, this threshold can be set to any value in the range $[0.7-0.9]$ as the proposed framework is not sensitive to this value.

However, the summary representation $V^{1}_{1}$ not only indicates the ”cannot-links” constraints but also contains information about which connections are recommended by local views as well. By applying k-means algorithm on the normalized rows of $V^{1}_{1}$ , an initial grouping of data points based on the summary of local representation is achieved. This grouping information denotes the connectivites that the local representations recommend. Let $G$ be the set that contains the indices of data points in each cluster according to $V^{1}_{1}$ . This information is added to the problem (4.7) as following:

[TABLE]

where the third term is the group lasso regularization [12]. Here $(\cdot)_{:j}$ denotes the $j$ th column of the input matrix and $(\cdot)_{g}$ indicates the input matrix’s rows within the set $g$ . Group lasso [12] is the generalization of standard lasso in which the entries of coefficient matrix corresponding to the indicated groups are either included or excluded from the regression model together. Using this regularization, we encourage the samples to get connected to the samples within the recommended groups (improving connectivity) and simultaneously to get disconnected from the groups that are recommended to be avoided.

This procedure can be easily extended to cases with higher number of levels ( $s>2$ ). In general, at the bottom level of this hierarchical structure, the typical SSC is applied on the set of patches $\{X_{j}^{s}\}^{p^{s-1}}_{j=1}$ . In the upper levels, the explained method (with two key steps) is applied on the gallery of patches individually and this procedure is continued until the root level which outputs the final global coefficient representation and the corresponding segmentation result.

The proposed framework not only computes the segmentation based on the global discriminant representation of the data but also takes advantage of robust local knowledge in each level of the hierarchy, leading to a robust discriminant representation.

4.3 Optimization

In this section, we introduce the optimization of the proposed convex problem in (4.4) using Alternating Direction Method of Multipliers (ADMM) [5]. First, we introduce auxiliary matrix $Z\in\mathbb{R}^{N\times N}$ and consider the following problem (The superscripts and subscripts in the optimization problem are dropped for ease of reading):

[TABLE]

Next, using penalty parameter $\beta>0$ and matrix $\Delta\in\mathbb{R}^{N\times N}$ of Lagrange multipliers for the equality constraint, we get the following problem:

[TABLE]

where $tr(.)$ is the trace operator. Based on ADMM, this problem is optimized using the following iterative procedure:

With an abuse of notation, let $(C^{(k)},Z^{(k)})$ be the optimization variables at iteration $k$ and $\Delta^{(k)}$ be the Lagrangian multiplier at the same iteration:

•

Find $Z^{(k+1)}$ , by minimizing $\mathcal{L}$ with respect to $Z$ , while the rest of the variables are fixed by setting the derivative of $\mathcal{L}$ with respect to $Z$ to zero:

[TABLE]

The matrix $Z^{(k+1)}$ is obtained by solving the above system of linear equations using approaches such as conjugate gradient method.

•

Find $C^{(k+1)}$ , by minimizing $\mathcal{L}$ with respect to $C$ , while the other variables are fixed. Note that we can rewrite this as:

[TABLE]

where $(i,j)$ th entry of W is defined by: $W_{(ij)}=1+\lambda_{1}\Theta_{ij}$ and we can simplify it more as:

[TABLE]

The constraint of $diag(C)=0$ can be considered by projecting the solution to the above problem on this constraint. This problem is group-separable, so we consider:

[TABLE]

By taking the (sub)gradient of the above problem, we have:

[TABLE]

where S is a matrix which is defined as following:

[TABLE]

$[{(S_{:j})_{g}}]_{k}$ denotes $k$ -th element of the vector $(S_{:j})_{g}$ . By setting (4.6) to zero:

[TABLE]

Hence the matrix C can be updated as:

[TABLE]

where:

[TABLE]

Using this definition, (4.7) can be further summarized as:

[TABLE]

Here $\mathcal{T}$ is the shrinkage-thresholding operator and is defined as:

[TABLE]

•

Find $\Delta^{(k+1)}$ by gradient ascend:

[TABLE]

These steps are repeated until convergence is met. Note that the problem in (4.5) is convex and consists of two separable blocks of variables, hence, the solution obtained by ADMM is guaranteed to be optimal. Algorithm 1 summarizes the updates for the ADMM implementation.

5 Experiments

In this section, we demonstrate the effectiveness of LG-SSC in presence of illumination effects, occlusion and disguise using 3 publicly available databases for face and object clustering: Extended Yale B [13, 21], AR database [31] and Coil-20 [32]. The algorithm and all the experiments are implemented in Matlab and run on a computer with Intel Core i7-3770 CPU, 3.40 GHZ, 16 GB RAM.

Evaluation metrics: To evaluate the quality of clustering, we use three well-known metrics, namely Accuracy (ACC), Normalized Mutual Information (NMI) [42] and Adjusted Rand Index (ARI) [16]. The accuracy of the clustering algorithm is calculated by following formula:

[TABLE]

In NMI, the mutual information between segmentation result and ground-truth clusters is calculated and is then scaled between 0 and 1. ARI computes the Rand Index score and corrects it for chance. We multiply the values of NMI and ARI by 100 to have an easier comparison with ACC.

Compared algorithms: The performance of proposed LG-SSC algorithm is compared with 6 subspace clustering algorithms: SSC [11], LRR [24], EDSC [17], $S^{3}C$ [22], LRSC [40] and MG-SSC [2] (our preliminary work). The parameters of each algorithm are tuned for the best result and are reported in Table 5.1 for each database.

5.1 Extended Yale B face data set

The Extended Yale B database [13, 21] contains 2,414 frontal-face images of 38 humans. There are 64 images, each of the size $192\times 168$ pixels, per individual. The face images were captured under various lighting conditions. Similar to [11], the images were downsampled to $48\times 42$ pixels. For LG-SSC, we set $p=4$ and $s=2$ . In order to study the effect of the number of clusters on the clustering performance, we implement 2 different settings of experiments: 1) We follow the setting utilized in [11], which has been implicitly specified as the general setting for reporting the performance of subspace clustering algorithms on this database over the past years. In particular, for $n\in\{2,3,5,8,10\}$ clusters, the images of 38 subjects are divided into 4 groups of [1-10], [11-20], [21-30] and [31-38]. For $n\in\{2,3,5,8\}$ clusters, all choices of possible different trials for each group is considered and for $n=10$ subjects, only the first three groups are considered. The subspace clustering algorithms are applied on the corresponding subsets of images and the average ACC, NMI and ARI values over these subsets are reported in Table 5.2. The numbers indicated with * are taken from the corresponding papers. 2) For $n\in\{15,20,30,38\}$ , we simply select the first $n$ images of the database and apply the subspace clustering algorithms. The values of three metrics ACC, NMI and ARI for each subspace clustering algorithm is reported in Table 5.3.

We observe that:

•

LG-SSC and MG-SSC significantly outperform other approaches in all cases. Specifically, for $n\geq 15$ , the accuracy of LG-SSC is more than 20% higher than SSC which is the basic foundation of this approach. The results indicate the efficiency of hierarchical structure of LG-SSC in dealing with sever illumination effects.

•

MG-SSC slightly performs better than LG-SSC when the number of clusters is low. However, by increasing the number of clusters to 20, LG-SSC outperforms MG-SSC. This confirms that by gradually feeding summarized information of local patches in a hierarchical framework, the robustness can increase, especially in more challenging cases.

•

The performance of LG-SSC is quite stable with respect to the number of clusters.

•

The performance of SSC, LRR, $S^{3}C$ and LRSC decreases significantly as the number of clusters increases.

•

Sparse-based approaches, in particular SSC and $S^{3}C$ , perform generally better compared to low-rank based approaches of LRR and LRSC.

•

EDSC benefits from a specific post-processing step which tends to be different from the other 6 approaches and this post-processing of affinity matrix plays a major role for the accuracy of clustering. We noted that without this post-processing step, the quality of the obtained coefficient matrix of EDSC is similar to LRR and LRSC. This makes sense as EDSC reguralizes the coefficient matrix using Frobinius norm which exhibits similar characteristics as nuclear norm in subspace clustering.

5.2 AR face data set

The AR database [31] contains frontal face images for 100 individuals (50 men and 50 females). There are 26 colored pictures collected for each person. The images include facial variations such as illumination changes, different expressions and facial disguises using sunglasses and scarves. Compared to Extended Yale B, this database is more challenging because of occlusions and fewer number of images per individual. We downsampled each image to $48\times 42$ pixels and converted them to gray scale. We set $p=4$ and $s=3$ . The performance of LG-SSC with respect to different number of clusters, is compared with MG-SSC, SSC, LRR, LRSC, EDSC and $S^{3}C$ in table 5.4. We observe that:

•

Performance of almost all approaches (except LRR, MG-SSC and LG-SSC) degraded compared to Extended Yale B database. This result is expected as AR database is a more challenging database and with higher number of clusters.

•

LG-SSC has a better performance compared to other approaches in all cases by a large margin. The robustness of LG-SSC is clearly evident for this database and the patch-based representations are elegantly guiding the global self-expressive representation to a more robust clustering segmentation.

•

LG-SSC consistantly performs better than MG-SSC which further highlights the efficiency of LG-SSC in combining the information of local patches with calculation of robust global self-expressive representation.

•

The occlusions are degrading the performance of SSC, EDSC, $S^{3}C$ and LRSC even in the simplest case of 5 clusters. This shows the sensitivity of these approaches to the occlusions and contiguous corruptions.

•

LRSC attempts to recover a clean dictionary by optimizing a nonconvex problem, however this approach assumes that the data is contaminated by sparse error which is clearly violated in this database.

•

The post-processing step of EDSC cannot improve the performance in this case. This is due to the severely corrupted global representation that makes the correction difficult (if not impossible).

•

The third best performance is achieved by LRR. The LRR approach is considered as the extension of RPCA [6] for the union of subspaces. Dense representations of nuclear norm appears to be more suitable compared to sparse representations for the data with complex noise structures.

For better visualization and comparison, the coefficient matrices corresponding to each algorithm for the first 5 individuals are plotted in Figure 5.1. The block diagonal structure in LG-SSC’s coefficient matrix is clearly evident. Two major components for the success of a subspace clustering algorithm, namely, (i) subspace preserving connections and (ii) strong connectivity withing each subspace is present for LG-SSC. However, the coefficient matrices of other approaches are contaminated due to illumination effects and disguises which is affecting the clustering performance as well. Note that MG-SSC does not output a final coefficient matrix, hence the comparison is done with other 5 approaches.

5.3 Coil-20 data set

Columbia Object Image Library(COIL-20) [32] contains 1440 gray-scale images of 20 objects in different poses. The images of the objects were taken by placing objects on a turntable against a black background. There are 72 images per object, each of size $128\times 128$ pixels. We downsampled each image to $32\times 32$ pixels. Even though, this database is clean, with no noise or occlusions, it would be still interesting to observe the performance of LG-SSC in dealing with data from clean subspaces. We divided each image to 9 overlapping patches and set $s=2$ . This is due to the fact that for images of the objects, the patches closer to the center of image contain more meaningful information compared to other patches.The performance of our proposed LG-SSC is compared with other subspace clustering algorithms in Table 5.5. We can conclude that:

•

Even though MG-SSC fails to increase the accuracy in dealing with clean images of objects, LG-SSC is able to increase the clustering accuracy by more than 10%. This suggests that locally guided self-expressiveness might improve the quality of clustering in the challenging case of close subspaces.

•

EDSC enjoys the benefits of post-processing the coefficient matrix and has the second best performance.

•

sparse based approaches (SSC and $S^{3}C$ ) have higher performance compared to low-rank based algorithms (LRR and LRSC). In general sparse based approaches have stronger theoretical guarantees compared to low-rank based alternatives and hence, for the clean data sets, they are usually expected to perform better.

5.4 Parameter Analysis

In LG-SSC, there are three parameters that are used in the optimization problem (4.4) which controls the trade-off between four qualities: (i) sparsity, (ii) ignoring cannot-links, (iii) respecting the recommended-links and (iv) the self-expressive reconstruction error. Following the methodology in [11], we set the regularization parameter $\mu$ as $\frac{\alpha}{\max_{j\neq i}|x_{i}^{T}x_{j}|}$ where $\alpha>1$ is tuned for each dataset. In this paper, we set this parameter to the values that were commonly used and reported in subspace clustering literature.

The behavior of LG-SSC with respect to $\lambda_{1}$ and $\lambda_{2}$ is empirically validated on all three databases (AR, Extended Yale B and Coil-20). We consider the clustering performance for the first 10 subjects of AR and Yale B and 20 objects of Coil-20. The clustering accuracy with respect to different values of these parameters is illustrated for each database in Figure 5.2. It can be seen that for values of $\lambda_{1}\in[2:10]$ and $\lambda_{2}\in[0.5:2]$ , the accuracy is quite stable for all three cases. In particular, Yale B is the least sensitive one to the values of $\lambda_{1}$ and $\lambda_{2}$ and Coil-20 is the most sensitive database. For Yale B database, as long as $\lambda_{1}$ and $\lambda_{2}$ are not too small, the accuracy is almost 100%. Interestingly by setting the $\lambda_{1}\in[2:20]$ and $\lambda_{2}=0$ , the accuracy is still 100%. This suggests that for this database, the ”cannot-links” information is more important compared to the ”recommended-links” information. However, for Coil-20, the recommended-links information plays an important role in boosting the accuracy of the basic SSC (which is around 78.68%).

We also evaluate the effect of patch sizes ( $p$ ) and the number of levels ( $s$ ) on the clustering accuracy. We consider $s\in\{2,3,4\}$ and $p\in\{2,3,4\}$ . The performance of LG-SSC for different values of $s$ and $p$ for the first 10 subjects of Yale B and AR and 20 objects of Coil-20 are reported in Table 5.6. For the AR database, the clustering accuracy is not affected by the patch-size as long as $s=3$ . Because for $s=2,p=2$ and $s=2,p=3$ , the patches at the coarse level are not robust themselves and they contain occluded parts of image, hence, the robustness is not transferred to the fine scale. This can be confirmed by considering the case where $s=2$ and $p=4$ . In this case, the accuracy is 100% because the patches in the second level are small enough to contain robust discriminant information. By increasing the number of levels and patches to 4 ( $s=4$ and $p=4$ ), the accuracy decreases significantly to 22.31%. In this case, the patches at the last level are very small and hence, neither robust nor discriminant information can be fed into upper levels. For Yale B, which is relatively less challenging compared to AR, the accuracy is almost 100% in all cases except for the case with $s=4$ and $p=4$ where the patches get intuitively very small. For this database, $s=2$ is sufficient to increase robustness to illumination variations. Interestingly, the best accuracy for the Coil-20 database is achieved only for $s=2$ and $p=2$ . Note that in this case, each image is $32\times 32$ pixels and hence we do not consider $s=4$ . For object clustering, the edges play a critical role, hence the patches should be considered such that they contain enough edge information for accurate clustering.

5.5 Neither Global Nor Local

In this section, we discuss the role of multi-layer graph fusion approach and emphasize on the point that almost neither of individual local patches nor global data might lead to a robust discriminant representation. However, merging the local representations using their low-dimensional embedding on Grassmann manifold can provide a summary representation which highlights the information that majority of local representations tend to agree on. Hence, the clustering accuracy of each local patch for the three databases (all samples for each database are considered) are plotted in Figure LABEL:local. For the AR dataset, two cases of $p=4$ and $p=16$ are considered. As can be seen, none of local patches reach an accuracy higher than 65% but applying a k-means on summarized low-dimensional embedding of these 16 patches lead to an accuracy near 85% (first column from right). LG-SSC further boosts this robustness to near 90%. When $p=4$ , the same observation in Table 5.6 is repeated and not only neither of local patches have an accuracy higher than 65% but also the merged information of these 4 local patches does not boost the performance significantly. As mentioned previously, this is because none of these patches have robust representations. In Yale B, the coefficient matrix corresponding to the 4th patch has the highest clustering accuracy and LG-SSC is improving this accuracy without getting affected by other patches, eg. the 1st patch. For Coil-20 database, not only the local patches do not lead to high clustering accuracy but also the merged low-dimensional embedding does not increase the clustering accuracy significantly as well. However, LG-SSC still increases the clustering accuracy. This is due to the fact that the merged information induces the global sparse self-expressive representation and for this database, the local representations provide sufficient information to avoid miss-clustering of closely related objects.

6 Conclusion

In this paper, we uncovered the importance of local representations in improving the robustness of self-expressive based subspace clustering approaches. The proposed hierarchical approach bridges the gap between robust local representations and discriminant global alternative in order to obtain a robust discriminant self-expressive representation for the input data. This approach consists of two major key ingredients: 1) Efficiently summarizing local based representations using low-rank embedding on a Grassmann manifold to obtain cannot-links and recommended links which local patches agree on them. 2) Employing this summarized information into the optimization problem for calculating self-expressive representation in each level using weighted group lasso regularization. Robustness of proposed approach to occlusion and complex noise was confirmed by experimental results.

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Abdolali, M., Rahmati, M.: Multiscale decomposition in low-rank approximation. IEEE Signal Processing Letters 24 (7), 1015–1019 (2017)
2[2] Abdolali, M., Rahmati, M.: From local to global subspace clustering for image data. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3787–3791. IEEE (2019)
3[3] Abdolali, M., Rahmati, M.: Robust subspace clustering for image data using clean dictionary estimation and group lasso based matrix completion. Journal of Visual Communication and Image Representation (2019)
4[4] Basri, R., Jacobs, D.W.: Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis & Machine Intelligence (2), 218–233 (2003)
5[5] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning 3 (1), 1–122 (2011)
6[6] Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? Journal of the ACM (JACM) 58 (3), 11 (2011)
7[7] Carreira-Perpinán, M.A.: A review of dimension reduction techniques. Department of Computer Science. University of Sheffield. Tech. Rep. CS-96-09 9 , 1–69 (1997)
8[8] Cayton, L.: Algorithms for manifold learning. Univ. of California at San Diego Tech. Rep 12 (1-17), 1 (2005)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Neither Global Nor Local: A Hierarchical Robust Subspace Clustering For Image Data

Abstract

1 Introduction

2 Related Works

3 The importance of local representations in robust subspace clustering

4 Proposed Framework: Locally-Guided SSC

4.1 Division and local information summarization

4.2 From local summarization to global representations

Remark 1**.**

4.3 Optimization

5 Experiments

5.1 Extended Yale B face data set

5.2 AR face data set

5.3 Coil-20 data set

5.4 Parameter Analysis

5.5 Neither Global Nor Local

6 Conclusion

Remark 1.