Image Stitching by Line-guided Local Warping with Global Similarity   Constraint

Tian-Zhu Xiang; Gui-Song Xia; Xiang Bai; Liangpei Zhang

arXiv:1702.07935·cs.CV·July 17, 2018

Image Stitching by Line-guided Local Warping with Global Similarity Constraint

Tian-Zhu Xiang, Gui-Song Xia, Xiang Bai, Liangpei Zhang

PDF

TL;DR

This paper introduces a line-guided local warping technique with a global similarity constraint to improve low-textured image stitching, achieving better alignment and reduced distortions compared to existing methods.

Contribution

It proposes a novel line-guided local warping approach combined with a global similarity constraint for more accurate and less distorted image stitching.

Findings

01

Outperforms state-of-the-art image stitching methods.

02

Effectively reduces projective distortions in stitched images.

03

Provides accurate alignment in low-textured regions.

Abstract

Low-textured image stitching remains a challenging problem. It is difficult to achieve good alignment and it is easy to break image structures due to insufficient and unreliable point correspondences. Moreover, because of the viewpoint variations between multiple images, the stitched images suffer from projective distortions. To solve these problems, this paper presents a line-guided local warping method with a global similarity constraint for image stitching. Line features which serve well for geometric descriptions and scene constraints, are employed to guide image stitching accurately. On one hand, the line features are integrated into a local warping model through a designed weight function. On the other hand, line features are adopted to impose strong geometric constraints, including line correspondence and line colinearity, to improve the stitching performance through mesh…

Tables6

Table 1. Table 1: Comparison of constraints on School

Methods	APAP	LAPAP	LAPAP+CPW	Proposed
Cor	1.024	0.730	0.666	0.664
$E r r_{m g}$	2.766	2.099	1.785	1.708

Table 2. Table 2: Comparison with original CPW model on Rooftops

Methods	Cor	$E r r_{m g}^{(p)}$	$E r r_{m g}^{(l)}$	$E r r_{m g}$
Original CPW	6.831	0.825	1.187	1.043
Improved CPW	6.390	0.973	0.491	0.682
Proposed	4.903	0.967	0.492	0.681

Table 3. Table 3: Comparison of alignment on Desk

Methods	Cor	$E r r_{m g}^{(p)}$	$E r r_{m g}^{(l)}$	$E r r_{m g}$
Baseline	0.390	4.894	5.632	5.001
CPW	0.299	1.534	3.703	1.849
APAP	0.360	2.652	4.407	2.907
Proposed	0.169	1.562	0.594	1.422

Table 4. Table 4: Quantitative Evaluation on Ceiling

Methods	Cor	$E r r_{m g}^{(p)}$	$E r r_{m g}^{(l)}$	$E r r_{m g}$
Baseline	0.755	3.200	2.059	2.452
SPHP	0.631	2.876	1.989	2.292
Proposed	0.200	1.343	0.695	0.921

Table 5. Table 5: Quantitative Evaluation on Temple

Methods	Cor	$E r r_{m g}^{(p)}$	$E r r_{m g}^{(l)}$	$E r r_{m g}$
Baseline	6.240	1.899	0.954	1.430
SPHP	4.334	1.756	0.919	1.341
Proposed	1.515	0.592	0.529	0.561

Table 6. Table 6: Quantitative evaluation of local-based methods

Methods	Church				Block				Wall
Methods	Cor	$E r r_{m g}^{(p)}$	$E r r_{m g}^{(l)}$	$E r r_{m g}$	Cor	$E r r_{m g}^{(p)}$	$E r r_{m g}^{(l)}$	$E r r_{m g}$	Cor	$E r r_{m g}^{(p)}$	$E r r_{m g}^{(l)}$	$E r r_{m g}$
CPW	4.950	0.599	0.876	0.686	2.561	1.600	1.582	1.592	0.308	2.348	2.100	2.312
APAP	6.485	1.319	1.261	1.300	3.013	2.719	1.710	2.263	0.252	3.490	2.178	3.302
SPHP+APAP	4.281	1.310	1.280	1.301	2.849	2.668	1.651	2.208	0.198	3.498	2.249	3.318
Proposed	3.090	0.630	0.515	0.594	1.880	1.550	0.627	1.133	0.081	2.248	0.478	1.993

Equations40

\hat{h} = h ar g min (\sum_{i} p_{i}^{^{'}} \times H p_{i}^{2} + \sum_{j} l_{j}^{^{'}}^{T} \times H p_{j}^{0, 1}^{2}) = h ar g min (\sum_{i} ∥ A_{i} h ∥^{2} + \sum_{j} ∥ B_{j} h ∥^{2}), s . t . ∥ h ∥ = 1,

\hat{h} = h ar g min (\sum_{i} p_{i}^{^{'}} \times H p_{i}^{2} + \sum_{j} l_{j}^{^{'}}^{T} \times H p_{j}^{0, 1}^{2}) = h ar g min (\sum_{i} ∥ A_{i} h ∥^{2} + \sum_{j} ∥ B_{j} h ∥^{2}), s . t . ∥ h ∥ = 1,

\hat{h} = h ar g min ∥ Ch ∥^{2}, s . t . ∥ h ∥ = 1,

\hat{h} = h ar g min ∥ Ch ∥^{2}, s . t . ∥ h ∥ = 1,

h_{k} = h ar g min ∥ W_{k} Ch ∥^{2}, s . t ∥ h ∥ = 1,

h_{k} = h ar g min ∥ W_{k} Ch ∥^{2}, s . t ∥ h ∥ = 1,

w^{p_{i}} = max (exp (- ∥ p_{*} - p_{i} ∥^{2} / σ^{2}), η),

w^{p_{i}} = max (exp (- ∥ p_{*} - p_{i} ∥^{2} / σ^{2}), η),

w^{l_{j}} = max (exp (- d_{l} (p_{*}, l_{j})^{2} / σ^{2}), η),

w^{l_{j}} = max (exp (- d_{l} (p_{*}, l_{j})^{2} / σ^{2}), η),

d_{l} (p_{*}, l_{j}) = ⎩ ⎨ ⎧ min (p_{*} - p_{j}^{0}, p_{*} - p_{j}^{1}) ∣ a_{j} x_{*} + b_{j} y_{*} + c_{j} ∣ / a_{j}^{2} + b_{j}^{2} (a) (b),

d_{l} (p_{*}, l_{j}) = ⎩ ⎨ ⎧ min (p_{*} - p_{j}^{0}, p_{*} - p_{j}^{1}) ∣ a_{j} x_{*} + b_{j} y_{*} + c_{j} ∣ / a_{j}^{2} + b_{j}^{2} (a) (b),

E_{p} = \sum_{i} w_{i}^{T} V_{i} - p_{i}^{^{'}}^{2},

E_{p} = \sum_{i} w_{i}^{T} V_{i} - p_{i}^{^{'}}^{2},

E_{g} = \sum_{i} V_{i} - \overline{V}_{i}^{2},

E_{g} = \sum_{i} V_{i} - \overline{V}_{i}^{2},

\begin{array}[]{l}\overline{\mathbf{V}}_{1}=\overline{\mathbf{V}}_{2}+\mu(\overline{\mathbf{V}}_{3}-\overline{\mathbf{V}}_{2})+\nu\mathbf{R}(\overline{\mathbf{V}}_{3}-\overline{\mathbf{V}}_{2}),\end{array}\mathbf{R}=\left[\begin{matrix}&0&1\\ &-1&0\end{matrix}\right],

\begin{array}[]{l}\overline{\mathbf{V}}_{1}=\overline{\mathbf{V}}_{2}+\mu(\overline{\mathbf{V}}_{3}-\overline{\mathbf{V}}_{2})+\nu\mathbf{R}(\overline{\mathbf{V}}_{3}-\overline{\mathbf{V}}_{2}),\end{array}\mathbf{R}=\left[\begin{matrix}&0&1\\ &-1&0\end{matrix}\right],

E_{s} (V_{1}) = φ ∥ V_{1} - (V_{2} + μ (V_{3} - V_{2}) + ν R (V_{3} - V_{2})) ∥^{2},

E_{s} (V_{1}) = φ ∥ V_{1} - (V_{2} + μ (V_{3} - V_{2}) + ν R (V_{3} - V_{2})) ∥^{2},

E_{l} = \sum_{j, k} (l_{j}^{^{'}}^{T} \cdot w_{j, k}^{T} V_{j, k}) / a_{j}^{^{'}}^{2} + b_{j}^{^{'}}^{2}^{2} .

E_{l} = \sum_{j, k} (l_{j}^{^{'}}^{T} \cdot w_{j, k}^{T} V_{j, k}) / a_{j}^{^{'}}^{2} + b_{j}^{^{'}}^{2}^{2} .

E_{c} = \sum_{i, k} (\hat{l}_{i}^{T} \cdot w_{i, k}^{T} V_{i, k}) / \overset{a}{^}_{i}^{2} + \hat{b}_{i}^{2}^{2} .

E_{c} = \sum_{i, k} (\hat{l}_{i}^{T} \cdot w_{i, k}^{T} V_{i, k}) / \overset{a}{^}_{i}^{2} + \hat{b}_{i}^{2}^{2} .

E = α E_{p} + β E_{g} + γ E_{s} + δ E_{l} + ρ E_{c},

E = α E_{p} + β E_{g} + γ E_{s} + δ E_{l} + ρ E_{c},

H_{i}^{^{'}} = τ H_{i} + ξ S,

H_{i}^{^{'}} = τ H_{i} + ξ S,

T_{i}^{^{'}} = H_{i}^{^{'}} H_{i}^{- 1},

T_{i}^{^{'}} = H_{i}^{^{'}} H_{i}^{- 1},

q_{1} q_{4} - c q_{2} q_{5} 0 q_{3} q_{6} 1 = Q_{a} q_{1} + c q_{3} q_{4} + c q_{6} 0 q_{2} q_{5} 0 q_{3} q_{6} 1 Q_{p} 10 - c 010001,

q_{1} q_{4} - c q_{2} q_{5} 0 q_{3} q_{6} 1 = Q_{a} q_{1} + c q_{3} q_{4} + c q_{6} 0 q_{2} q_{5} 0 q_{3} q_{6} 1 Q_{p} 10 - c 010001,

det J (u, v) = det J_{a} (u, v) \cdot det J_{p} (u, v) = λ_{a} \cdot \frac{1}{( 1 - c u ) ^{3}},

det J (u, v) = det J_{a} (u, v) \cdot det J_{p} (u, v) = λ_{a} \cdot \frac{1}{( 1 - c u ) ^{3}},

ξ =< p_{min} p_{i} \cdot p_{min} p_{ma x} > / p_{min} p_{ma x},

ξ =< p_{min} p_{i} \cdot p_{min} p_{ma x} > / p_{min} p_{ma x},

C or (I, I^{^{'}}) = \frac{1}{N} \sum_{π} (1 - N C C (p, p^{^{'}}))^{2},

C or (I, I^{^{'}}) = \frac{1}{N} \sum_{π} (1 - N C C (p, p^{^{'}}))^{2},

\begin{array}[]{c}{Err}_{mg}^{(p)}={\frac{1}{M}\sum\nolimits_{i=1}^{M}{{{\left\|f(\mathbf{p}_{i})-\mathbf{p}_{i}^{{}^{\prime}}\right\|}}}}\\[5.69054pt] {Err}_{mg}^{(l)}=\frac{1}{2K}\sum\nolimits_{j=1}^{K}{\sum\nolimits_{i=0}^{1}{{d}_{l}(f({\mathbf{p}_{l_{j}}^{i}}),{\mathbf{l}_{j}^{{}^{\prime}}})}}\\[5.69054pt] Err_{mg}=({Err}_{mg}^{(p)}*M+{Err}_{mg}^{(l)}*2K)/(M+2K)\end{array},

\begin{array}[]{c}{Err}_{mg}^{(p)}={\frac{1}{M}\sum\nolimits_{i=1}^{M}{{{\left\|f(\mathbf{p}_{i})-\mathbf{p}_{i}^{{}^{\prime}}\right\|}}}}\\[5.69054pt] {Err}_{mg}^{(l)}=\frac{1}{2K}\sum\nolimits_{j=1}^{K}{\sum\nolimits_{i=0}^{1}{{d}_{l}(f({\mathbf{p}_{l_{j}}^{i}}),{\mathbf{l}_{j}^{{}^{\prime}}})}}\\[5.69054pt] Err_{mg}=({Err}_{mg}^{(p)}*M+{Err}_{mg}^{(l)}*2K)/(M+2K)\end{array},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Image Stitching by Line-guided Local Warping

with Global Similarity Constraint

Tianzhu Xiang1, Gui-Song Xia1, Xiang Bai2, Liangpei Zhang1

1*State Key Lab. LIESMARS, Wuhan University, Wuhan, China.

2Electronic Information School, Huazhong University of Science and Technology, China.*

Abstract

Low-textured image stitching remains a challenging problem. It is difficult to achieve good alignment and it is easy to break image structures due to insufficient and unreliable point correspondences. Moreover, because of the viewpoint variations between multiple images, the stitched images suffer from projective distortions. To solve these problems, this paper presents a line-guided local warping method with a global similarity constraint for image stitching. Line features which serve well for geometric descriptions and scene constraints, are employed to guide image stitching accurately. On one hand, the line features are integrated into a local warping model through a designed weight function. On the other hand, line features are adopted to impose strong geometric constraints, including line correspondence and line colinearity, to improve the stitching performance through mesh optimization. To mitigate projective distortions, we adopt a global similarity constraint, which is integrated with the projective warps via a designed weight strategy. This constraint causes the final warp to slowly change from a projective to a similarity transformation across the image. Finally, the images undergo a two-stage alignment scheme that provides accurate alignment and reduces projective distortion. We evaluate our method on a series of images and compare it with several other methods. The experimental results demonstrate that the proposed method provides a convincing stitching performance and that it outperforms other state-of-the-art methods.

1 Introduction

Because images are limited by a camera’s narrow field of view (FOV), image stitching combines a group of images with overlapping regions to generate a single, but larger, mosaic with a wider FOV. Image stitching has been widely used in many tasks in photogrammetry [1], remote sensing [2] and computer vision [3, 4].

In the literature [5], there are typically two main approaches that have been attempted to produce image stitching with satisfactory visual results: (1) developing better alignment models and (2) employing image composition algorithms, such as seam cutting [6] and blending [7]. Image alignment is the first and most crucial step in image stitching. Although advanced image composition methods can reduce stitching artifacts and improve the stitching performance, they cannot address obvious misalignments. When a seam or blending area coincides with misaligned areas, the current image composition schemes will fail to provide a satisfactory stitched image [8].

Most previous image stitching methods estimate global geometric transformations (e.g., similarity, affine or projective transformation) to bring the overlapping images into alignment. However, these methods require the camera rotation to have a fixed projective center or the scenes to have limited depth variance [9], which are restrictive assumptions that are often violated in practice, resulting in artifacts in the stitched images, e.g., misalignments or ghosting.

To compensate for these geometric assumptions, some spatially-varying warping methods for image stitching have been proposed in recent years that can be roughly categorized into two groups: multiple homographies and mesh-based warping. The former estimates multiple homographies that are compatible with local geometries to align the input images, e.g., as-projective-as-possible (APAP) warping [5]. Mesh-based warping first pre-warps the image using global homography; then, it adopts some energy functions to optimize the alignment, treating it as a mesh warping problem, e.g., content-preserving warping (CPW) [10]. The high degrees of freedom (DoFs) involved in these methods can better handle parallax than can global transformations; thus, they can provide satisfactory stitching results. However, some challenges remain to be addressed:

The current methods often fail to achieve satisfactory alignment in low-texture images. Due to the high DoFs, these methods inevitably depend heavily on point correspondences [11]. However, keypoints are difficult to detect in some low-texture images because the homogeneous regions, such as indoor walls, sky, artificial structures, are not distinctive enough to provide rich and reliable correspondences. Hence, these methods often erroneously estimate the warping model, which causes misalignments. 2. -

The influence of projective distortions has not been fully considered. Because many methods are based on projective transformations, e.g., CPW [10], APAP [5], the stitched results of images taken under various photographing viewpoints may suffer from projective distortions [12] in the non-overlapping regions, including both shape and perspective distortions. For instance, some regions in the stitched image may be stretched or non-uniformly enlarged, and it is difficult to preserve the perspective of each image (Fig. 1(b), Fig. 7(a)). 3. -

The image structure distortion has not been fully considered. Some local warping models, e.g., CPW [10], APAP [5], may bend line structures, especially when stitching low-texture images. For instance, insufficient or unreliable keypoints cause APAP to erroneously estimate some local transformations, which results in misalignment of the local regions and distorts the line structures that span multiple local regions, while CPW employs only feature correspondences and content smoothness to optimize the global transformation and does not consider structural constraints.

The challenges to image stitching can be clearly seen in Fig. 1. Fig. 1 (a) shows the original images and the detected features (points and lines). In some homogeneous regions, only a few points are detected and matched, making it difficult to estimate an accurate transformation. Fig. 1 (b) shows the stitching results from global homography [13], CPW [10], APAP [5] and the proposed method. When the restrictive imaging conditions are violated, the global homography model does not fit the data correctly; thus, it results in obvious misalignments (the red boxes). In low-textured areas with insufficient correspondence (red boxes), CPW lacks sufficient data to align the pre-warping result, and APAP cannot estimate accurate local homographies, causing obvious misalignments. The lack of point correspondences also leads to structural deformations in CPW and APAP (blue boxes), where straight lines are deformed into curves. Due to the projective transformation used in these three models and the fact that no measures are taken to eliminate distortions, the stitched image results of these methods suffer from severe projective distortions (the yellow boxes), where the chairs are enlarged non-uniformly.

The above problems provide strong motivation for improving the performance of image stitching. To our knowledge, only a few studies have been conducted to address either of the aforementioned problems; consequently, additional efforts are needed. Recent studies ([14] and [15]) have reported that line features can be used to improve the alignment performance, and [12] and [16] recently showed that similarity transformations are advantageous in reducing distortions. Inspired by these studies, our work is based on the following two assumptions:

In most man-made environments, line features are relatively abundant, thus they can be regarded as effective supplements that can provide rich correspondences for accurate warping model estimation [17]. Furthermore, line features depict the geometrical and structural information of scenes [18, 19, 20]; thus, they can also be used to preserve the image structures. 2. -

Similarity transformation [12] does not introduce shape distortion because it consists only of translation, rotation and uniform scaling. A similarity transformation can be regarded as a combination of panning, zooming and in-plane camera rotation; therefore, it preserves the viewing direction.

It is thus of great interest to investigate how to integrate line features and global similarity transformation to improve the image stitching performance. To this end, this paper presents a line-guided local warping model for image stitching with a global similarity constraint. More precisely, this method adopts a two-stage scheme to achieve good alignment. First, pre-warping is jointly estimated using both point and line features. Then, extended mesh-based warping is used to further align the pre-warping result. Line features are integrated into mesh-based warping framework and act as structural constraints to preserve image structures. Finally, to prevent undesirable distortions, the global similarity transformation is adopted as a similarity constraint and used to adjust the estimated warping model. The contributions of our work are as follows:

We introduce line features to guide image stitching, especially in low-texture cases. Line features play a significant role mainly in two aspects: 1) they are integrated into the local warping model using a weight function to achieve accurate alignment; 2) they are employed to impose strong geometric constraints (i.e. line correspondence and line collinearity) to refine the stitching performance.

-

We present a weight integration strategy to combine the global similarity constraint with models of global homography or multiple homographies. Using this strategy, the resultant warp achieves a smooth transition from a projective to a similarity transformation across the image, which significantly mitigates the projective distortions in non-overlapping regions.

-

We propose a robust and effective two-stage stitching framework that combines the local multiple homographies model and the mesh-based warping model with line and global similarity constraints. The proposed method addresses local variation well to ensure image alignment by local stitching and flexible refinement. The method also preserves image structures and multi-perspective through strong geometrical and structural constraints. The proposed method achieves a state-of-the-art performance.

The remainder of this paper is organized as follows. Section LABEL:sec:work gives a brief review of the related works. Section 3 describes the proposed method in detail. The experimental results and analyses are reported in Section 4. Finally, we draw some conclusions and provide remarks in Section 5.

2 Related works

Numerous studies have been devoted to image stitching; a comprehensive survey can be found in [9]. The global homography model [13] works well for planar scenes or for scenes acquired with parallax-free camera motion, but violation of these assumptions may lead to ghosting artifacts.

Recently, spatially-varying warping methods have been proposed that flexibly address parallax. Liu et al. [10] proposed the content-preserving warping (CPW) method, which was first used in video stabilization. CPW adopts registration error and content smoothness to refine the pre-warping result obtained by global homography. A simple extension of global homography method was presented in [21], called dual-homography warping (DHW), which divides the entire scene into two planes: a distant plane and a ground plane. The final warping is obtained by a linear combination of these two homographies estimated by the point correspondences of each plane. However, this method has difficulties on complex scenes. Lin et al. [22] proposed the smoothly varying affine (SVA) warping method for image stitching. SVA can handle local deformations while preserving global affinity. However, because there are insufficient DoFs in the affine model, SVA cannot achieve projective warping. Zaragoza et al. [5] extended the previous method and proposed an as-projective-as-possible (APAP) warping method for image stitching. APAP achieves a smoothly varying projective stitching field estimated by a moving direct linear transformation (DLT) [23]. It maintains a global projection while allowing local non-projective deviations. Zhang et al. [3] proposed a parallax-tolerant image stitching method that seeks the optimal homography evaluated by the seam cost and uses CPW to refine the alignment. However, except for SVA, these methods are based on projective transformations, thus the stitched images often suffer from projective distortions. In addition, the resulting images may suffer from structural deformations because of the nonlinear loceal transformations in the model.

In recent years, similarity transformation, which is composed of translation, rotation and scaling, was introduced. Similarity transformation constructs a combined warping with projective transformations to constrain the projective distortions. Chang et al. [12] proposed a shape-preserving half-projective (SPHP) warping for image stitching that adopts projective, transition and similarity transformation to achieve a gradual change from a projective to a similarity transformation across the image. SPHP can significantly reduce the distortions and preserve the image shape; however, it may introduce structural deformations, e.g., line distortions, when the scene is dominated by line structures. Lin et al. [16] proposed an adaptive as-natural-as-possible (AANAP) warping that linearizes the homography in the non-overlapping regions and combines these homographies with a global similarity transformation using a direct and simple distance-based weight strategy to mitigate perspective distortions. However, some distortions still exist locally when stitching images (Fig. 13(b)).

It is worth noting that spatially-varying warping-based image stitching is highly dependent on point correspondences. When there are insufficient reliable keypoints (such as in low-texture images), the effects of the estimated models will degrade. More recently, Joo et al. [14] introduced line correspondences into the local warping model, but this approach requires a user to annotate the straight lines, and setting the parameters for this method is complex. Li et al. [15] proposed a dual-feature warping method for motion model estimation that combines line segments and points to estimate the global homography. However, this method still suffers from projective distortions.

3 The proposed approach

This section introduces the proposed method for image stitching in detail. The main idea is to integrate line constraints and a global similarity constraint into a two-stage alignment framework. The outline of our method is illustrated in Fig. 2. The first-stage alignment (presented in Section 3.1) involves estimating an accurate warping model using line guidance. Linear features are adopted as alignment constraints to jointly estimate both global and local homography with point correspondences, which provide rich and reliable correspondences even in low-texture images. To further improve the stitching performance, we adopt mesh optimization based on the extended content-preserving warping framework presented in Section 3.2. Then, the linear feature constraints (i.e., line correspondence and line collinearity) are combined to further refine the alignment and preserve the image structures. Finally, to mitigate the projective distortions, a global similarity transformation, estimated by a set of selected points in the approximate image projection plane, is employed to constrain the distortions caused by projective warping via a weighted integration strategy (Section 3.3). Based on the proposed warping model, we are able to achieve accurate and distortion-free image stitching.

3.1 Line-guided warping model

Point features are often adopted for image alignment. Given the target and reference images $I,I^{{}^{\prime}}$ , $\mathbb{R}\times\mathbb{R}\mapsto\mathbb{R}$ , and a pair of matching points: $\mathbf{p}=[x,y,1]$ and $\mathbf{p^{{}^{\prime}}}=[x^{{}^{\prime}},y^{{}^{\prime}},1]$ where $x,y\in\mathbb{R}$ , the global homography, $\mathbf{H}\in\mathbb{R}^{3\times 3}$ : $\mathbf{p^{{}^{\prime}}}=\mathbf{Hp}$ , can be estimated by minimizing the algebraic distance $\sum\nolimits_{i}{{\left\|{\mathbf{p}_{i}^{{}^{\prime}}\times\mathbf{H}{\mathbf{p}_{i}}}\right\|}^{2}}$ between a set of matching points, where $i$ is the index of matching points.

However, as stated previously, keypoints extracted from images are rare in some low-texture scenarios, thus it is difficult to estimate an accurate global homography for image stitching. Hence, line features, which are salient in artificial scenarios, are adopted as the alignment constraint to guide the global homography estimation.

Let $\mathbf{l}=[{a,b,c}]^{T}$ , $\mathbf{l^{{}^{\prime}}}=[{a^{{}^{\prime}},b^{{}^{\prime}},c^{{}^{\prime}}}]^{T}$ , with $a,b,c\in\mathbb{R}$ be a pair of matching lines in the target and reference images respectively. Here, $\mathbf{p}^{0,1}=[x^{0,1},y^{0,1},1]$ denotes the two endpoints of line $\mathbf{l}$ . They satisfy ${{\mathbf{l^{{}^{\prime}}}}^{T}}\mathbf{H}{\mathbf{p}^{0,1}}=0$ , which means that the endpoints transformed by $\mathbf{H}$ from $\mathbf{l}$ should lie on the corresponding line $\mathbf{l}^{{}^{\prime}}$ . Therefore, $\mathbf{H}$ can be estimated by minimizing the algebraic distance $\sum\nolimits_{j}{{{\left\|{{\mathbf{l}_{j}^{{}^{\prime}}}^{T}\times\mathbf{H}\mathbf{p}_{j}^{0,1}}\right\|}^{2}}}$ using a set of matching lines, where $j$ is the index of the matching lines.

The homography is then estimated jointly by point and line correspondences:

[TABLE]

where $\mathbf{h}=[{h_{1}},{h_{2}},{h_{3}},{h_{4}},{h_{5}},{h_{6}},{h_{7}},{h_{8}},{h_{9}}]$ is the column vector representation of $\mathbf{H}$ , and $\mathbf{A}_{i}$ , $\mathbf{B}_{j}\in\mathbb{R}^{2\times 9}$ are the coefficient matrixes computed by the $i\textendash$ th matching point and $j\textendash$ th matching line, respectively.

Stacking all the coefficient matrices of points ( $\mathbf{A}_{i}$ ) and lines ( $\mathbf{B}_{j}$ ) vertically into a unified matrix, $\mathbf{C}=[\mathbf{A};\mathbf{B}]$ , and Eq. (1) can be rewritten as follows:

[TABLE]

The global homography $\mathbf{H}$ is the smallest significant right singular vector of $\mathbf{C}$ . Note that before estimation, all the entries of the stacked matrices $\left[{{A_{i}};{B_{j}}}\right]$ should be normalized for numerical stability. In this study, we adopt the point-centric normalization approach proposed in [24].

Local homography can handle parallax better than global homography due to the higher DoFs [5]. Therefore, we extend the line-guided global homography to local homographies. The input images are first divided into uniform grid meshes. The local homography $\mathbf{h}_{k}$ of the $k\textendash$ th mesh located at $\mathbf{p}_{*}=[x_{*},y_{*}]$ is estimated by

[TABLE]

where $\mathbf{W}_{k}=diag\left(\left[{\mathbf{w}^{p}},{\mathbf{w}^{l}}\right]\right)$ , $\mathbf{w}^{p}\in\mathbb{R}^{2N}$ , and $\mathbf{w}^{l}\in\mathbb{R}^{2M}$ denote the weight factors for the point and line correspondences, respectively. Specifically, $\mathbf{w}^{p}=[w^{p_{1}}w^{p_{1}}...w^{p_{N}}w^{p_{N}}]$ , and $\mathbf{w}^{l}=[w^{l_{1}}w^{l_{1}}...w^{l_{M}}w^{l_{M}}]$ . Therefore, the solution is the smallest significant right singular vector of $\mathbf{W}\mathbf{C}$ .

The point weight factor $\mathbf{w}^{p}$ is calculated by the Gaussian weighted Euclidean distance:

[TABLE]

where $\mathbf{p}_{i}$ is the $i\textendash$ th keypoint, $\sigma$ is the scale parameter, and $\eta\in[0,1]$ is used to avoid the numerical issues caused by the small weights when the mesh center $\mathbf{p}_{*}$ is far away from keypoint $\mathbf{p}_{i}$ , as shown in Fig. 3(a).

The line weight factor $\mathbf{w}^{l}$ is calculated as follows:

[TABLE]

where ${d_{l}}{(\mathbf{p}_{*},\mathbf{l}_{j})}$ is the shortest distance between the mesh center $\mathbf{p}_{*}$ and line $\mathbf{l}_{j}$ , calculated as follows:

[TABLE]

where $\mathbf{p}_{j}^{0}$ , $\mathbf{p}_{j}^{1}$ are the endpoints of line $\mathbf{l}_{j}:\mathbf{l}_{j}=[a_{j},b_{j},c_{j}]$ . As shown in Fig. 3(b), when $\mathbf{p}_{*}$ is in the $R_{1}$ or $R_{2}$ region, the $d_{l}$ is calculated by (a), and when $\mathbf{p}_{*}$ is in the $R_{3}$ region, $d_{l}$ is calculated by (b). From Eq.(4) and (5), the weight is greater when the keypoint or line is closer to the mesh center $\mathbf{p}_{*}$ , which causes the local homography to be a better fit for the local structure around $\mathbf{p}_{*}$ .

3.2 Alignment refinement with line constraints

This section describes the adoption of mesh optimization as the second step of the two-stage alignment scheme to further improve the performance of image stitching. Content-preserving warping is a mesh-based warping method that was first used for video stabilization in [10] and, later, successfully applied to image stitching [25, 26, 27]. It is well-suited for small local adjustments. In our work, the line feature constraints (e.g., the line correspondence constraint and line colinearity constraint) are integrated into the content-preserving warping framework to both maintain the image structures and refine the alignment satisfactorily.

The target image $I$ is first divided into a regular grid mesh. In our case, the grid mesh is used to guide the image warping. Supposing $\overline{\mathbf{V}}$ denotes the vertices of the grid mesh in the pre-warping image transformed by the line-guided warping model. Alignment refinement is performed to find a group of deformed vertices $\mathbf{V}$ using energy optimization. An arbitrary point $\mathbf{p}$ in the pre-warping image can be represented by a linear combination of four mesh vertices ${\mathbf{V}}={\left[{\mathbf{V}_{1},\mathbf{V}_{2},\mathbf{V}_{3},\mathbf{V}_{4}}\right]^{T}}$ in its locating quad: ${\mathbf{p}}=\mathbf{w}^{T}{\mathbf{V}}$ , and $\mathbf{w}={\left[{w_{1},w_{2},w_{3},w_{4}}\right]^{T}}$ are calculated by inverse bilinear interpolation [28] and sum to 1. Therefore, the image warping problem can be formulated as a mesh warping problem. In fact, it is an optimization problem in which the objective is to accurately align the pre-warping image to the reference image while avoiding obvious distortions. The energy terms used in this paper are detailed below.

3.2.1 Content-preserving warping

Content-preserving warping [25] includes three energy terms: a point alignment term, a global alignment term and a smoothness term.

The point alignment term $E_{p}$ is used to align the feature points in the target image or pre-warping image to the corresponding points in the reference image as much as possible. It is defined as follows:

[TABLE]

where $\mathbf{p}_{i}^{{}^{\prime}}$ is the matching point in the reference image. This term ensures the alignment of the overlapping region.

The global alignment term $E_{g}$ is used to constrain the image regions without feature correspondences to be as consistent as possible with the pre-warping result:

[TABLE]

where $\overline{\mathbf{V}}_{i}$ is the corresponding vertex in the pre-warping result.

The smoothness term $E_{s}$ encourages each grid in the pre-warping result to preserve similarity during warping to avoid shape distortions as much as possible. Precisely, given a triangle $\vartriangle\overline{\mathbf{V}}_{0}\overline{\mathbf{V}}_{1}\overline{\mathbf{V}}_{2}$ in the pre-warping result, the vertex $\overline{\mathbf{V}}_{0}$ can be represented by $\overline{\mathbf{V}}_{1}$ and $\overline{\mathbf{V}}_{2}$ as shown below:

[TABLE]

where $\mu,\nu$ are the coordinate values of $\overline{\mathbf{V}}_{0}$ in the coordinated system defined by the other two vertices. During warping, the triangle uses a similarity transformation to preserve the relative relationship of the three vertices and avoid local distortions. The smoothness term is

[TABLE]

where $\varphi$ is a weight used to measure the salience of the triangle as in [10]. The weight more strongly preserves the shapes of high-salience regions than those of low-salience regions. The full smoothness energy term is formed by summing Eq. (10) over all the vertices.

3.2.2 Line correspondence term

However, content-preserving warping terms only ensure the point alignment in the overlapping regions; thus, the line correspondences are taken into consideration to further improve the alignment.

A line correspondence term is utilized to ensure that the line correspondences are well aligned. Let $\mathbf{l}_{j}$ , $\mathbf{l}_{j}^{{{}^{\prime}}}$ be a pair of corresponding lines in the target and reference images, respectively. Line $\mathbf{l}_{j}$ is cut into several short line segments by the edges of mesh if the line $\mathbf{l}_{j}$ traverses this mesh. The endpoints of the short line segments from $\mathbf{l}_{j}$ are denoted by ${\mathbf{p}_{j,k}}$ , where $k$ is the index of the endpoints, and ${\mathbf{p}_{j,k}^{{}^{\prime}}}$ denotes the endpoints in the pre-warping image transformed from ${\mathbf{p}_{j,k}}$ by the preceding warping process, ${\mathbf{p}_{j,k}^{{}^{\prime}}}=\mathbf{w}_{j,k}^{T}{\mathbf{V}_{j,k}}$ . The line correspondence term can be expressed by the idea that the distance from ${\mathbf{p}_{j,k}^{{}^{\prime}}}$ to the corresponding line $\mathbf{l}_{j}^{{}^{\prime}}$ should be the minimum distance:

[TABLE]

The line correspondence term not only enhances the image alignment but also, together with line collinearity term below, preserves the straightness of line structures.

3.2.3 Line collinearity term

However, the above terms may not reduce the distortions (e.g., line structure distortions) in the non-overlapping regions where there are few point or line correspondences. To capitalize on the line features and preserve the line structure, we adopt the line collinearity constraint.

The line collinearity term is used to preserve the straightness of linear structures in the target image as much as possible. Let $\mathbf{p}_{i,k}$ denote the endpoints and intersecting points of line $\mathbf{l}_{i}$ in the non-overlapping regions with the grid. Assume that $\mathbf{p}_{i,k}^{{}^{\prime}}$ denotes the corresponding points of $\mathbf{p}_{i,k}$ in the pre-warping result. The line should maintain its straightness after warping, that is, the transformed points $\mathbf{p}_{i,k}^{{}^{\prime}}$ should lie on the same line. This can be represented by the distance from the endpoints $\mathbf{p}_{i,k}^{{}^{\prime}}$ to the line ${\hat{\mathbf{l}}}_{i}$ which should be the minimum distance. Line ${\hat{\mathbf{l}}}_{i}$ is calculated by the head and tail endpoints of $\mathbf{p}_{i,k}^{{}^{\prime}}$ . The term is defined as follows:

[TABLE]

Together, the line collinearity term and the line correspondence term maintain the line structures well.

3.2.4 Objective function

The above five energy terms are then combined as an energy optimization problem in which the objective function is

[TABLE]

where $\alpha,\ \beta,\ \gamma,\ \delta,\ \rho$ are the weight factors for each energy term. In our implementation, $\alpha=1,\beta=0.001,\gamma=0.01,\delta=1,and~{}\rho=0.001$ . The above function is quadratic; consequently, it can be solved by a sparse linear solver. The final result is obtained through texture mapping.

3.3 Distortion reduction by global similarity constraint

To reduce the projective distortions in the non-overlapping regions, the global similarity transformation is adopted to adjust the local warping model.

Chang et al. [12] has shown that similarity transformation is effective in mitigating distortions. If we can find a similarity transformation that approximately represents the camera motion of the image projection plane, that transformation can be applied to offset the camera motion [16]. RANSAC [29] is used to iteratively segment the matching points. Each group of point correspondences can be used to estimate a similarity transformation. The estimation with the smallest rotation angle is selected as the optimal candidate [30]. As shown in Fig. 4, the group of points in green is chosen to estimate the global similarity transformation. The plane composed of green points approximates the image projection plane because the camera is nearly perpendicular to the ground when shooting.

3.3.1 Similarity constraint

An image patch can be transformed by a projective transformation (e.g. homography), which provides good alignment but may cause distortions, such as stretching. An image patch can also be warped by the similarity transformation, which, although it introduces no distortions, may result in poor alignment due to the limited DoFs. Integrating two types of transformations using weights, can therefore both ensure good alignment and reduce distortions. The similarity constraint procedure is described in Algorithm 1. The global similarity transformation is combined with global or local homographies using weight factors. To create a smooth transition, the whole image should be considered. The weight integration is calculated as follows:

[TABLE]

where $\mathbf{H}_{i}$ is the homography in the $i\textendash$ th grid mesh, and $\mathbf{H}_{i}^{{}^{\prime}}$ is the final homography in the $i\textendash$ th grid mesh. Here, $\mathbf{S}$ is the similarity transformation, and $\tau$ and $\xi$ are weight coefficients with $\tau+\xi=1$ . The calculation of these two weights will be described later. In a global homography model, the homography of every grid mesh is the same.

The corresponding warping procedure should also be applied to the reference image because the similarity transformation also adjusts the overlapping regions. The warping procedure for the reference image can be formulated as follows:

[TABLE]

where $\mathbf{T}_{i}^{{}^{\prime}}$ is the warping procedure for the reference image in the $i\textendash$ th grid mesh.

As shown in Fig. 5, when a point is far from the overlapping regions (especially the distorted non-overlapping regions) the procedure assigns a high weight for the similarity transformation to mitigate the distortions as much as possible. In contrast, for points near the overlapping regions, it assigns a high weight for the homography to ensure accurate alignment. Using this weight combination, the final warp smoothly changes from a projective to a similarity transformation across the image, which preserves the image shape and maintains the multi-perspective.

3.3.2 Weighting strategy

The weight coefficient calculation stems from the analysis of projective transformation. According to [31], let $\mathbf{R}$ be a rotation transformation that transforms the image coordinate $(x,y)$ to a new coordinate $(u,v)$ . Based on $\mathbf{p^{{}^{\prime}}}=\mathbf{H}\mathbf{p}$ , a new projective transformation $\mathbf{Q}$ that transforms $(u,v)$ to $(x^{{}^{\prime}},y^{{}^{\prime}})$ meets $\mathbf{p^{{}^{\prime}}}=\mathbf{Q}[u,v,1]^{T}=\mathbf{H}\mathbf{R}[u,v,1]^{T}$ , where $\mathbf{H}=[h_{1},h_{2},h_{3};h_{4},h_{5},h_{6};h_{7},h_{8},1]$ , and $\mathbf{Q}=[q_{1},q_{2},q_{3};q_{4},q_{5},q_{6};q_{7},q_{8},1]$ .

Supposing that the rotation angle is $\theta=\arctan\left({{h_{8}}/{h_{7}}}\right)$ , we can obtain ${q_{8}}=-{h_{7}}\sin\theta+{h_{8}}\cos\theta=0$ . Then, $\mathbf{Q}$ can be decomposed as follows:

[TABLE]

where $c=\sqrt{h_{7}^{2}+h_{8}^{2}}$ . Here, $\mathbf{Q}_{a}$ is the affine transformation, and $\mathbf{Q}_{p}$ is the projective transformation. Defining the local scale change [32] at point $(u,v)$ under the projective transformation as the determinant of the Jacobian of $\mathbf{Q}$ at point $(u,v)$ , the local scale change is calculated as follows:

[TABLE]

where $det$ denotes the determinant, and ${\lambda}_{a}$ is independent of $u$ and $v$ . It can be seen that the local area change derived from $\mathbf{Q}$ relies only on the $u$ direction. In other words, the distortions of projective transformation occur only along the $u\textendash$ axis. Therefore, the distortions can be effectively eliminated if the weight coefficients are calculated along the $u$ direction in the $(u,v)$ coordinate system.

The weight coefficients are designed based on the distance of grid points in the $u$ direction; the goal is to provide a gradual change from a projective to a similarity transformation across the image to preserve the image content in non-overlapping regions. As shown in Fig. 6, the center of the reference image is used as the origin of coordination $o$ , and the unit vector on the $u\textendash$ axis denotes $\overrightarrow{ou}=({1,0})$ . For the arbitrary mesh center $\mathbf{p}$ , $d$ is the projected length of vector $\overrightarrow{o{\mathbf{p}}}$ on the vector $\overrightarrow{ou}$ . The projected point ${\mathbf{p}_{max}}$ with a maximum length of $d$ and the projected point $\mathbf{p}_{min}$ with a minimum length of $d$ can be calculated. For the $i\textendash$ th grid, the weight coefficients are calculated as follows:

[TABLE]

where $<\overrightarrow{\mathbf{p}_{min}\mathbf{p}_{i}}\cdot\overrightarrow{{\mathbf{p}_{max}}{\mathbf{p}_{min}}}>$ denotes the projection length of $\overrightarrow{{\mathbf{p}_{min}}{\mathbf{p}_{i}}}$ on $\overrightarrow{{\mathbf{p}_{max}}{\mathbf{p}_{min}}}$ , and $\tau=1-\xi$ .

As shown in Fig. 7, APAP adopts the local homographies for alignment, which aims to be both globally projective while allowing local deviations. However, the stitched image suffers from projective distortions; for instance, the buildings are undesirably stretched and not parallel to the temples, in addition, the perspective distortions in the non-overlapping regions are obvious. In contrast, using a global similarity constraint, the proposed warping model preserves the shapes of objects and maintains the perspective of each image.

4 Experimental results and analysis

This section describes several experiments conducted to assess the performance of the proposed method on a series of challenging images. In our experiments, the testing images were acquired casually, using different shooting positions and angles.

Given a pair of input images, the keypoints are detected and matched by SIFT [33] in the VLFeat library [34]. The line features are detected by a line segment detector (LSD) [35] and matched by line-point invariants [36] or line-junction-line [37]. Then, RANSAC is used to remove the mismatches, and the remaining inliers are input to the stitching algorithms. We compared our approach with several other methods. The parameters of the other methods were set as suggested in the respective papers and we used the source code provided by the authors of the papers to obtain the compared results. For our method, $\sigma$ is 8.5, and $\eta$ is 0.01. The experiments were conducted on a PC with an Intel i3-2120 3.3 Ghz CPU and 8 GB of RAM. Not considering feature detection and matching, the proposed method takes 20–30 s to stitch together two images with a resolution of 800 $\times$ 600.

To better compare the methods and reduce interference, we avoided post-processing methods such as blending or seam cutting as detailed in [9]. Instead, the aligned images are simply blended by intensity average so that any misalignments remain obvious.

To assess the accuracy of the image stitching alignment quantitatively, the metrics of correlation (Cor) [16] and mean geometric error ( $Err_{mg}$ ) [14] are adopted. Cor is defined as one minus the normalized cross correlation (NCC) over the neighborhood of a $3\times 3$ window, that is

[TABLE]

where $N$ is the number of pixels in the overlapping region $\pi$ , and $\mathbf{p}$ and $\mathbf{p}^{{}^{\prime}}$ are the pixels in image $I$ and $I^{{}^{\prime}}$ , respectively. $Cor$ reflects the similarity of two images in the overlapping regions. The smaller the $Cor$ value is, the better the stitching result is.

$Err_{mg}$ is defined as the mean geometric error on points and lines, that is

[TABLE]

where $f:\mathbb{R}^{2}\mapsto\mathbb{R}^{2}$ is the estimated warping, $M$ is the number of point correspondences, $\mathbf{p}_{i}$ and $\mathbf{p}_{i}^{{}^{\prime}}$ are a pair of point correspondences, $K$ is the number of line correspondences, and ${d}_{l}$ denotes the projected distance of the endpoints of $\mathbf{l}_{j}$ to its correspondence line $\mathbf{l}_{j}^{{}^{\prime}}$ . A smaller $Err_{mg}$ value indicates a better stitching result.

In the following subsections, we first verify the performance of the proposed method on image alignment and distortion reduction. Then, we report the experimental comparison results including the comparison with the global-based methods and the local-based methods.

4.1 Image alignment

Fig. 8 illustrates the performance of each constraint in the proposed method, including the line-guided local warping estimation, the line correspondence constraint, and the line colinearity constraint. Fig. 8(b) shows the result of line-guided warping combined with APAP (LAPAP), which largely improves the alignment compared to APAP, as can be clearly seen in the closeup. However, LAPAP introduces structural distortions, e.g., the bent lines on the buildings, shown by red circle in the blue closeup. With CPW optimization, LAPAP+CPW refines the alignment performance (shown in Fig. 8(c)), but some slight misalignments still exist. Combined with line correspondence (LineCorr) constraint, LAPAP+CPW+LineCorr provides good alignment (Fig. 8(d)). However, structural distortions, e.g., line deformations, are not handled well as can be clearly seen in the blue closeup. By adding the line collinearity constraint to restrain the structural deformation, the proposed method provides a good stitching result with less distortion in this example (Fig. 8(e)). Quantitative evaluations of Cor and $Err_{mg}$ are shown in Table 1, which demonstrates conclusions consistent with the visual effect.

Fig. 9 shows a comparison of the original point-based CPW model [25] and the proposed CPW model on the Rooftops111The Rooftops images were acquired from the open dataset of [22]. images. Some errors or distortions are highlighted by the red circles. The stitching process is based on the proposed two-stage alignment. Fig. 9(a) shows the results from the original CPW model, in which misalignments are obvious, especially on the rooftops (red circle). Additionally, the roadside trees are stretched. Under the constraints of line features, Fig. 9(b) improves the alignment performance and produces more accurate results. As can be seen, line features provide a better geometric description than do point features alone, and the line features function as strong constraints for image stitching. Fig. 9(c) shows the final stitching results. Due to the global similarity constraint, the distortions around the roadside trees are largely mitigated, and the proposed method achieves a satisfactory stitching result. Table 2 shows the quantitative comparison. The improved CPW model largely reduces the alignment errors (mainly line errors and total error). By using the similarity constraint, the proposed method obtains a lower Cor.

Next, we compared the proposed method with other flexible warping methods to evaluate the alignment performance, namely, global homography (baseline) [13], CPW (using global warping for the initial alignment) [10], and APAP [5]. For completeness, the proposed method is also compared with the Image Composite Editor (ICE) [38] (a common commercial tool for image stitching) by inputting two images at once. For ICE, we used the final post-processed results for the comparison because the original alignment results are not obtainable in the standard version of ICE. In addition, no quantitative comparison of ICE is provided.

Fig. 10 shows the Desk image pair and the detected feature. For most of the low-textured areas, the keypoints are difficult to extract, resulting in insufficient matching points for the estimation of warping model. However, line features can be used as an effective complement for alignment purposes.

The comparison results are shown in Fig. 11. Because the images violate the assumptions, the baseline warp is unable to align them properly; it produces obvious misalignments (see the red boxes in Fig. 11 (a)). ICE, CPW, and APAP provide relatively better stitching results, but a non-negligible number of ghost artifacts remain. In Fig. 11(b), although ICE uses blending and pixel selection to conceal the misalignments, the post-processing is clearly not completely successful; for instance, there are obvious misalignments on the vertical edge of the desk. Due to an insufficient number of corresponding keypoints along the vertical edge of the desk, CPW and APAP cannot provide an accurate warping model for image alignment; consequently, ghosting occurs in these regions (see the red boxes in Fig. 11 (c) and (d)). With the help of line correspondences and the two-stage robust alignment scheme, our method results in satisfactory stitching performance with accurate alignment and few ghost artifacts (Fig. 11(e)). Note that our method also reduces the need for post-processing.

Table 3 depicts the Cor and $Err_{mg}$ values of the compared methods on the Desk image pair. As listed, CPW’s stronger constraint on point correspondences results in a smaller alignment error on point ${Err}_{mg}^{(p)}$ ; however, the alignment errors on line ${Err}_{mg}^{(l)}$ and $Err_{mg}$ remain large. The proposed method reduces the geometric error and results in better accuracy than do the other tested methods.

4.2 Distortion reduction

To investigate the distortion reduction performance, SPHP [12] and AANAP [16] were compared with the proposed method on the Railtracks and Temple Square image pairs222The Railtracks and Temple Square images were acquired from the open dataset of [5]..

Fig. 12 shows the stitching results of the four methods, APAP [5], SPHP [12], SPHP with an assumption of no rotation (SPHP1) [12], and our method. Due to its simple extrapolation of projective transformation to non-overlapping regions, the APAP (Fig. 12(a)) result shows projective distortions in the non-overlapping regions. In the blue box in the closeup, the car is enlarged, and the palm tree is obviously slanted. By introducing the similarity transformation, SPHP can largely mitigate these projective distortions. In Fig. 12(b), SPHP preserves the shape of the car, but it has a problem with the unnatural rotation. In addition, the construction site (in the red box) is tilted to the left. In contrast, SPHP1 preserves the shape and reduces the perspective distortion, but the construction site is now tilted slightly to the right (12(c)). Using the global similarity constraint, the proposed method largely eliminates all these distortions, providing a pleasing stitching result, as is clearly shown in 12(d).

Fig. 13 shows a comparison of the proposed method with AANAP [16] on distortion reduction. Fig. 13(a) shows that APAP achieves good alignment, but it suffers from shape and perspective distortion problems, for example, in the stretched and tilted buildings at the right of the image. By linearizing the homography and using the similarity transformation, AANAP provides an attractive result in which the projective distortions have been largely mitigated (Fig. 13(b)). However, as shown in the red circle of the enlarged view, the lines on the ground are slightly deformed. Our method yields more appealing stitching results in this example (Fig. 13(c)).

4.3 Comparisons with global-based methods

In this section, the proposed method is compared with three global-based methods: global homography (Baseline) [13], ICE [38], and SPHP [12]. For our method (called the global version), global homography is adopted during the first alignment stage and jointly estimated by point and line correspondences to pre-warp the source images.

Fig. 14 shows the two pairs of original images for stitching: Ceiling and Temple 333The Temple images were acquired from the open dataset of [5].. The low-textured content of Ceiling results in the detection of only a limited number of unevenly distributed keypoints, which may degrade the warping model’s estimations. However, line correspondences are abundant, which can improve the image alignment. Temple provides rich point correspondences, but the scene contains multiple distinct planes, which is a challenge for the global-based methods.

Fig. 15 and 16 show the results of the global-based methods on the Ceiling and Temple image pairs. As shown, due to the model deficiencies, the Baseline warp cannot provide satisfactory stitching results; there are numerous misalignments and projective distortions. The ICE and SPHP methods improve the stitching performance, especially in the aspect of the reduction of projective distortions. For instance, the door in Fig. 15 and the people in Fig. 16 have few distortions, but the bricks of the ceiling in the non-overlapping area in the ICE result (Fig. 15(b)) are slightly stretched. In addition, alignment errors in these two pairs of images (the red circles in Fig. 15 and 16) remain obvious. In contrast, the proposed method is more flexible and robust in handling the alignment not only because of the line-guided warping estimation but also because of the alignment constraints in the mesh-based framework. With the similarity constraint, our method provides good stitching results with minimal distortions.

Table 4 and Table 5 contains a quantitative comparison of Ceiling and Temple, showing that our method provides the results with the fewest errors. On Ceiling, our method performs the best because the line features play an important role in scenes without reliable keypoint correspondences. On the Temple image, which has rich and reliable keypoints, the role of the line feature may be reduced, but it still helps to improve the alignment accuracy.

4.4 Comparisons with local-based methods

The global version works well in preserving the content and perspective, but it is somewhat less robust when aligning images taken with large views. For high DoFs and flexible local homographies, our method that uses local homography in the pre-warping stage (called the local version) can handle the parallax issue. Therefore, in this section, we compared it with three other local-based methods: CPW [10], APAP [5], and SPHP+APAP [12]. Fig. 17 shows the original Church, Block, and Wall images for the comparison experiments444The Church and Block images were acquired from the open dataset of [3].. Some images have little texture, which limits the extracted features. Moreover, the images*′* corresponding views vary greatly.

The stitching results on these three pairs of images are provided in Fig. 18. In terms of alignment accuracy, CPW and APAP allow higher DoFs than does global homography, but they also produce misalignments in regions that lack point correspondences (the areas partially highlighted in red boxes). In addition, CPW and APAP may cause local structure deformation in structural regions that lack keypoints. The red closeups clearly show that straight lines are bent (e.g., the stair railing in Church, the building edge in Block, and the wall edge in Wall). Using the similarity transformation, SPHP+APAP reduces the projective distortions and preserves the shape and perspective, mitigating the building distortion in the non-overlapping regions in both Church and Block. In comparison, our method not only provides accurate alignment, which benefits from the two-stage alignment scheme, but also preserves image structures and perspectives due to the line and similarity constraints.

Table 6 shows the quantitative results of the compared methods. Our method consistently achieves better accuracy than CPW, APAP and SPHP+APAP except for ${Err}_{mg}^{(p)}$ in Church result. CPW adopts feature alignment as a strong constraint; therefore, it provides a good quantitative result in ${Err}_{mg}^{(p)}$ . However, its results are unsatisfactory on other criteria on the Church image. Overall, our method achieves the best quantitative results.

4.5 Stitching of multiple images

Figs. 19 and 20 show the stitching results of multiple images on the Apartments and Garden data, respectively555These images were acquired from the open dataset of [5]. Some distinct errors are highlighted in boxes. As can be seen, Autostitch and ICE result in some obvious misalignments because they use only global homography for alignment, which is unsuitable for images whose views differ by factors other than pure rotation. In contrast, our method largely improves the stitching performance because of the flexible line-guided local homographies and mesh optimization. Thus, the proposed method produces satisfactory stitching results that contain few misalignments and distortions.

5 Conclusion

This paper proposed a line-guided local warping for image stitching by imposing similarity constraint. Our method integrates multiple constraints, including line features and global similarity constraints, into a two-stage image stitching framework that achieves accurate alignment and mitigates distortions. The line features are employed as an effective supplement to point features for alignment. Then, the line feature constraints (line matching and line collinearity) are integrated into the mesh-based warping framework, which further improves the alignment while preserving the image structures. Additionally, the global similarity transformation is combined with the projective warping to maintain the image content and perspective. As shown by the results of performed experiments, the proposed method achieves a good image stitching result that yields the fewest alignment errors and distortions compared to other methods. The proposed method depends on line detection and matching; thus, incomplete or broken line segments may influence its structure-preserving performance. In future work, we would like to explore other complex structure constraints, such as contours [39, 40], to improve the image stitching performance, and explore the possibility of applying our warping model to other applications, such as video stabilization [41].

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Pang, M. Sun, X. Hu, and Z. Zhang, “Sgm-based seamline determination for urban orthophoto mosaicking,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 112, pp. 1–12, 2016.
2[2] X. Li, N. Hui, H. Shen, Y. Fu, and L. Zhang, “A robust mosaicking procedure for high spatial resolution remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 109, pp. 108–125, 2015.
3[3] Y. Zhang, S. Song, P. Tan, and J. Xiao, “Panocontext: A whole-room 3d context model for panoramic scene understanding,” in Proceedings of the 13th European Conference on Computer Vision , Zurich, Switzerland, 2014, pp. 668–686.
4[4] M. Brown and D. G. Lowe, “Recognising panoramas,” in International Conference on Computer Vision , Nice, France, 2003, pp. 1218–1227.
5[5] J. Zaragoza, T.-J. Chin, Q.-H. Tran, M. S. Brown, and D. Suter, “As-projective-as-possible image stitching with moving dlt,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 36, no. 7, pp. 1285–1298, 2014.
6[6] J. Gao, Y. Li, T.-J. Chin, and M. S. Brown, “Seam-driven image stitching,” in Eurographics , Girona, Spain, 2013, pp. 45–48.
7[7] W. Wang and M. K. Ng, “A variational method for multiple-image blending,” IEEE Transactions on Image Processing , vol. 21, no. 4, pp. 1809–1822, 2011.
8[8] J. Hu, D.-Q. Zhang, H. Yu, and C. W. Chen, “Multi-objective content preserving warping for image stitching,” in IEEE International Conference on Multimedia and Expo , Turin, Italy, 2015, pp. 1–6.