Enhancing Underexposed Photos using Perceptually Bidirectional   Similarity

Qing Zhang; Yongwei Nie; Lei Zhu; Chunxia Xiao; Wei-Shi Zheng

arXiv:1907.10992·cs.CV·July 9, 2020

Enhancing Underexposed Photos using Perceptually Bidirectional Similarity

Qing Zhang, Yongwei Nie, Lei Zhu, Chunxia Xiao, Wei-Shi Zheng

PDF

TL;DR

This paper introduces a novel method for enhancing underexposed photos by ensuring perceptual consistency through a bidirectional similarity criterion, leading to artifact-free, high-quality images and videos.

Contribution

It proposes a new perceptually bidirectional similarity criterion and formulates enhancement as a constrained illumination estimation problem, improving over existing methods.

Findings

01

Outperforms state-of-the-art enhancement methods

02

Effectively reduces visual artifacts like color distortion and detail loss

03

Extends to underexposed video enhancement with consistent illumination propagation

Abstract

Although remarkable progress has been made, existing methods for enhancing underexposed photos tend to produce visually unpleasing results due to the existence of visual artifacts (e.g., color distortion, loss of details and uneven exposure). We observed that this is because they fail to ensure the perceptual consistency of visual information between the source underexposed image and its enhanced output. To obtain high-quality results free of these artifacts, we present a novel underexposed photo enhancement approach that is able to maintain the perceptual consistency. We achieve this by proposing an effective criterion, referred to as perceptually bidirectional similarity, which explicitly describes how to ensure the perceptual consistency. Particularly, we adopt the Retinex theory and cast the enhancement problem as a constrained illumination estimation optimization, where we…

Tables2

Table 1. TABLE I : Quantitative comparison between our method and other video enhancement methods in terms of “mean/standard deviation” of DE and NIQE.

	Input	PPVE [33]	PDPF [35]	Ours
DE	5.81/0.27	7.38/0.38	7.14/0.31	7.53/0.23
NIQE	4.35/0.41	3.63/0.47	3.37/0.43	3.12/0.35

Table 2. TABLE II : Quantitative comparison between our method and the state-of-the-arts on the six employed datasets.

Dataset	Original		NPE [7]		WVM [8]		JieP [9]		LIME [10]		HDRNet [11]		DPE [12]		Ours
Dataset	DE	NIQE	DE	NIQE	DE	NIQE	DE	NIQE	De	NIQE	DE	NIQE	DE	NIQE	DE	NIQE
NPE	6.56	3.89	7.22	3.18	7.03	3.55	7.34	3.11	7.54	3.31	7.33	3.51	7.13	3.62	7.64	3.02
MEF	6.07	4.27	7.14	3.59	6.89	3.84	7.29	3.51	7.32	3.71	7.16	3.63	7.08	3.76	7.56	3.37
MF	6.36	3.35	7.11	3.02	7.14	3.25	7.23	3.17	7.49	3.12	7.19	3.26	7.03	3.41	7.74	2.81
LIME	6.02	4.47	6.91	4.09	6.82	4.29	6.98	3.87	7.39	4.10	7.18	3.95	6.87	4.31	7.45	3.57
VV	6.63	3.38	7.43	2.73	7.32	2.97	7.48	2.81	7.53	2.89	7.62	2.92	7.46	3.17	7.81	2.75
FiveK	6.45	3.29	7.09	2.93	7.03	3.12	7.16	2.82	7.21	2.88	7.11	2.79	6.93	3.17	7.25	2.68

Equations22

I = S \times R,

I = S \times R,

max I_{p}^{c} = Γ (S_{p}^{m i n}), \forall c \in {r, g, b},

max I_{p}^{c} = Γ (S_{p}^{m i n}), \forall c \in {r, g, b},

\left\{{\begin{array}[]{lc}{\nabla R_{p}=0,}&{\left|\nabla I_{p}\right|\leq\tau}\\ {{\partial_{d}R_{p}}/{\partial_{d}I_{p}}\geq 1,}&{\left|\nabla I_{p}\right|>\tau}\end{array}}\right.

\left\{{\begin{array}[]{lc}{\nabla R_{p}=0,}&{\left|\nabla I_{p}\right|\leq\tau}\\ {{\partial_{d}R_{p}}/{\partial_{d}I_{p}}\geq 1,}&{\left|\nabla I_{p}\right|>\tau}\end{array}}\right.

R T V (S_{p}) = H (S_{p}) + V (S_{p}),

R T V (S_{p}) = H (S_{p}) + V (S_{p}),

H (S_{p}) = q \in N_{p} \sum u_{q}^{x} w_{q}^{x} (\partial_{x} S_{q})^{2},

H (S_{p}) = q \in N_{p} \sum u_{q}^{x} w_{q}^{x} (\partial_{x} S_{q})^{2},

S_{p}^{'} = max I_{p}^{c}, \forall c \in {r, g, b} .

S_{p}^{'} = max I_{p}^{c}, \forall c \in {r, g, b} .

\begin{split}&\mathop{\arg\min}\limits_{S}\sum\limits_{p}{(S_{p}-S^{\prime}_{p})^{2}}+\lambda\Big{(}\mathcal{H}(S_{p})+\mathcal{V}(S_{p})\Big{)},~{}~{}s.t.~{}~{}\\ &S^{\min}_{p}\leq S_{p}\leq 1,\left\{{\begin{array}[]{lc}{\nabla(I_{p}/S_{p})=0,}&{\left|\nabla I_{p}\right|\leq\tau}\\ {{\partial_{d}(I_{p}/S_{p})}/{\partial_{d}I_{p}}\geq 1,}&{\left|\nabla I_{p}\right|>\tau}\end{array}}\right.\end{split}

\begin{split}&\mathop{\arg\min}\limits_{S}\sum\limits_{p}{(S_{p}-S^{\prime}_{p})^{2}}+\lambda\Big{(}\mathcal{H}(S_{p})+\mathcal{V}(S_{p})\Big{)},~{}~{}s.t.~{}~{}\\ &S^{\min}_{p}\leq S_{p}\leq 1,\left\{{\begin{array}[]{lc}{\nabla(I_{p}/S_{p})=0,}&{\left|\nabla I_{p}\right|\leq\tau}\\ {{\partial_{d}(I_{p}/S_{p})}/{\partial_{d}I_{p}}\geq 1,}&{\left|\nabla I_{p}\right|>\tau}\end{array}}\right.\end{split}

S_{p} = \frac{1}{Z _{p}} q_{↓} \in Ω_{p_{↓}} \sum \overset{ˉ}{S}_{q_{↓}} f (∥ p_{↓} - q_{↓} ∥) g (S_{p}^{'} - S_{q}^{'}),

S_{p} = \frac{1}{Z _{p}} q_{↓} \in Ω_{p_{↓}} \sum \overset{ˉ}{S}_{q_{↓}} f (∥ p_{↓} - q_{↓} ∥) g (S_{p}^{'} - S_{q}^{'}),

i = ar g max_{i} P (H_{i} ∣ p) \propto P (p ∣ H_{i}) P (H_{i}),

i = ar g max_{i} P (H_{i} ∣ p) \propto P (p ∣ H_{i}) P (H_{i}),

P (p ∣ H_{i}) = \frac{1}{∣ H _{i} ∣} q \in Ψ_{p^{'}} \sum ϕ_{i} (L_{q, t - 1}) G (\frac{L _{q, t - 1} - L _{p, t}}{d}),

P (p ∣ H_{i}) = \frac{1}{∣ H _{i} ∣} q \in Ψ_{p^{'}} \sum ϕ_{i} (L_{q, t - 1}) G (\frac{L _{q, t - 1} - L _{p, t}}{d}),

P (H_{i}) = q^{'} min (\frac{1}{D ( p ^{'} , q ^{'} )}) .

P (H_{i}) = q^{'} min (\frac{1}{D ( p ^{'} , q ^{'} )}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Enhancing Underexposed Photos using

Perceptually Bidirectional Similarity

Qing Zhang, Yongwei Nie, Lei Zhu, Chunxia Xiao, and Wei-Shi Zheng Q. Zhang and W.-S. Zheng are with the School of Data and Computer Science, Sun Yat-Sen University, Guangzhou 510006, China. W.-S. Zheng is also with the Peng Cheng Laboratory, Shenzhen 518005, China, and the Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China. E-mail: [email protected], [email protected]. Nie is with the School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China. E-mail: [email protected]. Zhu is with the department of Computer Science and Engineering, the Chinese University of Hong Kong. E-mail: [email protected]. C. Xiao is with the School of Computer Science, Wuhan University, Wuhan 430072, China. E-mail: [email protected].

Abstract

Although remarkable progress has been made, existing methods for enhancing underexposed photos tend to produce visually unpleasing results due to the existence of visual artifacts (e.g., color distortion, loss of details and uneven exposure). We observed that this is because they fail to ensure the perceptual consistency of visual information between the source underexposed image and its enhanced output. To obtain high-quality results free of these artifacts, we present a novel underexposed photo enhancement approach that is able to maintain the perceptual consistency. We achieve this by proposing an effective criterion, referred to as perceptually bidirectional similarity, which explicitly describes how to ensure the perceptual consistency. Particularly, we adopt the Retinex theory and cast the enhancement problem as a constrained illumination estimation optimization, where we formulate perceptually bidirectional similarity as constraints on illumination and solve for the illumination which can recover the desired artifact-free enhancement results. In addition, we describe a video enhancement framework that adopts the presented illumination estimation for handling underexposed videos. To this end, a probabilistic approach is introduced to propagate illuminations of sampled keyframes to the entire video by tackling a Bayesian Maximum A Posteriori problem. Extensive experiments demonstrate the superiority of our method over the state-of-the-art methods.

Index Terms:

Underexposed photo enhancement, perceptually bidirectional similarity, illumination estimation.

I Introduction

With the popularization of the readily-available cameras on cell phones, people are increasingly interested in taking photos. However, capturing well-exposed photos under complex lighting conditions (e.g., low-light and back-light) remains a challenge for non-expert users. Hence, underexposed photos are inevitably created (see Fig. 1(a) for an example). Due to the low detail visibility and dull colors, these photos not only look unpleasing and fail to capture what user desires, but also adversely affect various image analysis tasks, such as segmentation [1, 2], object recognition [3, 4] and saliency detection [5, 6], etc. To enhance the image aesthetic and benefit subsequent applications, automatic underexposed photo enhancement techniques are thus widely required.

Underexposed photo enhancement is a challenging task, since it is highly non-linear and subjective. Commercial softwares such as Adobe Lightroom and Photoshop allow users to interactively retouch photos, while they remain difficult for non-experts. Other ease of use alternatives such as the “Auto Enhance” on iPhone and the “Auto Tone” in Lightroom allow enhancing underexposed photos by just a single click. However, they may fail to produce high-quality results due to the inherent difficulty of automatically balancing all assorted appearance factors (e.g., brightness, contrast, and saturation, etc.) in the adjustment, as shown in Fig. 1(b) and (c).

There have been various underexposed photo enhancement algorithms in the research community. Early approaches work by performing contrast enhancement [13, 14]. Many subsequent approaches [15, 16, 9, 10, 17] rely on the Retinex theory [18], the camera response function [19], and the inverted images [20, 21] to enhance photos. Others learn data-driven photo adjustment by utilizing either traditional machine learning techniques [22, 23, 24], or deep neural networks [11, 25, 26, 27, 12, 28, 29]. However, as shown in Fig. 2, these methods still have respective limitations, e.g., the unclear details, local overexposure and color distortion, making they fail to produce visually pleasing results.

To address the limitations of previous methods, we present a novel method for enhancing underexposed photos. Our method is built upon the observation that the main reason why existing methods produce visually unpleasing results is because they may break the perceptually consistency of visual information between the underexposed input and its enhanced output. For instance, an enhanced image with loss of detail issue is unsatisfactory, since they break the edge consistency with the input image. Based on this observation, we propose perceptually bidirectional similarity (PBS) for explicitly enforcing the perceptual consistency, and formulate underexposed photo enhancement as PBS-constrained illumination estimation by defining PBS as constraints on illumination, which allows us to recover high-quality results from the acquired illumination. Besides, an illumination-estimation-based video enhancement framework is described to handle underexposed videos, where we sample keyframes for illumination estimation and then propagate the illuminations of keyframes to other video frames in a temporally coherent fashion via a Bayesian formulation.

In summary, this paper presents:

•

First, we propose PBS, a simple yet effective criterion for explicitly describing how to ensure the perceptual consistency during underexposed photo enhancement.

•

Second, we design PBS-constrained illumination estimation for enhancing underexposed photos in a way that avoids the artifacts encountered by previous methods.

•

Third, we adopt the proposed illumination estimation and introduce an underexposed video enhancement framework, which produces very competitive video enhancement results compared to existing methods.

•

Fourth, we evaluate the performance of our method in enhancing underexposed photos on six datasets and compare it with various state-of-the-art methods. Results show that our method outperforms previous methods.

A preliminary version of this work appeared in [30]. In this paper, we have extended the earlier conference version in four aspects. First, we present an effective video enhancement framework based on the proposed PBS-constrained illumination estimation. In particular, a probabilistic illumination propagation approach is introduced to obtain temporally coherent illumination sequence for an input video from illuminations of sampled keyframes. Second, we introduce an efficient implementation for our illumination estimation. Third, we provide deeper analysis to our method, including the relationship to color constancy and the potential in correcting overexposed images, etc. Fourth, we have conducted extensive experiments to evaluate the advantage of our method, including further comparisons with more recent learning-based methods and evaluations on additional datasets.

II Related Work

Histogram-based methods. One of the most widely-adopted image enhancement techniques is histogram equalization (HE), which increases image contrast by finding a transformation function that evens out the intensity histogram. However, it tends to cause loss of contrast for regions with high frequencies. To improve the result, Zuiderveld et al. [31] presented the contrast limited adaptive histogram equalization (CLAHE) by setting a limit on the derivative of the slope of the transformation function. This method is quite effective in contrast enhancement, but may induce ghosting artifacts. Although there are many subsequent HE-based variants [32, 13], they may also produce unsatisfactory results.

Sigmoid-mapping-based methods. Mapping pixel intensities with sigmoid functions is another way to enhance underexposed images. As globally applying sigmoid mapping may generate visually distorted results, existing methods usually perform local intensity mapping. For instance, Bennett and McMillan [33] decomposed the input image into a base layer and a detail layer, and then applied different mappings for the two layers to preserve the image details. Yuan and Sun [34] segmented the input image into subregions and computed luminance-aware detail-preserving mapping for each subregion. Zhang et al. [35] created multiple tone-mapped versions for the input image and fused them into a well-exposed image. Since finding locally optimal sigmoid mappings and ensuring globally smooth transitions are difficult, these methods may not work well for images with uneven exposure.

Retinex-based methods. This kind of method is built upon the assumption that an underexposed image is the pixel-wise product of the expected enhanced image and a single-channel illumination map. In this way, image enhancement can be reduced to an illumination estimation problem. Jobson et al. [36] made an early attempt to this problem, but their results often look unnatural. Although subsequent methods significantly improve the results [7, 15, 37, 8, 9, 10, 38], they may also induce visual artifacts such as loss of details, color distortion and uneven exposure. Our method also belongs to this category, which extends upon the previous work [30] in four different ways as mentioned in the introduction, and is able to robustly generate visually pleasing results free of the visual artifacts encountered by previous methods.

Learning-based methods. An increasing amount of efforts focus on investigating learning-based methods since the pioneering work of Bychkovsky et al. [22], which provides a dataset consisting of image pairs for tone adjustment. Yan et al. [24] achieved automatic color enhancement by tackling a learning-to-rank problem, while Yan et al. [39] enabled semantic-aware image enhancement by leveraging scene semantics. Gharbi et al. [11] proposed bilateral learning to enable real-time image enhancement, while Chen et al. [12] designed an unpaired learning model for enhancement based on a two-way generative adversarial networks (GANs). Yang et al. [40] corrected LDR images by using a deep reciprocating HDR transformation. Cai et al. [41] learned a contrast enhancer from multi-exposure images. Deep encoder-decoder network is also utilized to enhance low-light images [42, 43]. More recently, Jiang et al. [44] introduced the EnlightenGAN for low-light enhancement, while two other recent methods work by performing deep Retinex decomposition [28, 29]. However, these methods may not work well on images that are significantly different with the training images.

III Underexposed Photo Enhancement

This section presents our underexposed photo enhancement approach. We first summarize the background knowledge on Retinex-based image enhancement and illustrate how to cast photo enhancement as an illumination estimation problem. Then, we introduce PBS and analyze how we define it as constraints on illumination. Next, we formulate PBS-constrained illumination estimation for enhancing underexposed photos while avoiding the common visual artifacts, and provide in-depth model analysis. Finally, we describe an efficient implementation for the illumination estimation.

III-A Background on Retinex-based Image Enhancement

Retinex-based image enhancement [8, 10] assumes that an underexposed image $I$ (normalized to [0,1]) is the pixel-wise product of the desired enhanced image $R$ and a single-channel illumination map $S$ , which is expressed as

[TABLE]

where $\times$ denotes pixel-wise multiplication. With the above assumption, image enhancement can be reduced to an illumination estimation problem, since the enhanced image can be recovered by $R=I/S$ as long as $S$ is known.

III-B Perceptually Bidirectional Similarity (PBS)

We first analyze the common issues encountered by existing methods, which inspire the proposal of PBS. As shown in Fig. 3(b)-(g), color distortion, uneven exposure and loss of detail are the three main issues. CLAHE [31] and NPE [7] distort the skin color and mistakenly make the girl’s face and arms gray, giving rise to color family mismatch between the input image and the enhanced outputs. Yuan and Sun [34] and WVM [8] induce exposure inconsistency around the arms and the body, while these regions have consistent exposure in the input image. Bennett and McMillan [33] and LIME [10] overexpose the background and lead to loss of detail.

From the above analysis, we have come to an important observation — that is, the reason why existing methods fail to produce visually pleasing results is because they break the perceptual consistency of color, detail and local exposure distribution between the input image and the enhanced output. In other words, this observation suggests that a good enhanced image should not only improve the detail visibility of the underexposed regions, but also satisfy two properties: 1) it should contain all the visual information (can be enhanced versions) in the input image; 2) it should not introduce new visual information that does not exist in the input image. Aware of this, we propose perceptually bidirectional similarity (PBS), which more specifically characterizes the aforementioned two requirements for the enhanced image $R$ of an underexposed image $I$ : 1) colors and details in $I$ should all exist in $R$ as properly enhanced versions ( $\geq 1$ ), and regions in $I$ with consistent exposure should also have consistent exposure in $R$ ; 2) $R$ should not contain distorted colors, additional details and exposure inconsistencies that originally do not exist in $I$ .

III-C PBS as Constraints on Illumination

To utilize PBS, we define it as three constraints on illumination $S$ , which help ensure the bidirectional perceptual consistency of color, detail and exposure distribution between the input image $I$ and the enhanced image $R$ , respectively.

Color consistency. To preserve color consistency, we enforce each pixel’s color in $R$ and $I$ are in the same color family by imposing a range constraint on $S$ . Since $R=I/S$ and $I$ is normalized to [0,1], small (large) $S$ yields $R$ with high (low) RGB values. Intuitively, color inconsistency may appear in terms of mismatched colors in $R$ derived from naive color truncation, when $S$ is too small to guarantee that each RGB color channel in $R$ remains in the color gamut [0,1]. Hence, we bound $S$ to be no less than a value that can enlarge the maximum RGB color channel of each pixel in $I$ to the upper bound 1 through $R=I/S$ , which is expressed as

[TABLE]

where $I^{c}_{p}$ is a color channel at pixel $p$ . $\Gamma(\alpha)=\alpha^{\gamma}$ is the Gamma function with $\gamma\in(0,1)$ , which is an optional operation for further illumination adjustment. From Eq. 2, we can easily obtain $S^{\min}_{p}=(\max I^{c}_{p})^{1/\gamma}$ . To avoid mistakenly darken the input underexposed image, we set the upper bound of $S$ to 1, in which case the input will be directly taken as the output. Overall, for each pixel $p$ , the color consistency constraint can be defined as $S^{\min}_{p}\leq S_{p}\leq 1$ .

Detail consistency. We formulate the detail consistency described by PBS from a perspective of edge consistency as follows: 1) If $I$ is smooth at pixel $p$ , then $R$ should also be smooth at $p$ ; 2) If $I$ has an edge at pixel $p$ , then $R$ should have a stronger, or at least equivalent edge at $p$ . By associating edge with gradient and directional derivative, the above two cases can be characterized as the following constraint:

[TABLE]

where $\nabla$ denotes the gradient operator. $\partial_{d\in\{x,y\}}$ is the first order derivative along the horizontal ( $x$ ) or vertical ( $y$ ) direction. $\tau$ is a small constant (typically 1e-5) for determining whether there is an edge at a pixel. Note Eq. 3 can also be expressed as formulation about $S$ by replacing $R$ with $I/S$ .

Exposure distribution consistency. According to Eq. 1, the key to preserving the exposure distribution consistency is to ensure that $S$ is locally smooth for regions with similar brightness in the input. To this end, we alternatively adopt the relative total variation (RTV) measure [45] as the smoothness regularizer for obtaining piecewise smooth illumination, while maintaining prominent illumination discontinuities across regions. Adopting this regularizer can also help enhance image contrast, because when adjacent pixels $p$ and $q$ have similar illumination values ( $S_{p}\approx S_{q}$ ), their contrast in the enhanced image $R$ can be estimated as $|R_{p}-R_{q}|\approx|I_{p}-I_{q}|/S_{p}$ , which will be enhanced, since $S\leq 1$ . Note, other edge-aware smoothness regularizers [46, 47] can also work with our approach. Formally, the RTV measure is defined as

[TABLE]

where $\mathcal{H}(S_{p})$ and $\mathcal{V}(S_{p})$ denote the $x$ - and $y$ -direction RTV measure, respectively. Specifically, the $x$ -direction measure $\mathcal{H}(S_{p})$ is written as

[TABLE]

where $\mathcal{N}_{p}$ denotes a $15\times 15$ window centered at pixel $p$ . $u^{x}_{q}=G_{\sigma}*(|G_{\sigma}*\partial_{x}S_{q}|+\epsilon)^{-1}$ and $w^{x}_{q}=(|\partial_{x}S_{q}|+\epsilon)^{-1}$ , where $G_{\sigma}$ denotes a Gaussian kernel with standard deviation $\sigma=3$ , $*$ is the convolution operator, and $\epsilon=1e-3$ is used for preventing division by zero. $\mathcal{V}(S_{p})$ is defined similarly.

III-D PBS-constrained Illumination Estimation

This section illustrates how we formulate underexposed photo enhancement as PBS-constrained illumination estimation. We first introduce how to obtain an initial illumination for an input image. Then, we adopt the PBS constraints and design an optimization framework for refining the initial illumination, so that we can obtain the illumination that is able to recover PBS-satisfied enhanced image.

Intuitively, the brightness of different areas in an image roughly reflect the magnitude of illumination. Hence, inspired by [18], we obtain the initial illumination $S^{\prime}$ by treating the maximum values among the RGB color channels of the input image $I$ as the illumination values, which is expressed as

[TABLE]

As analyzed by [10], by this means, the initial illumination can better model the global illumination distribution, and also ensures that the enhanced image $R$ will be less saturated.

Although the initial illumination roughly depicts the overall illumination distribution, it typically contains richer details and textures that are not led by illumination discontinuities, making enhanced image directly recovered from it visually unrealistic, as shown in Fig. 4(c). Hence, we propose to estimate a refined illumination $S$ that satisfies the PBS constraints on illumination. To this end, we formulate the following objective function for estimating the desired illumination $S$ :

[TABLE]

where $\lambda$ is the balancing weight. The first term $(S_{p}-S^{\prime}_{p})^{2}$ forces the target illumination to be close to the initial illumination in structure, while the second term and the other two constraints are the PBS constraints. The objective function in Eq. 7 can be solved by introducing auxiliary variables to divide the intractable problem into several tractable subproblems (see [30] for details). With the refined illumination $S$ , the final enhanced image is recovered by $R=I/S^{\gamma}$ . Fig. 4 shows an example image enhanced by the proposed PBS-constrained illumination estimation. We can see that the refined illumination removes the redundant texture details in the initial illumination and yields more appealing enhancement result.

III-E Model Analysis

Effectiveness of each PBS constraint. Fig. 5 validates the effectiveness of each PBS constraint. We can see that the skin color is obviously distorted when we remove the color consistency constraint (see Fig. 5(b)), while removing the detail consistency constraint makes the grass as well as the face and arm overexposed (see Fig. 5(c)). Without the exposure distribution consistency constraint, the enhanced image shows unpleasing exposure inconsistency around the body (see Fig. 5(d)), while these regions have similar exposure level in the input image. Last, by combining all the three PBS constraints, we obtain a visually pleasing result with clear details, vivid color, distinct contrast and consistent exposure distribution, as shown in Fig. 5(e).

Parameter setting. The key parameter of our approach is $\lambda$ , which determines the smoothness level of the estimated illumination. In general, we set large $\lambda$ for highly textured images. $\gamma$ is another parameter that affects the result quality. In all our experiments, we empirically set $\lambda=0.8$ and $\gamma=0.6$ , which are able to produce reasonably good results for our test images. Fig. 6 evaluates the effect of varying $\lambda$ and $\gamma$ . As shown in the first row, large $\lambda$ produces result with strong local contrast. However, this effect becomes less obvious when $\lambda>0.8$ . As large $\lambda$ typically requires more iterations to converge, we fix $\lambda=0.8$ as a trade-off. The second row of Fig. 6 shows how $\gamma$ affects the results. We can see that the result without Gamma adjustment (namely $\gamma=1$ ) is also satisfactory, but too bright to be consistent with the image aesthetic. Decreasing $\gamma$ reduces the overall brightness, but at the cost of lowering the overall visibility. To obtain better visual results, we set $\gamma=0.6$ for our test images.

Convergence analysis. The PBS-constrained illumination estimation optimization in Eq. 7 stops iteration when: (i) the difference between two consecutive solutions is less than a small threshold (1e-3), or (ii) the maximum number of iterations (we empirically set it as 20) is reached. Fig. 7 shows the convergence curve for an example image. As shown, the illumination estimation converges after 7 iterations, and more iterations barely improve the result.

III-F Efficient Implementation

The PBS-constrained illumination estimation in Eq. 7 runs practically slow compared with [37, 10, 11], because it involves iteratively solving a set of subproblems. To make it more efficient and scalable to high-resolution images, we introduce an efficient computation for it.

Considering that illumination in natural images is generally piece-wise smooth and very suitable for edge-aware sampling, we propose to sample a low-resolution (low-res) input for illumination estimation, and upsample the low-res illumination to full-resolution (full-res) for enhancing the full-res underexposed image. Specifically, we first downsample the input image with its larger dimension (width or height) no more than 400 pixels, and perform illumination estimation on the downsampled low-res input. Then, we apply joint bilateral upsampling (JBU) [48] to transform the low-res illumination $\bar{S}$ to full-res version $S$ in an edge-aware manner, which is expressed as

[TABLE]

where $S^{\prime}$ is the initial illumination (full-res) obtained from Eq. 6. $p$ and $q$ denote coordinates of pixels in $S$ and $S^{\prime}$ . $p_{\downarrow}$ and $q_{\downarrow}$ denote coordinates of pixels in the low-res solution $\bar{S}$ . $f$ and $g$ are spatial and range filter kernels in terms of truncated Gaussian with standard deviation $\sigma_{d}=0.5$ and $\sigma_{r}=0.1$ , respectively. $\Omega$ denotes a $5\times 5$ window centered at pixel $p_{\downarrow}$ . $\mathcal{Z}_{p}$ is the normalizing factor that sums the filter weight $f(\cdot)g(\cdot)$ . Using the above implementation, the runtime for enhancing an 685 $\times$ 1024 image in Fig. 8(a) drops from 3 seconds to 0.3 seconds on a PC with Core i5-7400 CPU, while the enhanced image is visually indistinguishable from that of the naive implementation, as shown in Fig. 8.

IV Underexposed Video Enhancement

This section describes how we extend our method to handle underexposed videos. As implementing the PBS-constrained illumination estimation for each video frame tends to cause temporal inconsistencies in the form of jittering artifacts, and naively extending the illumination estimation to the entire video is computationally expensive, we thus propose to obtain temporally coherent illumination sequence by propagating illuminations of sparsely sampled keyframes to the others. Fig. 9 shows the pipeline of our method for enhancing underexposed videos. For a given underexposed video, we first sample some keyframes, and then perform illumination estimation to obtain their illuminations. Next, we propagate these illuminations to other temporally adjacent frames. Finally, a video denoising operation is applied to remove noise in the enhanced video recovered from the obtained illumination sequence. In the following we describe each step in details.

IV-A Keyframe Extraction & Illumination Estimation

The first step in our underexposed video enhancement pipeline searches for keyframes. Intuitively, keyframes that approximately depict the overall illumination changes of the source video are required to allow reliable illumination propagation. Based on this observation, we begin by taking the first frame as a keyframe, and then select the nearest frame that differs the first keyframe in luminance over $30\%$ pixels as the second keyframe. The third keyframe is similarly determined based on the second keyframe. We iteratively perform above operation to collect all keyframes. Note we compute the luminance difference in Lab color space, and consider two pixels to be different in luminance if the normalized difference is no less than a threshold $\ell=0.1$ . In addition, a Gaussian smoothing is applied to the luminance channel of the source underexposed video to reduce the effect of noise before extracting the keyframes.

The second step in our pipeline estimates illumination for the collected keyframes. In order to achieve higher efficiency, the illumination estimation optimization in Eq. 7 together with the efficient implementation in Eq. 8 are employed to obtain illumination for each keyframe.

IV-B Temporal Illumination Propagation

In the third step, we propagate illuminations of the keyframes to the rest of video frames. For each keyframe, we propagate its illumination over successive frames, until a new keyframe is found to start a new round of illumination propagation. We iteratively implement above illumination propagation until we reach the end of the video. Fig. 10 shows how our illumination propagation works.

Let $f_{t}(t=1,2...)$ be the frames of an input video. For a pixel $p$ in frame $f_{t}$ (not a keyframe and the illumination is unknown at this point), with the luminance channel (i.e., Y channel in YUV color space) as $L_{p,t}$ , we aim to predict its most likely illumination value based on the previous frame $f_{t-1}$ (either a keyframe or a frame with propagated known illumination) using a Bayesian formulation. To simplify the problem, we construct a histogram $H$ of 16 bins for illumination values of the frame $f_{t-1}$ , where $H_{i}$ denotes the $i$ -th bin and $|H_{i}|$ returns the number of pixels assigned to the bin. In this way, the illumination propagation problem reduces to finding the histogram bin of the previous frame $f_{t-1}$ that pixel $p$ in current frame $f_{t}$ belongs to. To achieve this, we introduce a probabilistic approach to find the bin index that maximizes the posterior probability $P(H_{i}|p)$ by addressing a Maximum A Posteriori (MAP) problem as

[TABLE]

where $P(p|H_{i})$ denotes the likelihood that pixel $p$ belongs to the bin $H_{i}$ . $P(H_{i})$ is a prior. Below we describe these two terms in detail.

We compute the likelihood of illumination value of a pixel $p$ in frame $f_{t}$ that belongs to the bin $H_{i}$ based on the probability density function of $H_{i}$ . Our main idea is to employ standard non-parametric density estimation for calculating the compatibility of assigning a pixel to a bin. By adopting the Parzen window-based approximation, we define $P(p|H_{i})$ as

[TABLE]

where $\Psi_{p^{\prime}}$ denotes a $N\times N$ ( $N=30$ ) squared window centered at pixel $p^{\prime}$ in frame $f_{t-1}$ . $p^{\prime}=p+v_{p}$ is the corresponding pixel of $p$ (in frame $f_{t}$ ), which is indicated by the motion vector $v_{p}$ between $f_{t}$ and $f_{t-1}$ . $q$ indexes pixels within the window $\Psi_{p^{\prime}}$ . $L_{q,t-1}$ denotes the luminance value of the pixel $q$ in frame $f_{t-1}$ , and $\phi_{i}(L_{q,t-1})$ returns the number of pixels with luminance value $L_{q,t-1}$ in the $i$ -th bin $H_{i}$ . $\mathcal{G}$ is a Parzen window defined by a 1-D Gaussian kernel function with width $d=5$ . Note the optical flow of the source video is computed by the method of [49].

Explicitly computing $P(H_{i})$ is difficult, we instead follow common MAP solutions [50] to devise a smoothness term to approximate the prior $P(H_{i})$ . For the pixel $p^{\prime}$ ( $f_{t-1}$ ) computed from $p$ ( $f_{t}$ ) by optical flow, we define $D(p^{\prime},q^{\prime})$ as the Euclidean distance between pixels $p^{\prime}$ and $q^{\prime}$ , where $q^{\prime}$ denotes a pixel within a squared window centered at $p^{\prime}$ that belongs to the $i$ -th bin $H_{i}$ . Formally, the prior $P(H_{i})$ is formulated as

[TABLE]

Note $P(H_{i})$ is feasible to be treated as a prior since it is irrelevant to the luminance of a pixel, and can be computed after the previous frame has been processed.

IV-C Video Denoising

While the proposed method can robustly enhance underexposed videos, it may also amplify the underlying noise. Unlike still images, the noise issue is usually non-negligible for dynamic video. Thus, to further improve the visual quality, we in the final step employ a video denoising operation to reduce the noise level of the enhanced video. In order to trade off the denoising performance and the runtime efficiency, we adopt V-BM4D [51], though any other video denoising algorithms would also work with our method.

IV-D Result and Comparison

Fig. 11 shows an example underexposed video enhanced by our approach. As can be seen, by obtaining the illumination sequence, we successfully light up the underexposed regions and reveal the underlying texture details of the umbrella. The video denoising operation further reduces the noise level of the enhanced video and generates a better result. Fig. 12 compares our method with previous underexposed video enhancement methods PPVE [33] and PDPF [35]. We can see that [33] produces over-saturated result and induces clear jittering artifacts around the legs, while result of [35] fails to present distinct contrast and vivid color. In comparison, our method produces a more appealing result. Note that the average counts of frames between two adjacently sampled keyframes for the videos shown in Fig. 11 and Fig. 12 are 25 and 19.

We also follow [35] to evaluate video enhancement performance via user study. Specifically, we use five videos from [35] for testing. For each video, we ask 10 subjects to rank the enhancement results produced by [33, 35] and our method in terms of temporal consistency and visual effect using a rating scale from 1 (worst) to 3 (best). As shown in Fig. 15, the rating distribution shows that results produced by our method are more preferred by human subjects. Table I further reports the DE and NIQE scores (see Section V.A for details of the two metrics) with mean and standard deviation for video enhancement results employed in the user study. As shown, our method outperforms the other two compared methods, since it achieves higher DE and lower NIQE values. Besides, our method also achieves lower standard deviation on the two metrics, demonstrating that it can better preserve the overall temporal consistency.

V Experiment

V-A Datasets and Evaluation Metrics

Benchmark datasets. We employ six benchmark datasets to evaluate our method, which are the NPE dataset [7], MEF dataset [52], MF dataset [37], LIME dataset [10], VV dataset 111https://sites.google.com/site/vonikakis/datasets and the FiveK dataset [22]. Note that, for the FiveK dataset, we randomly select 100 underexposed images for evaluation, while the remaining 4900 images are used for training the HDRNet method [11] to be compared.

Evaluation metrics. Since most benchmark datasets do not provide ground truth enhanced images, we employ two commonly-used non-reference metrics to quantitatively evaluate the algorithm performance. The first one is DE (discrete entropy) [53], which measures the performance of detail/contrast enhancement. The second one is NIQE (natural image quality evaluator) [54], which is a learned model for assessing the overall naturalness of images. In general, high DE values of the enhanced images mean that the detail visibility of the original images are better improved, while low NIQE values indicate that the enhanced images own good naturalness. Although it is not absolutely true, high DE and low NIQE values usually indicate reasonably good results.

V-B Comparison with State-of-the-art Methods

We compare our method with six recent underexposed photo enhancement methods: NPE [7], WVM [8], JieP [9], LIME [10], HDRNet [11] and DPE [12]. The first four are Retinex-based methods, while the last two are deep-learning-based methods. For fair comparison, we obtain the results of the compared methods either from the online demo programs or by producing them using implementations provided by the authors with the recommend parameter setting. In the following, we conduct the comparison in three aspects, including visual comparison, quantitative comparison, and a user study.

Visual comparison. We first show visual comparison in Fig. 13 and 14 on two challenging cases from the employed datasets: (i) a non-uniformly exposed photo with dim candlelight and imperceptible scene details (from the MEF dataset), (ii) an uniformly underexposed photo with little portrait details of the crawling baby (from the FiveK dataset). Comparing the results, we can see that our method outperforms the compared methods and has the following two advantages. First, it is able to recover more details and better contrast for the underexposed regions, without degrading other parts of the image. Second, it can reveal more vivid and natural colors, which makes our enhanced images look more realistic. Please see the supplementary material for more visual comparisons between our method and the state-of-the-arts.

Quantitative comparison. Second, we quantitatively evaluate the performance of our method by comparing it with other methods in terms of the DE and NIQE metrics. Table II reports the quantitative comparison results. Note that, the original average DE and NIQE values for each dataset are also shown for reference. As can be seen, all methods increase the DE value due to the detail/contrast enhancement, and reduce the NIQE value because of lightening the underexposed regions. In contrast, our method achieves higher DE and lower NIQE than other compared methods on almost all the datasets, which manifests that our method can not only recover clearer details and more distinct contrast, but also better preserve the overall naturalness and photorealism of the enhanced images.

User study. Since evaluating the visual quality of the enhanced images involves judgement of personal preference, we further conducted a user study to compare the results. To this end, we enhanced each test image in the six employed datasets using our method and the other six compared methods, and recruited 100 subjects via Amazon Mechanical Turk to rate the results. Specifically, for each test image, each subject was asked to rate seven different enhancement results (ours and other six methods’) using a Likert scale from 1 (worst) to 7 (best), according to the following common requirements for the results: (i) clear details and distinct contrast, (ii) natural and vivid color, (iii) no loss of detail and overexposure, (iv) well-preserved photorealism. To avoid subjective bias, the subjects were assigned with anonymous results in random orders. After the subjects finished rating all the results, we computed the average ratings obtained by each method on different datasets. Fig. 16 summarizes the ratings, where we can see that our method receives higher ratings compared to the others, demonstrating that results generated by our algorithm are more preferred by human subjects in average.

V-C More Analysis

Relationship to color constancy. Our approach can also be extended to producing visually plausible color constancy effect by performing the illumination estimation separately on each RGB channel. As shown in Fig. 17, compared with the properly exposed image, our method not only improves the scene visibility of the underexposed image, but also partially removes the color of candlelight, e.g., the background curtain. Note that color constancy is a challenging problem, and low light condition would make the problem more difficult. Our three-channel illumination map extension is just a very simple trial to this problem. Hence, it may not always produce satisfactory color constancy effect, e.g., the desktop in Fig. 17(c).

Application to overexposure correction. Our method is also applicable to overexposure correction. As found by [55], the inverted version of an overexposed image can be seen as an underexposed image, allowing us to fix overexposed regions by enhancing the corresponding underexposed regions in the inverted image. For a given overexposed image $I$ , we first compute its inverted image $\hat{I}$ by $\hat{I}=1-I$ . Then we perform illumination estimation on $\hat{I}$ to obtain the illumination $\hat{S}$ , from which we recover the enhanced image $\hat{R}$ . Finally, we get the overexposure corrected result $R$ by performing another inversion operation $R=1-\hat{R}$ . Fig. 18 shows two examples.

Limitations. Our method has limitations. As shown in Fig. 19, our method and the compared state-of-the art methods all fail to produce visually compelling results for the test image in Fig. 19(a), since the regions of the knight and the horse are almost black and barely have any textures and details. Another limitation is that our method may amplify noise together with the fine scale details when the input image is noisy.

V-D Additional Results

Fig. 20 shows more results produced by our method, where the underexposed images are diverse and involve various lighting conditions, including: (i) a nighttime outdoor image with an irregular light source in the center of the image (1st column), (ii) an evenly exposed image with little details of the dog and the grassland (2nd column), (iii) an indoor image with objects on the desk underexposed (3rd column) and (iv) an unevenly exposed image with the sky normally exposed while the building underexposed (4th column). As shown, for all these cases, our method produces good results.

VI Conclusion and Future Work

We have presented an approach for enhancing underexposed photos. Unlike previous methods, we reveal the reason why they tend to produce visually unpleasing results from a perspective of perceptual consistency of visual information, and accordingly propose perceptual bidirectional similarity (PBS) for explicitly describing how to maintain the perceptual consistency. Then, we design PBS-constrained illumination estimation for enhancing underexposed photos while avoiding the common visual artifacts. In addition, we extend our method to handle underexposed videos by introducing a probabilistic approach for propagating illumination along the temporal dimension. We have performed extensive experiments on six benchmark datasets, and compared our method with various state-of-the-art methods to demonstrate its superiority.

Acknowledgment

The authors would like to thank the anonymous reviewers for their constructive comments. This work was partially supported by the National Key Research and Development Program of China (2016YFB1001001), NSFC (61802453, U1911401, U1811461, 61902275), Fundamental Research Funds for the Central Universities (19lgpy216, D2190670), Guangdong Province Science and Technology Innovation Leading Talents (2016TX03X157), Guangdong NSF Project (2018B030312002, 2019A1515010860), Guangzhou Research Project (201902010037), and Research Projects of Zhejiang Lab (2019KD0AB03). The corresponding author of this work is Wei-Shi Zheng.

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] X. Liang, L. Lin, W. Yang, P. Luo, J. Huang, and S. Yan, “Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval,” IEEE Transactions on Multimedia , vol. 18, no. 6, pp. 1175–1186, 2016.
2[2] B. Kang, Y. Lee, and T. Q. Nguyen, “Depth-adaptive deep neural network for semantic segmentation,” IEEE Transactions on Multimedia , vol. 20, no. 9, pp. 2478–2490, 2018.
3[3] J. C. Nascimento and J. S. Marques, “Performance evaluation of object detection algorithms for video surveillance,” IEEE Transactions on Multimedia , vol. 8, no. 4, pp. 761–774, 2006.
4[4] X. Dong, J. Shen, D. Yu, W. Wang, J. Liu, and H. Huang, “Occlusion-aware real-time object tracking,” IEEE Transactions on Multimedia , vol. 19, no. 4, pp. 763–771, 2016.
5[5] H. Li, F. Meng, and K. N. Ngan, “Co-salient object detection from multiple images,” IEEE Transactions on Multimedia , vol. 15, no. 8, pp. 1896–1909, 2013.
6[6] X. Lin, Z.-J. Wang, L. Ma, and X. Wu, “Saliency detection via multi-scale global cues,” IEEE Transactions on Multimedia , 2018.
7[7] S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Transactions on Image Processing , vol. 22, no. 9, pp. 3538–3548, 2013.
8[8] X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” in CVPR , 2016, pp. 2782–2790.