Total Directional Variation for Video Denoising
Simone Parisotto, Carola-Bibiane Sch\"onlieb

TL;DR
This paper introduces a variational video denoising method using total directional variation (TDV) regularisation, which leverages anisotropic structure encoding via a volumetric structure tensor to improve denoising performance.
Contribution
The paper extends the TDV regulariser to video denoising by incorporating a volumetric structure tensor, enhancing the preservation of anisotropic features in videos.
Findings
Outperforms some state-of-the-art video denoising methods
Effectively captures anisotropic structures in videos
Demonstrates improved denoising quality in numerical experiments
Abstract
In this paper, we propose a variational approach for video denoising, based on a total directional variation (TDV) regulariser proposed in Parisotto et al. (2018), for image denoising and interpolation. In the TDV regulariser, the underlying image structure is encoded by means of weighted derivatives so as to enhance the anisotropic structures in images, e.g. stripes or curves with a dominant local directionality. For the extension of TDV to video denoising, the space-time structure is captured by the volumetric structure tensor guiding the smoothing process. We discuss this and present our whole video denoising work-flow. Our numerical results are compared with some state-of-the-art video denoising methods.
| Name (, , , ) | input | V-BM3D | V-BM4D | TDV () | ROF 2D+t () | |
|---|---|---|---|---|---|---|
| 10 | 28.13 | 45.99 | 46.90 | 49.16 (1.66, 1.71, 16.27) | 42.56 (16.27) | |
| Franke grey-scale | 20 | 22.11 | 41.64 | 42.67 | 45.23 (2.00, 2.00, 08.10) | 38.18 (08.10) |
| 35 | 17.25 | 38.63 | 39.34 | 41.89 (2.40, 2.40, 04.70) | 34.59 (04.70) | |
| 50 | 14.15 | 36.37 | 37.17 | 39.64 (2.70, 2.70, 03.30) | 32.30 (03.30) | |
| 70 | 11.23 | 30.60 | 35.03 | 37.44 (3.00, 3.00, 02.45) | 30.45 (02.45) | |
| 10 | 28.13 | 47.13 | 48.21 | 50.51 (1.89, 1.92, 16.59) | 44.10 (16.59) | |
| Franke coloured | 20 | 22.11 | 42.96 | 43.97 | 46.46 (2.35, 2.35, 08.35) | 39.93 (08.35) |
| 35 | 17.25 | 40.18 | 40.47 | 42.97 (2.79, 2.83, 04.74) | 36.36 (04.74) | |
| 50 | 14.15 | 38.11 | 38.15 | 40.74 (3.13, 3.17, 03.45) | 34.41 (03.45) | |
| 70 | 11.23 | 31.72 | 35.90 | 38.62 (3.50, 3.50, 02.45) | 32.29 (02.45) | |
| 10 | 28.13 | 37.30 | 37.12 | 35.24 (0.55, 0.68, 29.25) | 31.48 (29.25) | |
| Salesman | 20 | 22.11 | 34.13 | 33.33 | 31.96 (0.70, 0.75, 13.93) | 28.16 (13.93) |
| 35 | 17.25 | 30.79 | 30.20 | 29.36 (0.89, 0.89, 07.95) | 26.01 (07.95) | |
| 50 | 14.15 | 28.32 | 28.33 | 27.78 (1.05, 1.06, 05.45) | 24.78 (05.45) | |
| 70 | 11.23 | 24.55 | 26.68 | 26.34 (1.27, 1.32, 03.96) | 23.87 (03.96) | |
| 10 | 28.13 | 43.83 | 44.68 | 43.13 (0.93, 1.15, 25.75) | 39.18 (25.75) | |
| Water | 20 | 22.11 | 40.59 | 41.02 | 39.84 (1.18, 1.35, 12.60) | 35.94 (12.60) |
| 35 | 17.25 | 37.75 | 37.90 | 37.14 (1.40, 1.40, 06.95) | 33.36 (06.95) | |
| 50 | 14.15 | 35.58 | 35.85 | 35.41 (1.61, 1.65, 04.80) | 31.83 (04.80) | |
| 70 | 11.23 | 30.11 | 33.87 | 33.78 (1.80, 1.85, 03.45) | 30.51 (03.45) |
| Name (, , , ) | input | V-BM3D | V-BM4D | TDV () | ROF 2D+t () | |
|---|---|---|---|---|---|---|
| 10 | 28.13 | 39.64 | 39.93 | 39.25 (0.63, 0.63, 25.50) | 36.93 (25.50) | |
| Miss America | 20 | 22.11 | 37.95 | 37.78 | 37.28 (0.90, 0.90, 12.75) | 34.60 (12.75) |
| 35 | 17.25 | 36.03 | 35.77 | 35.44 (1.19, 1.19, 07.29) | 32.72 (07.29) | |
| 50 | 14.15 | 34.19 | 34.26 | 34.14 (1.42, 1.42, 05.10) | 31.47 (05.10) | |
| 70 | 11.23 | 28.86 | 32.64 | 32.85 (1.68, 1.68, 03.64) | 30.29 (03.64) | |
| 90 | 09.05 | 27.42 | 31.27 | 31.87 (1.90, 1.90, 02.83) | 29.42 (02.83) | |
| 10 | 28.13 | 37.82 | 37.49 | 35.96 (0.63, 0.63, 25.50) | 32.70 (25.50) | |
| Xylophone coloured | 20 | 22.11 | 34.70 | 34.13 | 33.06 (0.90, 0.90, 12.75) | 29.57 (12.75) |
| 35 | 17.25 | 32.06 | 31.65 | 30.93 (1.19, 1.19, 07.29) | 27.16 (07.29) | |
| 50 | 14.15 | 29.98 | 30.07 | 29.58 (1.42, 1.42, 05.10) | 25.72 (05.10) | |
| 70 | 11.23 | 25.89 | 28.51 | 28.32 (1.68, 1.68, 03.64) | 24.43 (03.64) | |
| 90 | 09.05 | 24.50 | 27.32 | 27.37 (1.90, 1.90, 02.83) | 23.57 (02.83) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: CCA, University of Cambridge, Wilberforce Road, Cambridge, CB3 0WA, UK
11email: [email protected] 22institutetext: DAMTP, University of Cambridge, Wilberforce Road, Cambridge, CB3 0WA, UK
22email: [email protected]
Total Directional Variation for Video Denoising
††thanks: SP acknowledges UK EPSRC grant EP/L016516/1 for the CCA DTC. CBS acknowledges support from Leverhulme Trust project on Breaking the non-convexity barrier, EPSRC grant Nr. EP/M00483X/1, the EPSRC Centre EP/N014588/1, the RISE projects CHiPS and NoMADS, the CCIMI and the Alan Turing Institute.
Simone Parisotto 11
Carola-Bibiane Schönlieb 22
Abstract
In this paper we propose a variational approach for video denoising, based on a total directional variation (TDV) regulariser proposed in [21, 20] for image denoising and interpolation. In the TDV regulariser, the underlying image structure is encoded by means of weighted derivatives so as to enhance the anisotropic structures in images, e.g. stripes or curves with a dominant local directionality. For the extension of TDV to video denoising, the space-time structure is captured by the volumetric structure tensor guiding the smoothing process. We discuss this and present our whole video denoising workflow. The numerical results are compared with some state-of-the-art video denoising methods.
Keywords:
Total directional variation Video denoising Anisotropy Structure tensor Variational methods.
1 Introduction
Video denoising refers to the task of removing noise in digital videos. Compared to image denoising, video denoising is usually a more challenging task due to the computational cost in processing large data and the redundancy of information, i.e. the expected similarity between two consecutive frames that should be inherited by the denoised video. A straightforward approach to video denoising is to denoise each frame of the video independently, by using the broad literature on image denoising methods, see e.g. [26, 24, 6, 23, 4, 11, 15, 3, 19, 12, 21]. Computational cost is then stratified across image frames by sequentially processing them, which is seen as an advantage. However, a significant disadvantage of this frame-by-frame processing is the appearance of flickering artefacts and post-processing motion compensation step may be required [18, 2].
In recent years different approaches have been proposed for solving the video denoising problem: we refer to the introduction of [1] for an extensive survey. Notably, patch-based approaches are usually considered among the most promising video denoising methods in that they are able to achieve qualitatively good denoising results. For example, V-BM3D is the 3D extension of the BM3D collaborative filters [10]: without inspecting the motion time-consistency, V-BM3D independently filters 2D patches resulting similar in the 3D spatio-temporal neighbourhood domain. As mentioned in [1], while generally receiving good denoising results, the problem of flickering still occurs in V-BM3D. For this reason, authors of V-BM3D developed an extension, called V-BM4D, where the patch-similarity is explored along space-temporal blocks defined by a motion vector, see [17]. Similarly, in [5] the authors propose to group patches via an optical flow equation based on [28] and implemented in [25]. In these approaches, while the incorporation of motion helps to provide consistency in time, denoising results also suffer from the lack of accuracy in the estimated motion. A possible way to avoid the motion estimation is to consider 3D rectangular patches so as to inherently model the 3D structure and motion in the spatio-temporal video dimensions, based on the fact that rectangular 3D patches are less repeatable than motion-compensated patches. However, such approach is not efficient for uniform motion or homogeneous spatial patterns, cf. the discussion on this topic in [1]. Motivated by this reasoning the authors of [1] introduce a Bayesian patch-based video denoising approach with rectangular 3D patches modelled as independent and identically distributed samples from an unknown a priori distribution: then each patch is denoised by minimising the expected mean square error. Other approaches in video denoising are the straightforward extension of the Rudin-Osher-Fatemi (ROF) model [24] to 3D data, by using a spatio-temporal total variation (referred in the next as ROF 2D+t), the joint video denoising with the computation of the flow [7] and CNN approaches [13].
Scope of the paper.
In this paper we propose an extension of the recently introduced total directional variation (TDV) regulariser [21, 20] for video denoising, via the following variational regularisation model:
[TABLE]
where is the denoised video, is a weighting field that encodes directional features in two spatial and one temporal dimension, is the regularisation parameter and is a given noisy video. The model (1) will be made more precise in the next sections where we mainly focus on its discrete and numerical aspects. In order to accommodate for spatial-temporal data, we consider here a modification of the TDV regulariser given in [21, 20] that derives directionality in the temporal dimension. Differently from the patch-based approach, we compute for each voxel the vector field of the motion, to be encoded as a weight in the TDV regulariser. With this voxel-based approach we will reduce the flickering artefact which appears in patch-based approaches due to the patch selection, especially in regions of smooth motion. Results are presented for a variety of videos corrupted with Gaussian white noise.
Organisation of the paper.
This paper is organised as follows: in Section 2 we describe the estimation of the vector fields, the TDV regulariser and the variational model to be minimised; in Section 3 we describe the optimisation method for solving the TDV video denoising problem and comment on the selection of parameters; in Section 4 we show denoising results on a selection of videos corrupted with Gaussian noise of varying strength.
2 Total directional variation for video denoising
Let be a clean video and a spatial, rectangular domain indexed by , with number of frames and colours. Let be a corrupted version of in each space-time voxel by i.i.d. Gaussian noise of zero mean and (possibly known) variance :
[TABLE]
In what follows, we propose to compute a denoised video by solving
[TABLE]
where is the proposed total direction regulariser w.r.t. a weighting field , both specified in the next sections, and a regularisation parameter.
2.1 The directional information
In order to capture directional information of in (3), we eigen-decompose the two-dimensional structure tensor [27] in each coordinate plane.
To do so, we first construct the 3D structure tensor: let be two smoothing parameters, be the Gaussian kernels of standard deviation and , respectively, and let . Then the 3D structure tensor reads as
[TABLE]
where , for each .
For a straightforward application to the TDV regulariser in [21], we extract the 2D sub-tensors of (4), whose eigen-decomposition encodes structural information in each of the coordinate frames spanned by , and :
[TABLE]
For each , the eigenvector has eigenvalue . The tangential directions in the 2D planes and are , respectively, with the gradient directions, see Fig. 1.
From (5), the ratios between the eigenvalues, called confidence, measure the local anisotropy of the gradient on the slices within a certain neighbourhood:
[TABLE]
Here, and the closer to 0, the higher is the local anisotropy.
2.2 The regulariser
The TDV regulariser is composed of a gradient operator weighted by a tensor , whose purpose is to smooth along selected directions. In view of the spatial-temporal data, we extend the natural gradient operator to the Cartesian planes , and . We will denote with the concatenation of resulting 2-dimensional gradients. Further, we encode (5) and (6) in , leading to the weighted gradient for the video function :
[TABLE]
Note that is computed once from the noisy input . For a fixed frame with and direction the gradient is the directional derivative of along w.r.t. the frame . See [22, Fig. 3.12] for more details about this choice. With this notation in place, we consider the total directional variation (TDV) regulariser,
[TABLE]
By plugging (8) into (9) we reinterpret (9) as a penalisation of the rate of change along , with coefficients as bias in the gradient estimation. Note, that while in [21] the regulariser has been proposed for a general order of derivatives, we consider here only a regulariser of first differential order.
2.3 Connections to optical flow
Let be a voxel and its intensity in the grey-scale video sequence . If is moved by a small increment between two frames, then the brightness constancy constraint reads
[TABLE]
If is sufficiently smooth, then the optical flow constraint is derived [16, 14] as a linearisation of (10) with respect to a velocity field :
[TABLE]
For a specific field with , Equation (11) is equivalent to
[TABLE]
We can now re-write (11) by means of the following velocity vector fields:
[TABLE]
leading to
[TABLE]
Here, the right-hand side of (14) encodes the components that we aim to penalise in (8). Thus, the penalisation of (8) is equivalent to the penalisation of the left-hand side of (14), assumed (11) holds with velocity fields in (13). Note that the weights and add a contribution in the direction of the gradients , respectively.
2.4 The minimisation problem
We aim to find the denoised video from the noisy input video by solving the minimisation problem (3). For the numerical optimisation of (3) we use a primal-dual scheme [9]. For this, we rewrite (3) as a saddle point problem for the operator , whose adjoint will be denoted by . In what follows, we denote by the primal variable, the dual variable, the Fenchel conjugate of , by the fidelity term and by the dedicated parameters of the primal-dual algorithm, see [9] for more details on the primal-dual schemes in image processing and [21] for their application to variational problems with TDV regulariser. The resulting saddle-point problem reads
[TABLE]
In the primal-dual algorithm solving (15) we need the proximal operators:
[TABLE]
where is the identity matrix. Note that is uniformly convex, with convexity parameter , so the dual problem is smooth. An accelerated version of the primal-dual algorithm can be used in this case, e.g. [8, Alg. 2], starting with where and is the squared operator norm, (which holds in connection with the discretisation in (17) and stepsize ).
3 The discrete model
In the discrete model, is a rectangular grid of size and a video is a volumetric data of size (heightwidthframescolours). Here, we consider grey-scale videos () along the axes , with , and . An extension to coloured videos is straightforward by processing each colour channel separately. Here, a fixed identifies a voxel in the gridded video domain, i.e. a small cube of size in each axis direction. Then, is the intensity in the voxel in the grey-scale video sequence . The noisy input video is denoted by as well as the other discrete vectorial quantities, namely and for .
3.1 Discretisation of derivative operators and vector fields
We describe a finite difference scheme on the voxels by introducing the discrete gradient operator , with defined via the central finite differences on half step-size and Neumann conditions as
[TABLE]
Remark 1.
While lies at the vertices of the discrete grid, lies on its edges. Thus, (17) is advantageous for local anisotropy since it has sub-pixel precision and a more compact stencil radius than the classical forward scheme.
In (7), acts on as follows:
[TABLE]
Any field with and confidence , , will be discretised in the cell centres of the discrete grid domain. The weighting multiplication in (7) is performed via an intermediate averaging interpolation operator that avoids artefacts due to the grid offset: this gives
3.2 TDV for video denoising
The TDV-based workflow consists of two steps, with pseudo-code in Alg. 1. The first one computes the directions via the eigen-decomposition in (5) while the second one is the primal-dual algorithm [8, Alg. 2], whose stopping criterion is the root mean square difference between two consecutive dual variable iterates.
4 Results
In this section we discuss the numerical results for video denoising obtained with Alg. 1. Considered videos have been taken from a benchmark video dataset111Videos are freely available: Salesman and Miss America at www.cs.tut.fi/~foi/GCF-BM3D Xylophone in MATLAB; Water (re-scaled, grey-scaled and clipped, Jay Miller, CC 3.0) at www.videvo.net/video/water-drop/477; Franke’s function (a synthetic surface moving on fixed trajectories: the coloured one changes with the parula colormap).
222Results are available at http://www.simoneparisotto.com/TDV4videodenoising.. Each video has values in corrupted with Gaussian noise. We tested different noise levels with standard deviation without clipping the videos so as to conform to the observation model.
The quality of the denoised result is evaluated by the peak signal-to-noise ratio (PSNR) value w.r.t. a ground truth video . The model requires the parameters as input. Once provided, we solve the saddle-point minimisation problem in (15) via the accelerated primal-dual algorithm with , see [8, Alg. 2]. Here the tolerance for the stopping criterion is fixed to (on average reached in iterations). However, we experienced faster convergence and similar results with and bigger tolerances, e.g. .
4.1 Selection of parameters
In the model, is sensitive to the choice of both for the vector fields, and the regularisation parameter that is chosen according to the noise level. Choosing those parameters by a trial and error approach is computationally expensive and the best parameters may differ, even for videos with the same noise level. In particular, the parameters depend on structure in the data, e.g. flat regions versus motions versus small details. Therefore, a strategy for tuning them is needed.
To estimate appropriate values for that render good results for a variety of videos we compute optimal parameters via line-search for maximising the PSNR for a small selection of video denoising examples for which the ground truth is available. The result of this optimisation is given in Table 1. For the line-search the parameters for the maximal PSNR values are computed iteratively, by applying Alg. 1 for two different choices of at a time, and subsequently adapt this parameter-set for the next iteration towards the ones in the neighbourhood of the one that returns a larger PSNR. In this search we constrain [27]. The line-search is stopped when, for the currently best parameters all the other neighbours in a certain radius of distance report an inferior PSNR value. In Fig. 2 we show the trajectory of the parameters during this line-search for the Franke video corrupted with Gaussian noise with . We observe that there exists a range of parameters in which the PSNR values are almost the same.
By looking at the estimated parameters from the line-search approach in Table 1, we suggest the following rule of thumb for their selection in Alg. 1:
[TABLE]
4.2 Numerical results
For the so-found optimal parameters we compare in Table 1 the PSNR values achieved for our approach () with patch-based filters (V-BM3D v2.0 and V-BM4D v1.0, default parameters and normal-complexity profile).
In Figs. 4 and 4 the visual comparison is shown for selected frames of the Franke and Water videos (corrupted by noise with ). The time-consistency achieved by our approach is apparent in the frame-by-frame PSNR comparison.
Video denoising results that use the quasi-optimal parameters computed with (19) are reported in Table 2: selected frames of videos corrupted with a high noise level of are shown in Figs. 6 and 6, with frame-by-frame PSNR values.
4.3 Discussion of results
We compared our variational denoising approach with patch-based (V-BM3D/V-BM4D) and variational (ROF 2D+t) methods. Patch-based methods are usually computationally faster than the variational approaches (including ours) but they tend to suffer from flickering and staircasing artefacts due to their patch-based nature. We experienced that our MATLAB code (not optimised for speed) is approximately slower than V-BM4D (C++ code with MEX interface) with normal-complexity profile. Both quantitative (via PSNR) and qualitative results (visual inspection) are relevant indicators for video denoising.
From the PSNR values in Tables 1 and 2 the TDV approach is comparable with the patch-based ones, with many single frames achieving higher PSNR value than the patch-based methods did. Also, by changing the noise level, the PSNR values are deteriorating less than with the patch-based methods, demonstrating the consistency of our approach.
Visual results confirm that the approach improves upon patch-based methods producing less flickering and stair-casing artefacts, especially when the motion is smooth due to the coherence imposed also along the time dimension.
5 Conclusions
In this paper, we proposed a variational approach with the total directional variation (TDV) regulariser for video denoising. We extended the range of applications of regularisation from image processing as demonstrated in [21] to videos. We compared with some state of the art patch-based algorithms for video denoising and obtained comparable results especially for high level noises while reducing artefacts in regions with smooth large motion, where the patch-based approach shows some weakness. We expect to improve further the results by refining the estimation of the anisotropic fields [5] and by using higher-order derivatives in the TDV definition [21]. This is left for future research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Arias, P., Morel, J.M.: Video Denoising via Empirical Bayesian Estimation of Space-Time Patches. J Math Imaging Vis 60 (1), 70–93 (2018)
- 2[2] Brailean, J.C., et al.: Noise reduction filters for dynamic image sequences: a review. Proceedings of the IEEE 83 (9), 1272–1292 (1995)
- 3[3] Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J Imaging Sci 3 (3), 492–526 (2010)
- 4[4] Buades, A., Coll, B., Morel, J..: A non-local algorithm for image denoising. In: IEEE CVPR 2005. vol. 2, pp. 60–65 (2005)
- 5[5] Buades, A., Lisani, J., Miladinović, M.: Patch-based video denoising with optical flow estimation. IEEE Trans Image Process 25 (6), 2573–2586 (2016)
- 6[6] Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. SIAM J Multiscale Model Simul 4 (2), 490–530 (2005)
- 7[7] Burger, M., Dirks, H., Schönlieb, C.: A variational model for joint motion estimation and image reconstruction. SIAM J Imaging Sci 11 (1), 94–128 (2018)
- 8[8] Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J Math Imaging Vis 40 (1), 120–145 (2011)
