Hue Modification Localization By Pair Matching
Quoc-Tin Phan, Michele Vascotto, Giulia Boato

TL;DR
This paper introduces a Siamese neural network approach to localize hue modifications in images by patch matching, effectively detecting manipulations even under JPEG compression.
Contribution
It presents a novel neural network-based method for localizing hue modifications using patch matching within the same image, robust to compression artifacts.
Findings
Effective in detecting hue modifications in uncompressed images.
Robust performance under JPEG compression.
Produces heatmaps highlighting manipulated regions.
Abstract
Hue modification is the adjustment of hue property on color images. Conducting hue modification on an image is trivial, and it can be abused to falsify opinions of viewers. Since shapes, edges or textural information remains unchanged after hue modification, this type of manipulation is relatively hard to be detected and localized. Since small patches inherit the same Color Filter Array (CFA) configuration and demosaicing, any distortion made by local hue modification can be detected by patch matching within the same image. In this paper, we propose to localize hue modification by means of a Siamese neural network specifically designed for matching two inputs. By crafting the network outputs, we are able to form a heatmap which potentially highlights malicious regions. Our proposed method deals well not only with uncompressed images but also with the presence of JPEG compression, an…
| Angle Method | 30 | 90 | 150 | 210 | 270 | 330 |
|---|---|---|---|---|---|---|
| Choi et al. | 66.16 | 67.15 | 65.05 | 68.34 | 67.92 | 68.21 |
| SpliceBuster | 18.12 | 29.57 | 32.99 | 26.80 | 21.97 | 12.29 |
| Siamese-T-0.8 | 66.44 | 65.28 | 66.22 | 69.28 | 70.00 | 63.23 |
| Siamese-G-0.95 | 71.79 | 73.82 | 73.04 | 74.41 | 74.29 | 69.86 |
| Method | 75 | 80 | 85 | 90 | 95 | 100 |
|---|---|---|---|---|---|---|
| Choi et al. | 9.80 | 9.74 | 9.69 | 9.85 | 10.05 | 39.47 |
| SpliceBuster | 12.80 | 12.47 | 14.26 | 14.66 | 15.57 | 21.44 |
| Siamese-T-0.8 | 51.96 | 53.43 | 54.77 | 54.56 | 56.81 | 61.60 |
| Siamese-G-0.95 | 61.41 | 60.83 | 63.03 | 64.65 | 66.73 | 69.11 |
| Method | 75 | 80 | 85 | 90 | 95 | 100 |
|---|---|---|---|---|---|---|
| Choi et al. | 9.68 | 9.62 | 9.55 | 9.70 | 9.65 | 9.65 |
| SpliceBuster | 5.07 | 4.63 | 3.08 | 6.35 | 4.74 | 4.82 |
| Siamese-T-0.8 | 52.85 | 46.76 | 41.29 | 49.93 | 46.13 | 46.09 |
| Siamese-G-0.95 | 63.51 | 57.52 | 49.83 | 58.36 | 56.32 | 55.75 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHeatmap
Hue Modification Localization By Pair Matching
Quoc-Tin Phan
DISI, University of Trento, Italy
Michele Vascotto
DISI, University of Trento, Italy
Giulia Boato
DISI, University of Trento, Italy
Abstract
Hue modification is the adjustment of hue property on color images. Conducting hue modification on an image is trivial, and it can be abused to falsify opinions of viewers. Since shapes, edges or textural information remains unchanged after hue modification, this type of manipulation is relatively hard to be detected and localized. Since small patches inherit the same Color Filter Array (CFA) configuration and demosaicing, any distortion made by local hue modification can be detected by patch matching within the same image. In this paper, we propose to localize hue modification by means of a Siamese neural network specifically designed for matching two inputs. By crafting the network outputs, we are able to form a heatmap which potentially highlights malicious regions. Our proposed method deals well not only with uncompressed images but also with the presence of JPEG compression, an operation usually hindering the exploitation of CFA and demosaicing artifacts. Experimental evidences corroborate the effectiveness of the proposed method.
Index Terms:
Hue modification, patch matching, Siamese network
I Introduction
Modern photography is losing its innocency due to the diversed use of image manipulation software, which allows even unexperienced users to modify digital images in different ways. Image contents are characterized mainly by geometric information like texture, edges and shapes, and by color information. Color modifications, even if does not effect geometric details, deceive human perception. They are very easy to be performed, and hard to be detected if implemented carefully.
In this paper, we address the problem of local hue modification, which is defined as the adjustment of angular position on the color circle (or color wheel) within an image area. Figure 1 illustrates hue modification by different angles 111https://www.imagemagick.org/Usage/color_mods/, last access: 15/02/2019.
To cope with local image manipulations, previous works seek for artifacts of Color Filter Array (CFA) and camera sensor pattern noise. Vast camera sensors employ a CFA, where each sensor element captures the light at a certain wavelength corresponding to a color component. The remaining color components at blind positions are interpolated from surrounding pixels. This interpolation is referred to as demosaicing. Image manipulations will likely generate some local or global disturbances which are inconsistent to our ordinary demosaicing artifacts [1]. More blind way to detect local disturbances is the extraction of statistical features of rich models capturing different types of neighboring dependencies [2]. These features have been proved to be effective in manipulation detection and localization, see for instance [3, 4]. Besides demosaicing, the imperfections of camera sensors also create sort of camera fingerprint, the so-called Photo-Response Nonuniformity (PRNU) noise, which is supposed to be present in every image [5]. In the presence of manipulation, this pattern noise is distorted and this distortion can be exploited as a useful clue, provided that the reference PRNU can be reliably estimated and the forged region is sufficiently large.
The specific local image manipulation considered in this paper, hue modification, distorts artifacts of demosaicing and neighboring dependencies. Based on this fact, the pioneering work in [6] analyzes demosaicing artifacts and then estimates hue modification. Based on the observation that an interpolated value is bigger than the minimum and smaller than the maximum of its neighborhood, on the green channel the number of pixel values unsatisfying this condition should be the majority of pixel which are originally captured in this channel and a minority of interpolated pixels, resulting a big ratio between two quantities. The estimation of hue modification is done via searching over a set of modification angles until the aforementioned ratio is maximized. We want to point out that CFA analysis requires the knowledge of CFA configuration, at least the positions of green component. Such information is not always available, especially for online images. Moreover, when the image undergoes JPEG compression, demosaicing artifacts are significantly distorted. Differently, the method proposed in [7] and [8] recovers the modification angle, by modifying the questioned image with a set of angles and matching its residual with the reference PRNU. In real scenarios this technique is very difficult to be exploited since the assumption to know the reference PRNU (or have access to images to estimate it) is very strong and cannot be easily satisfied.
In this work, we propose a novel method for detecting hue modification. Our methodology exploits the fact that two patches on the same image have the same inherent CFA configuration and demosaicing. Hue modification on a local region creates inconsistencies with the rest of the image, and thus pair-wise patch matching can reveal the forged region. To achieve such purpose, we propose a solution based on Siamese neural networks [9], trained on positive pairs (two pristine patches) and negative pairs (a pristine and a modified patch). JPEG compression before and after hue modification is included during training, granting the network the capability to deal with real-world conditions. Finally, we fuse multiple outputs of patch matching to obtain a unique decision map (heatmap), on which a postprocessing is applied to precisely localize the forged region (Section II). Experiments demonstrate the effectiveness of the proposed approach (Section III).
II Proposed Method
Hue modification is performed in the HSV space by adding an angle to the value of . Since defined on a circle, hue modification is periodic with a period , i.e. a modification of is identical to . Besides hue, other attributes of a color in HSV space are saturation and value (brightness), whose changes are different from hue modification. Here, we investigate the detection of hue modification on: i) uncompressed images, ii) JPEG images where the modification is carried out before and after compression.
Given two rectangular patches of size from the same image, we desire to estimate the logistic prediction that two patches are inconsistent with respect to two corresponding modification angles . The two patches are consistent if and inconsistent if .
We propose to verify the inconsistency of and by means of a Siamese neural network [9]. Siamese neural networks have been recently exploited for applications in multimedia forensics [10, 11, 12]. This network architecture consists of two identical sub-networks , followed by a non-linear classifier that outputs an inconsistency score whose standard logistic activation is defined as:
[TABLE]
The network parameters are jointly optimized to minimize the binary cross-entropy of network logistic predictions and patch inconsistencies , written in terms of a loss function over training patches:
[TABLE]
In Figure 2, we provide the sketch of the network architecture. We use the 50-layer Residual Network (ResNet50) [13] as the feature extractor , which outputs a -dimensional feature vector. The inconsistency of features extracted from two patches are evaluated by a pointwise squared difference operator. The classifier is a multilayer perceptron network composed by one hidden layer of units and one single-unit output layer with sigmoid activation outputting .
We train two separate Siamese networks end-to-end on large-scale synthetic training sets. The first model is trained on patches extracted from uncompressed images from RAISE [14] and Dresden [15]. To train the second model, we use the same patches and perform hue modification before or after JPEG compression with random quality factors in . Parameters are initialized using ResNet50 pretrained on ImageNet [16]. On each training iteration, we optimize the loss function with respect to on a mini-batch of pairs, half of which is labeled as positive, i.e. both two patches are unmodified, and another haft is labeled as negative, i.e. one patch is modified by an angle randomly selected in with step 30, and its counterpart is unmodified. Hue modification and JPEG compression are carried out during training. We use Adam optimizer with the starting learning rate , and schedule to halve it every epochs after the first epochs until convergence.
II-A Detection and Localization
II-A1 Heatmap creation
The described architecture outputs the logistic patches inconsistency. Given a test image, we collect all inconsistency scores and generate a unique localization heatmap which potentially indicates malicious regions.
Let be height and width of the image, and be height and width of the small patch. By using a sliding window with stride , the total number of patches will be , where and are number of patches along each dimension.
Generally, computing inconsistency scores on all possible pairs is expensive because the number of pairs grows quaratically w.r.t. . Nevertheless, almost computational burden is attributed to operations of feature extraction network which composes convolutional layers. In pairwise manner, one patch is paired with other patches and passed through about times. This redundancy can be reduced. We first pre-extract low-dimensional features of all patches by evaluating , , and proceed to compute for all possible pairs using all computed features.
For each patch within the image, an inconsistency map is built. If we consider all patches according to their spatial location on the image, is the inconsistency of -th patch and , where and .
It is typical to assume that the forged region is relatively small compared to the background, thus majority of ( refers to patches on the pristine region) exposes inconsistencies with the forged region, while remaining maps expose inconsistencies with the pristine region, as shown in Figure 3. In order to fuse inconsistency maps of majority patches belonging to the pristine region to obtain a unique map , we follow the approach in [12], computing by mean shift algorithm[17], which iteratively finds mean of majority (mode).
Eventually, is a subsampled heatmap which potentially highlights malicious region. The full-size heatmap can be obtained by resizing with bilinear interpolation. If the forged region is larger than the background, we obtain the inverted heatmap since the background is the smaller area.
II-A2 Postprocessing
The standard logistic output can be interpreted as the posterior probability that two patches and are inconsistent. After mean shifting, each element tells us how probable -th patch is forged because is the representative inconsistency map of pristine patches to all patches. While the threshold may be a reasonable choice for deciding if two patches are inconsistent, it is not straightfoward to apply this rule to pixel-level predictions. Moreover, as keeping False Alarm Rate (FAR) low is critial in forensic applications, a postprocessing step is important for pixel-level prediction. With this respect, postprocessing on each image is cast to finding a statistical threshold based on which a pixel is masked as forged or pristine. We apply a simple postprocessing based on the assumption that (to avoid adding new notation, we mean after resized) follows a Gaussian distribution, . We fix such that of the right tail are decided as being forged. is lower bounded by to maintain acceptable FAR, namely . is the solution of: Compared to the threshold , results in better or equal FAR. An example of postprocessing is shown in Figure 4.
III Experiments
Towards experimental evidences, we evaluate our approach under different configurations and test sets.
III-A Test set
To the best of our knowledge, there is no publicly available dataset on the problem of hue modification. Thus, to evaluate our method, we generated the test set from raw images of an external Canon 600D camera (never appeared in training phase) having CFA pattern . Raw images are decoded by dcraw version . For each image, a top-left region is cropped out such that and . The forged area follows random convex shape fixed within a bounding box, which is positioned at random location on the image. Next, we perform hue modification on pixels inside the polygons and generate multiple test sets:
- •
: Uncompressed images are demosaiced from raw images by dcraw and subject to local hue modification. For each modification angle step , hue modification is carried out on uncompressed images.
- •
: Hue modification by different angles ( images for each modification angle step ) are carried out on images, and images are unmodified. Afterwards, all images are compressed using quality factors , step .
- •
: images are first compressed using , step . Afterwards, hue modification by different angles (one angle for images) are carried out on JPEG images, while the remaining images are unmodified. All of them are compressed again using the default quality factor . By the second JPEG compression, is more challenging since the training images are only subject to single JPEG compression.
III-B Setups
The performance of our method is compared with the following state of the art methods: Choi et al. [6], based on CFA-based artifacts and explicitly designed for the estimation of hue modification, and SpliceBuster [3], based on statistical features of rich models [2] and selected for comparison since those features potentially capture local disturbances caused by local hue modification. We do not compare with [7, 8] given their strong assumption about the availability of the reference PRNU which is unrealistic in practical scenarios.
This work particularly focuses on the localization of hue modification rather than its estimation. Choi et al. [6] is an estimator which potentially returns the modification angle by searching over a feasible range. To convert Choi et al. into a localization method, we use a sliding window similarly to our method, and search the angle over , step . If the angle found is [math] or , the patch is marked as pristine. Choi et al. therefore outputs a binary map. The other method, SpliceBuster [3], returns the negative log-likelihood that a pixel is pristine. It means, a large value indicates high probability that a pixel is forged. We linearly scale the returned map into and apply the same postprocessing described in Section II-A2 to get the binary map. In order to demonstrate the advantage of our postprocessing, we also report performance of the proposed method when a simple thresholding is applied to binarize the heatmap. We empirically found that the threshold yields most acceptable results.
We aggregate True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) over all images and report average True Positive Rate (TPR), True Negative Rate (TNR) and F1 score.
III-C Quatitative Evaluation
III-C1 Detection on uncompressed images
In this section, we evaluate the first model trained on uncompressed images, and compare with Choi et al. and SpliceBuster on . Figure 5 presents TPR and TNR obtained by all methods.
Choi et al. is guaranteed to detect hue modification on uncompressed images since this type of manipulation distorts demosaicing artifacts. It achieves high TPR which implies that almost forged pixels are correctly detected. This comes at a cost of slightly worse TNR. SpliceBuster, on the other hand, detects correctly only about of forged pixels, and as a consequence, yields very high TNR. We assume that the features used in [2] are ineffective for hue modification detection.
Our Siamese network with heatmaps thresholded simply by is denoted by Siamese-T-0.8. The other alternative is denoted by Siamese-G-0.95, where heatmaps are postprocessed by threshold as designed in Section II-A2. We can clearly see that Siamese-G-0.95 outperforms Siamese-T-0.8 in all cases. In fact, a fixed threshold over all heatmaps cannot deal with high variability of predictive scores on each heatmap, and thus an adaptive threshold is more effective. Interestingly, the TPR reveals the fact that the middle range of modification angles are easier to detect by our methodologies. This is explainable since the strength of hue modification is periodic with the period of . Very small or very large positive angles correspond to little modifications.
We summarize the overall performance for some selective modification angles in Table I. In terms of F1 score, Siamese-G-0.95 outperforms all other methods.
III-C2 Detection in the presence of JPEG compression
We target more practical scenarios where hue modification is done with the presence of JPEG compression. It has been acknowledged that JPEG compression has strong impact on demosaicing artifacts [18, 19, 20]. Choi et al. is also very sensitive to JPEG compression since the count of interpolated and recorded pixels is less accurate [6].
We assess the second model trained on JPEG images under two testing circumstances: i) hue modification is performed on uncompressed images followed by JPEG compression, and ii) hue modification is performed on JPEG compressed images, and those are subsequently compressed again using quality factor . Note that during training, we do not perform second JPEG compression.
The TPR and TNR of all methods, where hue modification is performed before JPEG compression, i.e. dataset , is shown in the first column of Figure 6. Choi et al. fails to spot forged area unless the image is compressed with highest . At , Choi et al. achieves TPR and TNR . SpliceBuster, on the other hand, can only detect about of forged pixels. Our proposed methods perform far better than the other two competitors on , by keeping TPR at acceptable level and retaining always high TNR, i.e., Siamese-G-0.95 achieves an average of TPR ( of forged pixels are correctly detected) and of TNR, while Siamese-T-0.8 attains of TPR and of TNR. In the right column of Figure 6, i.e. dataset , the overall TPR and TNR of our methods are slightly degraded compared to the performance on . This degradation can be attributed to the second JPEG compression. In fact, we can generally observe the correlation of performance degradation and compression rate: the higher the first , the lower the performance. While Choi et al. behaves positively on , it loses that capability on .
The overall F1 scores for several selective are shown in Table II and III. Our two methodologies, in particular Siamese-G-0.95, outperform the other two methods to a large margin. We might notice that Choi et al. achieves F1 score on while TPR and TNR in the same configuration are over , see left column in Figure 6. This phenomenon is due to the high FP which penalizes precision, and as a consequence, F1 score. However, since TN dominates FP (due to the large pristine area compared to the forged area), TNR is not effectively penalized.
III-D Qualitative Inspection
In Figure 7, we provide detection results on realistic examples manually created using GIMP. Hue modification is carried out on uncompressed images (the first lines) and the modified images are JPEG compressed using highest quality (the last lines). Siamese-G-0.95 (the last column) clearly results in better detection maps compared with Choi et al. and SpliceBuster.
IV Conclusion
We have proposed a data-driven countermeasure for hue modification on color images based on patch matching. This task is done by means of a Siamese architecture which receives the two inputs and outputs the likelihood that the two inputs are inconsistent. A unique localization map is generated from inconsistency scores of multiple patches. Our models perform well on uncompressed and JPEG compressed images even though JPEG compression distorts CFA and demosaicing artifacts. Our future investigations will focus on the estimation of hue modification angles, based on which the original image can be recovered.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P. Ferrara, T. Bianchi, A. De Rosa, and A. Piva, “Image forgery localization via fine-grained analysis of CFA artifacts,” IEEE Trans. on Information Forensics and Security , vol. 7, no. 5, pp. 1566–1577, 2012.
- 2[2] J. Fridrich and J. Kodovsky, “Rich models for steganalysis of digital images,” IEEE Trans. on Information Forensics and Security , vol. 7, no. 3, pp. 868–882, 2012.
- 3[3] D. Cozzolino, G. Poggi, and L. Verdoliva, “Splicebuster: A new blind image splicing detector,” in Proc. of WIFS , 2015, pp. 1–6.
- 4[4] H. Li, W. Luo, X. Qiu, and J. Huang, “Image forgery localization via integrating tampering possibility maps,” IEEE Trans. on Information Forensics and Security , vol. 12, no. 5, pp. 1240–1252, 2017.
- 5[5] M. Chen, J. Fridrich, M. Goljan, and J. Lukáš, “Determining image origin and integrity using sensor noise,” IEEE Trans. on Information Forensics and Security , vol. 3, no. 1, pp. 74–90, 2008.
- 6[6] C.-H. Choi, H.-Y. Lee, and H.-K. Lee, “Estimation of color modification in digital images by CFA pattern change,” Forensic science international , vol. 226, pp. 94–105, 01 2013.
- 7[7] J. Hou, H. Jang, and H. Lee, “Hue modification estimation using sensor pattern noise,” in ICIP , 2014, pp. 5287–5291.
- 8[8] J. Hou and H. Lee, “Detection of hue modification using photo response nonuniformity,” IEEE Trans. on Circuits and Systems for Video Technology , vol. 27, no. 8, pp. 1826–1832, 2017.
