Blind Deconvolution with Scale Ambiguity

: Recent years have witnessed signiﬁcant advances in single image deblurring due to the increasing popularity of electronic imaging equipment. Most existing blind image deblurring algorithms focus on designing distinctive image priors for blur kernel estimation, which usually play regularization roles in deconvolution formulation. However, little research effort has been devoted to the relative scale ambiguity between the latent image and the blur kernel. The well-known L 1 normalization constraint, i.e., ﬁxing the sum of all the kernel weights to be one, is commonly selected to remove this ambiguity. In contrast to this arbitrary choice, we in this paper introduce the L p -norm normalization constraint on the blur kernel associated with a hyper-Laplacian prior. We show that the employed hyper-Laplacian regularizer can be transformed into a joint regularized prior based on a scale factor. We quantitatively show that the proper choice of p makes the joint prior sufﬁcient to favor the sharp solutions over the trivial solutions (the blurred input and the delta kernel). This facilitates the kernel estimation within the conventional maximum a posterior (MAP) framework. We carry out numerical experiments on several synthesized datasets and ﬁnd that the proposed method with p = 2 generates the highest average kernel similarity, the highest average PSNR and the lowest average error ratio. Based on these numerical results, we set p = 2 in our experiments. The evaluation on some real blurred images demonstrate that the results by the proposed methods are visually better than the state-of-the-art deblurring methods.


Introduction
Signal processing is a hot research topic in the field of electronics and information, which has been researched everywhere in today's digital era, especially for cultural, military, health and scientific research domains. As a 2D signal, image plays an important role for people to obtain information and the study of images has attracted much attention. Single image deblurring is a classical problem in image processing communities. A few image debluring methods [1][2][3][4][5][6][7] may be the most representatively used to handle deblurring problems. The work of Lai et al. [8] provided an overview of a series of deblurring methods [9][10][11][12][13][14]. When the blur is spatially invariant, two unknowns, i.e., a blur kernel (a.k.a. point spread function, PSF) and a latent image, are expected to be recovered from a single blurred input. The convolution operator is the most commonly used to describe the blur process: where f , u, k and ε represent the blurred image, the latent image, the blur kernel, and inevitable additive Gaussian noise, respectively. * denotes the convolution operator. Blind image deblurring is a well-known ill-posed problem because there are infinite pairs of u and k which can satisfy (1). To make the problem well-posed, plenty of methods focus on making

•
We introduce L p normalization constraint on the blur kernel associated with a hyper-Laplacian image prior. This prior can be converted to a joint prior, which ensures the physical validity of the blur kernel.

•
We provide statistical analysis on the joint prior and quantitatively verify that it favors ground-truth solutions over trivial solutions for some proper selections of p. This property contributes to the success of our methods for blind image deblurring. • An efficient algorithm is developed to solve the deconvolution formulation alternatively, which converges well in practice.

•
Quantitative and qualitative experiments on both synthesized datasets and real images demonstrate the proposed method with p = 2 performs favorably against the state-of-the-art deblurring methods.

Related Work
Blind image deblurring is an important low-level vision topic which has attracted tremendous research attention. It is an ill-posed problem, and still remains a challenging task. To make the problem tractable, plenty of deblurring methods aim to introduce additional information to constrain the solution space.
Following the study on natural image statistics [22], a number of deblurring methods have adapted the regularization terms to encourage the sparsity of image gradient. Levin et al. [19] presented detailed analysis that the method based on variational Bayesian inference was able to avoid trivial solutions in comparison to other naive MAP based methods. Unfortunately, this approach is computationally expensive. It is also proven to be effective to introduce an explicitly edge selecting step for kernel estimation [1,23] within the conventional MAP framework. However, strong edges may not always be available, such as face images. Recently, a wide variety of efficient methods based on MAP framework have been proposed. Many of them focused on designing different data terms [4,24] and various kinds of image priors [3,5,7,[25][26][27][28]. In addition, patch-based methods [13,29,30] have been developed to sidestep classical regularizers and had shown impressive performance. These methods usually searched for similar patches [13] or exploited sharp patches in an external dictionary [29,30], which both required heavy computation.
On the other hand, some methods employed different blur kernel priors [12,[31][32][33], either encouraging the estimated the blur kernel to be sparse or discouraging the delta kernel. In addition, Zhang et al. [21] imposed a unit Frobenius norm constraint on the blur kernel. However, an explicit solution of the latent image, which was very critical to their method, was often not easy to achieve. In comparison, Jin et al. [17] considered the blur kernel normalization for the convex isotropous TV regualizer with no requirement of explicit solution. Our work is motivated by this work. Our main difference from this method is that we incorporate a non-convex hyper-Laplacian prior in this paper. Recently, the amazing success of deep learning provides new ideas for image deblurring. Generally, deep learning based deblurring methods, such as [34][35][36][37][38][39], train end-to-end systems on large datasets, seeking the implicit mapping functions between the blurred images and the corresponding blur-free images. Nah et al. [34] train a multi-scale Convolutional Neural Network (CNN) to progressively restore sharp images in an end to-end manner without explicitly estimating the blur kernel. Kupyn et al. [35] develop an end-to-end learning method for motion deblurring based on a conditional Generative Adversarial Network (GAN) and the content loss. Tao et al. [37] propose a scale-recurrent network which is equipped with a Convolutional Long Short-Term Memory Network (ConvLSTM) layer [40] to further ensure hidden information flow between different resolution images. Different from conventional blind deblurring methods which output both the blur kernel and the clear image, this kind of methods do not make any effort to estimating the blur kernel. There are two main limitations for them. One is that the essential training process is very time-consuming. The other is that most existing deep learning based methods depend on the consistency between the training datasets and the testing datasets, which can hinder the generalization ability. These methods are less effective in facical images with large blur kernels. In this regard, conventional deblurring methods still have certain superioriity on some level.

The Proposed Approach
In this section, we develop a deconvolution method for blind image deblurring. First we deduce the transformed joint prior and present our deblurring model in Section 3.1, then we put forward the optimizing procedures in Section 3.2.

Joint Prior
Considering one representative optimization model for deblurring problem: where f is the blurred image, u is the latent image, k is the blur kernel, * denotes the convolution operator, and λ is a positive parameter. The employed hyper-Laplacian prior ∇u 0.5 = ∑ x |u h (x)| 0.5 + |u v (x)| 0.5 , where x denotes the pixel index. k ≥ 0 enforces element-wise non-negativity. The deconvolution formulation had been solved in [41] when k is fixed. And, when u is fixed, it involves a convex problem. Nonetheless, it may fail to recover pleasing solutions because the hyper-Laplacian prior does not always favor clear images over blurred ones. We find that if we considering the L p normalization constraint on blur kernel, the regularization can be converted into one joint version that statistically favors true solutions over trivial solutions for proper selections on p. Also, we do not need to propose any novel prior on the blur kernel in (2) except for the L p normalization constraint on the blur kernel. This is different from most existing blind deblurring methods formulated within the classical MAP framework. Such methods need to use additional smoothing constraint on the blur kernel ( k 1 or k 2 2 ) except for employing the L 1 normalization constraint on the blur kernel. In the following, we present the transformed deconvolution formulation by introducing a scale factor and provide some discussions.

Transformed Deconvolution Formulation
In order to demonstrate how the scale factor affects the optimization process, we introduce a scale factor s in model (2) and analyze that which part would be changed. Suppose that (u, k) and (z, m) are two solutions with the relationship u = sz and k = m/s, where s > 0 and s = 1. Then, the data term u * k − f 2 2 is the same as z * m − f 2 2 . In contrast, the regularization term is rescaled by some amount. Moreover, as k 1 = 1, we have m 1 = s, which is not equal to one. This indicates that the L 1 normalization constraint can not be satisfied any more.
The above analysis also fits the facts that the L 1 normalization constraint is just an arbitrary choice. We argue that it is necessary to impose more general normalization constraint on the blur kernel. In this paper, we consider the following optimization model with the L p -constraint: where z and m denote the latent image and the blur kernel, respectively. m p = (∑ i m p i ) 1/p denotes L p norm of m, where i denotes the index of kernel elements.
Optimizing (3) directly may be impracticable due to the complex L p constraint (p is unknown). However, the aforementioned analysis inspired us to transform (3) to the following expression if we relate (u, k) to (z, m) by k = m/ m 1 and u = z m 1 : where k 0.5 p = (∑ i k p i ) 0.5/p . There are differences between model (2) and model (4), even though they have similar formulations. The regularization term in model (4) is a rescaled version of the regularization term in model (2). The penalty coefficient λ k 0.5 p varies during iterations due to it being related to the previous iterated k, it's not a fixed value. Different from this, the penalty coefficient λ in model (2) is a fixed scalar. In addition, model (2) is usually used for non-blind deblurring. In blind deblurring case, additional kernel priors (e.g., k 2 2 or k 1 ) usually employed to solve blur kernel k. In comparison, model (4) can be adopted to both kernel estimation and intermediate latent image reconstruction. λ k 0.5 p ∇u 0.5 plays a role as a joint prior in this work. We solve (4) in Section 3.2. Note that the constraint on the blur kernel in (4) is still L 1 constraint. This is also consistent with the commonly physical understanding of the PSF.

Statistical Analysis
By introducing the L p normalization constraint, we obtain the transformed deconvolution formulation (4), in which the regularization term can be regarded as a joint prior of u and k. Naturally, there is a question that whether this joint regularization term facilitates the deblurring process to generate a pair of pleasing solutions. To this end, we study the statistical property of the joint prior. We find that the joint prior statistically favors ground-truth sharp solutions over trivial solutions.
To verify, we calculate the average prior ratios of ground-truth pairs (u, k) and trivial pairs ( f , δ) of different p on the benchmark dataset [19]. Specifically, we first choose 20 different values of p varying from 0.5 to 5. For each fixed p, we compute the prior ratios δ 0.5 p ∇ f 0.5 k 0.5 p ∇u 0.5 on each sample in the dataset. Then, we calculate the average prior ratios of 32 images for each fixed p to analyze how often the prior favors the ground-truth pairs. The average prior ratios curve is shown in Figure 1. It can be observed that for p > 1.2, the trivial pairs have larger prior values than the ground-truth ones. This indicates that the sharp solutions are encouraged favorably rather than the blurred ones, which further ensure the success of the proposed deconvolution.  [19]. The average prior ratios are always larger than one when p > 1.2. This illustrates that the choice of p > 1.2 makes the average energy values of the sharp solutions lower than the blurred ones.

Optimization
We solve (4) by alternative optimizing on u and k. The optimization details of two sub-problems are described in the following.

Blind Kernel Estimation
Intermediate Image Estimation. With the blur kernel k output from the previous iteration, the intermediate latent image u is estimated by We use the iterative reweighed least squares (IRLS) [42] method to solve (5). At the t-th iteration, we need to solve the following quadratic problem: where x | −1.5 , t denotes the iteration index, and the subscript x denotes the spatial location of a pixel. (6) is a weighted least squares problem which can be solved by the conjugate gradient method.
Kernel Estimation. Given a fixed intermediate latent image u, the blur kernel k can be obtained by solving the following model According to the analysis in [9], the kernel estimation methods in gradient domain can be more accurate. Thus, we replace the image intensity with the image derivatives in the data fidelity. That is, the blur kernel k is estimated by Similar to (6), the IRLS method is utilized to solve (8) by iteratively optimizing where Because of the benefit from the transformed deconvolution formulation (4), we are not required to impose L p normalization on the blur kernel k practically. Similar to most existing methods, after obtaining k, we set its negative elements to be 0, and divide it by its L 1 norm so that the sum of its elements is 1.
To get better results, the proposed kernel estimation process is carried out in a coarse-to-fine manner using an image pyramid. u and k are initialized as the blurred image and delta kernel in the coarsest level.

Final Deblurring
With the blur kernel k being determined, a variety of non-blind deconvolution methods can be used to estimate the latent image. We simply employ a hyper-Laplacian prior proposed by Levin et al. [42] to recover the latent image. The formulation of this non-blind deconvolution method is Figure 2 illustrates the whole pipeline of our framework. Algorithm 1 presents the main steps for the whole deblurring process. Figure 3 shows the diagram of the proposed algorithm.

Intermediate latent image Estimated kernel
Coarse-to-fine iterative optimization Blur kernel Figure 2. The pipeline of the proposed algorithm. In this algorithm, u and k are solved iteratively for each image scale in a coarse-to-fine pyramid framework.

Algorithm 1 Proposed blind deblurring algorithm
Input: Blurred image f . Initialize the intermediate image u and kernel k with the results from the coarser level. for l = 1 → K pyramid do Estimate u according to (5); Estimate k according to (8). end for Estimate the final latent image by solving (10). Output: Latent image u and blur kernel k.

Discussion and Experimental Results
In this section, we first show some discussions of the proposed method in Section 4.1: we empirically determine an optimal value of p in terms of three evaluation criterions in Section 4.1.1. According to this p, we discuss the convergence property of the proposed algorithm in Section 4.1.2 and provide the parameter analysis in Section 4.1.3. The limitations are discussed in Section 4.1.4. Then we evaluate the proposed method on both synthetic and real-world blurred images and compare it to state-of-the-art image deblurring methods in Section 4.2.
The proposed algorithm is implemented in MATLAB on a computer with an Intel Xeon E5630 CPU @2.53 G and 12 GB RAM. To process a 255 × 255 blurred image with kernel size 27 × 27, the proposed algorithm estimates the blur kernel in around 69 seconds without code optimization.
Parameter settings. In all experiments, the regularization weight parameter λ is set to be 0.09, ζ is set to be 0.003, and p is set to be 2.

Determination on Parameter p
In the proposed method, the parameter p plays a critical role as it implicitly influences the normalization of the blur kernel. It is also difficult to be determined theoretically. In order to choose a suitable p, we carried out experiments on a synthetic dataset to test 19 different values of p varying from 0.5 to 5, the step size for p is 0.25. To generate blurred images, we randomly pick 100 images from the BSDS500 dataset [43] and blur them by 8 kernels introduced by [19]. Three commonly used evaluation criterions, kernel similarity, error ratio and PSNR , are utilized for evaluation.
The kernel similarity measures the accuracy of the estimated kernel k est over ground-truth kernel k GT . This metric calculates the maximum response of normalized cross-correlation between k est and k GT with some possible shift. The error ratio metric proposed by [19] is commonly used to evaluate the restored result, and the PSNR is commonly used metric to evaluate the quality of the recovered results in comparison with ground-truth images.
The statistical results are summarized in Figure 4. In the subfigures, the three horizontal axes denote the values of p, and the three vertical axes denote the average kernel similarity, average error ratio and average PSNR, respectively. As can be seen in Figure 4a, in terms of the average kernel similarity, the optimal p lies in [1.5, 2]. Figure 4b demonstrates that when p ∈ [1.75, 2.25], the proposed method generates results with error ratio values smaller than 2. Other p values have relatively large error ratio values. In Figure 4c, we observe that p values in [1.5, 3] achieve average PSNR higher than 25, and the highest one is reached when p = 2. Based on above observations, we set p = 2 in all of our experiments.

Convergence Property
We present the convergence property of the proposed method when p = 2 in this section. For better demonstration, we quantitatively evaluate the convergence property of the proposed method on the benchmark dataset introduced by [19]. The results shown in Figure 5 demonstrate that the proposed method converges after less than 10 iterations, in terms of the average kernel similarity and average energies computed from model (5).

Parameter Analysis
The proposed model involves two main parameters, ζ and λ. The parameter ζ is used in the final non-blind deblurring step to recover the final deblurred images with the estimated kernels. For the selection of ζ, we follow the existing non-blind deblurring algorithm [42]. The parameter λ is a regularization weight parameter. We evaluate the effects of the parameter λ on deblurred results on the dataset [19] with the average kernel similarity metric and average PSNR value. Figure 6 shows that the selection of λ does not have very great influences on the quality of the deblurred results. Overall, the proposed algorithm achieves the best performance when λ = 0.09. Based on these statistical evaluations, the regularization weight parameter λ is empirically set to be 0.09 in all experiments.

Limitations
The proposed method is based on an existing simple image prior. It may fail in some cases. Here we present one special case. As our method is under the uniform-blur assumption, if an image is blurred non-uniformly, like rotation blur, our method cannot restore the image effectively. Figure 7a shows one real example. As can be seen in Figure 7b, our method fails to generate desirable results.

Quantitative Evaluation on Synthetic Datasets
To better evaluate the effectiveness of the proposed method, we perform experiments on three mainstream benchmark datasets [8,19,44]. For fair comparison, we use the implementations of the state-of-the-art methods to estimate the blur kernels. The non-blind deconvolution method [42] is utilized to generate the final deblurring results.
We first test our method on the benchmark dataset [19]. This dataset contains 32 blurred images from 4 ground-truth images with 8 blur kernels. We compare our method with 9 state-of-the-art methods including Levin et al. [11], Krishnan et al. [10], Perrone and Favaro [14], Michaeli and Iran [13], Ren et al. [3], Dong et al. [23], Yan et al. [16], Pan et al. [5] and Jin et al. [17]. We evaluate the effectiveness of the proposed method by both error ratio and average kernel similarity. Figure 8 shows the comparisons of the cumulative error ratios and average kernel similarity. The graph in Figure 8a shows that when the error ratio is greater than 2.2, our success rate exceeds 90%. In comparison, the success rate of Yan et al. [16], which performs the best in the 9 compared methods, reaches to 90% until the error ratio being up to 2.8. Similarly, the proposed method has higher average kernel similarity values than other compared methods, as demonstrated in Figure 8b.  [19]. Comparisons on these two quality metrics demonstrate that our method performs well against the other methods.
(a) Cumulative error ratios (b) Average PSNR Figure 9. Quantitative evaluation on the dataset [44]. In particular, our method is up to 90% success rate at error ratio 3.8. For average PSNR value, our method achieves 30.31 on average, leading among state-of-the-art methods.
Next, we test our method on the benchmark dataset [44], which contains 4 images and 12 blur kernels. We also show the performances of the proposed method against 7 state-of-the-art methods including Krishnan et al. [10], Shan et al. [20], Perrone and Favaro [14], Ren et al. [3], Pan et al. [5], Dong [23] and Jin et al. [17] in terms of error ratio and average PSNR. As shown in Figure 9a, our method takes lead with 90% of the output under error ratio 3.8. Figure 9b shows that our method has the highest average PSNR among all the methods evaluated.
Moreover, we evaluate our method on the benchmark dataset provided by Lai et al. [8], which contains 100 images including low-illumination, face, and text images. We compare our method with 7 state-of-the-art methods: Perrone and Favaro [14], Michaeli and Iran [13], Pan et al. [5], Ren et al. [3], Dong [23], Yan et al. [16] and Jin et al. [17]. The error ratio and average PSNR are used as performance evaluation criterions on this dataset. Figure 10a presents the cumulatively distribution of error ratios, our method is able to exceed 70% success rate when error ratio is greater than 4.6, which performs the best in the 7 compared methods. Figure 10b shows that the proposed method performs well in terms of average PSNR.
In addition, we test our method on some other synthetic images. To provide visual comparisons, we show two examples in Figure 11. The numbers in green presented on the top left corner in each image are the PSNRs values, which quantitatively measure the quality of the deblurring methods. Both visually and quantitatively, Figure 11 shows that the proposed method achieves favorable results against the other deblurring methods.

Deblurring on Real Blurred Images
We note that the existing two methods [10,14] have very similar formulation to (4). So we first show an example in Figure 12 compared with these two methods. The deblurring results shown in Figure 12b,c still contain severe blur effects and ringing artifacts. In comparison, our result shown in Figure 12d is better. Figure 13 shows another real blurred example compared with the state-of-the-art deblurring methods [1,5,10,[13][14][15]17,29]. The close-ups of the deblurred results by [10,15], which are shown in Figure 13c,h, contain significant blur, especially in the text region. The deblurred results by the methods [13,29] contain significant ringing artifacts, as shown in Figure 13e,d. The deblurred result by Jin et al. [17], which is shown in Figure 13k, contains some unpleasing effects. Although the method [1,3,5,14,23] handle this image well, their deblured results contain some noises around the text in the background as shown in Figure 13b,j,f,g,i, respectively. In contrast, our method generates clear result with less noise. The text and background of the deblurred result by our method are much clearer than other methods.
Another one real blurred image is shown in Figure 14a. The results by methods [10,23,27], shown in Figure 14b,g,i, contain some blur effects. In Figure 14d,e,h,c, the results by methods [13,14,17,29] contain significant ringing artifacts. The result by [3] is visually better than the above results as shown in Figure 14f. Compared with it, our method generates the deblurred image with clearer details as shown in Figure 14j. Figure 15 shows one example from the dataset [44] compared with [1,3,10,11,[13][14][15]23]. The deblurring methods [1,13,23] produce ringing artifacts as shown in Figure 15b,e,h. The deblurring results by methods [3,11,15] are over smoothed as shown in Figure 15i,c,g. The deblurred result by Krishnan et al. [10] in Figure 15d still contains some noise. The deblurred result by Perrone and Favaro [14], which is shown in Figure 15f, is also unpleasing. In contrast, our method generates a much clearer result, as shown in Figure 15j.
In addition, we provide visual comparisons with the state-of-the-art deep learning based methods in Figure 16. The deep learning based methods [35,37] are less effective in handling this blurred image as shown in Figure 16b,c. The result by the method [27] in Figure 16d is visually acceptable. In comparison, our result shown in Figure 16e is clearer than their result as shown in Figure 16d.

Domain Specific Images
We further evaluate our method on two kinds of domain specific images, e.g., text and face images. Text images are rather challenging for most deblurring methods. Figure 17 displays the deblurring results on two text images. For fair comparison, the final deblurred results of both L 1 /L 2 kernel prior based methods and our method are recovered by the non-blind deblurring method [42]. Figure 17b shows the results by employing L 1 kernel prior (min k ∇u * k − ∇ f 2 2 + λ k 1 ). Figure 17c shows the results by using L 2 kernel prior (min k ∇u * k − ∇ f 2 2 + λ k 2 2 ) and Figure 17d shows the results by the text deblurring method [15]. Comparing to the deblurred results using L 1 /L 2 kernel prior for the logo text image, our method estimates the kernel structure correctly while the other methods fail as shown in the top row of Figure 17. The bottom row of Figure 17 shows the deblurring results for a document text image. We note that our method generates better result than using L 1 /L 2 kernel prior and we obtain a comparable result with the specially designed text deblurring method [15].
Face images deblurring is challenging due to the blurry face images contain few edges or textures. Figure 18 shows some deblurring results on a face image. As can be seen, there are several artifacts in most of the results deblurred by other state-of-the-art methods, as shown in Figure 18b-k. Our method generates a clearer result with less ringing artifacts as shown in Figure 18l.
(a) Blurred image (b) [35] (c) [37] (d) [27] (e) Ours  [15] and ours, respectively. Our method generates better results than the previous two kernel prior methods and we obtain visually comparable or even better results than the specially designed for text deblurring method [15].

Conclusions and Future Work
In this paper, we consider the relative scale ambiguity between the latent image and the blur kernel based on an existing hyper-Laplacian regularization. We impose L p normalization constraint on the blur kernel instead of the arbitrary L 1 normalization constraint, which has been selected in most existing deblurring methods. We present that the hyper-Laplacian regularization can be transformed to a joint prior. Statistical results demonstrate that the joint prior favors ground-truth sharp solutions over trivial solutions when p > 1.2, which facilitates our deconvolution formulation to avoid trivial solutions. To determine the parameter p, we carry out numerical experiments and find that p = 2 can be a good choice. Both quantitative and qualitative experiments on benchmark datasets and real images demonstrate that the proposed method is able to achieve the state-of-the-art results without any heuristic filtering strategies to select salient edges or kernel sparsity prior. Our future work will focus on focus on considering the scale ambiguity by imposing the L p normalization constraint on the blur kernel associated with other image priors and improving the computational efficiency to implement realtime processing on the embedded platform.