Reversible Data Hiding Algorithm in Fully Homomorphic Encrypted Domain

This paper proposes a reversible data hiding scheme by exploiting the DGHV fully homomorphic encryption, and analyzes the feasibility of the scheme for data hiding from the perspective of information entropy. In the proposed algorithm, additional data can be embedded directly into a DGHV fully homomorphic encrypted image without any preprocessing. On the sending side, by using two encrypted pixels as a group, a data hider can get the difference of two pixels in a group. Additional data can be embedded into the encrypted image by shifting the histogram of the differences with the fully homomorphic property. On the receiver side, a legal user can extract the additional data by getting the difference histogram, and the original image can be restored by using modular arithmetic. Besides, the additional data can be extracted after decryption while the original image can be restored. Compared with the previous two typical algorithms, the proposed scheme can effectively avoid preprocessing operations before encryption and can successfully embed and extract additional data in the encrypted domain. The extensive testing results on the standard images have certified the effectiveness of the proposed scheme.


Introduction
Reversible data hiding [1][2][3] is an efficient technology that combines the robustness and provability of digital information, and embeds information for authentication. The hidden data can be extracted completely and the original carrier can be restored completely after data extraction. Because of the existence of these characteristics, reversible data hiding has been applied in many areas such as business and military. The most basic reversible data hiding technology dealt with the redundancy of digital information and then embedded the additional data. There are several reversible data hiding algorithms using difference expansion [4][5][6], histogram shifting [7][8][9][10] and the new prediction error algorithms have higher payload and better image quality [11][12][13][14][15].
For the sake of information security and privacy protection, data are usually encrypted before uploading and transmission. The sender encrypts the plaintext by using the keys and sends the encrypted ciphertext to the receiver. Since the ciphertext is sent, the security of the information is ensured. The receiver can decrypt the ciphertext into plaintext based on the obtained keys. The secret homomorphism was proposed by Rivest in 1978 [16]. First, several plaintexts were encrypted, then the encrypted ciphertexts could be multiplied or added, and decrypted finally. After experimental verification, the results of the operations performed under the idea has been consistent with the results of performing the same operations directly on the same plaintext. There have been more developments in the design of homomorphic encryption schemes. Because of the particularity of the encryption scheme, the homomorphic encryption technology can perform data operations in the encrypted domain without affecting the final decrypted data. Therefore, homomorphic encryption technologies have been widely used in the operation of secure data. The common homomorphic encryption technology had Pallier encryption system [17], RSA encryption system [18] and ElGmal encryption system [19]. Reversible data hiding technique in encrypted domain is based on this property. Reversible data hiding technique in encrypted domain can implement data hiding procedure in encrypted domain and can recover the original plaintext without error after decryption and data extraction. Owing to this merit, reversible data hiding in encrypted images has been a research hotspot in information security community recently.
There are several categories of reversible data hiding techniques in the literature. The first category is difference expansion-based algorithm, originally proposed by Tian [4]. The reversible data hiding algorithms based on the difference expansion algorithm utilize the correlation of the pixel values of adjacent pixels, and replace the original pixels with the difference of adjacent pixels. After that, the other category of reversible data hiding algorithms was proposed by Thodi [20] and then developed by Li et al. [12]. There are several methods for reversible data hiding in encrypted domain. In [21], by using absolute mean difference of multiple neighboring pixels, the authors proposed a reversible data hiding algorithm in encrypted domain. In [22], the authors proposed another algorithm based on discrete fourier transform and compressive sensing in encrypted domain. In [23,24], the authors proposed two methods for reversible data hiding by using Paillier cryptosystem. In [23], two adjacent pixels as a group are encrypted and then the differences of the two adjacent pixels in each group are computed to generate a difference histogram. The information can be hiding into the histogram by using histogram shifting technique and the homomorphic properties of Paillier system.
In recent years, image encryption technology has developed rapidly. Li [25] briefly summarized the design of image encryption schemes and made an analysis of the challenge of the image encryption faced in the future. In this paper, we propose a new reversible data hiding scheme in the fully homomorphic encrypted domain with the DGHV public key cryptosystem which was proposed by Dijk, Gentry, Halevi and Vaikuntanathan in [26]. By analyzing the insufficiency of existing homomorphic encryption systems, the homomorphic encryption scheme has been improved for multiple bits data encryption in [27]. With the improved encryption scheme, in an image, the use of two adjacent pixels as a group is encrypted by using the same random number. After that, the difference of the two adjacent pixels in each group was computed for computation of histogram. In the encrypted domain, additional data can be embedded and extracted by shifting the difference histogram. Without the data hiding key, the embedded data cannot be extracted. In addition, the additional data can be extracted from the directly decrypted image and the original image can be restored. Compared with the the work in [23,24], the proposed scheme has no preprocessing operations and has lower computational cost.
The remainer of this paper is organized as follows. Shannon's information entropy theory in combination with the encrypted domain is introduced in Section 2. A brief introduction of fully homomorphic encryption over the integers is given in Section 3. The details of the proposed reversible data hiding scheme is introduced in Section 4. Experiment results are given in Section 5. Finally, we have a conclusion in Section 6.

Shannon Information Theory
In 1948, Shannon put forward the mathematical theory of communication, which initiated the study of modern information theory [28]. He considered that information entropy can be used to measure the probability distribution of the pixels of a grayscale image. The larger is the information entropy, the more uniform is the probability distribution of the gray image pixels. In the Shannon's information theory, the set of different states of the message samples in the information source is called the probability space, which can be represented by X. In the probability space, the probability of occurrence of the samples is different, and their uncertainty is different. The greater is the probability of a sample appearing, the smaller is its uncertainty; conversely, the smaller is the probability of a sample appearing, the greater is its uncertainty. If the probability of sample x i is p(x i ), the information entropy is defined as where H(X) is called information entropy. As is known, the encrypted image loses the correlation between pixels and increases the information entropy of the image. The increase of the entropy makes the histogram distribution of image more uniform. Li [29] pointed out that the information entropy of encrypted image tends to the maximum, thus extra information is difficult to be embedded in the encrypted domain.
In the proposed algorithm, we used two adjacent pixels as a group and encrypted the two pixels in a group with the same parameter. In such a way, the correlation between adjacent pixels can be transmitted to the encrypted domain. As a result, the difference histogram distribution of the encrypted image is not uniform. Thus, there is a residual entropy space for additional information embedding in the encrypted domain.

Fully Homomorphic Encryption over the Integers
DGHV fully homomorphic encryption over the integers, which has the characteristics of additive homomorphism and multiplicative homomorphism, is widely used in the field of security. In this cryptosystem, a plaintext is encrypted with public keys. The plaintext can be retrieved after decrypting corresponding ciphertexts with private keys. To ensure the security of the encryption system, the proposed algorithm introduces the greatest common divisor (GCD) problem. Since additive homomorphism and multiplicative homomorphism are permitted in this cryptosystem, it provides an efficient approach to process the original data in encrypted domain.

Key Generation
Select two integers p and q. p is an odd number and is used as the private key, and q is the large integer. For the security of the ciphertexts, the two integers p and q must satisfy q >> p. The greatest common divisor problem is introduced by adding a number of ciphertexts x i (0 ≤ i ≤ l) with plaintexts of 0, so that the value of ciphertext is large, to ensure it is not easy to decrypt the ciphertext. At the same time, we must ensure that x 0 is the largest. The public key is pk, pk =< x 0 , x 1 , ..., x l >.

Encryption
For each original data m, select an integer r randomly. r is an integer that is generated randomly during the encryption process and n is the number of bits encrypted at one time. Private key p must satisfy Denote the encryption function as E [·]. The corresponding ciphertext c can be obtained by The greatest common divisor problem is introduced by adding a number of ciphertexts with a plaintext of 0, so that the value of ciphertext is large, and the ciphertext is not easy to be decrypted. Equation (3) can be formulated as Equation (4) by introducing the greatest common divisor problem where c represents the ciphertext of m after adding the greatest common divisors ∑ i∈S x i , and S is a subset of {0, 1, ..., l}.

Decryption
With corresponding private key p, the original plaintext m can be derived by where the decryption function is denoted as D[·].

Homomorphic Addition
According to Equations (3) and (5), we have the following derivations for the ciphertexts of m 1 and m 2 : where the result of encryption after addition of plaintexts m 1 and m 2 is the same as the result of encrypting the plaintexts m 1 and m 2 and then adding in the ciphertext domain.

Homomorphic Multiplication
According to Equations (3) and (5), we have the following derivations for the ciphertexts of m 1 and m 2 : = [((m 1 + 2 n r 1 )(m 2 + 2 n r 2 ) + p((m 1 + 2 n r 1 )q 2 + (m 2 + 2 n r 2 )q 1 + pq 1 q 2 )) mod p] mod 2 n = ((m 1 + 2 n r 1 )(m 2 + 2 n r 2 )) mod 2 n = (m 1 m 2 + 2 n (m 1 r 2 + m 2 r 1 + 2 n r 1 r 2 )) mod 2 n = m 1 m 2 (11) where the result of encryption after multiplication of plaintexts m 1 and m 2 is the same as that of multiplication of ciphertext after encryption of plaintext m 1 and m 2 . In conclusion, DGHV homomorphic encryption satisfies both additive homomorphism and multiplicative homomorphism, which means that DGHV is a fully homomorphic encryption scheme. Figure 1 plots the sketch of the proposed scheme, which is composed of four main phases: image encryption, data hiding, data extraction and image restoration. For the data hiding algorithm, refer to [23]. Reversible data hiding is an effective authentication or content integrity verification technique in which hidden data can be completely extracted and the original carrier can be recovered non-destructively after data extraction [30]. After the image owner encrypts the image, the data hider will embed the hidden data in the encrypted domain. With the private key, the receiver can use the corresponding decryption method to extract the encrypted embedded data. Then, by using the steps of extracting algorithm, the embedded data in the encrypted domain is extracted. Only when the private key is possessed, the receiver can obtain an image containing the embedded data similar to the original image. After decrypting, the embedded data can be extracted and the original image can be restored.

Image Encryption
According to the property of DGHV homomorphic cryptosystem, the parameter r(k) is selected randomly to ensure security. It is difficult to embed additional data directly into an encrypted image, because magnitude relationships among plaintexts cannot be kept to the corresponding ciphertexts. For this reason, we designed a corresponding data hiding strategy to embed additional data in the encrypted domain. With this strategy, we can shift the difference histogram for hiding data in encrypted domain.
Firstly, groups of two selected pixels are chosen from original image. Denote the two pixels in kth group as P 1 (k) and P 2 (k). A data owner selects an integer r(k) randomly, and encrypts P 1 (k) and P 2 (k) with the public key ∑ i∈S(k) x i , where C 1 (k) and C 2 (k) are the corresponding ciphertexts of P 1 (k) and P 2 (k), respectively, and S(k) is a subset of {0, 1, ..., l} in the kth group. We let n = 9 in this paper to ensure correct decryption. To ensure the security of the ciphertext, we select an integer r 2 in each group, which satisfies r 2 = i and encrypt 0 to generate C 2 (k) , Let D[C(k)] be the decrypted version of C(k) and D[C(k)] satisfies A data owner encrypts the original image, and a data hider can obtain the encrypted image and r 2 . The embedded data are also encrypted in this method.

Data Hiding
With the same method of the data encryption, the additional data can be encrypted. Data hider calculates the difference of the two pixels of a group, and then calculates the positive peak point and the negative peak point of the histogram. The pixels corresponding to the two peak points are used to embed additional data.

Data Embedding
After the image owner encrypts the original image by using the public key pk, the encrypted image I m and r 2 is transmitted to the data hider. When the data hider receives the encrypted image, the data hider obtains C(0) by receiving r 2 , and makes C 2 (k) subtract C(0) to recover C 2 (k). The C 2 (k) can be calculated by Then, the data hider embeds the additional data in the encrypted image and obtains a new image I w . Finally, the data hider sends I w and r 2 , the position of the positive peak point of the histogram and the position of the negative peak point of the histogram to the receiver, and the receiver can extract the embedded data and restore the original image with private keys.
In the proposed algorithm, the adjacent two encrypted pixels C 1 (k) and C 2 (k) are subtracted to obtain C d (k). Then, the position of the positive peak point of the histogram is recorded as EC max , and the position of the negative peak point of the histogram is recorded as EC min . When C d (k) = EC max , one bit of encrypted additional data is embedded in the adjacent encrypted pixel C 1 (k), and, when C d (k) = EC min , one bit of encrypted additional data is embedded in the adjacent encrypted pixel C 2 (k). In this paper, only one round of extra information is embedded.
The specific additional data embedding steps are shown as follows. Firstly, the difference between adjacent pixels in the encrypted domain is calculated as Secondly, the position of the positive peak point of the histogram and the position of the negative peak point of the histogram are selected, respectively. When C 1 (k) C 2 (k), the additional data embeds in the right half of the histogram. When C 1 (k) < C 2 (k), the additional data embeds in the left half of the histogram. The histogram shifting process is shown in Figure 2.
Thirdly, the additional information is embedded in the carrier image in the encrypted domain by shifting the difference histogram. When the additional data is encrypted, we denote it as E w (w). In this paper, the same public key ∑ i∈S(k) x i and random number r(k) are used to encrypt the additional information. Bit 1 and bit 0 are denoted as E(0) and E(1), respectively, in the encrypted domain and encrypted by the public key, which is the same as the public key used to encrypt the additional information. The embedded image pixels are C w1 (k) and C w2 (k). C w1 (k) and C w2 (k) are calculated as follows: If C w2 (k) = (C 2 (k) + E(0)) mod x 0 Denote P w1 (k) and P w2 (k) as the plaintext versions of C w1 (k) and C w2 (k), respectively. The effect of data hiding on plaintexts is to change P 1 (k) and P 2 (k) to P w1 (k) and P w2 (k): If P 1 (k) P 2 (k), To ensure the security of the ciphertext, according to Equations (14), we generate C w2 (k) with r 2 in each group, It can be seen that the embedding algorithm can be used not only in the plaintext domain, but also in the ciphertext domain. In other words, the data owner sends the encrypted image to the data hider, and then the data hider embeds the encrypted additional data into the encrypted image directly.

Data Extraction and Image Restoration
In the proposed scheme, data extraction and image restoration can be completed together. There are two ways to extract the hidden data and restore the original image.

Extract the Hidden Data and Restore Original Ciphertext Image in Encrypted Domain
When the receiver receives the encrypted embedded image, r 2 , the position of the positive and the negative peak point of the histogram, which has 18 bits of side information, the receiver can find the embedded position and use the following method to extract the embedded additional data and restore original pixel: Firstly, the receiver obtains C(0) by receiving r 2 , and makes C w2 (k) subtract C(0) to recover C w2 (k). The C w2 (k) can be calculated by Secondly, calculate the difference between adjacent pixels: where C wd (k) is denoted as the difference between adjacent pixels, which contain the additional data in the encrypted domain. Thirdly, extract embedded data and restore original pixels: where W(w) represents the extracted additional data after decryption, p is the private key and P(k) is the restored image after decryption. It can be seen that the receiver can extract the additional data and restore original image without any losses.

Extract the Hidden Data and Restore the Original Image after Decryption
When the receiver receives the encrypted embedded image, the position of the positive peak point of the histogram and the position of the negative peak point of the histogram, the receiver can find the embedding position and use the following method to extract the embedded additional data and restore the original image: Firstly, decrypt the encrypted embedded image by: where P w (k) represents the decrypted image including the additional data in the plaintext domain. Secondly, calculate the differences between adjacent pixels: where P wd (k) is denoted as the difference between the adjacent pixels in the plaintext domain.
However, since the value range of the grayscale image pixel in the plaintext domain is [0,255], the pixel value of some images restored to the plaintext domain may be 256 after decryption, which is the overflowing problem. At this point, we change the pixel value 256 to 255, and restore their position information. Then, we can embed these positions into the watermarked image through a reversible algorithm. On the receiver side, the receiver can recover the image by extracting the position information with the reversible algorithm. After that, the legal receiver can extract the additional data and restore the original image without any loss.

Experimental Results
In the experiment, we objectively evaluated the experimental results from the aspects of computational cost, peak signal-to-noise ratio (PSNR), embedded rate and imperceptibility. At the same time, by comparing with Xiang [23] and Xiang [24], it showed that the DGHV fully homomorphic encryption watermarking algorithm proposed in this paper is greatly improved in terms of computational cost.

Computational Cost
A series of experiments confirmed the performance of proposal algorithm. Different standard test images were used to verify the results of the experiment. One of the important performance indicators for measuring the homomorphic encryption algorithms is computational cost. The lower is the computational cost, the more useful the program is in the application. Table 1 presents four grayscale images of size 512 × 512 for testing and shows the average computational cost of ten times under different capacity.
On the one hand, it can be seen from the experimental data in Table 1 that the encryption and decryption times of different images are slightly different while the embedding capacity and the carrier image size are the same. Besides, the embedding and extraction times are also slightly different, respectively, which are mainly due to the different images in the text. On the other hand, at the stage of embedding additional data and extracting data, the computational cost increases as the number of embedded data increases.
By comparing a plaintext and the resulting version from the corresponding ciphertext, the results show that the pixel values of the original image are the same as the pixel values of the image pixels after decryption. As for the extraction of the additional data, there is no data loss or disorder of the sequence in the extraction of the embedded data.  Table 2 uses three same images, respectively, and selects Lena diagrams but different size as a group to compare the average computational cost of ten times of embedded additional data with different number of bits. In Table 2, it can be seen that the computational cost of the proposed algorithm corresponds to the size of the embedded data and the original image. For the same embedding rate, the smaller is the carrier image size, the lower is the computational cost. Furthermore, the computational cost is positively related to the size of the original image. The larger is the image, the higher is the computational cost needed for the encryption. The size of the embedded data has less influence on the computational cost in comparison with the image size.

Security
The security of hiding information in the encrypted domain is one of the important issues. The algorithm proposed in this paper is an algorithm for hiding information in the DGHV full homomorphic encrypted domain, which has both the property of addition homomorphism and multiplication homomorphism. To ensure the security of additional information, the greatest common divisor problem is introduced in the proposed algorithm. That is to say, some ciphertext sets x i with zero in the plaintext are added. Let x i be the public key set and randomly select a subset of x i as the public key for the encryption. Because the additive subsets are ciphertext with 0 in the plaintext, these subsets have no effect on the decryption. If an attacker does not know the private key, he cannot decrypt the encrypted image and cannot extract the additional information since the plaintexts were encrypted with the random integer r(k). Therefore, the security of the embedded information is ensured.

Imperceptibility
PSNR is one of the important indicators to measure the imperceptibility of the watermarking algorithm in the spatial domain. Generally speaking, the larger is the PSNR, the better is the imperceptibility achieved. Table 3 lists the PSNR values with four different images sized by 512 × 512.
We can see that all of the PSNR values are over 50 dB, showing the embedding distortion is smaller. In Table 3, it can be seen that, when the embedding capacity is larger, the PSNR decreases accordingly.  Figure 3 plots the four original images (Lena, Airplane, Lake and Man), the encrypted images after watermarking, the directly decrypted images, and the restored images. In total, 4096 bits of data are embedded in the encrypted images, respectively. We have the following observations from this figure:

Original Images and Their Processed Versions
(1) The encrypted image after embedding data has no correlation with the original image.
The experimental results show that the additional information in the DGHV fully homomorphic encrypted domain has achieved good results, which depends on the security of DGHV fully homomorphic encryption. That is, without the private key, the encrypted image cannot be decrypted and the additional information cannot be extracted in the encrypted domain. (2) Comparing the original image with the decrypted image containing the additional information, the visual distortion due to the watermark is small. After the original images are encrypted and embedded with 4096 bits of data, the PSNR values of the watermarked images are greater than 57 dB in Figure 3, which has satisfactory imperceptibility.
The information embedded in the fully homomorphic encrypted domain in the proposed algorithm is reversible, and the original image can be successfully recovered without distortion after decrypting and extracting the embedded information.

Performance Comparison
The proposed algorithm in this paper uses the histogram shifting technique to embed additional information in the DGHV encryption domain. Compared with Xiang [23], the algorithm proposed in this paper has lower computational cost.
To compare the proposed method with two existing methods [23,24], Figure 4 plots the PSNR values at different embedding rates by using the Lena image and the Airplane image, respectively. It can be seen in Figure 4 that, at the low embedding rate, the PSNR is slightly different from that of the method in [23]. At the high embedding rate, the PSNR of the proposed algorithm is higher than that of the method in [23]. In addition, the PSNR value of the proposed method is much higher than that of the method in [24]. Compared with Xiang [24], the proposed algorithm has greatly improved the imperceptibility of the image after embedding for the same embedding rate.  Method in [23] Method in [24] Proposed method Method in [23] Method in [24] Proposed method (b)

Conclusions
In this paper, we propose a reversible data hiding algorithm with DGHV fully homomorphic encryption and analyzed the feasibility of the scheme from the perspective of information entropy. In the proposed algorithm, we used two adjacent pixels as a group and encrypted the two pixels in a group with the same parameter. In such a way, the correlation between adjacent pixels can be transmitted to the encrypted domain. As a result, the difference histogram distribution of the encrypted image is not uniform. Thus, there is a residual entropy space for additional information embedding in the encrypted domain.
This method has better solved the problems of quickly encrypting multiple bits of data and embedding additional data in an encrypted domain. This algorithm has lower computational cost, higher security, and better imperceptibility. In future research, how to embed a robust and reversible watermark into this encrypted domain will be considered.