Hyperspectral Image Compression Using Vector Quantization, PCA and JPEG2000

: Compression of hyperspectral imagery increases the efﬁciency of image storage and transmission. It is especially useful to alleviate congestion in the downlinks of planes and satellites, where these images are usually taken from. A novel compression algorithm is presented here. It ﬁrst spectrally decorrelates the image using Vector Quantization and Principal Component Analysis (PCA), and then applies JPEG2000 to the Principal Components (PCs) exploiting spatial correlations for compression. We take advantage of the fact that dimensionality reduction preserves more information in the ﬁrst components, allocating more depth to the ﬁrst PCs. We optimize the selection of parameters by maximizing the distortion-ratio performance across the test images. An increase of 1 to 3 dB in Signal Noise Ratio (SNR) for the same compression ratio is found over just using PCA + JPEG2000, while also speeding up compression and decompression by more than 10%. A formula is proposed which determines the conﬁguration of the algorithm, obtaining results that range from heavily compressed-low SNR images to low compressed-near lossless ones.


Introduction
Images have been used since their inception to study different subjects of interest. Information can be remotely captured (with the only physical interaction being at most lighting the subject) and then analyzed by powerful algorithms to extract information the naked eye cannot see.
Traditionally, three different wavelengths have been captured per pixel, the building blocks of an image. This is enough to represent all the colors a human can see, since we perceive different colors as a mixture of red, green, and blue. Hyperspectral images extend this concept by sampling hundreds of wavelengths per pixel, making what is called a spectral signature, ranging from below the infrared to above the ultraviolet.
Spectral signatures greatly improve remote sensing capabilities, and have been used to identify materials within an image [1], for agriculture [2], or even for cancer detection [3].
These studies greatly benefit from the massive amounts of data that can be captured. A single image is usually millions of pixels in size, with each containing a couple hundred samples [4]. When dealing with this data, a bottleneck arises around storage, limited in planes and satellites, where capture devices are usually mounted on.
A viable solution is to compress the images, which can be done in a lossy or lossless way [5][6][7]. If all of the information is of critical importance, and any distortion to the original image could cause errors further down the line, lossless compression is the way to go. For most purposes, however, a moderate loss can be allowed, achieving much higher compression ratios.

•
Ideas from image compression have been extended from 2D to 3D space [9,10], adapting to the 3D-like matrix structure of hyperspectral images.
• Dictionary like methods have been proposed [11] in which spectral libraries are used to aid with compression.
• Dimensionality reduction has been used in the spectral direction, along with image compression on each of the reduced planes [12].
In this paper, a novel technique for lossy compression, composed of multiple steps, is explored: • First, nonlinear dimensionality reduction is applied in the spectral direction. This is accomplished in two steps [13]: pixels are clustered using the nearest neighbor algorithm, and then Principal Component Analysis (PCA) is performed on each cluster with the same number of target dimensions. The aim of this step is to eliminate spectral correlation, transforming the original bands to Principal Components (PCs). This transform has been shown to preserve relevant image characteristics in the transformed space [14], which is an interesting feature for compression.
• Secondly, each PC is individually compressed using two techniques found in the JPEG2000 standard [15]: Wavelet transforms and Block Coding. Compression is tuned for each PC, allocating more bit depth to the most important PCs.
• Finally, all of the data is packed into the compressed stream, along with the necessary metadata to decompress it. This metadata notably includes Wavelet and Block Coding settings, as well as all of the inverse PCA transforms and the pixel classification, which is arithmetically encoded.
We base the results of our algorithm on two metrics: Signal to noise ratio (SNR) and compression ratio. A third metric is derived from those: distortion-ratio, which measures, given a compression ratio, how good the quality (SNR) is. Thus, it cannot be measured directly, but when given two images compressed at the same ratio, we say that the one with higher SNR has a higher distortion-ratio metric. Results show that an image can be compressed to around 1/200 of its original size with no perceivable loss of information to the naked eye and up to 2000 times while preserving good SNR.
The rest of the paper is organized as follows: We first take an in-depth look at the algorithm, to see exactly how every step works. Then the implementation is explained, and the results are shown and compared with [12]. Finally, conclusions are drawn from the results, and we hint at possible future research lines in which we think compression could be improved even more.

The Algorithm
Hyperspectral images H can be seen as 3-dimensional cubes of data of size N x × N y × N z . The x and y dimensions are spatial, while the third, z, is spectral and accommodates the different wavelengths. The nature of hyperspectral images makes two types of correlation appear: • First, we have spectral correlation: bands that are close together have similar values in the same pixels. This is due to the spectrum of the different materials being a continuum, and to our samples being close together (<10 nm differences in wavelengths).
• On the other hand we have spatial correlation: neighboring pixels are expected to have similar values, since image features will usually span regions many pixels in size.
Compression in general takes advantage of redundancies in the input data. In this case the spectral redundancies are more pronounced than the spatial ones, and treating them independently can improve our distortion-ratio performance. Traditional image compression algorithms, when applied per band, do not take into account the spectral correlation, and lose compression ratio. Extensions to the 3-dimensional space assume the correlation is the same both in the spectral and spatial direction, which is not usually the case, and either distort the image more (if the large correlation is assumed) or achieve worse compression ratio (if the small correlation is assumed).
Tailoring the compression algorithm to the specifics of the input data can improve its performance, and that is the core of our approach: extending ideas found in the literature to better suit the nature of hyperspectral data.
Our starting point is the work done in [12]. The idea is to combine dimensionality reduction (PCA) along the z-axis with JPEG2000 compression in the (x, y) plane. Every plane (the reduction will not usually collapse z to a single point) is individually compressed, since the spectral correlation is mostly gone after reduction.
We aim to improve it in two ways: First, we add a vector quantization step prior to the PCA reduction to better adapt to local spectral statistics, by independently reducing clusters of pixels with similar characteristics. Secondly, we do not use a commercially available JPEG2000 solution. Instead, we have followed the standard and implemented the algorithm with a few variations, adapting to our needs (e.g., eliminate certain markers that are useful for partially decoding images, but that bloat image size). Our modified version follows these steps, which can be seen in Figure 1.

Dimensionality Reduction
Linear reduction (such as PCA) assumes a linear correlation model. This works for most cases, and PCA is well-known and used in the literature, but it can be improved. In our case, we decided to use VQPCA (Vector-Quantization PCA) [13].
VQPCA is a piecewise linear reduction model. It is a midway point that benefits from the simplicity of linear models, but offers results close to nonlinear ones. It works by quantizing the pixels (vectors) obtaining a user-defined number of representatives, which partition the z-dimensional space into a Voronoi [16] diagram. Each Voronoi cell is independently reduced by a different PCA, trained using only the points of its own cell. With this, we are exploiting pixel spectral similarities twice: similar pixels are grouped together in Voronoi cells, and then a PCA decides which directions maximize variance within each cluster. The result is that rounding errors are reduced, since applying PCA to clusters of similar pixels ensures less variance, and thus better resolution in the quantization step (Section 2.3).
In our case, we use the K-Means [17] algorithm to group the pixels p ∈ R n (so N z = n) of the image H into c different clusters. This yields centroids P 1 , . . . , P c and sets S 1 , . . . , S c corresponding to the Voronoi cells of the Voronoi diagram with centers P 1 , . . . , P c . We then perform PCA on each of the sets S i , choosing a target dimension m for the reduced space. We obtain matrices W i ∈ M m×n andW i ∈ M n×m that respectively project the original data onto the reduced space and immerse the reduced data into the original space. With this we obtain the reduced image H : with this approach, similar pixels are grouped together before projection, so the PCA algorithm can ignore the variance present between clusters, and better separate the pixels within. The cost is first: we need to store multiple PCA inversion matrices, and second: we also need to store the index i for which cluster each sample belongs to in order to invert the process when usingW i . Compression is all about reducing size, and this actually increases it, so we might wonder where the benefit is. Since this algorithm achieves better separation than PCA, we might get the same SNR with VQPCA and m dimensions than with PCA and m + 1 dimensions. The extra dimension saved will compensate the extra space needed by VQPCA. Note that, in the case that only one cluster c is used, VQPCA is equivalent to PCA. This makes VQPCA a good choice since it can always fall back in a well-tested method in case the clustering does not improve compression. We will later see the effect of the number of clusters on the distortionratio performance.

Wavelet Transform
After reducing H, H is now a cube of size N x × N y × m. We have removed the z-axis correlation, so now we will process each PC B j ∈ M N x ×N y j = 1, . . . , m separately.
A wavelet transform exploits spatial redundancies in a matrix to separate it into regions with different but consistent patterns, which can later be efficiently encoded. The wavelet transform selected is the CDF 9/7 from the JPEG2000 standard [18]. It will be applied to each PC independently, and we can configure how many times it is recursively applied. More passes will compress the image more, but will also produce results of a lesser SNR.

Variable Bit-Depth Quantization
Both the dimensionality reduction and the wavelet transform work on floating point numbers. When coding, it is easier to deal with integer arithmetic, since it usually takes up less space, and also because processing integers is faster. So, before coding, our data is quantized.
Quantization means choosing an integer interval ( −2 q−1 + 1, 2 q−1 − 1 in this case), and mapping the input data to that interval, rounding off any extra decimals. The exponent q is what we call bit-depth. A larger bit-depth means better SNR, but also worse compression ratio. Since not all the PCs B j have the same amount of information (PCs with lower j have more information about the input data) we can use different bit-depths q 1 , . . . , q m for the different PCs.

Coding
After quantization, the block coder from the JPEG2000 standard comes into play. Exploiting the patterns left by the wavelet transform, the image is coded using a binary arithmetic coder. The standard specifies a way of marking the compressed stream so that partial, lower SNR representations of the original data can be recovered without decompressing the whole image. For an image that is sent over the Internet this is useful, since previews can be obtained without receiving all of the data. Analyzing images requires the full picture to be available, so we have decided to not use standard-compliant JPEG2000 compression, but a custom format which eliminates all markers, resulting in a size reduction.

Metadata
Once coding is done, the original data H is compressed. However, to get the full picture we also need the metadata. Hyperspectral metadata usually contains geographical information, the wavelength of each band, sensor calibration settings and more. This is usually stored in plain text, but we also compress it by storing it in native raw values (byte, short, int, . . . ).
On top of that, we of course need metadata about the algorithm itself. The different configuration parameters are saved along the way (bit-depth values, wavelength pass number, and VQPCA recovery metadata). These are all encoded raw except the dictionary which maps pixels to their Voronoi cell. This information, a matrix of indices 0, . . . , c − 1, is arithmetically coded using a predictive model. It works by predicting the index of a pixel based on already processed (in a line-by-line sweep of the image) neighboring pixels. If the prediction is correct, a special symbol 0 is encoded indicating a hit, while if the prediction fails, the full index i = 1, . . . , c is encoded. The predictive model simply uses the previous symbol to guess the current one.
With that, the full image is compressed.

Implementation
The algorithm has been fully implemented in Java, under a command line interface named Jypec (Java hYPErspectral Compressor). The code can be found on GitHub [19].
Both the dimensionality reduction and wavelength steps work on 32-bit float matrices. Intermediate accumulators and calculations use 64-bit double precision numbers. This was chosen since the raw image data, usually 16-bit integers, fit without loss on floats, taking up half the memory we would need if we used double precision. We found no significant difference in distortion-ratio performance when double precision was used throughout.
After quantization, the rest of the operations are done over q j bit sign-magnitude integers for each PC B j .
The nearest neighbors algorithm used comes from the SMILE [17] API. Unlike other implementations [20], it can be trained with a subset of samples speeding up computation times.
The PCA algorithm has been implemented from scratch, relying on EJML [21] for fast matrix multiplication.
The wavelet transform implements the lifting [22] scheme to accelerate execution, and coefficients are kept between ±0.5 with renormalization procedures, to avoid coefficient runoff.
Quantization is done via a deadzone quantizer, which for a depth of q bits maps to 2 q − 1 values. (One value is lost since the mapping interval for zero is twice as wide as the rest).
Finally, the image is divided in smaller blocks which are then coded according to the JPEG2000 standard. The partitioning is fully deterministic so as to not have to store positional information about the blocks.
We allow fine tuning of the different parameters of the algorithm to test different configurations and see which one obtains the best results.

•
Subsampling factor t: Indicates the percentage of pixels used for dimensionality reduction training (which are randomly chosen). The lower it is the faster the algorithm performs (training costs for VQ and PCA increase quadratically and linearly, respectively, with t), but usually at the cost of distortion-ratio performance (see Section 4).
• Number of wavelet passes w: How many times the wavelet is recursively applied to each PC.
• Target dimensions m: Number of dimensions that we are reducing the original image to.
• Target bit depth q: Base bit depth used to process all of the PCs created after reducing H.
• Number of clusters c used for Vector Quantization in VQPCA.
• PC bit shave: This is how variable bit-depth is specified. For each PC, one can indicate how many bits to shave (s 1 , . . . , s m ) from the target bit depth q, creating the different values for q 1 , . . . , q m = q − s 1 , . . . , q − s m .

Experimental Results
We are interested in obtaining a good distortion-ratio performance. That is, given a certain compression ratio we want the minimum distortion possible or, conversely, given a distortion we want the maximum compression ratio.
The metric used to measure distortion is the signal to noise ratio SNR (Equation (2)). This is used to measure the difference between the original image H a and the result of compressing and then Remote Sens. 2018, 10, 907 6 of 13 uncompressing it (H b ). SNR is based on the mean square error MSE (Equation (3)) between the samples h a k , h b k of the hyperspectral images H a , H b , both with K = N x × N y × N z samples.
where σ(H) 2 (Equation (4)) is the variance of H, used to stabilize the metric when there is too much variance in the input image.
with µ(H) being the mean value of the samples h k ∈ H.
We can see that, under this definition, SNR is not symmetric. From now on, when we refer to an uncompressed image's SNR, H a will always be the original image, allowing us to compare results in a fair manner.
Unless otherwise stated, the parameters from Section 3 are set as follows: • t: We found no relevant effect on distortion-ratio when setting t > 0.25 so that is the value used throughout the experiments. It did lower slightly when 0.25 > t > 0.1, and rapidly degraded after t < 0.1, so values under those should be avoided.

•
The number of wavelet transform passes w is set to three, following the usual value set in the literature [9,10].

•
The number of dimensions m and global bit depth q are not set. We begin by optimizing those in Section 4.1.
• No bit shaving is done at this point. Bit shave effects are studied later in Section 4.3.
• VQPCA is configured to use one cluster c (equivalent to PCA). This will later be optimized in Section 4.2.

Varying Dimensions and Bit Depth
We start by analyzing the effect of changing the dimensions (m) and bit depth (q) when compressing our image. All images present similar behavior when being compressed, so for brevity we show compression results of the Cuprite image in Figure 2a for SNR and in Figure 2b for compression ratio.
The idea now is to maximize SNR for a given compression ratio. We take the values from Figure 3a,b and create contour maps with 50 lines each. We then intersect each compression ratio contour line with all of the SNR contour lines, and keep the intersection(s) with the highest SNR line. In this way we create a map of points that gives us a configuration for the algorithm which maximizes SNR for a given compression ratio (see Figure 3a). The number 50 is chosen since it gives a good amount of points to later create a regression line, with more points not being significant for the results.   Each hit represents an optimal configuration for some image.
We repeat this process for every image in our test library, creating a 2D histogram showing what configurations have been optimal (in terms of maximum SNR for a given compression ratio) more times. Polynomial regression over this histogram gives us a path that, going from highest to lowest compression ratio, tells what configuration to use to get the optimal SNR for that compression ratio (See Figure 3b). We will refer to values from this regression (shown in Equation (6)) as the QR configuration, which gives us optimal values for q and m for an index j (which ranges from zero onwards).

Optimal Number of Clusters
Now we want to see the effect of changing the number of clusters c of the VQPCA algorithm, following the optimal distortion-ratio path given by the QR configuration. The general trend can be seen in Figure 4a, which shows points at which SNR is maximized given a certain compression ratio. An outlier, (the CPR image) which has a different behavior, is shown in Figure 4b. The rest of the images had behaviors following the general trend.  In any case, we now add the parameter c to our QR configuration, creating the QRC configuration, as seen in Equation (7).

Variable Bit-Depth
So far, every PC of the image has been compressed with the same amount of bits. Dimensionality reduction usually keeps more information in the first PCs, progressively adding less detail with subsequent ones. This suggests the possibility of allocating different bit depths (q 1 , . . . , q m ) for the different m PCs.
The way we do this is by specifying the initial bit depth q, and then the bit shave values s 1 , . . . , s m , yielding bitdepths q 1 , . . . , q m for all m PCs (see Section 3).
We have tried 13 different shave patterns, to see which one yields the best distortion-ratio performance. They can be seen in Figure 5a. Some have been selected based on their properties (e.g., pattern 8 halves the information stored after every 2 i th PC, pattern 9 halves the information stored on every PC), while others are just tests to see how they would perform. We compress our image test bench, varying the pattern as well as the QRC configuration, obtaining the heat map from Figure 5b, showing which configurations obtained optimal distortion-ratio performance more often. In this case, Patterns 5 and 8.
In Figure 6 we see the extent of the increase in distortion-ratio performance, relative to using no bit shaving. For the same SNR, we achieve compression ratios up to 1.8 times better when using the bit shaving pattern 8.

Final Results
In the end, we have optimized the compression parameters q, m, c and s i , and we can define a configuration F(j) = {QRC(j), s i (j)} = {q(j), m(j), c(j), s i (j)}. Increasing j will result in an increase of SNR and a decrease of compression ratio, all while following the optimal distortion-ratio path. Figure 7 shows the results for our test image set, where we have compressed our images with configuration F(j), j = 0, . . . , 19. With the exception of the CPR image, all the others follow a similar slope, but achieve the same qualities at significantly different compression ratios. Numbers are not always descriptive of the results, so, as an example, we include the results of compressing the SUW image. Figure 8 shows how the SNR progresses from left to right, decreasing the compression ratio at the same time. Depending on what we want to use the image for, we will have to choose which j to set. Numbers are not always descriptive of the results, so, as an example, we include the results of 282 compressing the SUW image. Figure 8

Discussion
Parameter optimization results show that, for configurations F j with low values of j, it is best to increase the number of dimensions m and bit depth q at the same rate, starting with m = 1, q = 5, suggesting that both have equal impact in distortion-ratio performance. After reaching the m = 4, q = 8 mark, it is best to increase m around three times as much as q to keep on the optimal path. Bit depth is thus important only up to a certain point, after which the added bits have little impact compared to adding more PCs.
With respect to c, it is best kept at 1 (where VQPCA is equivalent to PCA) when j < 5. After that, adding more clusters does slightly improve distortion-ratio performance. This is however image dependent, and a clear correlation was not found.
Bit shave is an important tool that highlights the fact that dimensionality reduction concentrates information in the first PCs. From our tests, Pattern 8 clearly achieved the best results, shaving one more bit every time the PC counter goes past a power of two. The amount of bits shaved, despite being small compared to other patterns, is enough to significantly reduce image size, while at the same time preserving good SNR. Compression is also faster when shaving since less bits are coded.
Configurations F j offer great distortion-ratio performance across all test images. It is worth noting, however, that results are sometimes image dependent, so optimal parameter selection for a concrete image may vary slightly.

Comparison with Other Algorithms
Comparison of compression algorithms is a tricky subject, since it can only be done when using the same compression metrics and images. We have been able to compare our algorithm (Jypec) with the PCA + JPEG2000 approach from [12], showing an improvement in distortion-ratio performance, as seen in Figure 9.
Execution times (both in compression and decompression) have also improved ( Figure 10). This is mainly driven by the bit shaving technique, which saves time when encoding the final bit stream as each band requires fewer bit planes to be coded. . Distortion-ratio performance comparison between the work done in [12] and the algorithm developed here. Except in some isolated cases, Jypec achieves better distortion-ratio performance, having higher SNR for the same compression ratio and higher compression ratio for the same SNR. . Execution times clearly decrease for both compression and decompression. Compression ratio also improves consistently, being always better and almost double in some cases. SNR mostly improves, and where it does not, the increase in compression ratio makes up for it (see Figure 9).

Conclusions and Future Research Lines
A novel hyperspectral image compression algorithm is presented, based on the idea of first applying dimensionality reduction along the spectral axis, and then individually compressing each Principal Component (PC).
Vector-Quantization Principal Component Analysis (VQPCA) is chosen for reduction over the well-known PCA algorithm, which is a particular case of VQPCA. This method clusters the input pixels into groups, with each getting its own PCA fitted to improve separation.
After reduction, each of the resulting PCs are individually compressed using ideas from the JPEG2000 standard. A wavelet transform is applied, splitting the PC into four regions with increased local redundancy that allows for better compression, while losing minimal information in the process. A variable bit-depth quantizer transforms each PC from floating point to integer arithmetic, allowing PCs with more information to have greater bit-depths. Finally, a deterministic blocking algorithm, along with a custom version of the JPEG2000 block coder, encode and compress the information into the final bit stream, avoiding any markers that might bloat the final size.
Aside from compressing data, the image as well as the algorithm metadata are also compressed and embedded into the output stream, enabling decompression with just one file.
Results show an improvement in distortion-ratio performance over PCA + JPEG2000, with a 1 to 3 dB increase in SNR for the same compression ratio. This is mainly driven by the variable bit-depth, allocating more bits to the most important PCs. Compression and decompression times also exhibit a speedup of 16% and 12% respectively.
Jypec, the framework developed for testing these ideas, can be a great platform to expand this study. More compression flows can be easily explored in the future, especially by testing other nonlinear dimensionality reduction algorithms, or including different coding techniques aside from JPEG2000. Finally, it is clear that compression is highly dependent on the input data. Studies which optimize compression parameters to obtain the best distortion-ratio performance will be highly valuable to further enhance how we treat hyperspectral data. Funding: This work has been supported by the by the Spanish MINECO projects TIN2013-40968-P and TIN2017-87237-P.