On the Utilization of Reversible Colour Transforms for Lossless 2-D Data Compression

: Reversible Colour Transforms (RCTs) in conjunction with Bi-level Burrows–Wheeler Compression Algorithm (BBWCA) allows for high-level lossless image compression as demonstrated in this study. The RCTs transformation results in exceedingly coordinated image information among the neighbouring pixels as compared to the RGB colour space. This aids the Burrows–Wheeler Transform (BWT) based compression scheme and achieves compression ratios of high degree at the subsequent steps of the program. Validation has been done by comparing the proposed scheme across a range of benchmarks schemes and the performance of the proposed scheme is above par the other schemes. The proposed compression outperforms the techniques exclusively developed for 2-D electrocardiogram (EEG), RASTER map and Color Filter Array (CFA) image compression. The proposed system shows no dependency over parameters like image size, its type or the medium in which it is captured. A comprehensive analysis of the proposed scheme concludes that it achieves a signiﬁcant increase in compression and depicts comparable complexity similar to the various benchmark schemes.


Introduction
Lossless image compression is addressed as one of the most challenging tasks among the research community. Lossless coding methods are employed to retain the quality of an image in visual quality critical applications. Among the fields like remote sensing, business documentation, digital radiography, medical industry and satellite imagery, work is being done to achieve significant error-free compression [1][2][3].
Lossless image compression is also utilized in techniques such as information hiding and RASTER map compression [4,5]. Satellite and medical imagery are often high-resolution and at times require the original data to be protected from any loss. Different types of images such as grayscale, multi-spectral images, hyperspectral images, electrocardiogram (EEG), and Magnetic Resonance Imaging (MRI) have been processed to achieve lossless compression [6][7][8][9][10][11][12][13].
Lossless compression is majorly achieved due to the redundancy of intensity levels in the image thus making it a tedious task to bring down the size without compromising its quality [14][15][16]. By transforming the image data earlier to compression, higher image data redundancy can be achieved which in turn leads to higher compression [14]. Highly correlated image pixels regions offer greater information redundancy. This inter-pixel redundancy is utilized for effective lossless compression. This research work is oriented around the enhancement of lossless compression of colour images based on the Bi-level Burrows-Wheeler Compression Algorithm (BBWCA) utilizing colour space transformations. The BBWCA developed by Khan et al. in [14] utilized Reversible Colour Transform (RCT) focusing on its extensive possibilities of applications given in [15]. In the proposed technique, various colour space transformations were analyzed for their increase in inter-pixel redundancy leading to higher compression. Section 2 presents the various techniques employed for lossless compression with a major focus on the literature related to the Burrows-Wheeler Compression Algorithm (BWCA). RCTs are presented in Section 3. Experimental results in comparison with benchmark schemes and different image types are given in Section 4. Conclusions, contributions and future directions for the study are presented in Section 5.

Burrows-Wheeler Compression Algorithm (BWCA)
The classical Burrows-Wheeler Compression Algorithm (BWCA) was presented by J. Abel in [17] and is shown in Figure 1. It comprised of four stages which are explained briefly below and are depicted in Figure 2. A comprehensive examination of the scheme and its various stages is presented in the works of [14,15,[18][19][20][21]. Currently, various variants of the algorithm have been developed by modifying either the complete scheme or its various stages [18,22,23].  The source data is input to the first stage of the Burrows-Wheeler Transform (BWT). The BWT is a lossless transform invented by M. Burrows and D. J. Wheeler for text compression [19].
By increasing the recurrence of image pixels, redundancy is enhanced which in turns leads to better compression. In the case of BWCA, it is the BWT that generates runs of symbols which is encoded by the MTF transform in such a way that it increases the occurrence of zeros. The later stages of RLE and entropy coding the various stages of BWCA are detailed as follows in order to elucidate their effectiveness for lossless transformation and compression.
The BWCA is based on the Burrows-Wheeler Transform (BWT) with secondary stages taking advantage of it [19]. The BWT is a reversible transform and creates chains of symbol repetitions by lexicographically sorting the permutations of the input string or pixels. The output contains long runs of the symbols which allows for the later stages of MTF and RLE to easily compress these runs effectively [17]. Figure 2 shows the transformation of the example string 'bananana' for clarity. In the first step of BWT, the strings permutations are created by circular left shifting one element at a time. Figure 2a shows all the permutations. Figure 2b depicts the lexicographical sorting of the matrix in Figure 2a. The permutations are sorted in ascending alphabetical order where order is checked for each column of the string. For the example string, the original string position is now 5 and the last column of the sorted matrix contains the string 'nnnbaaaa' having repetitions of alphabets 'n' and 'a'.
Large size of input data will create better runs however, it will induce computation cost as memory and computations increase exponentially.
The string 'nnnbaaaa' is given to the MTF encoder stage in the reverse order to put it in ascending order. The MTF stage encodes a symbol from the list of all symbols. Its position is then moved to the front of the list so that if the same symbol is being repeated, it will get an index of "zero" from the list. Through this method, more runs of zeros are generated for repeating symbols.
As shown in Figure 2c, the string is encoded in conjunction with a list of all symbols. Currently the list is 'abn'. The position of alphabet 'a' is index number '0'. The four 'a' runs are encoded as '0000'. When the alphabet 'b' is reached, its position in the list is checked, which is 2 and the run is encoded as '00001'. the symbol 'b' is moved to the front of the list such that if it is repeated, more '0' indexes will be produced. The MTF encoder through this principle aims at increasing the runs of zeros and ones. The RLE stage then encodes these runs. RLE-0 is usually employed and other symbol runs are not encoded by RLE as the runs of zeros are usually more.
Different algorithms are presented for this RLE. Abel [17] presented the RLE-BIT and RLE-EXP algorithms while Burrows and Wheeler [19] used the Zero Run Transform (RLE-0). The proposed model uses the RLE-0 scheme. Once RLE has been done, the output is encoded by an entropy coder such as Huffman or arithmetic encoder.
An entropy coder compresses data by assigning fewer bits to frequently occurring symbols. This way symbols with elevated occurrence are decreased effectively. Both Huffman and arithmetic encoder models considers the probability of each data symbol or pixel. Huffman encoders provide better speed while arithmetic coders provide better compression. This research work has employed the arithmetic coder. Details of the arithmetic encoder are given in the Appendix A.
The BWCA is not complex in logic and offers high compression due to its main component, the Burrows-Wheeler Transform (BWT) which increases the redundancy of the data, in turn leading to higher compression. To further enhance the compression ability of the scheme, various modifications to the BWCA have been made including the works of Arnavut et al. in [24], Deorowicz et al. in [23], J. Abel in [17,18,22], Schindler in [25], Balkenhol and Shtarkov in [26], Arnavut et al. in [27] and Khan et al. in [28] to name a few.
This research work augments the compression performance of the Bi-level Burrows-Wheeler Compression Algorithm (BBWCA) devised by Khan et al. in [14]. The scheme offers higher image compression as compared to other benchmarks schemes by transforming the colour intensity levels over the dimensional space using a Reversible Colour Transform (RCT). The RCT increases the inter-pixel redundancy aiding in the compressibility of the data at the later stages of the algorithm. Figure 3 shows an overview of the BBWCA.
Prior to the input to RCT, true colour image data is DC shifted resulting in a grey-level range from −128 to +127. In comparison to the original image, the resulting image becomes more skewed aiding in much better compression during the later stages. Once the data is DC shifted; converted to YUV colour space from RGB, it is further processed during the BWT stage. To achieve better compression results Khan et al. in [14] employed the JPEG 2000 RCT [29]. In contrast to the conventional BWCA approach, BBWCA transfigures the image through row-wise and column-wise application of BWT over image data resulting in a highly correlated 2-D image. Redundancy among the pixels is enhanced by the utilization of the KMTF transform [28,30], in conjunction with RLE-0 and arithmetic encoding. By utilizing the RCT, BBWCA is able to achieve higher inter-pixel redundancy and in turn higher compression. There exist various RCTs, time and frequency based, which can be employed and explored for higher compression. The main aim of this research work is to investigate these RCTs and demonstrate their effect on the compression performance offered by the BBWCA. The following section presents a brief history and various types of RCTs designed and devised by the research community in the field of image processing.

Reversible Colour Transforms (RCTs)
Prior conversion of data from imaging domain to another can produce better compression results by the BWCA. Guo et al. utilized Discrete Wavelet Transform (DWT), a lossless wavelet transform in conjunction with Huffman encoder, and achieved higher compression. DWT for lossless compression has been utilized in methodologies proposed by Srikanth and Meher in [31] and Mozammel et al. in [32]. Telagarapu et al. in [33] employed Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) showing that the DWT outperformed the DCT in terms of lossless compression. Strutz et al. in [34] used automatic selection of colour transforms for efficient coding, either lossy or lossless. It was 2 percent more efficient in the case of lossless coding. In Strutz's earlier work in [35], various RCTs have been developed for efficient compression. By not fixing the RCT for image regions, better compression was achieved.
Colour space transformation refers to the conversion of image information from one colour domain to another. Multiple linear along with the non-linear transformations of the true colour image is utilized for the definition of a colour image in terms of luminance and colour [16,[36][37][38][39]. RCTs have been utilised to boost compression of the RGB images by increasing data redundancy in other colour spaces [39,40]. Totally reversible integer colour transform transforms the imaging data into various colour spaces [29,41] and has been utilized for lossless image compression by JPEG 2000 [29,[42][43][44].
This colour conversion helps in achieving high compression by reducing an image's entropy prior to encoding [38,39].
Researchers have proposed various RCTs in the fields including the methodologies presented in [16,29,37,38,42,[44][45][46][47]. This research focuses on the evaluation and investigation of the new and efficient RCT proposed by Starosolski in [45]. RDgDb, LDgDb, RDgEb, LDgEb, YA2UA2VA2, YA6UA6VA6, YA7UA7VA7 were listed as some of the few lossless transforms utilized and a comparative analysis has been drawn with RCTs of JPEG-2000, JPEG-XR and JPEG-LS etc. Each transform offers it advantages e.g., RDgDb has a requirement of subtraction of two integers per image pixel thus having lesser computational complexity [45] while LDgDb lossless colour transform offers a better estimation of the analog transformation in human vision system thus outperforming the YCoCg-R transform. Table 1 lists various RCTs used in image processing. Forward and Reverse RCT equations are listed as well.  14,30] mRCT Modular Arithmetic Variant of RCT [30] YCoCg-R JPEG XR Standard [43]

Experimental Setup And Results
All the simulations and experimentations of the proposed method were performed over machines equipped with an Intel Core i7 processor, Quad Core CPU running @ 2.9 GHz, 8 GB RAM, 2 GB virtual memory executing programs and applications over the Microsoft Windows operating system. Algorithms were developed in MATLAB and were evaluated using the following image datasets. Simulations were performed by applying the proposed technique to these image sets and comparing against benchmark schemes. Six different colour transforms that include RDgDb, LDgDb, LDgEb, YA2UA2VA2, YA6UA6VA6, and YA7UA7VA7 [43] were utilized. Simulation results are presented as follows.

Lossless Compression Of Colour Images
The BBWCA was tested on sets of colour images to guess the efficiency of the RCTs. Systems comprised of BMF, FLIC, STUFF IT, FP8, GRALIC, and BMF were used as benchmark schemes for the evaluation purposes.
Image compression benchmark website [48] features the afore-mentioned techniques serving as the core of the most recent compression schemes. The proposed scheme was gauged against the formerly crafted YUV RCT based BBWCA [15]. The Kodak test images [49] shown in Figure 4 were used for initial testing. All the pictures have a resolution of 768 × 512 × 3 pixels totaling 1152 KB each. Compression results for these images are listed in the tabular form in Table 2. The YA6UA6VA6 RCT based BBWCA compressed the archive of images to 4199 KB in opposition to the raw data set of 11,520 KB. Subsequently, the size of 4231 KB was achieved using GRALIC method. The YA6UA6VA6 RCT based BBWCA peaks the benchmark compression schemes as it compresses the well homogenized gray levels resulting from the RCT. Proposed scheme tested over natural images acquired from Uncompressed Colour Image Database (UCID) of the University of Loughborough as shown in Figure 5. In this case, the proposed system employed the existing distinguished top performer YA6UA6VA6 RCT for BBWCA. PNG, FP8, BMF, FLIC, GRALIC and BBWCA (YUV) were utilized as benchmark schemes.
Compression results are presented in Table 3. BBWCA surpasses the benchmark schemes by showing a significant reduction of the total archive of size 14,450 KB to a compressed size of 5519 KB with a Compression Ratio (CR) of 2.62. It is considered as a significant enhancement in the pre-existed RCT based BBWCA scheme equipped with the ability of reduction of the 7829 KB archive data with CR as 1.85.

Lossless Compression of RASTER Maps
RASTER maps are electronic maps embedded in navigation devices and comprises chiefly of redundant yet indispensable data. The data takes up a lot of storage space and being climacteric, its loss is highly pernicious. Lossless image compression is employed to decrease the data size while maintaining full quality of these images. Techniques proposed by Mao et al. in [4] and Akimov et al. in [48] prove to be pivotal in this aspect. Akimov  utilizing the YUV RCT gave higher compression results than BLiSE and different schemes used as benchmarks. The YA6UA6VA6 RCT BBWCA shows promising results in comparison to the YUV RCT based BBWCA. RASTER maps used to draw a comparative analysis of compression achieved with the benchmark schemes in comparison with the proposed BBWCA (YA6UA6VA6) algorithm is shown in Figure 6. Figure 7 demonstrates the results of the system for the respective RASTER maps.

Lossless Compression of CFA Images
CFA images undergo lossless compression to lower down the computational complexity in later stages. Lee et al.'s in [50] proposed a scheme working on the lossless compression of HDR imagery. Initially, the images are de-interlaced into individual channels followed by the weighted template matching based prediction is performed compressing the high-resolution images at a very low computational cost. Chung et al. in [51] proposed a method to use context matching accompanied with reduction techniques in the spectral domain for colour difference estimation.
CFA images utilized for the evaluation of the designed system are given in Figure 8. Bits Per Pixel (BPPs) achieved by the schemes in the discussion are represented in Figure 9. JPEG-LS, JPEG2000, LCMI, CMBP and HPCM techniques [51] are used as a benchmark scheme to draw a comparison with the proposed system. YUV RCT based BBWCA of Khan et al. in [14] is also compared with the proposed method to determine its compression efficiency and performance.
The proposed YA6UA6VA6 RCT based BBWCA performs the compression reducing the data with an average BPP of 3.35. In comparison to the YUV BBWCA, HPCM and CMBP schemes, it shows an increase of 18, 20, and 26 percent increase in the effective lossless compression.
The transform mainly offers the advantage of its ability to produce a redundant number of grey scale values lying among the U and V channels. Once the redundant data is transformed by BWT, it generates data with correlation matrix among the 2-D image which is efficiently compressed by the later stages.

Lossless Compression of 2-D EEG Data
Inspection and storage of EEG produce large data containing patient's significant information. Compression algorithms reduce the storage capacity of the data and provide effective methods for preventing overwriting due to the small capacity of EEG machines.
Antonio et al. in [52] proposed a method based on securing the derivation of EEG instead of the unprocessed data. Lossless compression was achieved by the amalgamation of various techniques including Huffman encoding and vector quantization. Dauwels et al. in [53] devised a methodology utilizing wavelets to achieve compression over multi-channel EEG. The effective compression is achieved by the utilization of wavelet-based volumetric coding, energy-based lossless compression of the resulting wavelet bands followed by coding based on the tensor. The latter scheme recorded with the best compression results.
Lin et al. in [54] proposed a multi-channel EEG compression scheme based on the Independent Component Analysis (ICA) followed by Set Partitioning in Hierarchical Trees (SPIHT). Prior to the application of ICA, the algorithm utilizes Principal Component Analysis (PCA) followed by the application of SPIHT compressing any residues adding effective compression in comparison to the independent components. Xu et al. in [55] proposed a compression method based on No List Set Partitioning in Hierarchical Trees (NLSPIHT) algorithm for 1.5 D EEG. It considers the I-D DWT instead of SPIHT based 2-D compression requiring lower computational complexity and power. Daou et al. in [56] proposed another scheme utilizing the same techniques as mentioned above i.e., DWT and SPIHT.
Experimentation was conducted utilizing the EEG data samples from the University of Bonn [57]. Figure 10 depicts the compression results achieved for the EEG data in terms of size and ratios. Five datasets labelled (A-E) constitute the data where each label presents an ailment. The proposed YA6UA6VA6 RCT based BBWCA shows a significant improvement in compression for the source signals.

Scalability to High Resolution Imagery
The proposed scheme provides a single platform for the compression of various forms of 2-D data and images including imagery acquired from large resolution sensors. 4K images, HDR images, MRI/CT image volumes are our future work directions.
Since there is too much redundancy in these datasets, one method that we are currently investigating involves applying MTF transform encoding prior to BBWCA. Figure 11 shows the amount of data that can be reduced to long runs of zeros by applying MTF transform on the pristine image before encoding it with BBWCA. Figure 11a shows the original image and its MTF encoded version is given in Figure 11b. Figure 11c shows the histogram of both images. Large number of zeros in the MTF encoded version is beneficial in data reduction prior to BWT and its successive stages.

Conclusions
The utilization of the reversible transforms in conjunction with BWCA algorithm resulted in greater compression as compared to the benchmark schemes. The YA6UA6VA6 RCT achieved the best compression results in general for various size of images and creates a single base for the compression of a variety of data including CFA, EEG and RASTER data. The proposed compression outperforms the techniques exclusively developed for 2-D EEG, RASTER map and Color Filter Array (CFA) image compression. In the future, the BBWCA is also aimed for the application of DCT and DWT to achieve lossy compression over the data as well. In comparison to the existing schemes, for CFA images the proposed system gives 3.35 Bits Per Pixel (BPPs) compared to 3.95. Similarly, for EEG data, the proposed system achieved a compression ratio of 3.12 and gave promising results for RASTER image compression.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. The Arithmetic Coder
The Huffman Coding trades a symbol with a particular code depending on the Huffman tree generated from the prior probabilities of the symbols in the message. On the other hand, the Arithmetic coding evades this idea by replacing a set of input symbols with a single floating point output number. The length of the message and its complexity leads to higher number of bits required for generating the output. The range of an arithmetic coding output, say x here, is a single number such that 0 ≤ x < 1.
x is uniquely decodable taking considering probabilities of the symbols.
This interval keeps on reducing for lengthier messages which in turn leads to the utilization of a higher numbers of bits to represent the range. The process continues for the subsequent symbols and frequently occurring symbols narrow the range lesser as compared to less frequent symbols. Figure A1 shows the encoding example for the Arithmetic encoder for better comprehension. The example of a fixed model arithmetic code from the works of Witten et al. [59], is presented below. Let suppose we have a set of alphabets i.e., [a,e,i,o,u,!] where ! is used as end of message indicator. The probabilities of these alphabets are given in Table A1. Now, a message "eaii!" is encoded as follows. The alphabet "e" is the first symbol to be encoded as it has the highest probability. The first symbol is the important part of the arithmetic encoder as it defines the highest interval which will be subsequently narrowed. As seen in Figure A1, in the first step, the message is arranged on the vertical probability interval. The range of "e" starts from 0.2 and ends at 0.5. The second alphabet in the message is "a". The range of e [0.2,0.5) needs to be narrowed. The equations below let us calculate that new range. range = upper_bound − lower_bound (A1) Range of a "symbol" is calculated then as lower_limit : lower_limit + range * "symbol"_probability (A2) So, for "a" the overall range is 0.5 − 0.2 = 0.3. The new range for symbol "a" is calculated as 0.2 : 0.2 + 0.3 × 0.2 giving us [0. 2, 0.26] Similarly, the ranges for all the symbols are calculated as shown in Table A1. Now, let's see how we can decode a message for the Arithmetic coder. Say the decoder only has the information about the last range of the message i.e., [0.23354, 0.2336). This range according to the probability given in Table A2 shows the first character has to be "e". Once the first character is decoded it imitates the probability ranges for the rest of the symbols. Initially the range was [0, 1) and after seeing "e" it became [0.2, 0.5). From here, as per the probabilities, the range is narrowed for symbol "a" giving us [0.2, 0.26). The process is repeated till all the symbols ranges are generated at the decoder end. Then from the last range, in the reverse order, the message is identified.