Application of Image Fusion in Diagnosis and Treatment of Liver Cancer

: With the accelerated development of medical imaging equipment and techniques, image fusion technology has been e ﬀ ectively applied for diagnosis, biopsy and radiofrequency ablation, especially for liver tumor. Tumor treatment relying on a single medical imaging modality might face challenges, due to the deep positioning of the lesions, operation history and the speciﬁc background conditions of the liver disease. Image fusion technology has been employed to address these challenges. Using the image fusion technology, one could obtain real-time anatomical imaging superimposed by functional images showing the same plane to facilitate the diagnosis and treatments of liver tumors. This paper presents a review of the key principles of image fusion technology, its application in tumor treatments, particularly in liver tumors, and concludes with a discussion of the limitations and prospects of the image fusion technology. / MR images ( F , H ) the remaining tumor tissue lesion could clearly be identiﬁed. Additional lesions near the liver hilus are adenomas with strong arterial contrast-agent enhancement ( I ). In the liver-speciﬁc contrast phase lesions are hypointense ( K ). Similar to the FNH, no signiﬁcant 18F-FDG-uptake is seen ( J , L ) [183].


Introduction
Medical imaging equipment has developed rapidly in the last decade, with widespread usage in clinical diagnosis and treatment. Two main imaging modes are employed that utilize different principles and equipment. The first is the anatomical imaging mode that mainly provides anatomical information with high resolution. The X-ray-based method, computed tomography (CT), which falls into the category of anatomical imaging, is the first technique developed for the noninvasive acquisition of images within the human body. CT is particularly effective for imaging tissues with large differences in density. At present, whole-body scans can be performed with the latest generation of CT systems, including multi-slice detectors that allow precise visualization, even for very small vessels. Magnetic resonance imaging (MRI) uses radio waves and magnets to generate body tissue images. Compared to CT, MRI uses nonionizing electromagnetic radiation, and appears devoid of exposure-related hazards. The technique employs radiofrequency (RF) radiation in the presence of carefully controlled magnetic fields to produce high-quality, cross-sectional images of the body in any plane. Using MRI, high spatial resolution can be effectively used to identify soft tissue within the human body [1,2].
The functional imaging mode mainly provides functional metabolic information. One such method is single photon emission computed tomography (SPECT). SPECT imaging instruments provide three-dimensional (tomographic) images of the distribution of radioactive tracer molecules introduced into the body which is generated from multiple 2D images of the body at different angles [3]. Another widely used method in this mode is positron emission tomography (PET). PET is a nuclear medicine functional imaging technique used to observe metabolic processes in tissue of organs, as an aid to disease diagnosis [4].
The main difference between SPECT and PET is the decay mechanism of the radiotracers used: while SPECT measures photons of Gamma decay from a tracer nuclide, the PET scan uses 0.511-kev and translations. This step matches the input images using their characteristics in order to facilitate the image fusions. The next stage is to find some rules to integrate multiple input images into one comprehensive image. The medical image can be fused by each pixel, feature extraction, region segmentation and marker point determination of anatomical structure or lesion condition. After fusion is complete, the operator interface often displays with original and overlapping cross-sectional images side-by-side. The fused image can help doctors to make accurate decisions for various diagnoses. For the efficient treatment of liver tumors, information on tumor size, location and number can be obtained by the fusion method accurately. Compared with ultrasound technology used previously, the development of new image fusion technology has greatly improved diagnostic accuracy [9]. For instance, image fusion guidance technology has been widely used in thermal ablation therapy in which two-dimensional Ultrasonography does not clearly show liver cancer lesions. This method uses high-contrast CT/MRI along with real-time guidance and the evaluation of ablation borders via Ultrasonography to demonstrate clear liver cancer lesions. More examples of medical image fusion applied in liver tumor diagnosis and treatment will be further discussed in Section 5.
In these procedures, image registration across modalities is important, and could highly impact the qualities of image fusion. As a result, we briefly introduce the medical image registration which mainly contains four steps: 1.
Feature extraction: The first step of image registration is to extract image features (feature descriptor), such as feature points/edges/contours/areas/structures, from input images.

2.
Feature matching: The second step of image registration is feature matching. It is used to find the correspondence between the extracted features from Step 1.

3.
Determination of geometric transformation parameters: This is the most important step in the image registration procedure. Based on the correspondence between the extracted features from Step 2, a suitable geometric transformation model is selected. Then, based on a certain measurement function, the geometric transformation parameters are determined. The commonly used geometric transformation models are: • Rigid transformation, which is mainly a transformation for rotation and translation. • Similarity transformation, which is mainly a transformation for translation, rotation and scaling. • Affine transformation, which is mainly a transformation for translation, rotation, scaling and shearing • Projective transformation, which is a combination of transformations, such as translation, rotation, scaling and shearing.

4.
Image resampling and registration: Then, with appropriate interpolation function, a floating image is mapped to the reference image's coordinate space to finish the image registration (floating and reference images are defined as the input images to be registered).
Among the above steps, the optimization of geometric transformation parameters in Step 3 is crucial for the quality of image registration. A function is often used to measure the similarity between the floating and reference images. Common functions are root mean square (rms), correlation, normalized cross-correlation, gradient cross-correlation, gradient difference, image entropy, mutual information, normalized mutual information, etc. Then, this function is maximized by optimizing the transformation parameters. This converted the problem to multiparametric and multipeak optimization problems. Traditional optimization methods include the gradient descent method, conjugate gradient method and genetic algorithm, etc.
Recently, deep learning has become more and more popular for image fusions and registration [10]. One approach of deep learning used for image registration is to drive iterative optimization using deep learning.
Appl. Sci. 2020, 10, 1171 5 of 29 In such a method, instead of using a traditional feature descriptor, we train deep learning to learn the feature descriptor to guide the fusion of the image. For instance, a deep learning algorithm, the convolutional autoencoder, is often used to extracted features from the images for the registration [11]. Then by optimization methods such as gradient descent, the measurement function is maximized and the image is registered. In the medical field, this kind of method has been used in the registration and fusion of CT/MR [12,13] and MR/US [14][15][16]. However, such iterative method often requires a long time, and has a difficulty to achieve efficient real-time registration.
The second approach is to pretrain the deep learning network to directly obtain the transformation parameters for the image registration. Such a deep learning network can be further divided into two categories: supervised learning and unsupervised learning. Supervised learning has been applied to register/fuse CT/US [17] and MR/US [18][19][20][21]. The training of the supervised learning network requires the ground truth. Two kinds of ground truth are generally used: ground truth data from the traditional registration method, as shown in above (i.e., step 1 to step 4) [22,23] and simulated ground truth data. For example, Eppenhof and Pluim generated image pairs using random transformations and obtained ground truth data for the image registration of CT images [24]. Similar ground truth generation methods are used for the fusion of MR images [25].
For unsupervised learning, the deep learning network is trained without the need of ground truth data. The most widely used network is Voxelmorph. This framework, which was proposed by Balakrishnan et al. [26], trains the network using a metric that quantifies image which are similar to input images. In the medical field, unsupervised learning has been used for the registration/fusion of CT/MR [27][28][29] and US/MRI [30].

Imaging Fusion Algorithms
There are three levels of image fusion: the pixel level, feature level and decision level. Pixel level fusion is the most basic image fusion method, which directly acts on the pixels in the image and does not need to extract features, but requires strict image registration. Feature level fusion requires extracting features in the image, such as size, edge, shape, texture information and other details. Decision-level fusion is able to extract, identify and classify valuable objects in the fusion image, and perform fusion at a higher level. For medical image fusion, both the pixel level and feature level are usually applied. Due to the particularity of liver imaging, it is necessary to combine several methods to achieve image fusion. The following are the commonly used image fusion methods.

Arithmetic Combination
The fastest fusion method is the arithmetic combination. Simple weighted fusion, also known as 'Weighted Averaging', is of the most simple and straightforward methods in arithmetic combination. The principle of the weighted average image fusion algorithm is to take pixel values of the original images directly and perform weighted averaging to obtain the pixel value of the fused image. Similarly, in the Simple Maximum/Minimum Method, the resultant fused image is obtained by selecting the maximum/minimum intensity of corresponding pixels from the input images. Arithmetic combination has the advantages of easy implementation and fast calculation speed. However, detailed information within the image cannot be captured, image contrast is reduced, and the edge of the image is altered with this technique, resulting in unsatisfactory fusion effects in most applications. Furthermore, due to this method requiring strict registration in advance, the fusion effect of noisy image is no ideal.

IHS
A color image can be represented by a three-primary red-green-blue (RGB) color system. However, this system does not conform to the human understanding of color. Another method of understanding color utilizes hue H, saturation S and intensity I. The hue H is determined by the dominant wavelength of the spectrum, saturation S characterizes the portion of the dominant wavelength of the spectrum and intensity I represents the brightness of the spectrum. In RGB space, three-spectral coordinate (R, G and Appl. Sci. 2020, 10, 1171 6 of 29 B) coupling is strong, and changes in any component of the spectral information will alter the entire spectrum. As a result, image processing in RGB space is difficult. On the other hand, in IHS space, main spectral information is reflected in hue and saturation, while changes in intensity have a very limited effect on the spectral information, and are easy to process. The main objective in processing high resolution and multispectral images is to add details of high-resolution images while retaining spectral information. Consequently, it is easier to conduct image fusion in IHS space. Examples of transformation from RGB values to IHS values are shown in Equations (1)-(3). In the medical field, IHS is useful for the fusion of Pseudocolor image processing and fusion. Pseudocolor image converts the grayscale values to the RGB values. Its aim is to better present the details on the medical image in order to obtain a clear visualization of the images. A good example of such an image is the Pseudocolor-PET images, which images are widely used for liver lesion detection [31,32]. The fusion of Pseudocolor-PET and MRI medical image, which utilize fusion methods, such as IHS-PCA [31], IHS-wavelet transformation-based method [33,34] and IHS-salient features extraction [35], has shown to be useful for both human visualization and the objective evaluation of lesions.
here, ν 1 and ν 2 are the translation values.

Principal Component Analysis
Principal component analysis (PCA) is a technique for dimensionality reduction for a large dataset. PCA is mathematically defined as an orthogonal linear transformation method that transforms data to new coordinate systems, such that the greatest variance by a scalar projection of data lies on the first coordinate, and so on [36]. In this manner, PCA [37][38][39] helps to reduce the noise and redundant information and highlight the key feature in the dataset. PCA is widely used in various applications, including image compression, image enhancement, image coding, random noise signal removal and image rotation. For image fusion, PCA can extract the key features of the images, which highlights the similarities and differences between the input images, while reducing the noise level at the same time. Then, based on these key features, we can find the optimal weights for transferring the input image information to fused images. Here, we present an example from Miao et al. [40] We define elements of matrices I A and I B representing the gray level or color of each pixel in the input images A and B, respectively. First, the wavelet-based method is used to decompose the input images to low and high frequency components. Secondly, by using PCA, the eigenvectors of image A and B could be obtained as (X A , X B ) T . Thirdly, the weight values of image A and B, w A and w B , for the low frequency part, are obtained as: Fourthly, the low frequency fusion is completed as: At last, the high frequency fusion is achieved by maximum weight method: Then, the fused image is obtained by combination of the new high frequency and low frequency images. In the medical field, PCA has been applied in the fusion of MRI, CT, PET and US [41][42][43]. PCA can also combine with decomposition methods, such as IHS, the pyramid method, Discrete wavelet transform [44], the Curvelet transform, Contourlet transform [45] and Non-Subsampled Contourlet transform [31,[46][47][48][49][50][51][52][53][54][55][56].

Pyramid Method
The principle of the pyramid method is to decompose individual images within the fusion into a multiscale pyramid image sequence (i.e., reduce the resolution of the image in a pyramid sequence shown in Figure 1). The low-resolution image is in the upper layer and high-resolution image in the lower layer, with the upper layer image being 1/4 of the previous layer image size. The pyramid of all the images is fused to the corresponding layer using a specific rule. The synthetic pyramid obtained is reconstructed according to the inverse process of pyramid generation. Based on this theory, multiple pyramid fusion algorithms (e.g., Gaussian pyramid and Laplacian Pyramid) have been proposed with different pyramid decomposition structures, fusion rules and reconstruction methods. In the medical field, the pyramid method has been applied in fusing multimodal medical images, such as MRI/CT, PET/MRI and SPECT/MRI [57][58][59].

Pyramid Method
The principle of the pyramid method is to decompose individual images within the fusion into a multiscale pyramid image sequence (i.e., reduce the resolution of the image in a pyramid sequence shown in Figure 1). The low-resolution image is in the upper layer and high-resolution image in the lower layer, with the upper layer image being 1/4 of the previous layer image size. The pyramid of all the images is fused to the corresponding layer using a specific rule. The synthetic pyramid obtained is reconstructed according to the inverse process of pyramid generation. Based on this theory, multiple pyramid fusion algorithms (e.g., Gaussian pyramid and Laplacian Pyramid) have been proposed with different pyramid decomposition structures, fusion rules and reconstruction methods. In the medical field, the pyramid method has been applied in fusing multimodal medical images, such as MRI/CT, PET/MRI and SPECT/MRI [57][58][59].

Wavelet Transformation Based Methods
In the field of image fusion, the wavelet transform-based method, which was initially developed for signal processing [60], is widely used as high-pass filtering. The detailed image is the result of high contrast corresponding to high values in the frequency domain. The detail of image is the result of high contrast, which corresponds to high values in the frequency domain. By DWT, we can detect these details in the image, using functions that are localized in both space and frequency. For image fusion, these detailed pieces of information from input images can then be extracted and fused into a new image using certain fusion rules, such as maximum selection, weighted average and PCA. The low frequency part of the images can then be fused in a similar way.

Wavelet Transformation Based Methods
In the field of image fusion, the wavelet transform-based method, which was initially developed for signal processing [60], is widely used as high-pass filtering. The detailed image is the result of high contrast corresponding to high values in the frequency domain. The detail of image is the result of high contrast, which corresponds to high values in the frequency domain. By DWT, we can detect these details in the image, using functions that are localized in both space and frequency. For image fusion, these detailed pieces of information from input images can then be extracted and fused into a new image using certain fusion rules, such as maximum selection, weighted average and PCA. The low frequency part of the images can then be fused in a similar way.
DWT has been widely used in CT/MRI and MRI/PET medical image fusion [61][62][63][64][65][66][67][68]. However, DWT is known to be sensitive to the translation/shift of input signals, and therefore, translation among signals may exert a negative impact on effectiveness. Contourlet transform is a two-dimensional image representation based on wavelet multiscale analysis known as Pyramidal Directional Filter Bank (PDFB) [69]. Compared with DWT, its basis functions are characterized by multiscale features, directionality, anisotropy and locality.
Such basis functions effectively represent edge and curve singularity, and allow the efficient extraction of geometric and texture information in the image to obtain a better fusion effect. The multiscale geometric analytical tool used in Contourlet transform demonstrates the excellent spatial and frequency domain localization properties of wavelet analysis, as well as the bonus of multidirectional and multiscale characteristics, good anisotropy and suitability to describe the geometric characteristics of an image [70,71]. However, for the Contourlet transform, shift invariance is lost as a result of its subsampling scheme for the multiscale partition. To overcome this difficulty, researchers have introduced the improved version of Contourlet transform, the Non-Subsampled Contourlet transform (NSCT) [72]. In the medical field, the Contourlet transform and the NSCT have also been to fuse MRI/PET and CT/MRI [70,73,74]. Another deficit of wavelet transformation is the lack of ability to represent edges and geometric structures of the image. Curvelet transform [75][76][77], a multiresolution and multi-direction pyramid that can preserve geometric regularity along edges [78], has been proposed to overcome this difficulty. Ali et al. [79] proposed a Curvelet transform (CVT)-based method for the combination of CT and MRI. However, as highlighted in other studies [80], CVT is not built directly in the discrete domain, and thus does not provide a multiresolution representation of geometry. Shearlet transform (ST) and non-subsampled Shearlet transform (NSCT) [81][82][83][84] are other sets of state-of-the-art tools with optimal use in sparse directional image representation. Based on composite wavelets, an optimal approximation of 2D functions is obtained. Compared to the Contourlet method, these methods have the advantage of directional selectivity and computational efficiency. Due to no restrictions on the number of directions for shearing, ST is used for the fusion of 2D and 3D medical images [82,83] and NSST has application in CT/MRI image fusion [85,86].

Pulse-Coupled Neural Network
For image fusion, a pulse-coupled neural network (PCNN) is often used as a feature extraction method [87,88]. As shown in Figure 2, the PCNN adopts a single layer, two-dimensional and laterally-connected neural network. The neurons are connected with the pixels in the input images directly. So, the size of PCNN is equal to the size of images. Each of the neurons is also connected with neighboring neurons, as shown in Figure 2. Image feature extraction using PCNN is an iteration process. At each iteration, each neuron receives the corresponding pixel's color intensity as an external stimulus. The outputs of its neighboring neurons from the previous iteration are treated as an internal stimulus and are combined with the external stimulus. When the total stimulus exceeds a threshold, the neuron will pulse (or fire) to have an output intensity equal to one at the corresponding location in the output image. The threshold at the neuron will significantly increase its value after firing and it decays exponentially until the neuron fires again.
Appl. Sci. 2020, 10, 1171 8 of 28 among signals may exert a negative impact on effectiveness. Contourlet transform is a twodimensional image representation based on wavelet multiscale analysis known as Pyramidal Directional Filter Bank (PDFB) [69]. Compared with DWT, its basis functions are characterized by multiscale features, directionality, anisotropy and locality. Such basis functions effectively represent edge and curve singularity, and allow the efficient extraction of geometric and texture information in the image to obtain a better fusion effect. The multiscale geometric analytical tool used in Contourlet transform demonstrates the excellent spatial and frequency domain localization properties of wavelet analysis, as well as the bonus of multidirectional and multiscale characteristics, good anisotropy and suitability to describe the geometric characteristics of an image [70,71]. However, for the Contourlet transform, shift invariance is lost as a result of its subsampling scheme for the multiscale partition. To overcome this difficulty, researchers have introduced the improved version of Contourlet transform, the Non-Subsampled Contourlet transform (NSCT) [72]. In the medical field, the Contourlet transform and the NSCT have also been to fuse MRI/PET and CT/MRI [70,73,74]. Another deficit of wavelet transformation is the lack of ability to represent edges and geometric structures of the image. Curvelet transform [75][76][77], a multiresolution and multi-direction pyramid that can preserve geometric regularity along edges [78], has been proposed to overcome this difficulty. Ali et al. [79] proposed a Curvelet transform (CVT)-based method for the combination of CT and MRI. However, as highlighted in other studies [80], CVT is not built directly in the discrete domain, and thus does not provide a multiresolution representation of geometry. Shearlet transform (ST) and non-subsampled Shearlet transform (NSCT) [81][82][83][84] are other sets of state-of-the-art tools with optimal use in sparse directional image representation. Based on composite wavelets, an optimal approximation of 2D functions is obtained. Compared to the Contourlet method, these methods have the advantage of directional selectivity and computational efficiency. Due to no restrictions on the number of directions for shearing, ST is used for the fusion of 2D and 3D medical images [82,83] and NSST has application in CT/MRI image fusion [85,86].

Pulse-Coupled Neural Network
For image fusion, a pulse-coupled neural network (PCNN) is often used as a feature extraction method [87,88]. As shown in Figure 2, the PCNN adopts a single layer, two-dimensional and laterallyconnected neural network. The neurons are connected with the pixels in the input images directly. So, the size of PCNN is equal to the size of images. Each of the neurons is also connected with neighboring neurons, as shown in Figure 2. Image feature extraction using PCNN is an iteration process. At each iteration, each neuron receives the corresponding pixel's color intensity as an external stimulus. The outputs of its neighboring neurons from the previous iteration are treated as an internal stimulus and are combined with the external stimulus. When the total stimulus exceeds a threshold, the neuron will pulse (or fire) to have an output intensity equal to one at the corresponding location in the output image. The threshold at the neuron will significantly increase its value after firing and it decays exponentially until the neuron fires again.  Through iterative computation, PCNN neurons produce a series of pulse outputs, which contain different features (e.g., high frequency features, low frequency features or edges) of the input images, and can be used for various image processing applications. Multichannel PCNN is proposed to process multiple feature images with a single/multiple PCNN to fuse these images [90].

Fuzzy Logic Based Methods
Fuzzy Logic is a multivalued logic that allows intermediate values to be defined between conventional evaluations, such as true/false, yes/no, high/low. Fuzzy systems refer to those that are directly related to fuzzy logic. These systems are mainly composed of fuzzification, knowledge bases, fuzzy inference engines and defuzzification. Fuzzification is the conversion of the input of a system to fuzzy sets with some degree of membership anywhere within the interval using a membership function. A membership function is a curve that defines how each point in the input space is mapped to a membership value. The knowledge base stores all information on the fuzzy controller, including knowledge and required control objectives in the specific application field. These core factors determine the performance of the fuzzy controller. The function of the fuzzy inference engine is to convert the fuzzy "if-then" rule into a type of mapping according to the fuzzy logic rule. Defuzzification is the conversion of the fuzzy output quantity into clear output.
Fuzzy logic is also applicable to image fusion. In this process, local features of the image are extracted and combined with fuzzy logic to compute weights for each pixel [115,116]. The fuzzy logic-based fusion rule is often used to cope with blurry image fusion. As a fusion rule, it could be further combined with DWT [117], NSCT [35,118,119] and NSST [120] for medical imaging. In this application, DWT/NSCT/NSST was performed on source images to obtain high-and low-frequency sub-bands. Next, a logic-based fusion rule was applied for the fusion of high [35,117,118] or low [120] sub-bands, and for the enhancing of the global contrast of the image [119]. The Fuzzy logic algorithm could overcome defects of losing edge information and color distortion in DWT/NSCT/NSST to improve image contrast. Neuro-fuzzy combines artificial neural networks with fuzzy logic to generate a resulting hybrid intelligent system. The humanlike reasoning style fuzzy system is combined with the learning procedure of artificial neural networks. This approach utilizes an artificial neural network to train the parameters of the membership function, and has been used in the fusion of MRI/CT images [121,122]. Similar to fuzzy logic fusion, neuro-fuzzy logic can be combined with WT, Contourlet transform [123], NSST [124] and NSCT [125] to optimize fusion performance.

Sparse Representation and Compressive Sensing Based Methods
Recently, the sparse representation of signals has become a popular topic of research. The method assumes the input image (usually expressed as a column vector) can be represented by a linear combination of several elements (a series of column vectors), which is referred to as atoms. The atoms compose a dictionary. In this technique, the most important issue is the dictionary choice. Two methods are usually employed: (1) an analytical dictionary built by selecting a specified transformation matrix via Fourier, wavelet, Curvelet and Gabor transform, and (2) dictionary learning built based on data training. The second method effectively achieves higher accuracy in extracting complex image features, a better space representation of various features of images and good adaptability. The common procedures in this category include K-means generalized SVD (K-SVD) [126][127][128][129][130], PCA [131,132], online dictionary learning, optimal directions and adaptive sparse representation [133]. However, such multidimensional signal/image processing usually involves a large amount of data. According to the compressive sensing theory [134], the image can be compressed with a few random projections if the image is sparse in a certain transform domain and can be sparsely represented [135][136][137][138].
Experimental results show that the image fusion using compressive sensing could preserve the rich texture information of the input images while reducing the amount of data required and the complexity of the algorithm processing [135,136]. During training, the dictionary for the sparse representation, the test sample (input images), can be represented as a column vector. Similarly, we create a dictionary matrix of a column vector which represents the training samples (atoms). Compressive sensing, using random projection, can be applied to reduce the dimensions of both the test vector and the dictionary matrix. Then, representation coefficients are obtained using sparse coding techniques, such as orthogonal matching pursuit, simultaneous orthogonal matching pursuit, a joint sparse representation model, approximate sparse representation with multi-selection strategy and convolutional sparse representation. At last, the sparse representation coefficients from the input images are fused by a certain fusion rule. Finally, the fused image is reconstructed by fused sparse representation coefficients. For instance, PET/CT/MRI images are fused using the K-SVD-based learning dictionary and Orthogonal Matching Pursuit (OMP) algorithm [139]. Similarly, Sparse representation-based methods have been applied for fusions of CT/MRI and MR images [138,140,141].

Edge-Preserving Based Methods
The edge-preserving filter emerged is an effective tool for image processing applications. The overall grayscale of traditional image smoothing filtering tends to be consistent in the neighborhood to achieve a smoothing effect. It is useful for the image with the similar pixel values (grayscale). However, this assumption does not hold at the edge of the image, which contains key information, since in this region, the grayscale tends to vary significantly with space coordinates. These grayscale variations provide meaningful image information. Thus, in many applications, weakening or filtering of the edges of the image during the filtering process is not desirable. Bilateral filters [142,143] can solve the problem of low-pass filtering. A grayscale difference weighting value is introduced into the original spatial low-pass filter based on the original spatial filtering weight. The grayscale phase and spatial difference distances have similar local characteristics. This suggests that when a point differs greatly from the center point in grayscale, it is considered that the point is distant from that center point. Guided filter is a similar edge-preserving smoothing filter that applies an optimal local linear approximation to achieve the edge-preserving goal. In this method, the guiding filter guides the filter with a guiding graph G, while the filter window radius, r, and the smoothing intensity, ε, are adjustable parameters. The linear transformation of the model ensures that the appearance of the output edge only depends on the edge of the guiding graph G. After the image is smoothed by edge preservation, the large edge structure of the image is preserved, and small fluctuations corresponding to noise are smoothed out. The above characteristics of bilateral and guided filters can optimize fusion weight to ensure that the fusion is more smoothly connected and visually natural. These features are extremely useful in medical image fusion applications, such as fusion of CT and MR images [142,143].

Deep Learning(DL) Methods
The DL algorithm has strong ability of feature extraction and data representation, and has made advanced achievements in medical image processing. The application of DL techniques in the field of image fusion has emerged as an active topic in the last three years.

Convolutional Neural Networks
The popular deep learning model, convolutional neural networks, can provide some new way in image fusion. Convolutional Neural Networks (CNNs) are able to extract the most important features from a large number of samples. The CNN uses a system much like a multilayer perceptron that has been designed to reduce processing requirements. CNNs consist of an input layer, an output layer and a hidden layer that includes multiple convolutional layers, pooling layers, fully connected layers and normalization layers. A convolutional layer defines multiple filters as a window and subsequently scans the entire image through this window.
It can output many feature mappings after training. The output can provide a multiscale and multiangle feature, it can also provide the location information of the feature. The advantages of feature extraction by CNN can be fully utilized in pixel or feature level image fusion. Another operation in CNNs is spatial pooling (max-pooling, min-pooling), which can bring some desirable invariances, including translation, rotation and scale into the model to a certain extent. Fully connected layers act as the role of a classifier. CNN overcomes the difficulty on manually designing complicated activity level measurement and fusion rules [144]. The activity level measurement and fusion rule can be jointly generated via training a CNN model. The feasibility of CNNs used for medical image fusion have already been proposed [145]. CNN can decompose the original images to high frequency and low frequency images [146], and select the rule of regional matching to fuse the two high frequency and low frequency images to get the final fusion images. Kumar et al. [147] developed a supervised CNN to learn to merge the data from PET-CT images of lung cancer. CNN has also been applied to fuse medical images MRI/CT, MRI/SPECT, multiparametric MR images [148] and PET/MRI [149]. CNN can also be combined with a wavelet transform for the fusion of CT and MR images [150]. In this method, the wavelet transform coefficients are first obtained by decomposing the input images. The next step is to use the trained CNN model to improve the high frequency coefficient's resolution. A similar procedure is applied to combine CNN with NSST to merge the CT and MR images [151].

Convolutional Sparse Representation
Sparse representation has been widely used in various image fusion. Due to the modeling burden and computational cost, traditional sparse representation has always been performed on local image patches rather than on the entire image. The concept of convolutional sparse coding (CSC) originates from the deconvolutional networks proposed by Zeiler et al. [152]. Its fundamental principle is to get an image's convolutional decomposition with a sparsity constraint. As an image representation approach, CSC is also termed as convolutional sparse representation (CSR). On the contrast to the conventional sparse representation, the sparse representation of an entire image can be computed in the CSR model. In this CSR model, the obtained representation is single-valued and optimized over the entire image [153,154]. Liu et al. [153] introduced the CSR into the field of medical image fusion on MRI/CT. Qiu C et al. proposed a novel fusion method based on convolutional sparse representation (CSR) to fuse the mis-registered GFP and phase contrast images in biomedical image fusion [155,156].

Stacked Autoencoders
A standard stacked autoencoder (SAE) is formed by stacking multiple autoencoders. The autoencoder can be learned by pretraining each layer before its successor using a back-propagation algorithm. At each layer, an autoencoder is used to obtain a set of features by jointly using an encoder and a decoder [157]. To prevent learning a trivial solution, stacked sparse autoencoders and stacked denoising autoencoders [158] have been applied to improve the SAEs' methods. SAE-based DL models have been applied to image fusion for multimodal medical image feature extraction. These extracted features can be used to design optimal fusion rules [159] and to obtain better fusion images. In this method, a multitask loss function related to image fusion quality is used to train the network.
In summary, the clinical applications of the fusion methods are shown in Table 1.

Image Fusion Indicators
Several image fusion indicators can demonstrate fusion quality. One is the fused image assessment. The other is the fused image metrics. We can obtain the assessment results by the subjective ratings, computational metrics and objective human tasks.
A number of image quality metrics [160] have been proposed, including mean square error (MSE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), mean absolute error (MAE), quality index, mutual information (MI), the Petrovic and Xydeas metric and Piella's Quality Index [161].

Ultrasonography/MRI, Ultrasonography/CT and Ultrasonography /PET-CT
Ultrasonography, a traditional, practical and convenient imaging technology, presents major advantages in the diagnosis of liver cancer. The recently developed ultrasound-based fusion imaging technology plays an important role in early diagnosis and ensuring minimally invasive procedures [170]. Several liver tumor types exist, including benign and primary liver malignant tumors and metastatic liver cancer. Benign liver lesions include hemangioma, focal nodular hyperplasia (FNH) and hepatic adenoma. Primary liver malignancies originate from hepatocytes, intrahepatic bile duct epithelial cells, endothelial cells and connective tissue. A number of tumor types, such as FNH and hepatic adenoma, have similar manifestations in enhanced CT or dynamically-enhanced MRI, and are therefore often difficult to distinguish. As the concept of fusion imaging technology was gradually implemented in clinical practice in the 1990s, ultrasonography was initially applied to the field of image fusion to identify the types of liver tumor [171].
CEUS/CT and CEUS/MRI fusion images were successfully generated and employed to dynamically observe blood flow in lesions and the blood perfusion of tumors, facilitating the diagnosis of different types of intrahepatic lesions, and in particular, minor lesions. CEUS/CT and MRI fusion imaging clearly achieves higher diagnostic efficacy than CEUS, CT or MRI alone. At present, image fusion technology based on CEUS is mainly applied for (1) diagnosis and treatment of small liver cancer, (2) the evaluation of minimally invasive treatment methods, such as liver tumor TACE and RFA, and (3) the early diagnosis and treatment of new or recurrent liver cancers or liver metastases after surgery.

Utility of CEUS/MRI in Diagnosis and Treatment of Small Liver Cancer
The natural course of subclinical stage liver cancer is at least two years. Through the study of subclinical stage liver cancer, the 5-year survival rate of small liver cancer is markedly higher than that of advanced large liver cancer. However, the clinical diagnosis rate of small liver cancer is a big problem for clinicians. In addition to screening for AFP indicators in blood samples, effective imaging examination methods urgently require application to optimize clinical practice.
Contrast-enhanced ultrasound has a high diagnostic efficacy for liver lesions, especially small lesions less than 1 cm [162]. However, Ultrasonography presents inherent limitations. First, researchers with significant clinical experience are essential for effective implementation. Second, Ultrasonography lacks high resolution, and is unable to provide information on spatial hierarchical relationships. As mentioned above, MRI images are characterized by high soft tissue resolution and multiple signals, especially for intrahepatic vessel imaging. At the same time, micro-cancerous nodules, precancerous nodules and vascular cancer thrombus display different signals on MRI sequences. Introduction of the liver-specific contrast agent, Gd-EOB-DTPA (i.e., Gadoxetate Disodium-ethoxybenzyl-diethylenetriamine pentaacetic acid), has greatly improved the image quality of enhanced MRI scans, clearly revealing the boundaries of liver cancer and micro-lesions [172].
Multistage image fusion technology of contrast-enhanced ultrasound and MRI is reported to improve the accuracy of the diagnosis of small liver cancer. Originally, conventional ultrasound and MRI were used to conduct an image fusion of the axial profile of the liver. Furthermore, upon combination of contrast-enhanced ultrasonography to observe the blood flow direction of liver lesions, data from the fusion image of contrast-enhanced ultrasound (CEUS) and magnetic resonance imaging (MRI) allowed the determination of the three-dimensional parameter index of tumor lesions in liver (size, spatial location, blood supply arteries, drainage area), resulting in increased detection rates of suspicious small liver cancer. Moreover, the main blood supply source of liver cancer, portal vein tumor thrombus and the specific hepatic arterioportal fistulas caused by tumor pathological factors were clearly displayed.

Utility of CEUS/MRI in Radiofrequency Ablation of Liver Cancer
Radiofrequency ablation of liver has been widely applied in the radical treatment of small liver and multinodular liver cancers due to multiple advantages of safety, simplicity and minimal invasion. The 5-year survival rate following radiofrequency ablation of small liver cancer is reported to be close to that of surgical resection [173]. Based on the advantages of CEUS/CT fusion imaging technology, local ablation of liver tumors has gradually become an important supplementary adjunct to the surgical treatment of primary liver cancer and colorectal cancer with liver metastases.
CEUS/CT fusion imaging has many advantages over conventional ultrasound-guided treatment for liver lesions that are difficult to ablate. Conventional Ultrasonography is unsuitable in the following situations: (1) the target lesion for ablation shows a similar echo signal to surrounding sclerosing nodules of liver tissue, which are difficult to identify owing to low spatial resolution, (2) some target lesions cannot be clearly displayed due to the interference of diaphragm movement or close proximity to the diaphragm, colon and gas movement, and (3) after repeated TACE or RFA treatment, the local echo of advanced liver tumor lesions is mixed, the lesion boundary is blurred, and it is difficult to identify the initial and recurrent lesions. CEUS/CT fusion imaging technology solves the above issues by not only taking into account the advantages of the real-time and continuous dynamic observation of the ablation effect, but also combining the characteristics of high-resolution-enhanced CT localization of lesions. Therefore, accurate needle placement for lesions is achieved under difficult conditions, and the flexibility of Ultrasonography is used to adjust the direction of intraoperative needle insertion to pinpoint the locations of difficult lesions. Recent reports indicate that the success rate of fusion imaging-guided radiofrequency ablation technology has increased to 93% [8]. CEUS/CT fusion imaging additionally allows multiple ablation plans for complex liver lesions based on the characteristics of enhanced CT three-dimensional imaging, the reasonable placement of needles and the needling path, along with the formulation of other specific plans to reduce the occurrence of accidental cases of the mis-penetration of the anatomical structure and vital vessels around the lesion, such as the diaphragm, colon and small intestine, thus avoiding the incidence of pneumothorax, hydrothorax, colon fistula and intestinal fistula after radiofrequency ablation. In view of the above advantages, CEUS/CT fusion imaging technology facilitates precise liver radiofrequency ablation treatment, which has a high clinical value.

Utility of CEUS/PET-CT in Transarterial Chemoembolization Treatment of Liver Cancer
Transarterial Chemoembolization (TACE) has been employed for the treatment of liver tumors since the 1980s. With increasing information on the mechanisms underlying liver cancer progression, clinicians are gradually realizing that surgical resection is inadequate to treat cases with a background of liver cirrhosis and the biological characteristics of polycentric or multiple liver cancer, which are responsible for the high recurrence of liver cancer after surgery [174]. In addition, pathologists have confirmed that once the size of the hepatocellular carcinoma exceeds 5 cm, the incidence of tumor lesions invading the portal and hepatic vein branches to form vascular thrombus vessels is greatly increased, which is the basis of intrahepatic dissemination and distant hematogenous metastasis. TACE therapy for liver cancer has been gradually applied for clinical treatment with minimal trauma and significant efficacy. However, clinicians are yet to establish the type of liver tumor suitable for TACE therapy, the optimal means to evaluate the efficacy of TACE therapy and the specific indicators that should be evaluated before intervention. According to clinical practice, before interventional embolization for liver cancer, the scope of lesions and the number of sub-lesions should be determined, considering the clinical difficulty of the TACE-mediated control of the intrahepatic dissemination of tumors. Methods of application of existing imaging technologies to evaluate the residual tumor range after TACE, the specific times of new rounds of TACE treatment, and outcomes of tumor necrosis after treatment, are of clinical significance.
TACE treatment is based on vascular embolization, and its effect depends on efficient blood flow into the tumor. Although enhanced CT or MRI is currently accepted as the gold standard for the imaging diagnosis of liver cancer, information provided by enhanced CT or MRI is mostly static, which is unable to help clear tissue perfusion of the tumor-bearing liver segment. Detailed images of the blood vessels furnishing liver cancer could not be clearly defined in previous analyses, and it was therefore impossible to assess whether the route of TACE into the tumor-bearing liver segment was effective, and the level of tumor necrosis after TACE. CEUS can be used to display micro-perfusion in liver tissues in real time, but its success is highly dependent on the experience of the operator, with interference from various objective factors. First, reactive congestion of liver tissue around the lesion in the early stages after interventional embolization is reported to affect measurement of the range of residual tumor lesions [163]. Second, contrast-enhanced images of the focal arterial stage were not evident when TACE was applied to intrahepatic cholangiocarcinoma with blood supply deficiency, and the detection rate of residual lesions was low. Additionally, contrast enhancement and regression were rapid in the arterial phase of tumor lesions with a rich blood supply, presenting the imaging characteristics of "fast in and fast out" [162]. CEUS may overlook residual lesions in different section scans. CEUS/PET-CT fusion imaging technology integrates the advantages of CEUS and PET-CT or enhanced PET-CT to display iodine-oil deposition, the even distribution of the embolization agent and the area of tumor necrosis in liver cancer. Therefore, the technique allows not only accurate estimation of the scope of the tumor lesion necrosis area after TACE, but also the recognition of blood flow signals in the necrotic area and a timely detection of residual lesions, which is of significant value in improving the efficacy of TACE, reducing recurrence, and ultimately improving the survival rate and the quality of life of patients. Due to the speed requirement of real-time CUES and CT/MRI fusion, arithmetic combination is usually used in the above clinical applications. Here, we list one example to show the influence of image fusion of CUES and CT/MRI images in the real clinical application. The first example is from Xu et al. [164].

Detection of Intrahepatic Cholangiocarcinoma
Intrahepatic cholangiocarcinoma (ICC) is the second most common hepatic malignant tumor type after hepatocellular carcinoma, which is characteristically associated with invasive growth, the occurrence of satellite foci and intrahepatic metastasis [162]. Preoperative imaging findings of intrahepatic cholangiocarcinoma foci and their sub-foci and accurate delineation of the boundaries of tumor lesions provide the basis for good surgical results. To improve the long-term survival rate of patients with bile duct cell carcinoma, the elucidation of the spatial anatomical relationships of bile duct cell carcinoma foci with intrahepatic portal vein and hepatic venous systems, and an improvement of the R0 resection rate, are necessary steps.
Currently, multistage-enhanced CT and dynamic-enhanced MRI are commonly used in clinical practice for the detection of intrahepatic cholangiocarcinoma. These two technologies have the following disadvantages: the diagnostic rate for small tumor lesions <1 cm is not high, and an accurate display of tumor boundaries when the lesion differentiation degree is low and the capsule is incomplete, is difficult. The liver-specific contrast agent Gd-EOB-DTPA-enhanced MRI scan has the advantage of clearly displaying tumor lesion boundaries and intrahepatic microscopic lesions. However, due to the length of time required for MRI, images of intrahepatic vessels are likely to

Detection of Intrahepatic Cholangiocarcinoma
Intrahepatic cholangiocarcinoma (ICC) is the second most common hepatic malignant tumor type after hepatocellular carcinoma, which is characteristically associated with invasive growth, the occurrence of satellite foci and intrahepatic metastasis [162]. Preoperative imaging findings of intrahepatic cholangiocarcinoma foci and their sub-foci and accurate delineation of the boundaries of tumor lesions provide the basis for good surgical results. To improve the long-term survival rate of patients with bile duct cell carcinoma, the elucidation of the spatial anatomical relationships of bile duct cell carcinoma foci with intrahepatic portal vein and hepatic venous systems, and an improvement of the R0 resection rate, are necessary steps.
Currently, multistage-enhanced CT and dynamic-enhanced MRI are commonly used in clinical practice for the detection of intrahepatic cholangiocarcinoma. These two technologies have the following disadvantages: the diagnostic rate for small tumor lesions <1 cm is not high, and an accurate display of tumor boundaries when the lesion differentiation degree is low and the capsule is incomplete, is difficult. The liver-specific contrast agent Gd-EOB-DTPA-enhanced MRI scan has the advantage of clearly displaying tumor lesion boundaries and intrahepatic microscopic lesions. However, due to the length of time required for MRI, images of intrahepatic vessels are likely to contain artifacts due to respiratory non-coordination [175]. With an extensive clinical application of medical image fusion technology, researchers have circumvented the disadvantages of the above techniques via the preoperative fusion of enhanced CT/MRI imaging in ICC. Fusion imaging allowed the determination of whether or not the lesion invades important hepatic vessels as well as establishment of the anatomical relationship between cholangiocarcinoma and satellite sub-focal lesions.
In other words, the technology facilitated the quantitative evaluation of whether main and sub-focal lesions were in the same tumor-bearing liver segment, and thus the possibility of combined vascular resection. In this way, preoperative assessment could be used to effectively guide whether to perform regular segment resection, combined segment resection or enlarged lobectomy, leading to the avoidance of unexpected situations, such as postoperative liver insufficiency and liver failure.

Surgical Operation Assistance
Image fusion is a valuable tool for planning treatment strategies and examining pathological changes. The CT portal vein image is automatically registered with Gd-EOB-DTPA-enhanced MRI images using Mitworkbetch software prior to the operation. The CT-MRI fusion image provides detailed lesion information, in turn, improving diagnostic accuracy [176,177]. At the same time, three-dimensional models and virtual surgical images based on CT/MRI fusion image reconstruction are applied to guide the key surgical procedures. Combined with indocyanine green molecular fluorescence images, CT/MRI image fusion efficiently defines tumor boundaries and identifies hidden microscopic lesions, leading to improved surgical precision. In addition, fusion imaging is particularly useful for lesions for which anatomical images are difficult to obtain from various angles with conventional techniques, such as anatomically complex hilar lesions. Based on tumor location in combination with the distance between the tumor and intrahepatic vasculature and spatial positional relationship, the optimal virtual resection plane can be determined. Additionally, fusion imaging may be effective for therapeutic evaluation. Posttreatment changes can be easily clarified by creating a fused image prior to treatment. Moreover, since 3D images can be obtained without difficulty, we may be able to successfully simulate surgical treatment in the future. Fusion imaging further allows patients to visually understand the disease process.
Intraoperative bleeding is a critical aspect of liver surgery, and an important factor affecting the success of surgery and the postoperative recovery of patients [178]. Enhanced preoperative CT/MRI fusion imaging can assist surgeons to better understand information related to intrahepatic vascular alignment, portal vein alignment variation, location of main hepatic vein branches and spatial distance from the tumor lesions. Thus, identification of important vessels and anatomical markers surrounding the lesion can be improved by the assessment of enhanced CT/MRI image fusions before the operation, which is vital in reducing intraoperative bleeding and operation times and accelerating postoperative recovery. Queisner et al. [179] conducted a series of clinical studies to evaluate the efficiency of contrast-enhanced CT/MRI image fusion technology in hepatectomy for different anatomical locations. Data from their study suggest that most preoperative surgical planning schemes based on image fusion are similar to the actual operative procedure conducted following the exploration of the liver. Therefore, routine preoperative contrast-enhanced CT/MRI image fusion could provide a valuable guide for planning surgical procedures, leading to the improvement of surgical treatment outcomes.
For the fusion of CT and MRI in real clinical applications, the common methods are Arithmetic combination [177], the PCA-wavelet transformation-based method [47][48][49]53,54,56,117] and the Pyramid method [58,59]. Here, we show one real clinical example for the assessment of the cryoablation margin using MRI-CT fusion imaging in hepatic malignancies [180]. From their study, it is shown that MRI-CT fusion imaging was achieved successfully in 46 (97.9%) of 47 lesions, and was useful for evaluating the Minimal ablative margin (MAM) of cryoablation in hepatic malignancies. An example of the fused MRI-CT images from Chen et al. [180] using Arithmetic combination is shown in Figure 4.

PET/CT & PET/MRI
PET/CT is the most widely used fusion imaging technique in clinical diagnosis. However, due to its imaging principles, PET/CT is less effective in locating tumor lesions in spatial resolution, and can only show a specific standard uptake value (SUV) range, which fails to accurately evaluate liver tumor position and its adjacent relationship in anatomical space. PET/MRI fusion technology has greater advantages in the assessment of tumor morphology, function and metabolic imaging than PET/CT [165,166]. Firstly, based on the advantages of diffusion-weighted imaging (DWI), perfusion imaging PWI and MR spectrum, MRI is far superior to CT in the functional imaging of human soft tissue organs. Secondly, the PET/MRI imaging system does not impose an ionizing radiation burden on the patient or operator. Thirdly, on CT images, signals of some abdominal and pelvic lesions may be disrupted by peristaltic bowel, poor bladder filling or uterine translocation. This type of interference is often unavoidable in the imaging process and affects the observation of diseased organs, which can be effectively overcome by the hybrid technology conditions of PET/MRI [167].
Clinical findings suggest that PET/MRI has an advantage in regular postoperative follow-up for patients at high risk of liver metastases from colorectal cancer with regard to the monitoring of tumor recurrence, especially in distinguishing inflammatory tissue around the surgical area of rectal cancer lesions, distant liver metastasis and the clinical TNM stage. With regard to adjacent tissue, taking into consideration both the metabolism of 18 F-FDG in postoperative inflammatory tissues of rectal cancer and DWI images in enhanced MRI, morphologic and functional imaging can effectively discriminate whether the newly formed mass in the rectal cancer area is an inflammatory scar or recurrent tumor tissue [181]. In terms of N-staging, 18 F-FDG activity is not specific to cancer, since it has been observed in macrophages involved in inflammatory and infectious diseases. Cancer patients with acute inflammatory or infectious diseases also display high SUV signals on PET/CT images, which makes it impossible to determine whether lymph nodes with high metabolic signals present an inflammatory lesion or neoplastic metastasis that affects the N stage of correctly diagnosed cancer patients. Owing to the significant benefits of the MRI in soft tissue imaging of lymph nodes, PET/MRI is obviously superior to PET/CT in distinguishing internal lymph node structures. Taking the morphological, functional and metabolic features of suspected lymph nodes into consideration,

PET/CT & PET/MRI
PET/CT is the most widely used fusion imaging technique in clinical diagnosis. However, due to its imaging principles, PET/CT is less effective in locating tumor lesions in spatial resolution, and can only show a specific standard uptake value (SUV) range, which fails to accurately evaluate liver tumor position and its adjacent relationship in anatomical space. PET/MRI fusion technology has greater advantages in the assessment of tumor morphology, function and metabolic imaging than PET/CT [165,166]. Firstly, based on the advantages of diffusion-weighted imaging (DWI), perfusion imaging PWI and MR spectrum, MRI is far superior to CT in the functional imaging of human soft tissue organs. Secondly, the PET/MRI imaging system does not impose an ionizing radiation burden on the patient or operator. Thirdly, on CT images, signals of some abdominal and pelvic lesions may be disrupted by peristaltic bowel, poor bladder filling or uterine translocation. This type of interference is often unavoidable in the imaging process and affects the observation of diseased organs, which can be effectively overcome by the hybrid technology conditions of PET/MRI [167].
Clinical findings suggest that PET/MRI has an advantage in regular postoperative follow-up for patients at high risk of liver metastases from colorectal cancer with regard to the monitoring of tumor recurrence, especially in distinguishing inflammatory tissue around the surgical area of rectal cancer lesions, distant liver metastasis and the clinical TNM stage. With regard to adjacent tissue, taking into consideration both the metabolism of 18 F-FDG in postoperative inflammatory tissues of rectal cancer and DWI images in enhanced MRI, morphologic and functional imaging can effectively discriminate whether the newly formed mass in the rectal cancer area is an inflammatory scar or recurrent tumor tissue [181]. In terms of N-staging, 18 F-FDG activity is not specific to cancer, since it has been observed in macrophages involved in inflammatory and infectious diseases. Cancer patients with acute inflammatory or infectious diseases also display high SUV signals on PET/CT images, which makes it impossible to determine whether lymph nodes with high metabolic signals present an inflammatory lesion or neoplastic metastasis that affects the N stage of correctly diagnosed cancer patients. Owing to the significant benefits of the MRI in soft tissue imaging of lymph nodes, PET/MRI is obviously superior to PET/CT in distinguishing internal lymph node structures. Taking the morphological, functional and metabolic features of suspected lymph nodes into consideration, PET/MRI can effectively distinguish tissue structures, such as fat hilum, margin and necrotic area within lymph nodes, which allows determination of the tumor metabolism of lymph node tissue with high suspicion of metastasis, and thus the differentiation of malignant from benign lymph nodes. With regard to M-staging, PET/MRI, which can distinguish metastatic lesions less than 1 cm, is of greater diagnostic value in patients with suspected liver metastasis of rectal cancer. Reiner reported a higher diagnostic rate of PET/MRI relative to enhanced CT/PET fusion [182]. Simultaneously, PET/MRI in the diagnosis of primary liver cancer can clearly distinguish whether the portal and hepatic vein systems display any important pathological features of tumor thrombus involvement, which can provide a foundation for clinical decisions of subsequent treatment. For the fusion of PET-MRI for clinical applications, the wavelet transformation-based method [66,73,74,130], IHS-PCA [31] and deep learning methods [151,154] are generally used. On the other hand, wavelet transformation-based methods [52,57,168,169] and deep learning [147] are generally applied for the fusion of PET-CT. Here, we show an example in Figure 5, which compares the accuracy of the fused image of PET/MRI and single modal MRI in the correct identification of a patient with liver lesions [183]. The data from this study indicate that the fusion of PET/MRI can increase the identification rate of the liver malignant lesion from 94.4% to 100%.
Appl. Sci. 2020, 10, 1171 18 of 28 PET/MRI can effectively distinguish tissue structures, such as fat hilum, margin and necrotic area within lymph nodes, which allows determination of the tumor metabolism of lymph node tissue with high suspicion of metastasis, and thus the differentiation of malignant from benign lymph nodes. With regard to M-staging, PET/MRI, which can distinguish metastatic lesions less than 1 cm, is of greater diagnostic value in patients with suspected liver metastasis of rectal cancer. Reiner reported a higher diagnostic rate of PET/MRI relative to enhanced CT/PET fusion [182]. Simultaneously, PET/MRI in the diagnosis of primary liver cancer can clearly distinguish whether the portal and hepatic vein systems display any important pathological features of tumor thrombus involvement, which can provide a foundation for clinical decisions of subsequent treatment. For the fusion of PET-MRI for clinical applications, the wavelet transformation-based method [66,73,74,130], IHS-PCA [31] and deep learning methods [151,154] are generally used. On the other hand, wavelet transformationbased methods [52,57,168,169] and deep learning [147] are generally applied for the fusion of PET-CT. Here, we show an example in Figure 5, which compares the accuracy of the fused image of PET/MRI and single modal MRI in the correct identification of a patient with liver lesions [183]. The data from this study indicate that the fusion of PET/MRI can increase the identification rate of the liver malignant lesion from 94.4% to 100%.

Figure 5.
A 25-year-old female patient with a history of colorectal cancer presented multiple liver lesions after surgery. The focal nodular hyperplasia (FNH) in the right liver shows an arterial contrastagent enhancement (A), and is still hyperintense in the liver-specific contrast phase (C). No significant 18F-FDG-uptake is seen (B,D). A second lesion in the right liver is rated as a colorectal liver metastasis due to incomplete resection. Tumor lesion is neither detectable by MRI without liver-specific contrast phase nor with liver-specific contrast phase (E,G). In fused PET/MR images (F,H) the remaining tumor tissue lesion could clearly be identified. Additional lesions near the liver hilus are adenomas with strong arterial contrast-agent enhancement (I). In the liver-specific contrast phase lesions are hypointense (K). Similar to the FNH, no significant 18F-FDG-uptake is seen (J,L) [183].

Discussion on the Limitations and Prospects of Medical Image Fusion Technology
In the modern clinical practices, physicians have a higher demand on the accuracy and efficiency of a visually-aided medical diagnostic system. The image fusion techniques can efficiently process and combine the information from different image devices, which plays an important role in the precise positioning of tumors, the early diagnosis and treatment of cancer. With the advance of modern computer systems and medical imaging equipment, image fusion technology will be further developed, bringing a new revolution to clinical diagnosis. The research trend is to develop new algorithms which will make the registration of multimodal medical images more accurate, the fusion more efficient, and thus they will eventually achieve the purpose of improving the diagnostic effect. In the following, we present several challenges and the research trend in this topic: The extensive application of different medical image modalities has played a globally recognized role in the diagnosis of the liver cancers [184]. However, these modalities still have several flaws. As a result, one research challenge and focus is to further improve the single modalities. These may include the reduction of the rising cost of these medical images, decreasing the patient's exposure time to radiation, while maintaining the image quality [185,186]. In addition, current clinical Figure 5. A 25-year-old female patient with a history of colorectal cancer presented multiple liver lesions after surgery. The focal nodular hyperplasia (FNH) in the right liver shows an arterial contrast-agent enhancement (A), and is still hyperintense in the liver-specific contrast phase (C). No significant 18F-FDG-uptake is seen (B,D). A second lesion in the right liver is rated as a colorectal liver metastasis due to incomplete resection. Tumor lesion is neither detectable by MRI without liver-specific contrast phase nor with liver-specific contrast phase (E,G). In fused PET/MR images (F,H) the remaining tumor tissue lesion could clearly be identified. Additional lesions near the liver hilus are adenomas with strong arterial contrast-agent enhancement (I). In the liver-specific contrast phase lesions are hypointense (K). Similar to the FNH, no significant 18F-FDG-uptake is seen (J,L) [183].

Discussion on the Limitations and Prospects of Medical Image Fusion Technology
In the modern clinical practices, physicians have a higher demand on the accuracy and efficiency of a visually-aided medical diagnostic system. The image fusion techniques can efficiently process and combine the information from different image devices, which plays an important role in the precise positioning of tumors, the early diagnosis and treatment of cancer. With the advance of modern computer systems and medical imaging equipment, image fusion technology will be further developed, bringing a new revolution to clinical diagnosis. The research trend is to develop new algorithms which will make the registration of multimodal medical images more accurate, the fusion more efficient, and thus they will eventually achieve the purpose of improving the diagnostic effect. In the following, we present several challenges and the research trend in this topic: The extensive application of different medical image modalities has played a globally recognized role in the diagnosis of the liver cancers [184]. However, these modalities still have several flaws. As a result, one research challenge and focus is to further improve the single modalities. These may include the reduction of the rising cost of these medical images, decreasing the patient's exposure time to radiation, while maintaining the image quality [185,186]. In addition, current clinical application of image fusion is still limited to merge the medical images from two independent medical devices. As a result, the patient needs to receive multiple examinations. Consequently, image fusion is more expensive than single image technology, which limits its application in the clinical diagnosis and treatment of tumors.
To reduce the examination cost and the risk of additional radiation, it is ideal to develop devices which can perform multimodalities exams at the same time while maintaining high image qualities.
Development of efficient medical registration technologies is also very important. One topic is to address the significant alignment errors caused by patient breathing and motion compensation. In order to optimize linkage of the fused image, data from the unified respiratory phase should be used for registration, and the patient should remain in the same scanning position to the greatest possible extent, particularly for organs that move with the respiratory phase. Automatic registration will be the development direction of image fusion volume navigation technology, aiming to optimize the registration process and the fusion accuracy. In addition, improvement of the available algorithms in terms of accuracy and faster registration processes may promote the utility of fusion imaging. New algorithms and methods may be introduced to take into account organ movement caused by changes in breathing or position. An ideal system is based on automatic registration using complex electromagnetic tracking and computer-aided imaging algorithms, without effects on external reference points or anatomical landmarks selected by the user. These features would allow wider usage of this technology by individuals with less experience in image fusion. With further development of follow-up physical diagnosis technologies, improved fusion imaging may be widely applied in clinical practice to achieve an early diagnosis and treatment of tumors in the future.
Another interesting research topic is to reduce the computational time of the registration/fusion algorithm, or to speed up the fusion procedure. The relatively large computational time puts limitation of the implementation of several fusion algorithms in specific clinical applications based on the requirement of these medical studies. For example, as discussed in Section 5.1, for the real-time fusion of US and MRI or CT for liver lesion diagnosis, current clinical application can only use fusion results by a simple arithmetic combination due to its fast processing speed. Several research directions could be useful to overcome this challenge, such as (i) applying high performance computation, which utilizes parallel computing, to obtain efficient image fusion process; (ii) using a pretrained deep learning network for high speed image fusion processing.
As shown in Section 4.10, the deep learning method becomes more and more popular for the registration/fusion of medical images. However, the robustness and availability of the dataset still constrain the usage of such a method in clinical image fusion. For instance, the size of available data is often very limited due to the privacy of these clinical data. Secondly, the need for medical experts to label the available dataset is very time-and finance-consuming. At last, the quality of the data, especially pathological data, cannot be guaranteed. To solve the problem of data shortage, researchers have proposed and applied data augmentation in medical image processing, which increase the diversity and size of the dataset without obtaining new data. The simplest data augment function such as random image rotations or nonlinear deformations are easy to implement, but lack the ability to emulate real variations. The more advanced methods, such as few-shot/one-shot learning [187] and attribute-guided augmentation (AGA) [188], could produce a wide variety of realistic new images for the deep learning-based image fusions with little supervised data samples. Unsupervised learning is another important research direction to overcome the small dataset challenge. For instance, stack autoencoders is one of the popular feature extraction unsupervised learning algorithms used in the image registration/fusion process. A newly developed learning module, Spatial Transformer (ST) [189], can make explicit use of the data's spatial information and can be inserted into CNNs. This makes CNNs invariant to translation, scaling, rotation and common distortions without additional training, and thus to be able to register medical images without training datasets. Another relative new and popular algorithm, Generative Adversarial Networks (GANs) [190], creates a generative network and a discriminative network at the same time. The network can receive end-to-end training and learn representative features in a completely unsupervised way which provides a research direction in deep learning-based image registration/fusion.
As discussed in the previous sections, all of the existing image fusion techniques possess their own strength or weakness. With efficient combination of different image fusion methods, the advantages of different fusion methods could be combined for higher image qualities, while the weakness of these methods could be avoided.
At the same time, a new and more efficient algorithm still needs to be developed to improve the quality and visualization of the fusion image, while reducing the errors due to resolution difference between images, image noise and the dimension difference between images. Noise effects due to signal noise could affect the image fusion process in a negative way. As a result, an efficient denoising algorithm would be useful to address the signal noise from the medical image in order to enhance the quality of fused images.
At last, many articles to date have documented case reports with only a small number of patients. Larger multicenter randomized studies, including cost-benefit analyses and clinical impact studies, are required to further evaluate the efficacy of medical fusion technology.