Visible Spectrum and Infra-Red Image Matching: A New Method

: Textural and intensity changes between Visible Spectrum (VS) and Infra-Red (IR) images degrade the performance of feature points. We propose a new method based on a regression technique to overcome this problem. The proposed method consists of three main steps. In the ﬁrst step, feature points are detected from VS-IR images and Modiﬁed Normalized (MN)-Scale Invariant Feature Transform (SIFT) descriptors are computed. In the second step, correct MN-SIFT descriptor matches are identiﬁed between VS-IR images with projection error. A regression model is trained on correct MN-SIFT descriptors. In the third step, the regression model is used to process the MN-SIFT descriptors of test VS images in order to remove misalignment with the MN-SIFT descriptors of test IR images and to overcome textural and intensity changes. Experiments are performed on two different VS-IR image datasets. The experimental results show that the proposed method works really well and demonstrates on average 14% and 15% better precision and matching scores compared to recently proposed Histograms of Directional Maps (HoDM) descriptor. RGB-NIR and MSD datasets, respectively. The experiments are also performed with different state-of-the-art feature matching strategies. The experimental results show that the proposed method also demonstrates better matching and precision scores with them compared to other descriptors.


Introduction
Visible Spectrum (VS) and Infra-Red (IR) images are used in a wide variety of computer vision applications, such as image registration [1,2], face recognition [3], scene category recognition [4], stereo matching [5] and medical image analysis [6]. IR images provide complementary information to visible spectrum images in order to make image analysis more reliable [4].
In the last two decades, a large number of feature point detectors and descriptors have been proposed [7]. These detectors and descriptors have been designed for grayscale and RGB images to overcome common types of transformations and deformations between the images [8]. In cross-spectral applications, such as VS-IR image matching, the detectors and descriptors under perform due to high textural and intensity changes between VS-IR images [9]. To overcome this problem, several new and modified versions of Scale Invariant Feature Transform (SIFT) [10] have been proposed in order to minimize the effects of intensity and textural changes between the images by using features such as Canny edges [11], local contrast and differential excitations [12], local binary patterns [13,14], normalized gradients [15] and local self similarity [16] for descriptor construction.
In contrast, this paper proposes a regression based method to overcome the effects of intensity and textural changes. The proposed method consists of three steps. In the first step it detects SIFT feature points [10]. Then it computes Modified Normalized(MN)-SIFT [17] descriptors for every detected feature point. It identifies correct MN-SIFT descriptor matches between pairs of training VS-IR images with projection error [18]. It fits a regression model on correct MN-SIFT descriptors. The MN-SIFT descriptors of test VS images are passed through the regression model to remove the effects of textural and intensity changes prior to their descriptor matching with the MN-SIFT descriptors of test IR images. To the best of our knowledge, no similar method exists in the literature. The main contributions of this paper are as follows: • A performance evaluation of different feature point detectors and descriptors on VS-IR images. • A new regression based method for VS-IR image matching.
The rest of the paper is organized as follows: Section 2 presents related work. Section 3 presents the proposed method. Section 4 presents the experimental setup and the results. Finally, the paper is concluded in Section 5.

Related Work
Intensity and textural changes between multisensor images degrade the performance of feature points. Various new and modified versions of SIFT algorithm have been proposed to overcome such changes. Gradient Orientation Modification (GOM)-SIFT, is an extension of SIFT, to makes the SIFT descriptor robust against textural and intensity changes. It is based on modifying image gradients around the SIFT feature points [1] and then uses the modified gradients for descriptor construction. Orientation Restricted (OR)-SIFT is based on a similar idea. It computes SIFT descriptors and combines the elements of the SIFT descriptor in the opposite orientation directions [2] to overcome textural and intensity changes.
In the Edge Oriented Histogram (EOH) method, the distribution and orientation of Canny edges are used instead of image gradients for descriptor construction [5]. The EOH descriptor works well on VS-IR images but under performs in the presence of rotational changes between the images. Directional filters are used in Reference [11] to overcome this problem and to construct VAR-EOH descriptors. Similarly, the image edges are computed with Local Contrast (LC) and Differential Excitation (DE) kernels and are used instead of image gradients in the SIFT algorithm to compute LC-SIFT and DE-SIFT descriptors [12].
In the Local Binary Patterns of Gradients (LBPG) [14] approach, the Center Symmetric Local Binary Patterns (CSLBP) [13] is used to modify both gradient magnitude and orientation maps around the feature points. This gives modified image gradients, which are used as features in the construction of LBPG descriptor. LBPG shows superior performance against textural and intensity changes but its large descriptor size (256 dimensions) makes the descriptor matching process computationally expensive. To overcome this limitation, image gradients around the feature points are normalized and used as features in the construction of Normalized Gradient (NG)-SIFT descriptors [15]. NG-SIFT works well on multisensor images of structured scenes, but under performs on textured scene images. In the Modified Normalized gradient (MN)-SIFT method, the MN features are used to overcome this problem [17]. The Local Self Similarity (LSS) descriptor [16] uses self-similarity of pixels, edges, and repetitive patterns in descriptor construction. The extended versions of LSS are Dense Adaptive Self Correlation [19] and Fully Convolutional Self Similarity [20] descriptors, both they are computed in a dense manner over the whole image.
Ye et al. [21] propose a phase congruency approach. They generate features by computing amplitude and orientation of the phase congruency model to match multi-modal images. They call the descriptor as Histogram of Orientated Phase Congruency (HOPC), which encapsulates both edges and structural information for multi-modal image matching.
Kim et al. [22] propose Local Self Similarity Frequency (LSSF) descriptors. They compute LSSF with features, which are obtained through frequency domain analysis of local internal layout of the self similarities. The LSSF employs a correlation surface to reduce intensity and textural differences between VS and near IR images. LSSF is invariant to rotational changes and uses a log polar binning scheme for descriptor construction.
Sedaghat and Ebadi propose an adaptive binning strategy to construct descriptors for remote sensing images [23]. They use Hessian affine feature point detector to extract normalized image patches for descriptor construction. Unlike SIFT, they use an adaptive histogram quantization strategy to incorporate both location and gradient orientation information to make the descriptors robust against viewpoint, intensity and textural changes.
Nunes and Padua compute descriptors by extracting structural properties of the image [24]. They use log-Gabor filters and named the descriptor as Multispectral Feature Descriptor (MFD). Similarly, Sobel filters are used in four different directions to compute Histograms of Directional Maps (HoDM) descriptor [25]. Binary descriptors such as Oriented FAST and Rotated BRIEF (ORB) [26] and Binary Robust Invariant Scalable Keypoints (BRISK) [27] are also used to overcome intensity changes between the images. There are some methods, which employ the theory of physics to understand the phenomena of textural and intensity changes to improve the cross spectral image matching results [28,29].
In Reference [9], the authors compare different descriptors on multisensor images using the image matching framework of [18]. These descriptors are gradient based descriptors like SIFT, NG-SIFT, LG-SIFT, DE-SIFT, MN-SIFT, and intensity order based descriptors [30], Haar wavelet based descriptors [31] and Local binary pattern based descriptors [13,14]. It is shown that descriptors computed on normalized image patches extracted through Harris feature points [32] and SIFT feature points demonstrate better performance than descriptors computed on normalized image patches extracted with ORB and Brisk detectors. It is also shown that MN-SIFT compared to others, demonstrates better results on multisensor images.
The regression method proposed in this paper is also based on MN-SIFT descriptors. It uses MN-SIFT descriptors to train a regression model to overcome textural and intensity changes between VS-IR images. The regression model processes the MN-SIFT descriptors of test VS images prior to their descriptor matching with the MN-SIFT descriptors of IR images and improves the image matching results.

Proposed Method
This section presents the proposed method. It consists of three main steps: feature point detection, MN-SIFT descriptor construction and regression model training & testing. Figure 1 shows a block diagram for the proposed method. Each block is briefly described below:

Image Datasets
We use two different image datasets: (i) RGB-NIR dataset [4], and (ii) Multimodal Stereo dataset (MSD) [5]. The RGB-NIR dataset consists of Visible Spectrum (VS: 400-700 nm) and Near-Infra-Red (NIR: 750-1100 nm) images of 477 different indoor and outdoor scenes. The MSD dataset consists of VS and Long-Wave-Infra-Red (LWIR: 800-1500 nm) images of 100 different outdoor scenes. We randomly divide each dataset into two disjoint sets: Training set and Test set. The training set consists of 10 percent image pairs of the dataset and only used for regression model training. Whereas the test set consists of the remaining 90% image pairs and used only to evaluate the proposed descriptor (Reg-SIFT) and to compare its performance with other state of the art descriptors.

SIFT Feature Point Detection
We use the SIFT feature point detector [10] to detect feature points from VS-IR images of the train and test sets. The SIFT detector is based on scale space images which are computed by convolving the input image with variable-scale Gaussian kernel.

Projection Error
Projection error ( ) is defined as the Euclidian distance between the reference image feature points and the projected feature points [18]. The projected feature points are obtained by projecting the feature points of the target image onto reference image with a ground truth homography K. This homography is known in advance between the reference and target images according to Reference [18,33]. We use a threshold of 2 pixels as projection error in this paper to identify corresponding feature points between the reference and target images for the proposed methods.

MN-SIFT
To compute MN-SIFT [17] descriptors, a circular region around each detected SIFT feature point is cropped from the image. The radius of the region is proportional to scale (σ) of the SIFT feature point. The region is sub divided into 4 × 4 location bins as shown in Figure 2. The bins are denoted as H r,c where r, c = 1, 2, 3, 4. The region is convolved with kernels to obtain derivatives F h and F v along horizontal and vertical directions, respectively. The gradient magnitude (Ω) and gradient orientation (β) are calculated at each pixel location (x, y) as: Then modified gradient magnitudes (Ω) are computed as: where Ω min and Ω max are the region's minimum and maximum gradient magnitude values, respectively. The pixels of each location bin are identified as follows: The pixels of a location bin are denoted as {H(x, y) : x ∈ [l c , u c ] ∧ y ∈ [l r , u r ]} where s represents the region's size. The gradient orientations βof the region are quantized into eight different levels as follows: where mod(.) represents a modular operator i.e., the modular operation on β(x, y) = −π/2 produces L(x, y) = 6. Then a feature histogram h r,c,t is computed for each location bin as follows: where t = 0, 1, ..., 7 and δ(.) is defined as: The histograms are concatenated over all the location bins H r,c to obtain MN-SIFT descriptor.

Regression Modeling Using Corresponding Descriptors
We trained a regression model on corresponding MN-SIFT descriptors of the train set. To understand the training process, let I R and I T be two images of the same scene. These images depict the same scene contents in VS and IR bands, respectively. Feature points are detected on I R and I T images and the feature point locations (pixels) are stored as (x r , y r ) and (x t , y t ), respectively, where r = 1, 2, 3, ..., u, t = 1, 2, 3, ..., v. u and v represent total number of feature points detected on I R and I T images, respectively. The feature points of I R are projected onto I T with a homography K, which acts as ground truth data between I R and I T . This homography is known in advance between every VS-IR images of the training set according to Reference [18,34]. We use a projection error of 2 pixels to identify corresponding feature points between I R and I T . Figure 3 shows detected and corresponding feature points as blue '+' and green 'o' between VS-LWIR images of a MSD scene. respectively.
Then MN-SIFT descriptors are computed for corresponding feature points. Such descriptors are referred to as corresponding/correct descriptors. Let R 1 be a descriptor of I R image and let its corresponding descriptor match is T 1 in I T image: where m = 128 represents the length of R 1 and T 1 MN-SIFT descriptors. Let f (R 1 , Θ 1 ) be a model function that gives an error e 1 when it is subtracted from the first element of T 1 : where Θ 1 are the parameters of f (R 1 , Θ 1 ) which are required to be learnt to minimize the e 2 1 error i.e., square of error between the first element of T 1 and the model function f (R 1 , Θ 1 ). To learn Θ 1 , projection error equal to or less than 2 pixels is used to identify n corresponding MN-SIFT descriptors between I R and I T images. Then corresponding descriptors are stored as R and T matrices where R i and T i are corresponding MN-SIFT descriptors, which are stored as the ith row of R and T, where i = 1, 2, 3, ..., n.
The regression modeling, as explained above, was based on a single image pair, that is, I R and I T . In the case of a dataset, for instance RGB-NIR and MSD datasets, each dataset is randomly divided into two disjoint sets. One set for training and the other one for testing. The corresponding descriptors are obtained from each image pair of the training set and are appended as rows to from matrices R and T. SIFT detector gives on average 380 corresponding feature points per image pair with projection error of equal to less than 2 pixels. If there are 10 image pairs in the training set, then total number of corresponding descriptors (i.e rows of R and T) obtained is 380 × 10 = 3800 and the unknown Θs of the model functions are learnt on these corresponding descriptors as explained above.
In fact Θs represent the parameters of a regression model. In this paper we use five different regression models to compute the proposed Reg-SIFT. These regression models are Linear Regression (LR), Decision Tree Regression (DTR), Random Forest Regression (RFR), Support Vector Machine Regression (SVMR) and Multi-Layer Perceptron Regression (MLPR). All these models are implemented in Python with Sklearn library except the MLPR, which is implemented using the Keras and Tensorflow libraries. Figure 1 shows Reg-SIFT block, which is obtained by processing the MN-SIFT descriptors of test VS images with the trained regression model. We are using five different models; therefore, Reg ∈ {LR, DTR, RFR, SV MR, MLPR}. To understand the processing of MN-SIFT to get Reg-SIFT, consider I F and I G be two images of the test set depicting the same scene in VS and IR bands, respectively. SIFT feature points are detected and MN-SIFT descriptors are computed. Let F w and G z be two sets of MN-SIFT descriptors that belong to I F and I G images, respectively, where w = 1, 2, 3, ..., w and z = 1, 2, 3, ..., z. Total number of descriptors computed for I F and I G images are denoted as w and z, respectively. Each F w descriptor is processed through the regression model (i.e., testing) to obtain a Reg-SIFT descriptor (G w ) and then the Reg-SIFT descriptors are matched with the MN-SIFT descriptors (G z ) of image I G . To understand the process, let , g w 2 , g w 3 , · · · , g w m ] as follows: . . . = . . .
After that, image matching using Reg-SIFT is carried out and the matching results are compared with state of the art descriptors.
We use MN-SIFT for proposed Reg-SIFT because of its better robustness towards intensity and textual changes. MN-SIFT is based on MN features, which contain both local textural and structure information and performs well in cross spectral applications compared to NG-SIFT in [9] . NG-SIFT encapsulates only the structural information [15]. The experimental results of [9] show that MN-SIFT demonstrates better performance on multisensor images than SIFT, LC-SIFT, LBPG, DE-SIFT, and CS-LBP descriptors.
Another reason for choosing MN-SIFT is its descriptor construction process, which is simple compared to that of EOH, LSS, MFD and HoDM descriptors. EOH uses Canny edges. The detection of Canny edges is relatively simple on VS images, but fails on LWIR/NIR images due to low contrast. LSS is based on local self-similarity between a small region and a larger one around the feature points and computed with sum of square differences approach. MFD uses directional Log-Gabor filters. These filters are more computationally expensive than simple directional filters used in MN-SIFT. HoDM uses Sobel filters in four different directions to compute image gradients. Then absolute values of the gradients are calculated and the weak gradients are suppressed with a hypothesis. The four image gradient values at each pixel location are compared and binarized. The absolute and binary gradients are then read with a spatial pooling scheme to compute HoDM descriptors. The HoDM process is also computationally expensive than MN-SIFT.

Experimental Setup and Results
This section presents image matching, image datasets, evaluation criteria, experimental setup and results.

Image Matching
Image matching is used as a test problem in this paper to evaluate the performance of Reg-SIFT and to present a comparison of Reg-SIFT with respect to state of the art feature point descriptors. Image matching is widely used as a test problem for the performance evaluation of feature points [15,[34][35][36]. In this paper, an image matching framework of Heinly et al. [18] is used which is based on projection error. Image matching is carried out in three steps [18]: (i) Feature point detection (ii) Feature point description and (iii) Feature point matching. Image matching is performed between pairs of images of the same scene under scale, rotation, affine, and common type of deformations and transformations. In this paper, the image matching is performed between VS and IR images, which possess high textural and intensity changes.

Descriptor Matcher
We use a Brute Force Descriptor Matcher (BFDM) of OpenCV for descriptor matching. BFDM is based on L2 norm (Euclidean distance) and Hamming distance. L2 norm is suitable for SIFT, CS-LBP, GLOH, LIOP, LBPG, LC-SIFT, DE-SIFT, NG-SIFT, MN-SIFT and Reg-SIFT descriptors whereas hamming distance is best for binary descriptors such as BRIEF, ORB, FREAK, and BRISK.

Evaluation Criteria
BFDM returns a list of putative descriptor matches, which we store in a matrix from as: where w and z are two descriptors of VS/RGB and LWIR/NIR images, respectively. If BFDM identifies them as a putative match then we use logic 1 otherwise 0 to store the matching result. To identify correct and false matches present in matrix M, we compute a ground truthH matrix with the help of projection error [18] as follows:H If the feature points of w and z descriptors are at ≤ 2, they are assigned a logic 1 otherwise logic 0. Then the total number of correct (N c ) and false (N f ) matches are computed as follows: Matching and precision scores are computed as performance metrics [18,34]. Matching score is a ratio between number of correct descriptor matches (N c ) and smaller of the number of descriptors in the pair of images ( min{w, z} ). Precision is defined as a ratio between number of correct descriptor matches (N c ) and total number of descriptor matches (N c + N f ). Matching and precision scores are computed on image pair basis of the test set as illustrated in Figure 4 and then average matching and precision scores are presented.  Figure 4. Image matching frame work for performance evaluation of feature points on VS and near infra-red/long-wave-infra-red (NIR/LWIR) images of the test set.

Experimental Results
The experimental results are divided into three parts. In the first part, a comparison of different feature point detectors is presented with objective to identify the best ones for VS-IR images. Repeatability score and number of correspondences are used as performance metrics for this comparison. The repeatability score is defined as a fraction of feature points, identified as corresponding feature points between two images. In the second part, the two best detectors are paired with seventeen different descriptors one by one in order to find the best detector-descriptor combinations for VS-IR images. In the third part, the proposed Reg-SIFT is computed with LR, DTR, RFR, SVMR and MLPR regression techniques for comparison and the best regression technique is selected for subsequent comparisons where the Reg-SIFT is compared with the best detector-descriptor pair of the second part.

Comparison of Feature Point Detectors
We use SIFT, SURF, ORB, BRISK, FAST and Harris [32] detectors. Table 1 shows the comparison results. It can be seen that SIFT detector obtains on average the best performance compared to other detectors. It demonstrates the best number of feature point correspondences and repeatability scores, that is, 903% and 62.3%, respectively, on RGB-NIR dataset. Harris detector achieves the second best results. Harris obtains the best results on MSD followed by the SIFT detector.

Comparison of Feature Point Descriptors
We pair Harris and SIFT detectors with different descriptor types for this comparison. Harris and SIFT detectors are chosen based on their better performance on VS-IR images compared to other detectors as discussed in the previous section. Harris and SIFT detectors are paired one by one with seventeen different descriptor types to evaluate the performances of different detector-descriptor combinations on RGB-NIR and MSD datasets and to identify the best detector-descriptor combinations for VS-IR images. The comparison is based on average matching and precision scores. We use a notation detector+descriptor to represent a detector-descriptor pair. For example SIFT + ORB means that detector is SIFT and the descriptor is ORB. BFDM is used for descriptor matching. Table 2 shows average matching score. SIFT + SIFT achieves a matching score of 34% on RGB-NIR dataset whereas Harris+SIFT achieves 24% matching score. SIFT + MN-SIFT achieves the best matching score of 46% on RGB-NIR dataset and outperforms all other descriptor combinations with SIFT detector. In case of Harris detector, Harris + HoDM demonstrates the best performance. It can be seen that descriptors combination with SIFT detector gives better results compared to pairing descriptors with Harris detector. The matching scores achieved on MSD are very low compared to RGB-NIR dataset. This is due to very high textural and intensity changes between VS-LWIR images compared to RGB-NIR images. The comparison shows that SIFT + HoDM obtains better results and outperforms all others detector-descriptor combinations on MSD dataset. Similarly, Harris + HoDM also demonstrates good results on MSD. Table 2. Average matching score (%) achieved by different detector-descriptor pairs on RGB-NIR and MSD datasets for image matching. Maximum value in each column is highlighted with bold face font. SIFT  34  24  3  1  SURF  19  14  1  0  ORB  36  28  1  1  BRISK  32  25  1  0  GOM-SIFT  35  26  4  2  OR-SIFT  35  27  4  3  EOH  25  19  3  3  LBPG  32  28  2  1  LS-SIFT  37  29  2  1  DE-SIFT  39  31  2  1  NG-SIFT  38  31  5  4  MN-SIFT  46  36  6  6  LSS  31  27  3  2  VAR-EOH  24  17  2  1  LSSF  32  24  7  5  MFD  43  34  11  8  HoDM  45  38  16  14   Table 3 shows average precision score obtained by different detector-descriptor combinations. SIFT+MN-SIFT and Harris+HoDM demonstrates the best precision scores on RGB-NIR dataset whereas SIFT+HoDM and Harris+HoDM outperform others on MSD. The above comparisons show that combining SIFT detector with different descriptors produces better results than combining descriptors with Harris detector. Thats why, we use SIFT detector to detect feature points to compute MN-SIFT and Reg-SIFT descriptors, in the proposed method.

Comparison between Proposed Descriptor and State of the Art
In this section the performance of the proposed descriptor (Reg-SIFT) is compared with different state of the art descriptors. The comparison is based on average matching and precision scores. Only SIFT detector is used as it obtains better results than Harris detector on VS-IR images. Reg-SIFT and other descriptors evaluated in this section are computed for SIFT feature points.  Table 4 shows a comparison between different regression models for computing Reg-SIFT descriptors. The models which are compared are LR, DTR, RFR, SVMR and MLPR. The Reg-SIFT computed with LR model is referred to as LR-SIFT, similarly for other models. The comparison shows that MLPR model gives better precision and matching scores compared to others. In the subsequent sections we use MLPR for Reg-SIFT, to compare its performance with seventeen different descriptors. Table 4. Average matching and precision scores (%) achieved by Reg-SIFT using different regression techniques.  Correct and false descriptor matches are identified with projection error less than or equal to 2 pixels as explained above. It can be seen that SIFT+SIFT gives only one correct match. It obtains matching and precision scores of 0.5% and 0.8%, respectively. SIFT+MFD obtains 9% matching and 24% precision scores. SIFT + HoDM obtains matching and precision scores of 12% and 44%, respectively. Whereas the proposed Reg-SIFT works really well. It obtains matching and precision scores of 20.5% and 72.4%, respectively, (see Figure 5d) and outperforms SIFT, MSD and HoDM. Reg-SIFT obtains almost 8.5% (matching) and 28.4% (precision) better scores than HoDM.  Figure 6 shows a comparison of proposed descriptor with seventeen different state of the art descriptors. It can be seen that the proposed descriptor obtains average matching score of 60% and 27% on RGB-NIR and MSD datasets respectively, and outperforms all other descriptors. Similarly the average precision scores shows that the proposed descriptor achieves 81% and 37% scores on RGB-NIR and MSD dataset respectively, and outperforms all other descriptors. Compared to HoDM, the proposed descriptor demonstrates 15% and 13% better matching and precision scores on RGB-NIR dataset. Whereas on MSD, it performs 11% (matching) and 14% (precision) better than HoDM, respectively.

Computational Complexity
This section presents computational complexity comparisons. All the experiments are performed on a single desktop PC, with Window 7 as operating system and i5-2520M CPU and 16 GB memory space. GPU and parallel computations are not used. Table 7 shows the average time taken (in seconds) by different feature point detectors to detect 1000 feature points per image. Ten different images of the same size, that is, 1024 × 768 are used and then average time is presented. The comparison shows that the FAST detector takes less time on average to detect feature points compared to other detectors.  Table 8 shows average time taken (in seconds) by different descriptor construction methods to compute 1000 descriptors per image. Ten different images of the same size, that is, 1024 × 679 are used for this comparison. Then, average time is reported in the table. The comparison shows that ORB takes less time on average to compute descriptors compared to others.

Conclusions
Textural and intensity changes between visible spectrum and infrared images degrade the performance of feature points. The proposed method overcomes such changes by training a regression model on correct feature point descriptor matches. It uses the regression model to align the descriptors prior to their descriptor matching. The experimental results show that the proposed method gives promising results. It gives relatively better results on RGB-NIR images compared to VS-LWIR images. The experimental results show that the proposed method obtains on average matching scores of 60% and 27% on RGB-NIR and MSD datasets, respectively, whereas HoDM obtains 45% and 16% average matching scores, respectively. The proposed method also achieves average precision scores of 81% and 37% and performs better than HoDM, which achieves 68% and 23% precision scores on RGB-NIR and MSD datasets, respectively. The experiments are also performed with different state-of-the-art feature matching strategies. The experimental results show that the proposed method also demonstrates better matching and precision scores with them compared to other descriptors.