A Novel Statistical Method for Scene Classification Based on Multi-Object Categorization and Logistic Regression

In recent years, interest in scene classification of different indoor-outdoor scene images has increased due to major developments in visual sensor techniques. Scene classification has been demonstrated to be an efficient method for environmental observations but it is a challenging task considering the complexity of multiple objects in scenery images. These images include a combination of different properties and objects i.e., (color, text, and regions) and they are classified on the basis of optimal features. In this paper, an efficient multiclass objects categorization method is proposed for the indoor-outdoor scene classification of scenery images using benchmark datasets. We illustrate two improved methods, fuzzy c-mean and mean shift algorithms, which infer multiple object segmentation in complex images. Multiple object categorization is achieved through multiple kernel learning (MKL), which considers local descriptors and signatures of regions. The relations between multiple objects are then examined by intersection over union algorithm. Finally, scene classification is achieved by using Multi-class Logistic Regression (McLR). Experimental evaluation demonstrated that our scene classification method is superior compared to other conventional methods, especially when dealing with complex images. Our system should be applicable in various domains such as drone targeting, autonomous driving, Global positioning systems, robotics and tourist guide applications.


Introduction
Scene classification uses visual sensor technologies to explore the semantically significant information contained inside an image. Scene classification is the process of assigning categorizing labels to whole scenes based on the visual sensory data of the scene and the structure and relationships between multiple objects presented in the images. Sensors identify two broad categories (i.e., indoor and outdoor) to generally classify different scenes and these are further divided into different sub-categories based on the categories and labels pertaining to the specific multiple objects presented in the images. Visual sensors use the different properties of objects such as their local and global features to classify the whole scene. Scenery images comprise a wide variety of knowledge about the behavior of various objects which have visible features such as borders, corners, and point clouds and these enable us to learn, modify, consider alternative solutions and create new techniques to examine complex scenes. Scene interpretation [1,2] should be capable of accommodating changes in the environment being observed, identifying the vital characteristics of various objects and defining relationships among various objects in order to represent the actual scene behaviors [3,4].
Such scene information needs consistent and accurate object classification that intends to distinguish the images by evaluating semantic object properties. Object classification has become an extensively adopted field in various applications such as smart monitoring and image fetching. It also offers supplemental knowledge in the fields of activity recognition. Apart from object classification which only concentrates on limited parts of an image, scene classification is the next step that leads to scene recognition and labelling based on such limited object information [5,6]. Many scenes are comprised of complicated object relationships and, because variations among scenes can be quite subtle, accurate scene classification is a challenging task in the area of pattern matching and recognition.
The main function of scene classification is to recognize all the objects presented in the scene and to describe semantics for the accurate labeling of the whole scene. Researchers and scientists have produced a lot of work on multiple object categorization [7] for scene classification but there are still several challenges that can affect the accuracy of object categorization and recognition such as changes in illumination, the size of objects, view orientation, and occlusion between objects in complex images. Several articles [8] used a place category strategy that presents a more detailed list of the objects, summary of their spatial correlations and other static features to discriminate scenes, which affect recognition accuracy. Therefore, we propose a novel methodology, which presents the combined effects of similar region clustering, textures of objects, local/global descriptors and class distribution probability estimation. Our novel methodology produces significant performance effects compared to existing methods.
To overcome the challenges encountered in scene classification, we propose a multiple objects categorization-based method to perform scene classification of scenery images from benchmark datasets. As the first step, the proposed system preprocesses the images. We achieve efficient segmentation using two segmentation algorithms, (i) Modified Fast Super-Pixel Based Fuzzy C-Mean Segmentation (MFCS) image segmentation and (ii) Mean Shift Segmentation (MSS). In the second step the results of two algorithms are compared and analyzed. In the third step, we achieve multiple object categorization by evaluating the multiple regions detector, matching the signatures and local descriptors of the regions of images. Kernel function is used to achieve an object similarity score. Finally, the Estimated Intersection over Union (EIOU) and Multi-class Logistic Regression (McLR) are used for scene classification over challenging datasets. The main contributions of our work are as follows.

•
To the best of our knowledge, this is the first time that signatures of objects, local descriptors and multiple kernel learning for objects categorization and multi-class logistic regression for scene classification have been introduced.

•
Fusing of Geometric and SIFT feature descriptors for objects and scene classification. • Accurate multiple region extraction and label indexing of complex scene datasets. • Significant improvement in the accuracy of object and scene classification with less computational time compared to other state-of-the-art methods.
Related work is discussed in Section 2. Section 3 illustrates and details the methodology of our proposed scene classification system. Section 4 presents an analysis of our experimental results and a detailed description of the datasets. Section 5 concludes this paper.

Related Work
Exploring multiple object locations, their scale, view orientation and the impact of scenery images are challenging tasks in the visual sensors [9,10] field. We have studied the literature in several domains such as multi-object categorization, object segmentation as well as labeling and scene classification in order to establish proper parameters and metrices for our proposed method.

Object Segmentation
Image segmentation consists of transforming an image into a set of pixel regions represented by a mask or labels in an image. This transformation of an image into a set of pixels (a segment) allows Sensors 2020, 20, 3871 3 of 20 the processing of important segments only. There are numerous techniques for the segmentation of the objects. In Sezgin et al. [11] categorized thresholding techniques into the following groups: (a) a histogram shape-based technique, (b) a clustering-based technique, (c) an entropy-based technique, (d) an object attribute-based technique, (e) a spatial method, and (f) thresholding methods. In Sujji et al. [12] discussed threshold techniques where they wanted to segment an image to detect the contours of tumors in the brain. In Bi et al. [13] proposed a segmentation method according to the fusion of motion, color and stereo cues of objects. In Yan et al. [14] proposed k-means clustering based on color image enhancement for the segmentation of cells. They computed the gray value components of R, G, and B distributions to find the mean value of these distributions. Additionally, they used YCbCr color space to represent the three clusters, achieved by dividing the improved color images. In Kamdi et al. [15] explained region growing algorithms for segmentation by comparing advantages and disadvantages. Moreover, they divided the image into regions of similar pixels by mean and by min-max techniques. In K-means clustering, the number of k segments is defined to partition the image into k groups. K groups are formed based on the similarity of color intensity or on the minimum variance from the centroid to the target pixel.

Single/Multiple Object Categorization
The object categorization field opens a lot of challenges for researchers in the form of finding the location of each object, identifying and describing the interactions among objects, identifying occluding objects, and delineating groups for meaningful outcomes. In Wong et al. [16] proposed an algorithm for detecting an object online and a classification of the various objects in the image. They suggested fast tracking all the objects in the scene via kernel learning instead of depending on prior knowledge of the specific object. Their implementation was performed on a Neovision2 tower benchmark dataset, which was a biologically inspired implementation that determined the shape and the movement of an object. In Sumbul et al. [17] devised the methods which included the attention of a multisource region network that calculated the pre-source feature illustration and assigned attention scores to member regions tested around the demanded object positions by utilizing their representations. They used multispectral techniques that achieved accuracies up to 64.2%. In Martin et al. [18] designed a Bayesian inference model to examine prior knowledge of each object for multiple object tracking. Then, it updated the possible mass function for closer object discrimination and applied a rate of convergence for correct classification. In Lecumberry et al. [19] computed a shape similarity measure and the steepest descent minimization method for modeling each object's shape iteration. They used energy optimization for the automatic classification of multiple objects.

Scene Classification
Similarly, scene classification is a domain that provides new directions such as complicated scene contents/labels due to major ambiguities [20], similar objects properties among different scenes, and multi-instance learning in confused scenes. In Shi et al. [21] proposed a context-based saliency detection algorithm that marks saliency regions in images. They used a CNN model to construct feature points tested on five datasets, i.e., LabelMe, UIUC-Sports, Scene-15, MIT67, and SUN which produced effective results only with indoor scenes. In Zhang et al. [22] proposed the MVFL-VC method along with labeled object categorization algorithms. On the other hand, a mapping function was used to find the correlation with their labels in images. In Zhou et al. [23] proposed a simple method for indoor-outdoor scene classification, which included a bag-of-features model to construct multiple resolution images and highlighted it with dense regions. Then, partition modalities were used to produce better results for scene classification.
In Hayat et al. [24] introduced an indoor scene categorization method based on large-scale spatial layout, scale variations and rich feature descriptors for multiple distinct objects. In addition, tailored feature representations were learned by a Convolution Neural Network to effectively adopt large-scale classification. In Zou et al. [25] proposed an effective scene classification approach where fusion Sensors 2020, 20, 3871 4 of 20 of local/global spatial features were adopted as collaborative representation. These features were processed by multiscale completed local binary patterns, Gabor features and SIFT patterns. Finally, they implemented Kernel collaborative classification for scene discrimination. In Ismail et al. [26] proposed a method consisting of two steps for indoor scene classification. Initially, spatial layout estimation was performed to estimate three orthogonal vanishing points and then the relationships between scene elements were represented by a layout estimation method to retrieve a high scene classification score.

Overview of Solution Framework
In this section, we propose a novel scene classification approach along with object categorization that accurately recognizes and labels all target objects presented in the scene. The proposed scene classification system starts with preprocessing and clearing unwanted information such as noise contents and with the normalization of object sizes for all images in the datasets. Then, the extracted data are applied to accurate object segmentation based on two distinct segmentation algorithms: modified fast super-pixel based fuzzy c-means clustering and mean shift segmentation algorithms. Multiple objects categorization is performed by considering multiple kernel learning. Finally, the proposed system achieves scene classification by using the EIOU score and McLR. Figure 1 presents an overview of the proposed scene classification system. large-scale classification. In Zou et al. [25] proposed an effective scene classification approach where fusion of local/global spatial features were adopted as collaborative representation. These features were processed by multiscale completed local binary patterns, Gabor features and SIFT patterns. Finally, they implemented Kernel collaborative classification for scene discrimination. In Ismail et al. [26] proposed a method consisting of two steps for indoor scene classification. Initially, spatial layout estimation was performed to estimate three orthogonal vanishing points and then the relationships between scene elements were represented by a layout estimation method to retrieve a high scene classification score.

Overview of Solution Framework
In this section, we propose a novel scene classification approach along with object categorization that accurately recognizes and labels all target objects presented in the scene. The proposed scene classification system starts with preprocessing and clearing unwanted information such as noise contents and with the normalization of object sizes for all images in the datasets. Then, the extracted data are applied to accurate object segmentation based on two distinct segmentation algorithms: modified fast super-pixel based fuzzy c-means clustering and mean shift segmentation algorithms. Multiple objects categorization is performed by considering multiple kernel learning. Finally, the proposed system achieves scene classification by using the EIOU score and McLR. Figure  1 presents an overview of the proposed scene classification system.

Preprocessing and Normalization
During preprocessing, images are captured under different conditions such as various lights and environments which produce noise and high intensity values in the images (see Figure 2a). Therefore, to solve these issues, an Adaptive Weighted Median Filter (AWMF) [27] is applied. Such filters use an sliding window which slides over all the images. It uses the local statistic weights of the image for the filtering process. The relative weights , of the pixels (i, j) are calculated as: where indicates the weight of the central pixel of the frame of the filter (i.e., 3 3 or 5 5), "a" is the scaling factor used for the scale of frame of the filter (i.e., 3 or 5) and is Euclidean distance between pixels.
, and , are the mean and variance of the sliding window respectively. , and , are achieved as follows:

Preprocessing and Normalization
During preprocessing, images are captured under different conditions such as various lights and environments which produce noise and high intensity values in the images (see Figure 2a). Therefore, to solve these issues, an Adaptive Weighted Median Filter (AWMF) [27] is applied. Such filters use an M × N sliding window which slides over all the images. It uses the local statistic weights of the image for the filtering process. The relative weights W i,j of the pixels (i, j) are calculated as: where W 0 indicates the weight of the central pixel of the frame of the filter (i.e., 3 × 3 or 5×5), "a" is the scaling factor used for the scale of frame of the filter (i.e., 3 or 5) and D is Euclidean distance between pixels. U x,y and V x,y are the mean and variance of the M × N sliding window respectively. U x,y and V x,y are achieved as follows: Sensors 2020, 20, 3871 5 of 20 Figure 2 demonstrates the preprocessing steps which include both noisy images and filtered images.
Sensors 2020, 20, x FOR PEER REVIEW 5 of 20 Figure 2 demonstrates the preprocessing steps which include both noisy images and filtered images.

Single/Multiple Object Segmentation
This section provides a detailed description of single/multiple object segmentation. Object segmentation is a process in which an image is split into multiple regions. Segmentation can be achieved according to similarities in pixels or colors in a scene. As different scenes contain multiple regions, the delineation or demarcation of these regions through segmentation is a significant but challenging process in scene classification. Accuracy in segmentation greatly influences accuracy and consistency in scene classification. Images are segmented into multiple regions which are labeled with different colors. To process object segmentation, two robust segmentation methods are considered as, (i) Modified fast super-pixel based fuzzy c-means clustering image segmentation (MFCS) and (ii) mean shift segmentation (MSS).

Modified Fast Super-pixel Based Fuzzy C-Mean Segmentation (MFCS)
Using the MFCS clustering algorithm, we achieved improved color image segmentation results compared to conventional FCM [28] methods. At the start of the process, overlapping elements are identified and pixels are taken as data points similar to the clustering approach. Then, each pixel that reveals fuzzy logic is considered to belong to more than one cluster rather than to just one defined cluster. The MFCS achieves the segmentation of the image by minimizing the objective function during iterations. In addition, these elements restrict optimal clusters of images by minimizing the weights within the clusters through a squared error objective function , which is formulated as: where represents the number of clusters, is the data points having r any real numbers in cluster which show the fuzziness of the resulting cluster, represents the membership of pixels of data in the cluster and which shows the cluster center:

Single/Multiple Object Segmentation
This section provides a detailed description of single/multiple object segmentation. Object segmentation is a process in which an image is split into multiple regions. Segmentation can be achieved according to similarities in pixels or colors in a scene. As different scenes contain multiple regions, the delineation or demarcation of these regions through segmentation is a significant but challenging process in scene classification. Accuracy in segmentation greatly influences accuracy and consistency in scene classification. Images are segmented into multiple regions which are labeled with different colors. To process object segmentation, two robust segmentation methods are considered as, (i) Modified fast super-pixel based fuzzy c-means clustering image segmentation (MFCS) and (ii) mean shift segmentation (MSS).

Modified Fast Super-pixel Based Fuzzy C-Mean Segmentation (MFCS)
Using the MFCS clustering algorithm, we achieved improved color image segmentation results compared to conventional FCM [28] methods. At the start of the process, overlapping elements are identified and pixels are taken as data points similar to the clustering approach. Then, each pixel that reveals fuzzy logic is considered to belong to more than one cluster rather than to just one defined cluster. The MFCS achieves the segmentation of the image by minimizing the objective function during iterations. In addition, these elements restrict optimal clusters of images by minimizing the weights within the clusters through a squared error objective function J M (U, V) which is formulated as: where c represents the number of clusters, n is the data points having r any real numbers in i th cluster which show the fuzziness of the resulting cluster, u r ij represents the membership of x j pixels of data in the i th cluster and v i which shows the cluster center: Sensors 2020, 20, 3871 6 of 20 J M (U, V) is used to measure the distance between the corresponding pixel and the cluster center. The corresponding pixel is assigned with high value of membership when the distance between the pixel and the cluster center is minimum. The conventional FCM algorithm works on the local spatial information of pixels in images such that all neighboring regions of pixels cause high computation complexity due to analysis of spatial values at each iteration. Therefore, the proposed algorithm uses super pixel-based pre-segmentation [29] and density-based spatial clustering with noise (DBSCN) to decrease the computational complexity of Conventional FCM. Figure 3 presents the results of super pixel-based pre-segmentation. The proposed method achieved the segmentation of the color image in a few seconds on the MatLab platform running on an Intel(R) CPU 2.5 GHz core-i5 CPU 2.5 GHz and 8 GB of RAM (Intel, Santa Clara, CA, USA).
Sensors 2020, 20, x FOR PEER REVIEW 6 of 20 , is used to measure the distance between the corresponding pixel and the cluster center. The corresponding pixel is assigned with high value of membership when the distance between the pixel and the cluster center is minimum. The conventional FCM algorithm works on the local spatial information of pixels in images such that all neighboring regions of pixels cause high computation complexity due to analysis of spatial values at each iteration. Therefore, the proposed algorithm uses super pixel-based pre-segmentation [29] and density-based spatial clustering with noise (DBSCN) to decrease the computational complexity of Conventional FCM. Figure 3 presents the results of super pixel-based pre-segmentation. The proposed method achieved the segmentation of the color image in a few seconds on the MatLab platform running on an Intel(R) CPU 2.5 GHz core-i5 CPU 2.5 GHz and 8 GB of RAM (Intel, Santa Clara, CA, USA). for each data point in an image do 5: Step 1. Measure the membership of given data point to clusters 6: Step 2. Update the cluster centers 7: end for 8: end while   The set of data points are shown as x i = x 1 , . . . , x n , and v i = v 1 , . . . , v c shows the set of cluster centers and r (any real numbers) shows the fuzziness of resulting clusters. The proposed MFCS, Algorithm 1, is carried out in steps, and the pseudo code of the MFCS algorithm is given as follows: for each data point in an image do 5: Step 1. Measure the membership u i j of given data point to clusters c 6: Step 2. Update the cluster centers v i 7: end for 8: end while , is used to measure the distance between the corresponding pixel and the cluster center. The corresponding pixel is assigned with high value of membership when the distance between the pixel and the cluster center is minimum. The conventional FCM algorithm works on the local spatial information of pixels in images such that all neighboring regions of pixels cause high computation complexity due to analysis of spatial values at each iteration. Therefore, the proposed algorithm uses super pixel-based pre-segmentation [29] and density-based spatial clustering with noise (DBSCN) to decrease the computational complexity of Conventional FCM. Figure 3 presents the results of super pixel-based pre-segmentation. The proposed method achieved the segmentation of the color image in a few seconds on the MatLab platform running on an Intel(R) CPU 2.5 GHz core-i5 CPU 2.5 GHz and 8 GB of RAM (Intel, Santa Clara, CA, USA). for each data point in an image do 5: Step 1. Measure the membership of given data point to clusters 6: Step 2. Update the cluster centers 7: end for 8: end while

Mean Shift-Based Segmentation (MSS)
The proposed system achieves the segmentation of an image in multiple regions using the Mean Shift Segmentation [30] algorithm. The MSS algorithm searches for the highest concentration of similar pixels space in the sample image and estimates the local density of pixels. MSS then performs density estimation iteratively and finds the minimum local value for density [31] so that all pixels having local density near to local minimum density are easily shifted to clusters of similar attributes (see Figure 5). This is a non-parametric clustering technique which does not depend on any prior knowledge of the objects or picture elements. Therefore, it can find cluster centers quickly and perform efficient object segmentation. Meanwhile, the proposed system uses kernel density estimation to find the minimum local value of density. Such kernel density k E (x) of window function is estimated at D dimensional space S D for n pixels x j , j = 1, 2, 3, . . . , n at a location of x can be determined as: where h x j is the width of kernel density (window function) which can be determined as: where d x j is probability density function of given pixels space and h is a constant. Kernel density (window function) K(x) satisfies the given condition as: Sensors 2020, 20, x FOR PEER REVIEW 7 of 20

Mean Shift-Based Segmentation (MSS)
The proposed system achieves the segmentation of an image in multiple regions using the Mean Shift Segmentation [30] algorithm. The MSS algorithm searches for the highest concentration of similar pixels space in the sample image and estimates the local density of pixels. MSS then performs density estimation iteratively and finds the minimum local value for density [31] so that all pixels having local density near to local minimum density are easily shifted to clusters of similar attributes (see Figure 5). This is a non-parametric clustering technique which does not depend on any prior knowledge of the objects or picture elements. Therefore, it can find cluster centers quickly and perform efficient object segmentation. Meanwhile, the proposed system uses kernel density estimation to find the minimum local value of density. Such kernel density of window function is estimated at dimensional space for pixels , = 1, 2, 3, . . . , at a location of can be determined as: where h is the width of kernel density (window function) which can be determined as: where is probability density function of given pixels space and is a constant. Kernel density (window function) satisfies the given condition as: xk x dx = 0 (11) Thus, the proposed system analyses the results of MFCS and MSS algorithms with respect to segmentation accuracies along with ground truths and computation time efficiency. MFSC takes less computation time and produces clearer results compared to MSS. MFCS performance is more significant and better than MSS, therefore we used MFCS results for further experiments. Figure 6 indicates the comparison between the MFCS and MSS. The segmentation accuracies are evaluated by comparing the results with given ground truths of all classes from the dataset. Evaluation is carried out on the basis of pixels of segmented objects and ground truths. Table 1 indicates segmented object accuracies after comparing them with the ground truth labels. Thus, the proposed system analyses the results of MFCS and MSS algorithms with respect to segmentation accuracies along with ground truths and computation time efficiency. MFSC takes less computation time and produces clearer results compared to MSS. MFCS performance is more significant and better than MSS, therefore we used MFCS results for further experiments. Figure 6 indicates the comparison between the MFCS and MSS. The segmentation accuracies are evaluated by comparing the results with given ground truths of all classes from the dataset. Evaluation is carried out on the basis of pixels of segmented objects and ground truths. Table 1 indicates segmented object accuracies after comparing them with the ground truth labels.  Mean Segmentation Accuracy = 86.77 % fl = flower; bo = boat; sh = sheep; do = dog; ca = car; co = cow; bi = bird; ro = road; bd = body; gr = grass; ch = chair; du = duck; bu = building; sk = sky; tr = tree; si = sign; ct = cat; wt = water; bc = bicycle; bk = book. On the other hand, Tables 2 and 3 define the total computational time of the proposed method such as MFCS and MSS algorithms over MSRC and Corel-10k datasets, respectively.  On the other hand, Tables 2 and 3 define the total computational time of the proposed method such as MFCS and MSS algorithms over MSRC and Corel-10k datasets, respectively.

Object Categorization
In this section, the proposed system used the Multiple Kernel Learning (MKL) method [32] to achieve multiple object categorization based on multiple regions and signatures of the regions in complex scenes. In object categorization, an image j (containing clusters c of multiple objects represented by different colors obtained by the segmentation process) is initially set for local descriptor D j (i.e., SIFT, HOG) and defines the region R of the image j. The signature x j is computed using a function f R from local descriptors D j as f R : D j → x j . This conversion of f R is mathematically derived as follows: where Cen c is used for the center of clusters c, |c| represents the total descriptors in the clusters c of all the images of a class, descriptors of image j that belong to cluster c are shown as D icj and µ c represents the mean of centered descriptors that belong to clusters c. µ j,c represents the computation of the signature of an image j. Then µ j,c is converted into a vector vec j,C . The signature vector x j of image j for all clusters c is computed by the concatenation of all vec j,C vec j = vec j,1 . . . vec j,C Figure 7 indicates the results of HOG and SIFT descriptors. These descriptors of defined region R are operated using a deformable parts model [33]. It produces multiple regions by drawing rectangular bounding boxes [34] over the images. The proposed system only uses bounding box regions with maximum scores given by the detector. These rectangular bounding boxes are used to indicate the regions of different foreground objects. Sensors 2020, 20, x FOR PEER REVIEW 10 of 20 After defining accurate regions of objects within the image, similarity based on the signature (extracted vectors) of this region in i and j images is measured using kernel function as: However, an image holds multiple regions to achieve similarity over the entire image. Therefore, the proposed system computes similarity as: where ω is associated with weights of multiple regions. Figure 8 illustrates the objects categorization method using multiple kernel learning.

Scene Classification
After multiple object categorization, the labeled information is further used for scene classification. This includes two significant approaches, (1) Expected Intersection over Union (EIOU) [35] score and (2) Multi-class Logistic Regression (McLR) [36]. EIOU is measured for the foreground objects and McLR is used to solve the multi-class classification problem which recognizes scenes in the images. After defining accurate regions R of objects within the image, similarity based on the signature (extracted vectors) of this region R in i and j images is measured using kernel function k R as: However, an image holds multiple regions to achieve similarity over the entire image. Therefore, the proposed system computes similarity as: where ω R is associated with weights of multiple regions. Figure 8 illustrates the objects categorization method using multiple kernel learning. After defining accurate regions of objects within the image, similarity based on the signature (extracted vectors) of this region in i and j images is measured using kernel function as: However, an image holds multiple regions to achieve similarity over the entire image. Therefore, the proposed system computes similarity as: where ω is associated with weights of multiple regions. Figure 8 illustrates the objects categorization method using multiple kernel learning.

Scene Classification
After multiple object categorization, the labeled information is further used for scene classification. This includes two significant approaches, (1) Expected Intersection over Union (EIOU) [35] score and (2) Multi-class Logistic Regression (McLR) [36]. EIOU is measured for the foreground objects and McLR is used to solve the multi-class classification problem which recognizes scenes in the images.

Scene Classification
After multiple object categorization, the labeled information is further used for scene classification. This includes two significant approaches, (1) Expected Intersection over Union (EIOU) [35] score and (2) Multi-class Logistic Regression (McLR) [36]. EIOU is measured for the foreground objects and McLR is used to solve the multi-class classification problem which recognizes scenes in the images.

Expected Intersection over Union score (EIOU)
The EIOU score is used to indicate how accurately we have predicted the objects and the regions of predicted objects. The EIOU score is given to all foreground objects in the images of all scenes by the proposed system and the scene is classified based on the EIOU of the foreground objects. To examine the EIOU function, we used the multiple objects y j , their locations and the predicted objects y j . The Expected Intersection over Union U EIOU are achieved as follows: where C is the number of classes and U (C) iou is defined as: where y j 1, . . . , C ∀ j ∈ V and V shows all pixels set in all images. 1 {y j −k∧y j −C} represents the indicator function which gives the 1 if y j − k ∧ y j − C is true otherwise it gives 0. The ratio of the sum of pixels represents the value of U (C) iou as the EIOU score of objects. The computed EIOU score is shown over the objects as in Figure 9.

Expected Intersection over Union score (EIOU)
The EIOU score is used to indicate how accurately we have predicted the objects and the regions of predicted objects. The EIOU score is given to all foreground objects in the images of all scenes by the proposed system and the scene is classified based on the EIOU of the foreground objects. To examine the EIOU function, we used the multiple objects , their locations and the predicted objects . The Expected Intersection over Union are achieved as follows: where is the number of classes and iou is defined as: where 1, … . . , ∀ ∈ and shows all pixels set in all images. 1 ∧ represents the indicator function which gives the 1 if − ∧ − is true otherwise it gives 0. The ratio of the sum of pixels represents the value of iou as the EIOU score of objects. The computed EIOU score is shown over the objects as in Figure 9.

Multi-Class Logistic Regression (McLR)
McLR is used for the classification of a whole scene based on multiple objects and their features. If there are multiple classes, McLR predicts the probability of given class x belongs to (i.e., all classes of datasets). During McLR, a classifier is designed to distinguish multiple = 1,2, … classes having L labeled training images using the feature vector as input. The L labels of all training images are = , , … , , and the posterior class distribution (PCD) is achieved for the estimation of the logistic regressor. Figure 10 shows the systematic flow of multi-class logistic regression.

Multi-Class Logistic Regression (McLR)
McLR is used for the classification of a whole scene based on multiple objects and their features. If there are multiple classes, McLR predicts the probability of given class x belongs to j th (i.e., all classes of datasets). During McLR, a classifier is designed to distinguish multiple c = 1, 2, . . . K classes having L labeled training images using the feature vector as input. The L labels of all training images are T L = (x 1 , z 1 ), . . . , (x L , z L ) and the posterior class distribution (PCD) is achieved for the estimation of theω logistic regressor. Figure 10 shows the systematic flow of multi-class logistic regression.

Expected Intersection over Union score (EIOU)
The EIOU score is used to indicate how accurately we have predicted the objects and the regions of predicted objects. The EIOU score is given to all foreground objects in the images of all scenes by the proposed system and the scene is classified based on the EIOU of the foreground objects. To examine the EIOU function, we used the multiple objects , their locations and the predicted objects . The Expected Intersection over Union are achieved as follows: where is the number of classes and iou is defined as: where 1, … . . , ∀ ∈ and shows all pixels set in all images. 1 ∧ represents the indicator function which gives the 1 if − ∧ − is true otherwise it gives 0. The ratio of the sum of pixels represents the value of iou as the EIOU score of objects. The computed EIOU score is shown over the objects as in Figure 9.

Multi-Class Logistic Regression (McLR)
McLR is used for the classification of a whole scene based on multiple objects and their features. If there are multiple classes, McLR predicts the probability of given class x belongs to (i.e., all classes of datasets). During McLR, a classifier is designed to distinguish multiple = 1,2, … classes having L labeled training images using the feature vector as input. The L labels of all training images are = , , … , , and the posterior class distribution (PCD) is achieved for the estimation of the logistic regressor. Figure 10 shows the systematic flow of multi-class logistic regression. The McLR is achieved as follows: where w (c) is used as a logistic regressor for class c, the feature vectors are shown as x = x 1 , . . . , x j and set logistic regressors are shown as w (c) = w (c) for class c. The posterior class probability of regressor w is achieved as follows: where is used as a logistic regressor for class c, the feature vectors are shown as = , … , and set logistic regressors are shown as = , … , for class c. The posterior class probability of regressor is achieved as follows:

Experimental Setup and Evaluation
In this section, we present details of the experimental setup and evaluation. Object segmentation accuracy and computation time are used for performance evaluation of the proposed system for challenging indoor and outdoor datasets. We used Matlab to carry-out the experiments with a hardware system using an Intel Core i3 CPU of 2.5 GHz and 8 GB of RAM. To evaluate the performance of the proposed scene classification system, we used three different datasets: MSRC [37], Corel-10k [38] and CVPR 67 [39] datasets. For the training/testing of datasets, we used a leave-one-out-cross validation method. For the training and testing set, datasets are split into 1 and n-1 observation sets for testing and training respectively. Then, prediction weights are observed for each observation set. All the details of each dataset, their experimental results and comparisons of the proposed scene classification method with other state-of-the-art scene classification methods are given below.

Experimental Setup and Evaluation
In this section, we present details of the experimental setup and evaluation. Object segmentation accuracy and computation time are used for performance evaluation of the proposed system for challenging indoor and outdoor datasets. We used Matlab to carry-out the experiments with a hardware system using an Intel Core i3 CPU of 2.5 GHz and 8 GB of RAM. To evaluate the performance of the proposed scene classification system, we used three different datasets: MSRC [37], Corel-10k [38] and CVPR 67 [39] datasets. For the training/testing of datasets, we used a leave-one-out-cross validation method. For the training and testing set, datasets are split into 1 and n-1 observation sets for testing and training respectively. Then, prediction weights are observed for each observation set. All the details of each dataset, their experimental results and comparisons of the proposed scene classification method with other state-of-the-art scene classification methods are given below.

Experimental Setup and Evaluation
In this section, we present details of the experimental setup and evaluation. Object segmentation accuracy and computation time are used for performance evaluation of the proposed system for challenging indoor and outdoor datasets. We used Matlab to carry-out the experiments with a hardware system using an Intel Core i3 CPU of 2.5 GHz and 8 GB of RAM. To evaluate the performance of the proposed scene classification system, we used three different datasets: MSRC [37], Corel-10k [38] and CVPR 67 [39] datasets. For the training/testing of datasets, we used a leave-one-out-cross validation method. For the training and testing set, datasets are split into 1 and n-1 observation sets for testing and training respectively. Then, prediction weights are observed for each observation set. All the details of each dataset, their experimental results and comparisons of the proposed scene classification method with other state-of-the-art scene classification methods are given below.

MSRC Dataset
In the MSRC dataset, we are dealing with 591 scene images. We used twenty classes for the experimental evaluation: flower, boat, sheep, dog, car, chair, cow, bird, road, body, grass, building, sky, tree, sign, cat, water, bicycle, book and duck. Figure 13 shows example images from the MSRC dataset. Such dataset is comprised of various complicated scene images with the resolution of 213 × 320 having various objects. In the MSRC dataset, we are dealing with 591 scene images. We used twenty classes for the experimental evaluation: flower, boat, sheep, dog, car, chair, cow, bird, road, body, grass, building, sky, tree, sign, cat, water, bicycle, book and duck. Figure 13 shows example images from the MSRC dataset. Such dataset is comprised of various complicated scene images with the resolution of 213 × 320 having various objects.

Corel-10k Dataset
The Corel-10k dataset contains 10,000 scene images, which include multiple classes and have challenging images of different sizes and backgrounds. We performed experimental evaluations over twenty classes which included rhino, deer, car, water, building, elephant, plane, tree, tiger, bike, wolf, dog, boat, flower, bear, sky, land, cat, bird and fish. Figure 14 presents example images of the Corel-10k dataset.

CVPR 67 indoor Scene Dataset
CVPR 67 dataset contains 67 indoor scene classes and 15,620 total images, each class consisting of 100 scene images. We performed experimental evaluation on all classes of indoor scenes (i.e., kitchen, bedroom, bathroom, corridor, elevator, locker-room, waiting-room, dining-room, game-room and garage). Figure 15 presents some example images of the CVPR 67 indoor scene dataset.

Corel-10k Dataset
The Corel-10k dataset contains 10,000 scene images, which include multiple classes and have challenging images of different sizes and backgrounds. We performed experimental evaluations over twenty classes which included rhino, deer, car, water, building, elephant, plane, tree, tiger, bike, wolf, dog, boat, flower, bear, sky, land, cat, bird and fish. Figure 14 presents example images of the Corel-10k dataset. In the MSRC dataset, we are dealing with 591 scene images. We used twenty classes for the experimental evaluation: flower, boat, sheep, dog, car, chair, cow, bird, road, body, grass, building, sky, tree, sign, cat, water, bicycle, book and duck. Figure 13 shows example images from the MSRC dataset. Such dataset is comprised of various complicated scene images with the resolution of 213 × 320 having various objects.

Corel-10k Dataset
The Corel-10k dataset contains 10,000 scene images, which include multiple classes and have challenging images of different sizes and backgrounds. We performed experimental evaluations over twenty classes which included rhino, deer, car, water, building, elephant, plane, tree, tiger, bike, wolf, dog, boat, flower, bear, sky, land, cat, bird and fish. Figure 14 presents example images of the Corel-10k dataset.

CVPR 67 indoor Scene Dataset
CVPR 67 dataset contains 67 indoor scene classes and 15,620 total images, each class consisting of 100 scene images. We performed experimental evaluation on all classes of indoor scenes (i.e., kitchen, bedroom, bathroom, corridor, elevator, locker-room, waiting-room, dining-room, game-room and garage). Figure 15 presents some example images of the CVPR 67 indoor scene dataset.

Experimental Results
For experiments, mean classification accuracy and comparison with existing methods were investigated by considering the indoor-outdoor scenes of all images. The proposed system achieved sufficiently informative enough results due to robust object segmentation techniques (i.e., MFCS and MSS) which reflect better performance in scene classification.

Experiment 1: Using the MSRC Dataset
Considering the MSRC dataset, the proposed system was applied for scene classification accuracy. Table 4 shows that the major scene classes of the MSRC dataset produce remarkable performance in terms of accuracy. Table 5 summarizes the comparison of classification accuracy of the proposed method and it shows significantly better results (88.75%) than all other state-of-the-art methods.

Experiment 2: Using the Corel-10k Dataset
During experiments using the Corel-10k dataset, the proposed method is used with 20 different scenes and it obtained the highest classification accuracy score (85.75%) as shown in Table 6. Similarly, Table 7 shows that the proposed method has significantly higher recognition accuracy than the other state-of the-art methods such as VLAD, TNNV and LLC.

Experiment 3: Using the CVPR 67 Indoor Scene Dataset
In the experimental evaluation using the CVPR 67 indoor scene data, the proposed method achieved scene classification accuracy of (80.02%) over 10 different classes of the CVPR 67 indoor scene dataset. The accuracy of the CVPR 67 dataset is less than the MSRC and the Corel-10k dataset caused by multiple occluded objects in different real-world scenes used in the dataset. When an object is hidden behind other objects, it is difficult to recognize it due to this occlusion effect. Table 8 shows the confusion matrix of classification using the CVPR 67 dataset.

Experimental Results
For experiments, mean classification accuracy and comparison with existing methods were investigated by considering the indoor-outdoor scenes of all images. The proposed system achieved sufficiently informative enough results due to robust object segmentation techniques (i.e., MFCS and MSS) which reflect better performance in scene classification.

Experiment 1: Using the MSRC Dataset
Considering the MSRC dataset, the proposed system was applied for scene classification accuracy. Table 4 shows that the major scene classes of the MSRC dataset produce remarkable performance in terms of accuracy. Table 5 summarizes the comparison of classification accuracy of the proposed method and it shows significantly better results (88.75%) than all other state-of-the-art methods.

Experiment 2: Using the Corel-10k Dataset
During experiments using the Corel-10k dataset, the proposed method is used with 20 different scenes and it obtained the highest classification accuracy score (85.75%) as shown in Table 6. Similarly, Table 7 shows that the proposed method has significantly higher recognition accuracy than the other state-of the-art methods such as VLAD, TNNV and LLC.

Experiment 3: Using the CVPR 67 Indoor Scene Dataset
In the experimental evaluation using the CVPR 67 indoor scene data, the proposed method achieved scene classification accuracy of (80.02%) over 10 different classes of the CVPR 67 indoor scene dataset. The accuracy of the CVPR 67 dataset is less than the MSRC and the Corel-10k dataset caused by multiple occluded objects in different real-world scenes used in the dataset. When an object is hidden behind other objects, it is difficult to recognize it due to this occlusion effect. Table 8 shows the confusion matrix of classification using the CVPR 67 dataset.

Methods Classification Accuracy (%)
Bayesian model [40] 82.9 Scene classification using machine performance [41] 81.0 Scene classification with weighted method [42] 84.7 Proposed Method 88.75 Table 6. Confusion matrix of accuracy for object classification of outdoor scenes for the proposed approach using the Corel-10k dataset.  Table 7. Comparison of the proposed method with other state-of-the art methods using the Corel-10k dataset.

Conclusions
In this work, we proposed a new effective scene classification system that segments single/multiple objects and classifies complex indoor-outdoor scenes. With the proposed system, object segmentation problems were explored using two robust algorithms-MFCS and MSS. In addition, object similarity was examined by multiple kernel learning. Logistic regression was used for complex scene classification. Experimental evaluations reveal that our proposed system consistently outperforms others state-of-art systems in terms of computation, segmentation and accuracy.
In future research work, we will analyze scenery images in depth to improve the accuracy of scene classification and we will work to decrease the computational complexity of scene classification. We will work in future on deep learning for indoor-outdoor scene classification to further improve classification accuracy and to expand the applicability of our work.