Machine Learning Classiﬁers Evaluation for Automatic Karyogram Generation from G-Banded Metaphase Images

Featured Application: Results of the methodology described in this work are part of an automatic system to generate a cytogenetic report for the Laboratory of Cytogenetics of the Children’s Hospital of Tamaulipas. Abstract: This work proposes the evaluation of a set of algorithms of machine learning and the selection of the most appropriate one for the classiﬁcation of segmented chromosomes images acquired using the Giemsa staining technique (G-banding). The evaluation and selection of the best classiﬁcation algorithms was carried out over a dataset of 119 Q-banding chromosomes images, and the obtained results were then applied to a dataset of 24 G-band chromosomes images, manually classiﬁed by an expert of the Laboratory of Cytogenetic of the Children’s Hospital of Tamaulipas. The results of evaluation of 51 classiﬁers yielded that the best classiﬁcation accuracy for the selected features was obtained by a backpropagation neural network. One of the main contributions of this study is the proposal of a two-stage classiﬁcation scheme based on the best classiﬁer found by the initial evaluation. In stage 1, chromosome images are classiﬁed into three major groups. In stage 2, the output of phase 1 is used as the input of a multiclass classiﬁer. Using this scheme, 82% of the IGB bank samples and 88% of the samples of a bank of images obtained with a Q-band available in the literature consisting of 119 chromosome studies were successfully classiﬁed. The proposed work is a part of an desktop application that allows cytogeneticist to automatically generate cytogenetic reports.


Introduction
Chromosome analysis is an essential task that is performed in hospitals and specialized clinical laboratories by cytogeneticists, in order to promptly diagnose cancer and genetic abnormalities. This analysis is based on a karyotype; this is the graphical classification of chromosomes over the photography of a cell during the metaphase, a stage of the mitosis. In metaphase, the chromosomes are easily observable through an optical microscope [1,2].
The reported work in this paper is not the first attept to compare the performance of various machine learning methods for medical image based diagnosis. In Reference [3], feature evaluation from structural magnetic resonance images is proposed. In Reference [4], machine learning methods were evaluated to diagnose parkinson's disease (PD) based on voice patterns. In Reference [5], a performance  Manual karyogram construction is a complex task demanding time and expertise. Nowadays, several efforts have been done to create automatic systems for dealing with computer-based karyotyping [2,[6][7][8][9][10][11][12][13]. Several studies implement techniques from machine learning such as Support Vector Machines [7,8,[14][15][16], Nearest Neighbor Algorithms [17,18], Wavelets [19], Bayesian techniques [20,21] and mainly, Artificial Neural Networks [19,[22][23][24][25]. Nevertheless, the automatic computer image-based karyotyping is an open research topic. One of the main problems arises when the image contains overlapping or touching chromosomes, because each chromosome must be cut individually to present the karyogram. Another important aspect to segment and classify the chromosomes is the applied staining technique to acquire the microscopic image. The most common staining techniques are G-band, C-band, R-band and Q-band, named upon the used stain in the cellular culture [2].
Q-banding is the first technique that was used in chromosome studies. There exists at least one public image database that has been used to develop computational systems to automatically build karyograms. One of these database was made available in Reference [26], and is composed of 119 chromosome studies, including their karyotype. This database was formed in well controlled conditions, producing homogeneous images across every study, and is ideal to test automatic karyotyping systems. Nonetheless, the Q-banding technique is not very common nowadays because of the high cost of the required staining materials and equipment, as well as the lighting requirement to avoid fast vanishing of the staining effects [27].
After Q-banding, the G-banding (Giemsa stain) technique emerged, and it is currently popular because the required equipment is less costly compared to the Q-banding, preserving the stain over the same regions of the chromosomes. Even when, in the literature, the Q-banding has been used extensively for proposing automatic karyotyping systems [8,14,28,29], G-banding is the staining technique used in the Children's Hospital of Tamaulipas (CHT) because of its low cost and high availability. Nowadays, there are no public databases of karyotypes images or systems to automatically build karyograms using this technique. The computer-based automatic construction of karyograms performs two complementary and sequential tasks: image segmentation and segmented chromosomes classification [30,31]. There are several works that segment and classify Q-band chromosomes images with varying results [8,26,32,33].
It is important to define the optimal chromosome features to obtain a good accuracy. In the literature, shape description, length, centromere position, and the banding pattern have been used as chromosomes descriptors [12,18,30,31,[34][35][36]. In Reference [8,33], the authors annotate that these characteristics are useful to determine if the chromosome was correctly segmented, but they can not be used as the only descriptors of the chromosome. They also propose to use the band pattern profile and the intensity levels along the medial axis of the chromosome. In Reference [33], the authors approximate the medial axis using transversal lines along the chromosome with different orientations. In a similar way, in Reference [26], the medial axis is computed using transversal lines along the skeleton of the chromosome. They begin by selecting a point on one of the ends of the chromosome; then, transversal lines are traced at increasing angles until the full circle is covered. The line with the shorter length is selected and its midpoint is identified as the first coordinate of the medial axis. This process is repeated until the second endpoint is reached. Finally, the obtained coordinates are smoothed using splines.
Through the analysis of the medial axis it is possible to obtain the length of the chromosome, and some other features, as the gray level intensities along this axis, and its corresponding band profile [18,34]. In Reference [8], a feature vector is constructed using the chromosome length, the gray intensity levels along 98 transversal lines traced over the medial axis, and the chromosome area, computed by counting the active pixels of the binary chromosome image. The methodology presented in Reference [26] uses a vector composed of 131 features, including area, length, perimeter and 64 gray level intensities extracted over normal lines to the medial axis of the chromosome. These vectors are normalized, making possible to compare chromosomes from different images and improving the performance of the classifier. Reference [36] proposes the use of features inspired by a human expert's classification method such as width, position and the average intensity of the two most eye-catching regions of each chromosome to improve the classification accurate.
Concerning the automatic classification of chromosomes to build a karyogram, in Reference [37], the authors propose the use of three classification techniques-(a) a backpropagation neural network, (b) fuzzy-logic rules and, (c) euclidean-distance-based template matching, reporting an Mean Squared Error accuracy of 94%, 93% and 95%, respectively, using 13 intensity values extracted along the medial axis. However, in the work in Reference [26], a neural network was used, obtaining an accuracy of 94% for the chromosomes, increasing to 64 intensity values in a similar way to Reference [37] but using a k-fold cross validation as a accuracy measurement with k = 3. In Reference [33], an accuracy of 90% using a gray-level-based similarity measure was achieved, but this time with k = 5. All these three approaches use Q-band chromosome images. In the existing literature, it can be observed that they use different measurements to evaluate the performance of their methods and that a standard measurement does not exist [9]. In this research, a k-fold cross validation with k = 10 is used as a accuracy measurement, with a 90% of acceptability according to the state of the art.
In this paper, a two-stage automatic G-band chromosomes images classifier is proposed. The classifier was selected among 51 machine learning algorithms. Since the available dataset is composed of 24 karyograms segmented using a semi-automatic tool specially developed for this task, it is not possible to use advanced techniques such as Deep Learning, since they require a much bigger dataset than the available one. For this reason, only classical machine learning algorithms were evaluated. To evaluate the 51 classifiers, the image database reported in Reference [26] was used, which was acquired using the Q-banding technique. These images were acquired in a well controlled environment, that favors the classification task. In the other hand, the G-band images obtained at the CHT were not homogeneous, making it difficult to define a classification model with high accuracy. In addition, the Q-banding dataset has at least five times more images that the G-banding dataset. The rest of the paper is organized as follows. In Section 2, materials and methodology are described. In Section 3, results and discussion are presented. Finally, in Section 4, a conclusion is outlined.

Materials
For the development of the proposed evaluation, the following databases were used-

•
A dataset obtained in collaboration with cytogeneticists of the CHT. The database consists of 24 G-banded prometaphase images acquired from 24 different patients and their corresponding manual karyograms. This dataset is under construction, and results have not been published elsewhere. In order to use the data of the Laboratory of Cytogenetics of the CHT, the Research and Education Department of the CHT reviewed and approved the use of the karyotypes in this work, judging that no appreciable risks or ethical issues were encountered, since no personal data associated to the karyotypes were issued. • A dataset retrieved from Laboratory of Biomedical Imaging (BioImLab) from the University of Padova. The database consists of 119 Q-banded prometaphase images and their corresponding manual karyograms acquired from the same number of cells [38].
To evaluate the proposed two-stage classifier scheme and to implement the desktop application for semi-automatic chromosome classification, the following software tools were used: • Matlab R2014 has been used for the preprocessing, segmentation and transformation required in order to separate the chromosomes from the input prometaphase image. This software was alsa used to test multiple MLPs with several neurons in the hidden layer in order to find the optimal network configuration that yields the best performance. • Weka 3.6.7 has been used to train and test classifiers selected for the comparison reported in this work.
To acquire the microscope images and evaluate the machine learning algorithms reported in this work, the following hardware tools were used:

Methods
The classification presented in this work is a part of a sequential process to automatically build a cytogenetic report. Figure 3 shows the required steps to generate this report. The first two steps are related to the chromosome segmentation and feature extraction, which were performed using a semi-automatic tool programmed for this purpose. This tool uses geometry and image processing techniques related to pixel labelling to perform these tasks. The segmentation and feature extraction process is planned to be presented in another paper. This paper is focused on the third step, related to the chromosome classification based on two main stages: a coarse classification where each chromosome is classified in one of three main groups; and a fine classification, where each chromosome in the coarse classification is assigned to one of the 24 chromosome types. Finally, the classification results are used to build the final karyogram.This tool is not addressed in this work

Outline of the Proposed Automatic Classification
To build the multistage classification described in the step 3 of Figure 3, five phases were conceived. Phases 1 to 4 are related to the construction of the classification model, while phase 5 is related to the development of a GUI that integrates the steps presented in Figure 3. Figure 4 shows the required phases to generate a classification model and build a karyogram.

Phase 1. Classifiers Training
During this phase, a set M is generated, whose elements are the classification models, m, that were constructed to classify the chromosome images. m n is an element of the set M, that is m ∈ M. In this sense, m is composed of features, algorithms and architectures. m n is defined as m = (c i , a j , q k ), where c i represents the features, a j the algorithms, and q k the architectures.
Features. The elements of the set C are the extracted features from the segmented chromosomes images. In this way, c i is a subset of the set C, where i is the index of the current subset. In Table 1, the features used in each element of the C set are described. Algorithms. The set of classification algorithms A, is composed of 51 algorithms found in the Weka (Waikato Environment for Knowledge Analysis) platform. Hence, a j represents the subset of algorithms, where a j ⊂ A, and j is the current algorithm. Table 2 summarizes the algorithms from set a 1 . Table 3 summarizes the top rated algorithms a 2 obtained at the end of phase 1.  Architectures. The set Q is composed of binary and multiclass architectures used in the classification. In this way, q k represents a classification architecture, where q k ⊂ Q, and k is the index of the current architecture. Table 4 summarizes the algorithms from sets q 1 and q 2 . As depicted in Figure 4, phase 1 is divided into 5 activities. The first 4 activities are repeated as many times as the number of the required classification models, m n . In the activity number 5, the best element of the set M is selected. These activities are described below: 1.
From the literature, groups of features that can be extracted from Q-band chromosome images where formed. These were listed as the subset of features c 1 , c 2 , c 3 , and c 4 and are presented in Table 1. In this activity, one of the feature groups c i is selected to be extracted from every chromosome in the image database. With these features, two sets are generated in separated datasets to be utilized in training. The first dataset is used to train the classifiers and the second one as test dataset.

2.
In this activity, a group of classification algorithms available in the Weka platform is selected and identified as the set A. These elements conform the subset a 1 . According to the experimental results, the elements of the subset a 1 were modified to form the subset of algorithms, a 2 . The algorithms of subset a 1 are enlisted in Table 2, and for subset a 2 in Table 3.

3.
This activity defines the architectures that will be used in chromosomes classification. An architecture q n represents how the chromosomes are going to be classified in one of the 24 output classes. For example, one architecture could be a multiclass classification, where the chromosome would be assigned directly to 1 of the 24 classes. Another option is to divide the group into autosome and sex chromosomes, and identify if it belongs to one of this groups or not (binary classification). Table 4 summarizes the defined architectures for this activity.

4.
Training and testing of a model m n is performed in this activity. The training and testing dataset, the set of algorithms a j and the classification architecture q k are the elements of the current model, m n . The training and testing accuracies are reported, and they are used as evaluation metric for the next activity.

5.
Once several models were generated (through activities 1 to 4), the best chromosome classification model, m n , is identified, based on the chromosome classification accuracy. Current m n could represent, either the output of phase 1, or the final classification model, m.

Phase 2. Classifier Analysis
In this phase, the algorithm a j from the classification model m obtained in phase 1 is analyzed in order to find a relationship between the results and the configuration of this algorithm. This is an intermediate phase to integrate the selected classifier to the next phase.

Phase 3. Application Development
Here, a GUI is developed to allow the cytogenetist to use a semi-automatic tool to segment the chromosomes in G-band images. Then, the segmented chromosomes are used to build the G-band image database. This tool is also used to extract the chromosome features defined in the classification model, m, that was obtained as a result of phase 1.

Phase 4. Classifiers Training
During phase 1, a classification model, m, was obtained by training the algorithms a j using the Q-band image database. In this phase, the algorithms a j are trained again, following the classification architecture q k obtained in phase 1, but this time using the features extracted from the G-band chromosome images. To validate each model the k-fold cross validation is used with a k = 10.

Phase 5. Application Integration
In this phase, an application consisting of 3 modules, named A, B and C, was developed.

Module A:
This module includes a GUI and the segmentation related operations. It allows the user to generate the segmented chromosomes and arrange them in directories.

Module B:
It comprises the automatic classification operations, including the development of a GUI that allows the user to: (i) Enter the segmented chromosomes obtained by the module A; (ii) Use the classification model m, obtained in phase 4. Its output is composed of the classified chromosomes.

Module C:
It is a GUI that generates a karyogram using the classified chromosomes obtained in module B. This karyogram is interactive, since the module allows the user to change the chromosome polarity and its membership class. In addition, this module generates a cytogenetic report in the format defined by the cytogeneticist.

Results and Discussion
Results presented in this section are reported according to the stages previously described. Four experiments were performed in order to identify the best classification algorithm for the aimed application. Two more experiments were carried out to find out the feature set that best describes the chromosome in the context of automatic classification.

Experiment 1. Feature Selection
Results of this experiment will yield the features that best describe the chromosomes and the related accuracy for each tested feature set. For a first training round, the set of algorithms a 1 and the group of features c 1 were used, along with a multi-class architecture, q 1 , where the chromosomes were classified in classes from 1 to 24. In this experiment, the highest accuracy was 60.80%, obtained by the Random Forest (RF) algorithm. The rest of the tested classification algorithms achieved a classification accuracy under 57%. The experiment was repeated with the normalized version of c 1 features (set c 2 ). Results show that the Multilayer Perceptron (MLP) artificial neural network (ANN) obtained an accuracy of 60.72%. During the first round of this experiment, the accuracy of the RF algorithm was 0.08% greater than that of the MLP algorithm. In the second round, when using c 2 set (normalized data), all the classification algorithms obtained better results, except the RF. In Figure 5, the accuracy for c 1 and c 2 feature sets are compared.   Table 2.
This experiment did not yield the optimal feature set to describe the chromosomes, although its results shown that the feature vector must be normalized in order improve the accuracy in 98% of the tested algorithms. In the other hand, though this experiment, the training time for each classifier could be measured. The training time of the 51 classifiers, using 32 features on the selected hardware, took at least 24 h for each classifier, and thus it was proposed to reduce the number of algorithms to be tested, in order to work with the algorithm that obtained the best results in the shortest time.
In this experiment, accuracy results from the models generated with reduced feature sets are lower (60%), compared to the models where the full set of features is used. This indicates that feature selection and reduction would not improve the obtained results when the full set of features was used.

Experiment 2. Training Time
The purpose of this experiment is to reduce the number of classifiers to test, by identifying and keeping those yielding the best results for the experimental data set.
For the next tests, the feature set c 3 and the multi-class architecture q 1 were used. In the first round, where 10 chromosome images were used, the best classification accuracy, 64.73%, was obtained by the MLP classifier. In order to identify the classifiers yielding an similar accuracy compared to the MLP classifier, the classifiers with an accuracy above 60% were selected to conform the set of classifiers q 2 , as shown in Figure 6. The set q 2 is presented in Table 2. Next, using the set of 119 images, the best accuracy was obtained by the same classifier (MLP): 86.77%. The accuracies obtained in this experiment are reported in Figure 7.  In these experiments, 51 algorithms were tested and 5 of them obtained accuracies above 60%. The selection of the 60% threshold is arbitrary, considering that a 50% accuracy is not better than a random classification. The accuracy obtained in this round of experiments was used to discard the algorithms with worse accuracies for our purpose and to reduce the training time in future experiments. It is worth noting that during the second round of this experiment, more training information was used, rising the execution time, but improving the classification accuracy results, from 67.58% to 86.78% when using the MLP algorithm. Figure 7 shows the classifiers that yielded the best accuracies. Results of this experiment show that the MLP classifier yielded the best accuracies and could be used in the final classification model.
Execution time was not studied, but results showed that between 5 and 7 s are needed to classify a whole set of chromosomes once the trained model was obtained.

Experiment 3. Two Stage Classification
Once that the classifiers yielding the best accuracy were identified as the set q 2 , in this third experiment the classification process was divided into two stages. This division is depicted in Figure 3. In the pre-classification stage, the objective is to segregate the chromosomes into wide groups, and use this output as the input of a final classification into 24 classes. In Reference [39], the authors propose applying a post-classification process to re-assign a chromosome to a different class when a wrong number of chromosomes is found in some class. In this work, a post-classification process is executed to reassign the misclassified chromosomes to their correct class.
According to the International System of Human Cytogenomic Nomeclature (ISCN) [40], chromosomes can be grouped by shape and area. In Reference [25], the authors propose a hierarchical classification approach, where they divide the chromosomes into seven groups. For the G-band chromosomes images used in this work, this preclassification into seven groups gave accuracy results lower than the single phase classification. It was found that the preclassification in three groups worked better for the G-band dataset, then the pairs are divided as followsThe 3 pre-classification groups are defined according to the shape and area of the chromosome. The 23 chromosome pairs are divided as follows:

Pre-Classification Stage
This stage could be performed using two types of architectures: multi-class and binary, which are explained below:

1.
Multi-class (q 2 architecture). Two groups of features, c 3 and c 4 , are used to decide if the current chromosome belongs to one of the three groups. Belongs to group 2 (G2) or does not belong to group 2 (NOTG2). (c) Belongs to group 3 (G3) or does not belong to group 3 (NOTG3). Figure 8a shows the results of the multi-class classifiers. The accuracy for all these classifiers was close to 90%. In the other hand, the results of the binary classifiers are shown in Figure 8b-d, where it can be observed that most of the tested classifiers obtained accuracies close to 90%, except the SMO classifier in group 2, where non-optimized hyperparameters were used. In the other hand, the results of the binary classifiers are shown in Figure 8b-d, where it can be observed that all of the classifiers obtained accuracies close to 90%. Although, the best results were obtained using the q 3 architecture (binary classification) and the feature set c 3 . It is worth noting that classifier with the best accuracy was the MLP.  The purpose of this experiment was to pre-classify the chromosomes into three different groups, working with the feature sets c 3 and c 4 . The group c 3 is a vector of 131 features, while the group c 4 is conformed of only 3. It was observed that the best results were obtained using the feature set c 3 , both in the binary and multiclass classifiers. The binary and multiclass classification schemes were compared using the MLP classifier, since it gave the best results in both schemes. The minimum and maximum accuracies of both schemes are compared in Figure 9. The minimum and maximum percentages for the binary scheme are 73.16% and 98.13%, while the results for the multi-class scheme are 73.10% and 64.94%, respectively. This test showed that the binary classifiers obtained higher accuracies.

Post-Classification Stage
In this stage, the pre-classification into 3 groups scheme was applied over the same training set in the same way as in phase 2, but using this time a reduced multi-class classification scheme with the feature sets c 3 and c 4 . Figure 10 presents the results of this experiment, where once more time, the best results are obtained using the feature set c 3 and the MLP algorithm, obtaining accuracies of 95.30%, 91.77% and 90.02% for the groups 1, 2 and 3, respectively.
Once the chromosomes were classified into 3 wide classes, the next step was to use this classification as the input of a multi-class classifier. The highest accuracies were obtained by the MLP classifier, being 95%, 91% and 90%, for the groups 1, 2 and 3, respectively.
Finally, the performance of the whole classification scheme was tested by a complete classification round, creating a dataset containing every testing sample. When the same sample was assigned to two different classes during the phase 1, its final class was reassigned according to the highest membership percentage. Once the classification into three wide classes was obtained, the Phase 2 was carried out, using the multi-class algorithms over 24 classes, where 76.44% of the whole set of chromosomes was correctly classified.

Experiment 4. Redefinition of the 2 Stage Classification
In experiment 3, in the pre-classification stage, accuracies above 90% were obtained using the following configuration; the 90% accuracy threshold was selected observing that the maximum accuracy reported in the literature was 94% in the work in Reference [26] using a refined ANN. In the present work it was decided to round this value to the nearest ten below, since a non-refined ANN is being used: • Feature group c 3 ; • MLP classifier; and • q 3 (Phase 1, binary classifiers) and q 4 (Phase 2, multi-class classifiers) architectures.
In a new experiment, the same configuration was used, but this time the 3 wide groups were redefined during the training stage. This new group definition was based on the size characterization dictated by the System for Human Cytogenetic Nomeclature (ISCN) [40]. Changes to model previously presented in Figure 3, are shown in Table 5. In phase 1, the obtained accuracies were 99.49%, 97.85% and 98.64% for the groups 1, 2 and 3, respectively. In phase 2, the accuracies were 99.24%, 92.69% y 92.47% for the groups 1, 2 and 3. Results of experiment 4 are shown in Figure 11. The results of the classification of the whole chromosome set using the modified model yielded a accuracy of 88.45%, which is higher than the result of the classification using the initial two-stage classification by 12.01%.
The classification algorithm used for this test was the MLP, that is an Artificial Neural Network (ANN). The default configuration of this algorithm in Weka is presented in Figure 12.

Application of the Proposed Model to the G-Band Images Dataset
The G-band image dataset (GID) is composed of 24 studies of chromosomes in metaphase. The dataset and its manual classification by an expert were provided by the laboratory of cytogenetics of the CHT. Using automatic and semi-automatic segmentation tools developed by our team, the chromosomes were individually segmented and a dataset of 1097 G-band chromosome images was composed. If the karyotypes were taken from healthy people, the total number of chromosomes of the dataset should be 1104, but some of the provided studies presented some anomaly in the number of chromosomes, for example, some studies lacked chromosome 4 or 18. It is worth noting that the number of elements of the G-band dataset is only 20% of that of the Q-band dataset. Figure 13 shows an example of the chromosomes in G bands.
The segmented GID was then used to train the ANN architecture presented in Figure 12. Using binary and multi-class classifiers, in phase 1, the accuracies were 95.26%, 90.56% and 93.6%, for the groups 1, 2, and 3, respectively. In phase 2, the accuracies were 91.12%, 77.08% and 84.4%, for the same groups. Finally, the overall accuracy was 80.99% for the whole set.

Changes in the Number of Neurons in the Hidden Layer
In order to improve the obtained accuracies in 3 wide classes for the GID, the number of neurons in the hidden layer of the ANN presented in Figure 12, was varied from 1 to the maximum number of neurons in the default configuration of Weka (70 neurons). The result of accuracy for the varying number of neurons is presented in Figure 14.
For the phase 1 binary classification, the highest accuracies were 94.71 %, 89.56% and 92.7%, for the groups 1, 2 and 3, respectively. In phase 2, the obtained percentages were 91.71 %, 76.51% y 86.4%, for the same groups. Using this new configuration, the overall accuracy was 82.36% for the whole set. Table 6, summarizes the tested number of neurons in the hidden layer for each classifier, and their corresponding classification results.
It is worth noting that the accuracy of classification do not show a radical change when using less neurons. For example, in the case of the group 1, when using binary classifiers and varying number of hidden-layer neurons form 1 to 70, the classification results were between 94% and 95%; and for the rest of the groups the observed behavior was similar. This fact is important because if the high classification results persist even using less neurons, the training and evaluation times are reduced, without affecting the functionality of the classifier. Figure 15 Figure 16a-c, the confussion matrices for the first classification stage are shown. The lower accuracy was found in the group 2, where a 0.91 accuracy is obtained. In Figure 16d, the confussion matrix has a minimun accuracy of 0.88 for chromosomes C4 and C5. In Figure 16e, the minimum obtained accuracy is 0.51 for the class C23. Finally, in Figure 16f, the lower accuracy is 0.5 for the class C24. The group that exhibits the lowest accuracies is the group 2 of the multi-class classification, where most of the tested classifiers have scores below 60%. Finally, the chromosome number 10 presented the lowest accuracy (35%).
In this section an ANN two-stages classification model applied to G-band chromosomes images was proposed. This model obtained a accuracy of 88.45% over a Q-band image dataset. The same model attained a percentage of 82.02% when it was applied over a G-band image dataset. It is worth noting that the G-band dataset was composed of only 24 karyotypes, while the number of karyotypes of the Q-band dataset was 119. A classification model trained with more information will obtain better results.
A two-stage classification scheme was proposed. In the first stage, the chromosomes are classified in 3 wide groups, based on the ISCN length characterization of chromosomes. In a second stage, the output of the first stage is used as input for a multi-class classifier applied over each wide class. This allowed us to improve the accuracies. The ANN used in this scheme was tuned by reducing the number of neurons in the hidden layer without affecting the final classification result.

Desktop Application for Semi-Automatic Chromosome Classification
In Figure 17, a screenshot of the developed semi-automated chromosome segmentation desktop app is shown. The cytogeneticist can read chromosome images from the microscope connected to the computer that hosts the desktop application. Once the chromosome image has been acquired, the cytogeneticist can choose between two segmentation modes: the automatic or the semi-automatic mode. In the semi-automatic mode, the cytogeneticist interacts with the app by manually selecting the approximated medial axis to aid the segmentation process of the displayed chromosomes.
The segmented chromosome are then automatically classified by the previously trained two-stage classifier to generate a preliminary karyotype with the classification label for each chromosome.
The cytogeneticist would use the generated results to build the final karyogram, including the possibility to manually correct misclassified chromosomes. In Figure 18, an example of a preliminary karyogram is shown. This karyogram is the output generated by the proposed system, and may need additional intervention of the cytogeneticist. In this specific situation, the misclassified chromosomes must be manually arranged by the cytogeneticist to have the final karyogram complete.

Conclusions
Most of the works in the literature about chromosome classification are trained using Q-band image datasets. The two-stage classification model proposed in this work was trained with a dataset of images acquired with the more affordable and common G-band staining technique. The dataset composed of 24 images was provided by the CHT.
This work was limited to the study of the best classifier based on the classification accuracy for the task of classifying G-band chromosome images to build a karyogram. Only the classifiers available through the Weka platform were tested. The ANN that was finally selected was used with its default configuration. Deep-learning techniques were not studied because they require manually annotating large amounts of data for training, and it is a very time consuming and expensive process. The set of karyotype images provided by the CHT is composed of only 24 images, hence only classical classification algorithms were tested. Execution time for each classifier was not studied, but results showed that between 5 and 7 s are needed to classify a whole set of segmented chromosomes, once the trained model was obtained.
The MLP-ANNs have proven their effectiveness in modelling features that best describe a phenomena. In this work, the features extracted from the chromosome images are used to feed a MLP-ANN. Three sets of features were tested and the set best describing the chromosomes, in the aimed application context, was selected. This feature set is allows the ANN to distinguish between 24 different classes.
The main contribution of this work is the proposal of a two-stage chromosome classification scheme. In the first stage, the chromosomes are classified into 3 wide classes. This output is used as the input of a second stage, where the pre-classified chromosomes are assigned to a 1 of the 24 available classes. This scheme improved the classification accuracy from 76% to 88% when using the Q-band database.
The two stage chromosome classification scheme proposed in this work achieves results comparable with those found in the literature, even when a non-optimized MLP-ANN classifier was used. As a future work, the optimization of the network hyper-parameters parameters and topology will be explored. To objectively determine the most suited classification algorithm for the aimed application, a set of 51 algorithms was evaluated using a dataset different from the one used during the training stage. In this evaluation, the MLP-ANN obtained the best classification results, both with the multi-class classifiers and the binary ones. Using the Q-band dataset [26], the proposed classification scheme correctly classified 88% of the whole set, while using the G-band image dataset, it obtained an accuracy of 82%. It should be noted that the G-band image dataset is composed of only 24 karyotypes, while the Q-band is composed of 119 studies.
It was observed that reducing the number of neurons in the hidden layer of the MLP-ANN reduced the training time without affecting the classification results. In fact, for the 3 tested binary classifiers, starting from 4 neurons, the classification accuracy was similar to that obtained when using the 66 default neurons. Finally, the classification scheme presented in this work was implemented into an application that allowed the cytogeneticist from the CHT to reduce the required time to generate a cytogenetic report from several hours to some minutes.
The integration of an automatic system in all the processing phases: chromosome segmentation, chromosome classification and the automated diagnose of genetic diseases is desired. However, several issues must be solved to attain this state, such as the lack of a bigger set of labelled G-banding images. A bigger set of images would increase the classification accuracy of the existing models and could allow us to test of a more robust classification scheme based on deep learning techniques. Also, the integration of a fully automatic segmentation module would complete the proposed system.