DETECTION CT SCAN IMAGES A PROJECT REPORT Submitted

DETECTION OF CAVITARY AND
MILIARY TUBERCULOSIS FROM CT
SCAN IMAGES
A PROJECT REPORT
Submitted by

PRIYANKA. J
(2015202052)
A report of the project
submitted to the Faculty of
INFORMATION SCIENCE AND COMMUNICATION ENGINEERING
in partial fulfillment
for the award of the degree
of

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

MASTER OF COMPUTER APPLICATIONS

DEPARTMENT OF INFORMATION SCIENCE AND TECHNOLOGY
COLLEGE OF ENGINEERING, GUINDY
ANNA UNIVERSITY
CHENNAI 600 025
MAY 2018

ii

ANNA UNIVERSITY
CHENNAI – 600 025
BONA FIDE CERTIFICATE

Certified that this project report titled DETECTION OF CAVITARY
AND MILIARY TUBERCULOSIS FROM CT SCAN IMAGES is the bona fide
work of PRIYANKA. J (2015202052) who carried out project work under my
supervision. Certified further that to the best of my knowledge and belief, the
work reported herein does not form part of any other thesis or dissertation on
the basis of which a degree or an award was conferred on an earlier occasion on
this or any other candidate.

PLACE : CHENNAI

Dr. H. KHANNA NEHEMIAH

DATE :

ASSOCIATE PROFESSOR
PROJECT GUIDE
RAMANUJAN COMPUTING CENTRE
COLLEGE OF ENGINEERING, GUINDY
ANNA UNIVERSITY
CHENNAI 600025

COUNTERSIGNED

Dr. SASWATI MUKHERJEE
PROFESSOR AND HEAD OF THE DEPARTMENT
DEPARTMENT OF INFORMATION SCIENCE AND TECHNOLOGY
COLLEGE OF ENGINEERING, GUINDY
ANNA UNIVERSITY
CHENNAI 600025

iii

ABSTRACT

Computer Aided Diagnosis (CAD) system is proposed for the
detection of cavitary and miliary tuberculosis. Cavitary tuberculosis consists of
cavities of size from 1 cm to 6 cm and miliary tuberculosis consists of randomly
distributed micro nodules in all the lobes of the lungs. The datasets for the
proposed system are obtained from the reputed hospitals in Tamil Nadu. The
noise from the lung CT scan image is removed by using Gaussian filter and
Otsu’s thresholding method is used to convert the grayscale image to binary
image from which the lungs are segmented. The cavities are extracted from
the segmented lungs by finding the diameter of the cavity and the cavities are
classified based on their sizes.
Region of interests are cavities and micro nodules from which the
images are classified as cavitary and miliary tuberculosis. Texture, shape and
geometrical features are extracted from the region of interests. Bee colony
optimization with sequential forward selection and bee colony optimization with
rough dependency measure are used to select two subsets of features from the
extracted features. The selected features are classified using radial basis function
neural network, based on which the lungs are classified as miliary, normal lung
or other diseased lungs. The accuracy obtained from the features selected by bee
colony optimization with forward selection is 93.33% and the accuracy obtained
from bee colony optimization with rough dependency measure is 86.67%.

iv

ABSTRACT (TAMIL)

Tamil Abstract

v

ACKNOWLEDGEMENT

The satisfaction that accompanies the success would be incomplete
without mentioning the names of the people who made it possible.
It is my privilege to express my sincere thanks to my project guide
Dr.

H. Khanna Nehemiah, Associate Professor, Ramanujan Computing

Centre, Anna University, Chennai for his keen interest, inspiring guidance and
constant encouragement with my work during all stages, to bring this thesis into
fruition.
I deeply express my sincere thanks to Dr. Saswati Mukherjee,
Professor and Head, Department of Information Science and Technology, Anna
University, Chennai for extending the facilities of the department to my project
and for her unstinting support.
I would like to express my sincere thanks to the project committee
members, Dr.

P. Yogesh, Associate Professor, Dr.

K. Vidya, Assistant

Professor, Dr. P. Prabavathy, Teaching Fellow and Mrs. G. Mahalakshmi,
Teaching Fellow, Department of Information Science and Technology, Anna
University, Chennai for their valuable suggestions, encouragement and constant
motivation throughout the duration of my project. I thank my parents, family
and friends for bearing with me throughout the course of my project.

PRIYANKA. J

vi

TABLE OF CONTENTS

ABSTRACT
ABSTRACT (TAMIL)
ACKNOWLEDGEMENT
LIST OF TABLES
LIST OF FIGURES
LIST OF SYMBOLS AND ABBREVIATIONS

iii
iv
v
viii
ix
x

1

INTRODUCTION
1.1 OBJECTIVE
1.2 CHALLENGES
1.3 ORGANIZATION OF THE REPORT

1
3
4
4

2

LITERATURE SURVEY
2.1 COMPUTER AIDED DIAGNOSIS
2.2 BEE COLONY OPTIMIZATION
2.3 FEATURE SUBSET SELECTION
2.4 RADIAL BASIS FUNCTION NEURAL NETWORK

5
5
5
6
7

3

SYSTEM DESIGN
3.1 PROPOSED SYSTEM
3.2 PREPROCESSING SUBSYSTEM
3.3 SEGMENTATION SUBSYSTEM
3.4 ROI EXTRACTION SUBSYSTEM
3.5 FEATURE EXTRACTION SUBSYSTEM
3.6 FEATURE SELECTION SUBSYSTEM
3.6.1 Bee Colony Optimization
3.6.2 Sequential Forward Selection
3.6.3 Rough Dependency Measure
3.7 CLASSIFICATION SUBSYSTEM

8
8
9
10
11
12
13
13
14
15
16

4

IMPLEMENTATION AND RESULTS
4.1 DATASET DESCRIPTION
4.2 PREPROCESSING AND SEGMENTATION
4.3 ROI EXTRACTION
4.4 FEATURE EXTRACTION

17
17
17
18
19

vii
4.5
4.6
4.7

5

FEATURE SELECTION
CLASSIFICATION
EVALUATION METRICS
4.7.1 Performance Evaluation of the Classification System
4.7.2 Experimental Results

CONCLUSION AND FUTURE WORK
5.1 CONCLUSION
5.2 FUTURE WORK

REFERENCES

20
21
22
22
23
25
25
25
26

viii

LIST OF TABLES

4.1 Confusion Matrix
4.2 Results Obtained from the Proposed System

24
24

ix

LIST OF FIGURES

3.1 System Framework for the Detection of Cavitary
and Miliary Tuberculosis
4.1
4.2
4.3
4.4

Preprocessed and Segmented Image
ROI Extraction
Extracted Features
Selected Features by Bee Colony Optimization with
Sequential Forward Selection
4.5 Selected Features by Bee Colony Optimization with
Rough Dependency Measure
4.6 Results by Bee Colony Optimization with
Sequential Forward Selection
4.7 Results by Bee Colony Optimization with Rough
Dependency Measure

8
17
18
19
20
20
21
21

x

LIST OF SYMBOLS AND ABBREVIATIONS
ACO

Ant Colony Optimization

BCO

Bee Colony Optimization

CAD

Computer Aided Diagnosis

CSM

Cosine Similarity Measure

CT

Computed Tomography

DICOM

Digital Imaging and Communications in Medicine

FN

False Negative

FP

False Positive

GLCM

Gray Level Co-occurrence Matrix

LGXP

Local Gabor XOR Pattern

NB

Naive Bayes

RBFNN

Radial Basis Function Neural Network

RDM

Rough Dependency Measure

ROI

Region of Interests

SFS

Sequential Forward Selection

SVM

Support Vector Machine

TB

Tuberculosis

TN

True Negative

TP

True Positive

UCI

University of California at Irvine

WHO

World Health Organisation

1

CHAPTER 1
INTRODUCTION

Tuberculosis (TB) is an infectious disease and it is the second largest
killer in the world. From the report of World Health Organization (WHO), TB
affects 12 million people, of which nearly 2 million people dies. WHO started
a plan called Global Plan to Stop Tuberculosis for the accurate diagnosis and to
provide effective treatment. Two kinds of tuberculosis infection are latent and
active TB where latent TB has inactive state of the bacteria which can become
active and active TB can transmitted to others. The disease that develops initially
after the exposure to mycobacterium tuberculosis is called primary TB and the
disease develops after the reactivation of previous stage of tuberculosis is said
to be reactivation TB.
Reactivation TB has cavities and cavities are indicative of active
tuberculosis progress but it is not common to occur in primary TB. Cavities
are of size of 1 cm to 6 cm in diameter. Miliary TB consists of micro nodules
having a diameter of 1 mm to 3 mm that are distributed randomly on all the
lobes of the lungs. Generally TB is screened using chest X-rays which is less
sensitive than Computed Tomography (CT) scan images. Changes in density of
the images are more apparent in CT scans.
An image is defined as an artifact that depicts visual perception.
Digital image is the binary representation of a two-dimensional image. Digital
image has the finite set of digital values called pixels. The process of creating
visual representations of the interior of the body for the clinical analysis is called
medical imaging. Medical imaging is used to diagnose, monitor or treat medical
conditions of the body. CT images produces the cross-sectional images of the

2
body. CT images are used to diagnose the disease and monitors the effectiveness
of the disease. CT images are better than chest X-rays because CT images
displays the more detailed view of the chest then chest X-rays. So, mostly
CT images are used to diagnose many diseases. Interpretation of the medical
images of the patients by using computer is called Computer Aided Diagnosis.
Computer Aided Diagnosis (CAD) is used by doctors for
interpretation of medical images. It is used in the detection and diagnosis of
different types of diseases. Radiologists uses the output of the CAD system to
obtain the second opinion before diagnosing the disease. CAD system are used
in this project to detect the cavitary and miliary TB from CT slices of lungs.
Feature extraction is the main contribution in this project where texture, shape
and run length features are extracted.
The main objective of this project is to improve the feature selection.
Feature selection is defined as selecting the subset of relevant features where
the redundant or irrelevant features are eliminated. There are three types of
approaches in feature selection such as filter approach, wrapper approach and
hybrid approach. In the proposed system, filter approach is used where the
features are selected by their scores in various statistical functions. In this
project, the features are selected by the maximum probability value that are
computed from applying bee colony algorithm.
Bee colony algorithm plays the important role in selecting the
features in the proposed system. Bees are the insects which lives in colonies.
Bee colony consists of three types of bees such as employed, onlookers and
scouts where each food sources are initialized to each employed bees. Employed
bees evaluates the nectar amount in their memory and dances in the hive where
hive is their living area. Each onlooker bees chooses the food source for which
the employed bees dances well. The food sources that are abandoned are

3
replaced with new food sources discovered by the scouts. The best food source
are selected until no food sources are present.
In feature selection subsystem, bee colony optimization with
sequential forward selection and bee colony optimization with rough
dependency measure are used. Bee colony optimization with sequential forward
selection selects the features based on their probability values that are calculated
by the fitness values of the features. Bee colony optimization with rough
dependency measure selects the features based on indiscernible relation between
the features and also by the probability values of the selected features from
indiscernible relation.
Classification subsystem is used to accurately predict the target class
labels. In this project, binary classification is used where it consists of only 1 and
0. Classification results are tested by comparing the predicted values with the
known class labels. Radial Basis Function Neural Network (RBFNN) is used as
the classification method. Radial basis function neural network consists of three
layers namely input layer, hidden layer with radial basis activation function and
an output layer.
In this project, Computer aided diagnosis system detects the cavitary
and miliary tuberculosis from CT chest images. The lungs are segmented by
the Otsu’s thresholding method. Cavities and micronodules are considered as
region of interests from the CT slices from which the texture, shape and run
length features are extracted. Bee colony algorithm is used to select the features
from extracted features. The selected features are used to train the radial basis
function neural network.

4
1.1

OBJECTIVE

The main objective of the project is to take the CT image datasets
having cavitary and miliary tuberculosis and predict the occurrence of cavitary,
miliary and other diseases from the datasets at high accuracy. Physicians can
make decisions to choose the best treatment method at low cost.

1.2

CHALLENGES

Features selecting from the dataset using bee colony algorithm is a
challenging approach. To classify the images as the predicted diseased image
when an unknown chest CT images are given.

1.3

ORGANIZATION OF THE REPORT

Chapter 2 describes the literature survey about the image processing
and the detection of diseases.
Chapter 3 discusses about the detailed description of the system
framework of the proposed work.
Chapter 4 describes about the implementation of the detection of
cavitary and miliary tuberculosis and its obtained results.
Chapter 5 describes about the conclusion and future work.

5

CHAPTER 2
LITERATURE SURVEY

This chapter describes about the works done by other researchers and
the methodologies used by them.

2.1

COMPUTER AIDED DIAGNOSIS

Anita Titus et al. 1 have proposed the CAD system to detect the
cavitary and miliary tuberculosis from CT scan images using Local Gabor XOR
Pattern (LGXP) technique. The noise from the lung CT scan image is removed
by Gaussian filter which is followed by iterative thresholding. Morphological
erosion, dilation, opening and closing operations are then used to segment
the lungs. Region growing technique is used to extract the cavities from the
segmented lungs by their intensity levels and their sizes. LGXP is applied to the
CT slices that do not has cavities to change the CT slices into the texton images.
From texton images, LGXP histogram is extracted. From LGXP histogram,
features are extracted which is applied to the neural network. From the neural
network, the results are obtained to classify the CT slices as miliary TB, normal
lung or other diseases. The obtained accuracy is 96% for cavitary TB, 93% for
miliary TB using texton based LGXP and 88% for miliary TB using LGXP.

2.2

BEE COLONY OPTIMIZATION

Uzer et al. 2 proposed a feature selection method using artificial
bee colony algorithm and classification of medical datasets using support vector
machines. This proposed system uses the hepatitis, liver disorder and diabetes

6
disease datasets from University of California at Irvine (UCI) repository. The
features are selected from the datasets by using artificial bee colony algorithm
by forward selection. The selected features are classified by support vector
machine with k-fold cross-validation method.

The obtained accuracy are

94.92%, 74.81% and 79.29% for hepatitis datasets, liver disorder datasets and
diabetes datasets.

2.3

FEATURE SUBSET SELECTION

K.B.Nahato et al.

3 have proposed a new knowledge mining

technique from clinical datasets using rough sets and backpropagation neural
network. The objective of this proposed system is to build a classifier that
predicts the presence or absence of a disease from the minimal set of attributes
that are extracted from the clinical dataset. This work uses rough set method
combined with backpropagation neural network (RS-BPNN). The two stages
are handling of missing values of datasets from which the appropriate attributes
are selected by indiscernibility relation method and using backpropagation
neural network for classification on the selected features. The datasets used are
hepatitis, Wisconsin breast cancer and Statlog heart disease datasets obtained
from the University of California at Irvine (UCI) repository. The accuracy
obtained for hepatitis, breast cancer and heart disease are 97.3%, 98.6% and
90.4%.
Sweetlin Dhalia et al. 4 have proposed the new CAD system
to detect the pulmonary hamartoma from CT scan images using ant colony
optimization for feature selection. Lungs are segmented from CT slices by
using Otsu’s thresholding method. In this work, nodules are considered to be the
region of interests. The textural, shape and geometrical features are extracted
from region of interests. Filter approach combines with ant colony optimization
is used to select the features from the extracted features. Cosine Similarity

7
Measure (CSM) and Rough Dependency Measure (RDM) directs the Ant
Colony Optimization (ACO) to obtain two subsets of features independently.
The selected features trains two classifiers such as Support Vector Machine
(SVM) and Naive Bayes (NB) classifiers. Four trained classifiers are obtained
which are tested and evaluated to obtain the performance measures. The results
achieved are 88%, 85%, 93% and 91% from four trained classifiers.

2.4

RADIAL BASIS FUNCTION NEURAL NETWORK

D.S.Elizabeth et al. 5 have proposed the computer aided diagnosis
system to select a significant slice from a computed tomography (CT) scan to
analyze each nodule from a set of slices in Digital Imaging and Communications
in Medicine (DICOM) format has been developed. The lung parenchyma is
segmented from each slice of CT image using greedy snake algorithm. Region
growing algorithm is used to extract the regions of interest where the ROIs
are nodules. For each nodule, the slice having the largest area is chosen as
the significant slice. Texture and shape features are extracted from the ROIs.
The extracted features are used to train the neural network. The neural work
used in this project is Radial Basis Function Neural Network (RBFNN). Radial
basis function neural network are used to classify the nodule as cancerous or
non-cancerous. This proposed system has the accuracy of about 94.4%.

8

CHAPTER 3
SYSTEM DESIGN

This chapter tells about the design of the proposed system and the
overall implementation of the proposed system.

3.1

PROPOSED SYSTEM
CAD system framework for detection of cavitary and miliary

tuberculosis from CT scan images are shown in Figure 3.1.

Figure 3.1: System Framework for the Detection of Cavitary and Miliary
Tuberculosis

9
It develops a CAD system to detect cavitary and miliary tuberculosis
from CT scan images. The proposed work helps in quick identification of
cavitary and miliary tuberculosis. CT scan images of the lungs are collected
in the proposed system, which contains cavitary, miliary and other diseased
lung CT scan images. First, the noise is removed from the lung CT image
using Gaussian filter. Then the enhanced image is segmented using Otsu’s
thresholding method.

The segmentation process removes the bones and

unwanted portions of the image. In the segmented slice, background removal
is performed by using morphological erosion, opening and closing operations.
The ROI is selected from the segmented image. From the ROI, texture, shape
and geometrical features are extracted. Bee colony optimization with forward
selection and bee colony optimization with rough dependency measure is used
to select the relevant features from the extracted features. These selected
features are used for train the radial basis function neural network classifier.
The performance evaluation measures such as accuracy and confusion matrix
are used for evaluating the performance of the trained classifier.

3.2

PREPROCESSING SUBSYSTEM

Image preprocessing is done to enhance the image features and to
remove the unwanted distortions. Chest CT slices used in the project having a
dimension of 1024 x 1024 pixels. In the preprocessing step, each CT slice is
converted to a grayscale slice. As Gaussian noise is present, the CT slices are
fed to a Gaussian filter to reduce the noise. Gaussian filter blurs the image by
Gaussian function to reduce image noise and reduce detail. Gaussian function
is shown in Equation 3.1.
1
?(x?µ)2
P(X) = ? e
? 2?

.
2?2

where ? is the standard deviation and µ is the mean.

(3.1)

10
Input :
Lung CT scan slice
Process :
Step 1 :

The slice is converted to grayscale image.

Step 2 :

Gaussian filter is used to blur the grayscale image which removes
noise from the image.

Output :
Denoised image

3.3

SEGMENTATION SUBSYSTEM

The process of separating an image into one or more different regions
is called segmentation of an image. The left and right lobes of the lung are
separated using morphological operation.
Otsu’s segmentation algorithm is used to separate the lung tissues
from the CT slice by finding a suitable threshold. Airways, disease patterns and
sometimes image noise may be seen as holes in the segmented binary image.
In a CT image, the intensity values of the lung pixels are the same as the
background pixels and hence they are also removed to get the segmented lung
fields.
Input :
Preprocessed lung CT slice
Process :
Step 1 :

Initial threshold T and global threshold value by Otsu’s method.

Step 2 :

Segment the image into two groups of pixels G1 with intensity values
greater than T and G2 with intensity values lesser than or equal to T.

Step 3 :

The average intensity values 1 and 2 for the pixels in G1 and G2 are
computed.

11
Step 4 :

Compute a new threshold using Equation 3.2.
T = 1/2(µ1 + µ2)

Step 5 :

(3.2)

Repeat steps 2 to 4 till the value of T remains the same in successive
iterations.

Step 6 :

For each pixel in I(x,y) , set I(x,y) = 1 if I(x,y)>T; 0 otherwise.

Step 7 :

Remove the black pixels present in the lung regions with the white
pixel intensity values.

Step 8 :

Eliminate the connected components.

Output :
Segmented Lungs

3.4

ROI EXTRACTION SUBSYSTEM

The region of interests are the pathology bearing regions in this work.
Cavities in the lungs are the symbol for post primary tuberculosis. Cavities are
close to spherical shape which is greater than 1 cm and lesser than 6 cm in
diameter which contains air, fluid or both. If the cavities are present, then the
images are classified as cavitary TB images and others are classified as miliary
and other diseased images.
Input :
Segmented lungs
Process :
Step 1 :

The cavities are selected as region of interests where the cavities are
identified by their diameter.

Step 2 :

If the cavities are present, then the images are classified as cavitary
TB images.

Step 3 :

The images are classified as miliary and other diseased images when
there is no cavities are identified.

12
Output :
Classified images

3.5

FEATURE EXTRACTION SUBSYSTEM

Extraction of features from an image is one of the important work
done in this project. A piece of information relevant for solving the task
related to detect the disease is called feature. Features are distinct properties
of the image which conveys a lot of information about an image. Gray Level
Co-occurrence Matrix (GLCM) features and shape features are extracted from
the image in the orientations 0? , 45? , 90? , 135? .
Input :
Region of interests.
Process :
Step 1 :

Compute the six GLCM features for each ROI in four orientations.
The features are contrast, correlation, dissimilarity, energy, entropy
and homogeneity.

Step 2 :

Compute the shape features namely area, eccentricity, centroid,
orientation, filled area, convex area, Euler number, equiv-diameter,
solidity, extent, perimeter, major axis length and minor axis length
from each ROI.

Step 3 :

Combine the twenty-two features obtained from step 1 and thirteen
features from step 2.

Step 4 :

Feature vector is constructed.

Output :
Feature vector.

13
3.6

FEATURE SELECTION SUBSYSTEM

The objective of this subsystem is to select a subset of relevant
features to construct a classifier model.

In this work, a filter based bee

colony optimization is used in which the features are selected based on the
intrinsic characteristics of the features. This makes the filter approach faster
to implement thereby increasing its computational efficiency.

3.6.1

Bee Colony Optimization

Bees are the insects which lives in colonies. In a real bee colony,
there are some tasks performed by specialized individuals. The individuals
in bee colony consists of three types of bees such as employed, onlookers
and scouts where each food sources are initialized to each employed bees.
These specialized bees try to maximize the nectar amount stored in the hive
by performing efficient division of labour and self-organization. Bee colony
algorithm consists of three kinds of bees : employed bees, onlooker bees and
scout bees.
Half of the colony comprises employed bees and the other half
includes the onlooker bees. Employed bees are responsible for exploiting the
nectar sources explored before and giving information to the other waiting bees
(onlooker bees) in the hive about the quality of the food source site which
they are exploiting. Onlooker bees wait in the hive and decide a food source
to exploit depending on the information shared by the employed bees. Scouts
randomly search the environment in order to find a new food source depending
on an internal motivation or possible external clues or at random. Bee colony
optimization algorithm :
Input :
Extracted features

14
Process :
Step 1 :

Initialize the food source positions.

Step 2 :

Each employed bee produces a new food source in her food source
site and exploits the better source.

Step 3 :

Each onlooker bee selects a source depending on the quality of her
solution, produces a new food source in selected source site and
exploits the better source.

Step 4 :

Determine the source to be abandoned and allocate its employed bee
as scout for searching new food sources.

Step 5 :

Memorize the best food source found so far.

Step 6 :

Repeat steps 2-5 until the stopping criterion is met.

Output :
Selected features
The above procedure can be implemented for feature reduction. Let
the bees select the feature subsets at random and calculate their fitness and
find the best one at each iteration. This procedure is repeated for a number
of iterations to find the optimal subset.

3.6.2

Sequential Forward Selection

Sequential forward selection starts with an empty set, sequentially
add the feature that has maximized probability value is combined with the
set that already been selected. Bee colony algorithm with sequential forward
selection is as follows:
Input :
Extracted features
Process :
Step 1 :

Finding the random features from extracted features.

Step 2 : Employee bee finds the fitness value for each feature that are selected

15
from previous step.
Step 3 :

Onlooker bees selects the features based on their probability values
that are calculated from fitness value.

Step 4 : The features having the probability greater than 0.5 are selected as
the selected features.
Output :
Selected features
3.6.3

Rough Dependency Measure
Rough set feature selection provides a filter-based tool by which

knowledge may be extracted from a domain in a concise way, retaining the
information content whilst reducing the amount of knowledge involved. The
important concept in rough sets are indiscernibility. Bee colony algorithm with
rough dependency measure are as follows:
Input :
Extracted features
Process :
Step 1 :

Select the initial parameter values for BCO.

Step 2 :

Initialize the population.

Step 3 :

Calculate the objective and fitness value.

Step 4 :

Find the optimum feature subset as global.

Step 5 :

do

Step 5.a :

Produce new feature subset.

Step 5.b :

Calculate the fitness and probability values.

Step 5.c : Produce the solutions for onlookers.
Step 5.d :

Apply the greedy selection for onlookers.

Step 5.e : Determine the abandoned solution and scouts.
Step 5.f :

Calculate the cycle best feature subset.

Step 5.g :

Memorize the best optimum feature subset.

Step 6 :

Repeat for maximum number of cycles.

16
Output :
Selected features
3.7

CLASSIFICATION SUBSYSTEM
The selected feature subsets are used to train radial basis function

neural network classifier. Radial Basis Function (RBF) networks typically have
three layers: an input layer, a hidden layer with a non-linear RBF activation
function and a linear output layer.

A Radial Basis Function (RBF) is a

real-valued function whose value depends only on the Euclidean distance. They
are similar to 2-layer networks, but we replace the activation function with a
radial basis function, specifically a Gaussian radial basis function.
Functions that depend only on the distance from a center vector are
radially symmetric about that vector, hence the name radial basis function. In the
basic form all inputs are connected to each hidden neuron. K-means clustering
is used to determine the centers for each of the radial basis functions. Given an
input x, an RBF network produces a weighted sum output. If there is a cluster
with none or one assigned points to it, we simply average the standard deviation
of the other clusters.
Input :
Selected features
Process :
Step 1 :

Assign target vector for training process.

Step 2 :

Compare the values that obtained from training process.

Step 3 :

Error correction is done.

Step 4 :

Update the weight vector.

Step 5 :

Error should be minimized.

Step 6 :

Classified result is displayed.

Output :
Diagnostic result.

17

CHAPTER 4
IMPLEMENTATION AND RESULTS

4.1

DATASET DESCRIPTION

Training is passed out by 40 lung CT scan slices which includes 16
slices with miliary TB, 13 slices with cavities and 11 slices with other diseased
images. The datasets are obtained from the reputed hospitals in Tamil Nadu.

4.2

PREPROCESSING AND SEGMENTATION

Pre-processed lung CT scan images and the results are depicted in
Figure 4.1.

Figure 4.1: Preprocessed and Segmented Image

18
A CT slice is taken as a input image. The input image is denoised
using Gaussian filter in which the image was blurred to reduce noise. The
denoised image is converted into binary image using Otsu’s thresholding
algorithm and then the lungs are segmented.

The segmented lungs are

superimposed to the original image from which the region of interests can be
extracted.

4.3

ROI EXTRACTION

Extraction of ROI from an image is shown in the Figure 4.2.
Pathology bearing regions are considered as ROIs. The ROIs are extracted from
the segmented lung image without background or superimposed image.

Figure 4.2: ROI Extraction

19
4.4

FEATURE EXTRACTION

Texture, geometric and shape features that are extracted from region
of interests are shown in Figure 4.3. Geometric features are extracted from Gray
Level Co-occurrence Matrix (GLCM).

Figure 4.3: Extracted Features

20
4.5

FEATURE SELECTION

Figure 4.4 displays the selected features obtained from the extracted
features by applying bee colony algorithm with forward selection and Figure 4.5
displays the selected features obtained from the extracted features by applying
bee colony algorithm with rough dependency measure.

Figure 4.4: Selected Features by Bee Colony Optimization with Sequential
Forward Selection

Figure 4.5: Selected Features by Bee Colony Optimization with Rough
Dependency Measure

21
4.6

CLASSIFICATION

The results from two subsets of feature selection obtained are shown
in the Figure 4.6 and Figure 4.7 with accuracy got by classification using radial
basis function neural network.

Figure 4.6: Results by Bee Colony Optimization with Sequential Forward
Selection

Figure 4.7: Results by Bee Colony Optimization with Rough Dependency
Measure

22
4.7

EVALUATION METRICS

All the metrics are listed in this section to evaluate the performance
of the system. The metrics calculated are the precision, recall and accuracy from
the performance of the proposed system.

4.7.1

Performance Evaluation of the Classification System

The performance of the radial basis function neural network
classifier for the given training dataset is evaluated.

Prediction outcome

represents the sample of the confusion matrix.
True Positive (TP) is the diseased images are correctly identified as diseased.
True Negative (TN) is the accurate negative forecast normal images correctly
identified as normal.
False Positive (FP) is an inaccurate positive forecast diseased image incorrectly
identified as diseased.
False Negative (FN) is an inaccurate negative forecast normal image incorrectly
identified as diseased.
Accuracy is the number of all correct predictions divided by the total number
of the dataset. The accuracy is the proportion of true results in the population.
The accuracy can be calculated by the Equation 4.1.
Precision is the correct positive predictions divided by the total number of
positive prediction. Precision value is defined as the proportion of the true
positives against all the positive results. Precision can be calculated by the
Equation 4.2.

23

Recall is the correct positive predictions divided by the total number of positive
prediction. Precision relates to the tests ability to identify positive results.
Again, consider the example of the medical test used to identify a disease. The
precision of a test is the proportion of people who have the disease who test
positive for it. The equation for obtaining recall is shown in Equation 4.3.
Sensitivity is the ability of a test to correctly identify those with the disease
(true positive rate) and specificity is the ability of the test to correctly identify
those without the disease (true negative rate). Sensitivity and specificity can be
obtained by the Equation 4.4 and Equation 4.5.

4.7.2

Accuracy = T P + T N/(T P + T N + FP + FN)

(4.1)

Precision = T P/(T P + FP)

(4.2)

Recall = T P/(T P + FN)

(4.3)

Sensitivity = T P/(T P + FN)

(4.4)

Speci f icity = T N/(T N + FP)

(4.5)

Experimental Results

Confusion matrix obtained from the miliary and cavitary TB datasets
are shown in Table 4.1 and the accuracy, precision and recall are depicted in
Table 4.2

The accuracy obtained from bee colony optimization with sequential
forward selection is greater when compared with the results obtained from bee
colony optimization with rough dependency measure by using Table 4.2.

24

Table 4.1: Confusion Matrix

Class

Miliary
TB

Cavitary
TB

Others

Total

Miliary TB

14

5

1

20

Cavitary TB

5

12

3

20

Table 4.2: Results Obtained from the Proposed System

Methodology

Accuracy

Precision

Recall

Miliary TB using BCO
with SFS

93.33%

90%

90%

Miliary TB using BCO
with RDM

86.67%

63%

63%

25

CHAPTER 5
CONCLUSION AND FUTURE WORK

5.1

CONCLUSION

In this proposed work, both cavitary and miliary tuberculosis are
detected from CT scan images. Lung CT scan images are denoised by using
Gaussian filter and segmented by finding Otsu’s thresholding.

The cavity

regions are extracted from the segmented lungs by using region growing
technique. Cavities are considered as region of interests. The texture, shape
and geometrical features are extracted from region of interests. The selection of
features are done by bee colony optimization with sequential forward selection
and bee colony optimization with rough dependency measure. The selected
features are used to train the radial basis function neural network. Based on the
result and analysis of the classifier, BCO with sequential feature selection has
an accuracy of 93.33%, precision of 90%, recall of 90% and sensitivity of 93%.
The results obtained from BCO with rough dependency measure are accuracy
86.6%, precision 86%, recall 90% and sensitivity 96%. We can conclude that
the result obtained from BCO with sequential forward selection is greater than
the results obtained from BCO with rough dependency measure.

5.2

FUTURE WORK

Bee colony optimization has the disadvantage that it lacks of use
of secondary information. In future, bee colony optimization can be replaced
by using new fitness tests on the new algorithm parameters to increase the
performance of the system and with other combination of preprocessing and
feature selection techniques.

26

REFERENCES

1 Anita Titus, H Khanna Nehemiah, and A Kannan. Computer aided
diagnosis system to detect cavitary and miliary tuberculosis using CT scan
images. Alexandria Engineering Journal, 2017.
2 Mustafa Serter Uzer, Nihat Yilmaz, and Onur Inan. Feature selection
method based on artificial bee colony algorithm and support vector
machines for medical datasets classification. The Scientific World Journal,
2013.
3 Kindie Biredagn Nahato, Khanna Nehemiah Harichandran, and Kannan
Arputharaj. Knowledge mining from clinical datasets using rough sets
and backpropagation neural network. Computational and mathematical
methods in medicine, 2015.
4 J Dhalia Sweetlin, H Khanna Nehemiah, and A Kannan. Computer aided
diagnosis of pulmonary hamartoma from ct scan images using ant colony
optimization based feature selection. Alexandria Engineering Journal,
2017.
5 DS Elizabeth, HK Nehemiah, CS Retmin Raj, and A Kannan.
Computer-aided diagnosis of lung cancer based on analysis of the significant
slice of chest computed tomography image. IET image processing,
6(6):697–705, 2012.
6 Yeon Joo Jeong and Kyung Soo Lee.
Pulmonary tuberculosis:
up-to-date imaging and management. American Journal of Roentgenology,
191(3):834–844, 2008.