Machine Learning Can Detect Thyroid Cancer – Scientists Discover
Thyroid cancer is a kind of endocrine cancer that develops in the thyroid gland. Much work has gone into improving diagnosis, and thyroidectomy is still the major treatment technique. In order for a procedure to go well and not cause any problems, the doctor needs to make a correct preoperative diagnosis.
A team of three scientists from Loyola University Chicago in the United States, the George Washington University in the United States, and Shengjing Hospital of China Medical University in China, including Nan Miles Xi, Lin Wang, and Chuanjia Yang, proposed a machine framework to predict thyroid nodule malignancy based on a novel clinical dataset. The current human evaluation of thyroid nodule malignancy is prone to inaccuracy and may not provide an accurate preoperative diagnosis. To quantify and evaluate model performance under uncertainty, 10-fold cross-validation, bootstrap analysis, and permutation predictor importance were used. The comparison of model prediction and expert evaluation demonstrates that the framework outperforms human judgement in predicting thyroid nodule malignancy. This method is accurate, understandable, and, as a result, useful in the early detection of thyroid cancer.
The information for this research was collected from 724 patients hospitalised at China Median University's Shengjing Hospital between 2010 and 2012. All patients had thyroidectomy, and the datasets included information on their nodule malignancy, demographic information, ultrasound characteristics, and blood test findings. The average age of the patients is 46.61 years old, with a range of 13 to 82 years old. There are 200 male-patient-affiliated nodules (16.23%) and 1032 female-patient-affiliated nodules (83.3%). They measured free triiodothyronine, free thyroxine, thyroid-stimulating hormone, thyroid peroxidase antibodies, and thyroglobulin antibodies, among other thyroid function assays. Based on their echogenicity, all thyroids were divided into two groups: even (89.12%) and uneven (10.88%).
They applied gradient boosting machine, logistic regression, linear discriminant analysis, support vector machine with radial or linear kernel, and random forest to train six machine learning models to predict the nodule malignancy based on the dataset provided in the previous section. They trained the machine learning models on the training data before predicting nodule malignancy in the test data. Finally, they repeated the process until the machine learning models predicted every patient group and their associated nodules. To limit the unpredictability generated by random splitting, they conducted the 10-fold cross validation 10 times. To assess the model's performance, they compared the model prediction to the genuine nodule malignancy. They calculated the accuracy, area under the receiver operating characteristic, sensitivity, specificity, and precision for each model.
The machine models perform well in terms of prediction. The best-performing model, random forest, outperformed expert judgement by 12% on accuracy and 14% on F1 score, the two broad metrics. One explanation of improved prediction by machine learning models is that they can capture complicated nonlinear interactions between variables. Such links are inherent in the dataset and are difficult for people to recognise. The models are also more aggressive in predicting malignant nodules. As a consequence, the machine learning model may be used to detect thyroid cancer. The variable importance analysis revealed important factors in the diagnosis of malignant thyroid nodules. These factors corroborate prior results from clinical and modelling investigations. According to the findings, nodules with increased blood flow are more likely to be cancerous.
More data from bigger patient cohorts would be required to better study the real influence of nodule location. Several subjects need more investigation. To begin with, the current dataset includes 724 individuals with a total of 1232 nodules. Despite the fact that the sample size is not tiny, collecting additional data will broaden the range of patients and nodules. When used in real-world circumstances, the model trained on a larger and more diversified dataset will generalise better to new patients. Second, the ultrasound-related characteristics in our dataset were retrieved from the original ultrasonography by sonographers. Such an extraction may miss critical elements that are only visible in raw photos. A deep convolutional neural network may be used to capture certain information straight from the ultrasound and increase model performance. Third, we may include fresh tools in the modelling process that go beyond typical ultrasound and blood tests. Single-cell RNA-sequencing (scRNA-seq), a cutting-edge sequencing tool that reveals genome-wide gene expression at single-cell levels, may be used to analyse and compare thyroid nodule transcriptomes. scRNA-seq may be able to improve the model's performance in the same way that it has in other cancer diagnostics and personalised treatment.
This study's machine learning architecture can assess the quality of clinical data for thyroid cancer diagnosis. A machine learning model may be developed using data from this research. A high-quality dataset should include enough information to allow algorithms to reliably predict nodule malignancy. This means that the model's ability to predict the quality of other datasets will be used as a proxy for the quality of other datasets.