breast cancer detection with supervised machine learning

Date |
Aug 2015
Technologies |
r
·
python
·
scikit-learn
·
caret
·

In 2008 approximately 1,380,000 new cases of breast cancer were diagnosed in the world. Today breast cancer is the most common tumor among women. Breast cancer was the cause of 411,000 deaths worldwide in 2002, leading cause of cancer death among women (representing the 14% of all deaths from malignant tumors). Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome. Despite a significant improvement in cancer survival in the last 20 years, anticipate the outcome of the treatment still constitutes an open question.

In this work, gene expression data of 78 patients was summarized by using a set of elementary flux modes (EFMs). In particular, a sufficiently large set of EFMs was calculated for the production/consumption of each metabolite that can be excreted/absolved by the cell. Then, the gene expression datasets obtained from 78 patients with breast cancer were projected onto the EFMs. Afterwards, the mean pvalue for each metabolite along all the EFMs is calculated. This procedure results in a matrix with the value in the i-th row and j-th column corresponding to the mean pvalue of the i-th metabolite in the j-th patient. Those pvalues will be used as a feature to perform the final classification.