close
close

first Drop

Com TW NOw News 2024

(Discussion) Which dataset should I use for breast cancer classification based on homologous recombination deficiency?
news

(Discussion) Which dataset should I use for breast cancer classification based on homologous recombination deficiency?

(Discussion) Which dataset should I use for breast cancer classification based on homologous recombination deficiency?

https://preview.redd.it/gne7m3lzk1id1.png?width=972&format=png&auto=webp&s=db84e0d9bdbcd2ba756e085c59691bdf5391c937

I’ve decided to work on a thesis and a paper on a similar topic (using ML for something related to cancer and genetic data) and I think this will be a good reference.

The Author Mia Josephine Jeffris basically i chose a transcriptome dataset and used SVC to classify cancers into cancers with homologous recombination deficiency and cancers without. i plan to use another model and compare its performance with basic models (random forest/decision tree/SVM etc..) and hopefully propose a new model that can classify with less features while producing SOTA metrics (AUC and accuracy).

My idea is: Find a transcriptome dataset, do some exploration/preparation/scaling/training models and test performance. I also plan to publish this in a relevant journal – so the dataset needs to be (high dimensional) recent and somewhat large.

I am new to the topic, do you have any tips? Recent papers on related ideas? I would be very grateful for the help.

submitted by /u/icy_end_7
(link) (reactions)