Sources for Datasets

Grand Challenge ( Grand Challenge hosts various challenges in the medical domain and provides curated datasets related to those challenges. It serves as a platform for researchers to collaborate and develop solutions for complex medical problems.

  1. International Cancer Genome Consortium (ICGC) (
    The International Cancer Genome Consortium aims to decipher the genomic changes underlying different types of cancers. It provides access to a large repository of cancer genomic data, facilitating research on cancer biology and personalized medicine.
  2. Open Neuro (
    Open Neuro is a platform that hosts open-access datasets in the field of neuroscience. It provides a range of neuroimaging datasets, including fMRI, EEG, and MEG, allowing researchers to investigate brain function and connectivity.
  3. DrugBank (
    DrugBank is a comprehensive resource that offers pharmaceutical and drug-related datasets. It provides information on drug structures, targets, interactions, and clinical data, supporting drug discovery and pharmacological research.
  4. European Bioinformatics Institute (EMBL-EBI) (
    The European Bioinformatics Institute (EMBL-EBI) hosts a variety of biological data resources. It includes databases such as Ensembl for genome annotations, ArrayExpress for gene expression data, and UniProt for protein sequences, among others.
  5. CORD-19 (
    CORD-19 is a dataset specifically focused on COVID-19 research. It consists of scholarly articles related to the coronavirus, enabling researchers to study the virus, its transmission, and potential treatment
  6. World Health Organization (WHO) Global Health Observatory Data (
    The WHO Global Health Observatory Data is a repository of global health statistics and indicators. It covers various health-related topics, including disease prevalence, mortality rates, healthcare infrastructure, and more.
  7. Human Mortality Database (
    The Human Mortality Database provides detailed mortality and population data from various countries. It includes demographic information, cause of death statistics, and life tables, enabling researchers to study mortality trends and patterns.
  8. NOAA National Centers for Environmental Information (
    The National Centers for Environmental Information (NCEI) provides access to a wide range of environmental datasets. It includes climate data, weather observations, oceanographic data, and geophysical information, supporting research on climate change, weather patterns, and natural hazards.
  9. Urban Sound Dataset (
    The Urban Sound Dataset offers audio recordings of urban environments. It includes various urban sounds, such as car horns, sirens, and street music. This dataset is commonly used for research in audio classification and environmental sound analysis

  1. MIMIC (
    The Medical Information Mart for Intensive Care (MIMIC) is a freely available critical care database. It includes de-identified health data from over 60,000 ICU patients, allowing researchers to explore clinical outcomes and develop predictive models.
  2. PhysioNet (
    PhysioNet is a resource for physiological signal processing and analysis. It provides a wide range of datasets, including electrocardiograms (ECG), blood pressure recordings, sleep data, and more. The datasets aim to advance research in cardiovascular and neurophysiological systems.
  3. Cancer Imaging Archive (TCIA) (
    The Cancer Imaging Archive is a repository of medical images, specifically focused on cancer-related imaging data. It offers a vast collection of images, such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET), for researchers working in oncology.
  4. Medical ImageNet (
    Medical ImageNet is a dataset specifically designed for medical image analysis, with a particular focus on COVID-19. It contains a large collection of chest X-rays and computed tomography (CT) scans to aid researchers in developing diagnostic and prognostic models.
  5. ADNI (Alzheimer’s Disease Neuroimaging Initiative) (
    ADNI is a research initiative that provides access to a vast collection of neuroimaging and biomarker data related to Alzheimer’s disease. The dataset includes MRI scans, PET scans, genetic information, and cognitive assessments, enabling researchers to study the progression and early detection of Alzheimer’s.
  6. Radiology Data from the RSNA (Radiological Society of North America) (
    The RSNA offers datasets focused on radiology and medical imaging. It includes various datasets, such as chest X-rays, mammograms, and computed tomography (CT) scans, allowing researchers to develop and evaluate algorithms for image interpretation and diagnosis.
  7. UCI ML Breast Cancer Wisconsin (Diagnostic) Dataset (
    The UCI ML Breast Cancer Wisconsin dataset provides diagnostic information about breast cancer. It contains features extracted from breast mass images, such as clump thickness, uniformity of cell size, and marginal adhesion. The dataset is commonly used for classification tasks related to breast cancer diagnosis.
  8. Openi (
    Openi is an open-access biomedical image search engine. It provides a vast collection of medical images, including X-rays, CT scans, and histopathology images. Researchers can utilize this resource to explore and analyze various medical conditions.
  9. HIV Databases (
    The HIV Databases host a comprehensive collection of HIV molecular data, including sequences, alignments, structures, and immunological data. These datasets are valuable for researchers working on HIV/AIDS-related studies and vaccine development.
  10. Autism Brain Imaging Data Exchange (ABIDE) (
    ABIDE is a public repository that curates and shares neuroimaging data related to Autism Spectrum Disorder (ASD). It includes MRI scans from individuals with ASD and typically developing individuals, allowing researchers to investigate the neural basis of ASD.
  11. Clinical ( is a comprehensive registry and database of clinical trials conducted worldwide. It provides access to a wide range of clinical trial datasets, enabling researchers to explore treatment outcomes, adverse events, and other clinical parameters


  1. OpenfMRI (
    OpenfMRI is a resource for sharing and accessing functional MRI (fMRI) datasets. It hosts a collection of publicly available fMRI data, allowing researchers to analyze brain activity and develop models related to cognitive processes and mental disorders.
  2. Alzheimer’s Disease Genetics Consortium (ADGC) (
    The ADGC offers datasets related to genetics research on Alzheimer’s disease. It includes genome-wide association study (GWAS) data, exome sequencing data, and phenotypic information. Researchers can utilize these datasets to investigate the genetic factors associated with Alzheimer’s disease.
  3. Sleep-EDF (
    The Sleep-EDF dataset contains polysomnographic recordings of sleep patterns. It includes EEG, EOG, and EMG signals recorded from individuals during sleep. This dataset serves as a valuable resource for researchers studying sleep disorders and related phenomena.
  4. Human Connectome Project (HCP) (
    The HCP provides a comprehensive collection of neuroimaging data, including structural and functional MRI scans, along with behavioral and demographic information. It enables researchers to study the brain’s connectivity and understand its relationship to behavior and cognition.