GENERATION AND PROCESSING OF BIOINFORMATICS DATABASE
Abstract
Positioned at the intersection of Bioinformatics, Computational Biology, and Public Health, this study addresses the challenge of genomic data preprocessing. With the growing volume of genomic and gene expression data, data filtering methods have emerged as a fundamental technique for organizing records, extracting materials, and developing new research hypotheses. In this research context, the aim of this work is to generate a gene expression dataset using database processing and management techniques for the study of Alzheimer’s disease and for breast cancer subtype classification. This goal was achieved through the use of Python and its associated libraries, along with reputable public repositories. Data preparation involved extraction, merging, column organization, and normalization. One of the main achievements was the creation of a specific function to automate the data preparation pipeline. The practical outcome is a set of cohesive and high-quality databases that serve as a valuable resource for the scientific community. The validation of each step demonstrated the effectiveness of the approach and underscores the critical importance of preprocessing for obtaining reliable results in Bioinformatics.
Downloads
Published
Issue
Section
License

Este obra está licenciado com uma Licença Creative Commons Atribuição 4.0 Internacional.