Bioinformatics Cheatsheet

1) Genomic Data File Types

Format Acronym Format Name File Extension Encoding Notes
FASTA .fa .fasta Text First Used by the FASTA software.
FASTQ .fq .fastq Text Adds sequence quality values to FASTA.
GenBank .gb .gbk Text Developed by NCBI for GenBank Project.
SAM Sequence Alignment Map .sam Text
BAM Binary Alignment Map .bam Binary Same information as SAM but in binary format.
BAI Binary Alignme/Map Index File .bai Binaru A table of contents for a BAM file.
VCF Variant Call Format .vcf Text Information about variations from a reference.
GTF Gene Transfer Format .gtf Text Old but still popular.
GFF General Feature Format .gff Text An enhancement to GTF.
GFF3 General Feature Format Version 3 .gff3 Text An enhancement to GFF.

2) Sources of Genomic Data Files

https://www.ncbi.nlm.nih.gov/genome/

https://www.ensembl.org/

https://www.internationalgenome.org/data/