Format Acronym | Format Name | File Extension | Encoding | Notes |
---|---|---|---|---|
FASTA | .fa .fasta | Text | First Used by the FASTA software. Just holds sequences. Used for reference genomes. | |
FASTQ | .fq .fastq | Text | Adds sequence quality values to FASTA. Usually output from sequencers. | |
GenBank | .gb .gbk | Text | Developed by NCBI for GenBank Project. | |
SAM | Sequence Alignment Map | .sam | Text | |
BAM | Binary Alignment Map | .bam | Binary | Same information as SAM but in binary format. |
BAI | Binary Alignment/Map Index File | .bai | Binary | A table of contents for a BAM file. |
VCF | Variant Call Format | .vcf | Text | Information about variations from a reference. |
GTF | Gene Transfer Format | .gtf | Text | Old but still popular. |
GFF | General Feature Format | .gff | Text | An enhancement to GTF. |
GFF3 | General Feature Format Version 3 | .gff3 | Text | An enhancement to GFF. |
URL | Description |
---|---|
https://www.ncbi.nlm.nih.gov/genome/ | Genomic data available from NIH NCBI Datasets. |
https://www.ensembl.org/ | Ensembl is a genome browser for vertebrate genomes. |
https://www.internationalgenome.org/data/ | The International Genome Sample Resource (IGSR) and the 1000 Genomes Project. |
https://b1mg-project.eu/ | The Beyond 1 Million Genomes (B1MG) project is helping to create a network of genetic and clinical data across Europe. |
https://duos.org/ | Data Use Oversight System |
Data Processing Website, with inbuilt tools.
Workflow management, Java and Groovy based.
Workflow management, Python based.
A place to share your Shiny applications online, R based.