| Format Acronym | Format Name | File Extension | Encoding | Notes |
|---|---|---|---|---|
| FASTA | .fa .fasta | Text | First Used by the FASTA software. Just holds sequences. Used for reference genomes. | |
| FASTQ | .fq .fastq | Text | Adds sequence quality values to FASTA. Usually output from sequencers. | |
| GenBank | .gb .gbk | Text | Developed by NCBI for GenBank Project. | |
| SAM | Sequence Alignment Map | .sam | Text | |
| BAM | Binary Alignment Map | .bam | Binary | Same information as SAM but in binary format. |
| BAI | Binary Alignment/Map Index File | .bai | Binary | A table of contents for a BAM file. |
| VCF | Variant Call Format | .vcf | Text | Information about variations from a reference. |
| GTF | Gene Transfer Format | .gtf | Text | Old but still popular. |
| GFF | General Feature Format | .gff | Text | An enhancement to GTF. |
| GFF3 | General Feature Format Version 3 | .gff3 | Text | An enhancement to GFF. |
| URL | Description |
|---|---|
| https://www.ncbi.nlm.nih.gov/genome/ | Genomic data available from NIH NCBI Datasets. |
| https://www.ensembl.org/ | Ensembl is a genome browser for vertebrate genomes. |
| https://www.internationalgenome.org/data/ | The International Genome Sample Resource (IGSR) and the 1000 Genomes Project. |
| https://b1mg-project.eu/ | The Beyond 1 Million Genomes (B1MG) project is helping to create a network of genetic and clinical data across Europe. |
| https://duos.org/ | Data Use Oversight System |
Data Processing Website, with inbuilt tools.
Workflow management, Java and Groovy based.
Workflow management, Python based.
A place to share your Shiny applications online, R based.
OHDSI, Observational Health Data Sciences and Informatics, pronounced ‘Odessey’.
OMOP, Observational Medical Outcomes Partnership.
OMOP CDM, OMOP Common Data Model.