top of page

Glossary

FASTA

FASTA

 

FASTA is the universal text-based format in bioinformatics which describes nucleotide or amino acid sequences. It usually starts with a single-line description and each letter represents a nucleotide or an amino acid. The identifier line starts with a greater than symbol (“>”) which separates it from the sequence data. 

The program uses a “hashing” strategy to find matches for a short stretch of identical residues with a length of k called “k-tuples”, like the “words” in BLAST. Tuples that have similar positions in their respective sequences are saved. The program then looks for regions of high similarity between the query and database sequence. These regions are dense in similar tuples. Again, a substation matrix is used to judge whether these regions/ segments are a good match. Neighboring high scoring segments are joined together to form a single alignment. 

BLAST

BLAST

The Basic Local Alignment Search Tool is an algorithm that is used to produce matches based on the statistical significance between an input of protein/ nucleotide sequences against a library/ database of sequences.  

The statistical significance is described by the e-value, which estimates how many matches could have occurred at a given score by chance. The lower the value, the more significant. 

It uses fast heuristic methods, which sacrifices the guarantee of a perfect match for speed. 

Scientists use the produced matches to make inferences about the structure and function of the protein they input initially. 

Firstly, it undergoes a process called “seeding”, where the sequence is split into a list of “words” made up of 3 amino acid residues. The algorithm looks through its database for sequences containing those words. A score describes how well those words match through a substitution matrix; a word is considered a match if it is above a threshold. If the word is a match, then the word is extended in both directions, while an eye is kept on the score of the new extension, the sequence is dropped if the score falls below a threshold. 

The results from BLAST searches can help in inferring evolutionary and functional relationships between sequences. It can also help in determining the members of gene families.  

MSA

Multiple Sequence Alignment

 

Multiple Sequence Alignments are a very effective tool for sequence comparison. Compares nucleotide or amino acids sequences based on a pairwise alignment. It can be used to infer evolutionary relationships and for phylogenetic analyses.

bottom of page