Sequence Alignment

From Canonica AI

Introduction

Sequence alignment is a method used in bioinformatics to arrange the sequences of DNA, RNA, or protein to identify regions of similarity. These similarities could be a result of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the sequences to optimize the alignment and to identify similarity, difference, or indels.

A close-up view of a computer screen displaying a sequence alignment of DNA sequences.
A close-up view of a computer screen displaying a sequence alignment of DNA sequences.

Types of Sequence Alignment

There are two main types of sequence alignment: global alignment and local alignment.

Global Alignment

Global alignment compares the entire sequence of two or more sequences. This type of alignment is useful when the sequences being compared are similar in size and structure and are expected to have evolved from a common ancestor. The most common algorithm used for global alignment is the Needleman-Wunsch algorithm.

Local Alignment

Local alignment identifies regions of similarity within long sequences that are often widely divergent overall. Local alignment is often used when the sequences do not have global similarity, but are suspected to contain regions of similarity or identical sequences "hidden" in otherwise dissimilar genetic codes. The Smith-Waterman algorithm is a commonly used method for local alignment.

Sequence Alignment Algorithms

Sequence alignment algorithms are computational procedures that determine the optimal alignment of two or more sequences. These algorithms can be either pairwise, meaning they align two sequences, or multiple, meaning they align more than two sequences.

Pairwise Sequence Alignment

Pairwise sequence alignment methods are used to find the best-matching piecewise (local or global) alignments of two query sequences. Pairwise alignments can only be used between two sequences at a time, but they are efficient to calculate and are often used for sequence database searching.

Multiple Sequence Alignment

Multiple sequence alignment involves the alignment of more than two sequences. These alignments are often used in identifying conserved sequence regions across a group of sequences hypothesized to be evolutionarily related. Multiple alignments are often used to assess sequence conservation, to predict protein secondary and tertiary structures, and to construct phylogenetic trees.

Scoring Systems

Scoring systems are used to quantify the quality of an alignment, and different scoring systems are used depending on the specific requirements of the alignment. Scoring systems generally assign scores for matches, mismatches, and gaps.

Substitution Matrices

A substitution matrix is used to determine the score for aligning different characters. The most commonly used substitution matrices are the PAM (Point Accepted Mutation) matrices and the BLOSUM (Blocks Substitution Matrix) series.

Gap Penalties

Gap penalties are used to discourage the introduction of gaps in the alignment. The penalty is usually larger for creating a new gap (gap opening penalty) than for extending an existing one (gap extension penalty).

Applications of Sequence Alignment

Sequence alignment is used in various applications in bioinformatics such as phylogenetic analysis, protein secondary structure prediction, domain searching, and database searching.

Phylogenetic Analysis

In phylogenetic analysis, sequence alignment is used to arrange sequences of species to reflect their evolutionary relationships. The degree of sequence similarity is indicative of the species' evolutionary closeness.

Protein Secondary Structure Prediction

In protein secondary structure prediction, sequence alignment can help identify the function of newly discovered proteins.

Domain Searching

In domain searching, sequence alignment helps identify common domains in protein sequences.

Database Searching

In database searching, sequence alignment is used to search sequence databases to find similar sequences.

See Also