BLAST for dummies - mn/ibv/bioinfwiki (2024)

Contents

  • 1 BLAST for dummies
    • 1.1 Sequence similarity searches: queries and hits
    • 1.2 The BLAST database
    • 1.3 Aligning query and hit sequences
    • 1.4 Understanding the BLAST output

Sequence similarity searches: queries and hits

The BLAST algorithm is more or less the standard way of performing sequence similarity searches. With ‘sequences’, we mean biological (nucleotide or amino acid) sequences. There are many different reasons as to why such searches may be performed. Typically, the user has one (or many) unknown sequences, and he/she wants to understand what these sequences are or what they do. In the terminology used by BLAST, these are the query sequences. A sequence search will (hopefully) identify sequences that are similar (or even identical) to the queries. The identified sequences are often called the hit sequences (or just hits). Typically, there is much more known about the hits than the query. For instance, we may know that a specific hit is an enzyme. If the match between the query and the hit is sufficient good, we may conclude that the query sequence also is an enzyme (but not necessarily with exactly the same specificity!). Sometimes, we also perform BLAST searches with queries that are already known to the user. In keeping with the previous example, we may use the sequence of a well-known enzyme as a query sequence. After performing a BLAST search, the hit sequences do not help us identify the nature of the query sequence, but they may tell us something about the distribution of this particular protein in other organisms (provided this information is included in the hit descriptions).

The BLAST database

From the above it is clear that, in order to provide information about a given query, BLAST needs a collection of sequences that the query is compared to. Such a collection of queries is known as the database. When executing, BLAST will compare the query sequence to every single sequence in the database. If a similarity is detected, BLAST will output this sequence as a hit. Both the query and the database must be formatted as FASTA files, i.e. each sequence must contain a header starting with the “>” character, followed by the actual sequence on the following lines. The database will often consist of one FASTA file containing very many separate FASTA sequences. Such BLAST databases can be created by the user, but often previously created databases are used.

Sometimes, the user wants to find hit sequences that are 100% identical to the query. Such a search is obviously easy to accomplish. Finding matches that are similar (but not identical) is a much more difficult task. BLAST is (within certain limits) able to do this. But this also implies that not all hits for a given query are equal; some hits will be better than others. In fact, some hits may display so little similarity with the query that we should disregard them altogether.

Aligning query and hit sequences

How does BLAST identify similarity between sequences? BLAST tries to create an alignment between the query and a given database sequence. To start with, a short 100% identical match must be found between the query and database sequences. If such a match is found, the alignment is extended in both directions. Matching characters are awarded points; if the sum of these points keeps increasing, the extension continues. If the sum of points drops below a limit, the alignment extension is stopped, and the hit is reported.

It is not necessary to understand the exact mechanism behind this algorithm. But it is clear that BLAST needs to be instructed about the precise manner of scoring an alignment (i.e. awarding points for matching characters). If using nucleotide sequences, this is accomplished in a very simple manner: only matching characters increase the point score. But when using amino acid sequences, this becomes a bit more complicated. Some amino acids are so dissimilar that they are not awarded points (or indeed get a negative score). But some amino acids are quite similar to each other, such as leucine and isoleucine. These get scores almost as good as identical amino acids. The precise scores for every possible amino acid pair are defined in so-called matrix files. The standard BLAST matrix is called the BLOSUM62 matrix. Along with specifying a query and a database, the user needs to specify which matrix to use when running BLAST.

It is important to understand that this way of creating alignments is not a perfect algorithm. It is used in BLAST because it is very fast, but it will miss or under-report certain types of similarities. (The interested reader may look up “dynamic programming” to find an algorithm that theoretically will produce perfect alignments). The great advantage of BLAST is not its exactness, but its speed.

Understanding the BLAST output

It should be clear from the above that the output of BLAST consists of a list of hits for a given query sequence. The hits are ordered according to their similarity with the query. The most basic measurement of similarity is the “bitscore” or just (“score”), which simply reflects the points awarded the BLAST-generated alignment. The score is recalculated to provide the “E-value”, which quantifies the possibility of a hit being produced just by chance.

It is possible to run BLAST specifying multiple query sequences. In that case, BLAST simply processes one query at the time, and adds the output to the same output file, starting with a definition of the query used. If using many queries in one BLAST run, the output thereof can quickly become overwhelming. In that case, it is useful to use a tool to visualize the BLAST output. One such tool has been developed at UoO:

BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data

(doi:10.1186/1471-2105-15-128)

If you are interested in other options, you can read the following paper:

BLAST output visualization in the new sequencing era

(doi: 10.1093/bib/bbt009)

BLAST for dummies - mn/ibv/bioinfwiki (2024)
Top Articles
Algorithmic trading review | Semantic Scholar
What is Algorithmic Trading? The Definitive Guide
Craigslist San Francisco Bay
Kostner Wingback Bed
Pollen Count Centreville Va
Artem The Gambler
123 Movies Black Adam
Limp Home Mode Maximum Derate
Find All Subdomains
Ati Capstone Orientation Video Quiz
Notary Ups Hours
Crusader Kings 3 Workshop
Aces Fmc Charting
OSRS Dryness Calculator - GEGCalculators
iLuv Aud Click: Tragbarer Wi-Fi-Lautsprecher für Amazons Alexa - Portable Echo Alternative
Apne Tv Co Com
Scenes from Paradise: Where to Visit Filming Locations Around the World - Paradise
Napa Autocare Locator
Transfer and Pay with Wells Fargo Online®
How do I get into solitude sewers Restoring Order? - Gamers Wiki
Honda cb750 cbx z1 Kawasaki kz900 h2 kz 900 Harley Davidson BMW Indian - wanted - by dealer - sale - craigslist
How To Cancel Goodnotes Subscription
Icivics The Electoral Process Answer Key
Bernie Platt, former Cherry Hill mayor and funeral home magnate, has died at 90
Craigslist Pennsylvania Poconos
Beaufort 72 Hour
Account Now Login In
Lindy Kendra Scott Obituary
Penn State Service Management
101 Lewman Way Jeffersonville In
Kristy Ann Spillane
Stouffville Tribune (Stouffville, ON), March 27, 1947, p. 1
Filmy Met
Salons Open Near Me Today
Bernie Platt, former Cherry Hill mayor and funeral home magnate, has died at 90
Smartfind Express Henrico
The Wichita Beacon from Wichita, Kansas
Audi Q3 | 2023 - 2024 | De Waal Autogroep
Aliciabibs
Austin Automotive Buda
3496 W Little League Dr San Bernardino Ca 92407
8 Ball Pool Unblocked Cool Math Games
Immobiliare di Felice| Appartamento | Appartamento in vendita Porto San
Lyndie Irons And Pat Tenore
Costco Gas Foster City
Yakini Q Sj Photos
Brown launches digital hub to expand community, career exploration for students, alumni
Go Nutrients Intestinal Edge Reviews
Makes A Successful Catch Maybe Crossword Clue
Land of Samurai: One Piece’s Wano Kuni Arc Explained
Att Corporate Store Location
Latest Posts
Article information

Author: Margart Wisoky

Last Updated:

Views: 6590

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.