Support Vector Machine (SVM) Algorithm - GeeksforGeeks (2024)

Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear classification, regression, and even outlier detection tasks. SVMs can be used for a variety of tasks, such as text classification, image classification, spam detection,handwriting identification, gene expression analysis, face detection, and anomaly detection. SVMs are adaptable and efficient in a variety of applications because they can manage high-dimensional data and nonlinear relationships.

SVM algorithms are very effective as we try to find the maximum separating hyperplane between the different classes available in the target feature.

Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and regression. Though we say regression problems as well it’s best suited for classification. The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the data points in different classes in the feature space. The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible. The dimension of the hyperplane depends upon the number of features. If the number of input features is two, then the hyperplane is just a line. If the number of input features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to imagine when the number of features exceeds three.

Let’s consider two independent variables x1, x2, and one dependent variable which is either a blue circle or a red circle.

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (1)

Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line because we are considering only two input features x1, x2) that segregate our data points or do a classification between red and blue circles. So how do we choose the best line or in general the best hyperplane that segregates our data points?

How does SVM work?

One reasonable choice as the best hyperplane is the one that represents the largest separation or margin between the two classes.

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (2)

Multiple hyperplanes separate the data from two classes

So we choose the hyperplane whose distance from it to the nearest data point on each side is maximized. If such a hyperplane exists it is known as the maximum-margin hyperplane/hard margin. So from the above figure, we choose L2. Let’s consider a scenario like shown below

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (3)

Selecting hyperplane for data with outlier

Here we have one blue ball in the boundary of the red ball. So how does SVM classify the data? It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane that maximizes the margin. SVM is robust to outliers.

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (4)

Hyperplane which is the most optimized one

So in this type of data point what SVM does is, finds the maximum margin as done with previous data sets along with that it adds a penalty each time a point crosses the margin. So the margins in these types of cases are called soft margins. When there is a soft margin to the data set, the SVM tries to minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly used penalty. If no violations no hinge loss.If violations hinge loss proportional to the distance of violation.

Till now, we were talking about linearly separable data(the group of blue balls and red balls are separable by a straight line/linear line). What to do if data are not linearly separable?

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (5)

Original 1D dataset for classification

Say, our data is shown in the figure above. SVM solves this by creating a new variable using a kernel. We call a point xi on the line and we create a new variable yi as a function of distance from origin o.so if we plot this we get something like as shown below

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (6)

Mapping 1D data to 2D to become able to separate the two classes

In this case, the new variable y is created as a function of distance from the origin. A non-linear function that creates a new variable is referred to as a kernel.

Support Vector Machine Terminology

  1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data points of different classes in a feature space. In the case of linear classifications, it will be a linear equation i.e. wx+b = 0.
  2. Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a critical role in deciding the hyperplane and margin.
  3. Margin: Margin is the distance between the support vector and hyperplane. The main objective of the support vector machine algorithm is to maximize the margin. The wider margin indicates better classification performance.
  4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original input data points into high-dimensional feature spaces, so, that the hyperplane can be easily found out even if the data points are not linearly separable in the original input space. Some of the common kernel functions are linear, polynomial, radial basis function(RBF), and sigmoid.
  5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a hyperplane that properly separates the data points of different categories without any misclassifications.
  6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a soft margin technique. Each data point has a slack variable introduced by the soft-margin SVM formulation, which softens the strict margin requirement and permits certain misclassifications or violations. It discovers a compromise between increasing the margin and reducing violations.
  7. C: Margin maximisation and misclassification fines are balanced by the regularisation parameter C in SVM. The penalty for going over the margin or misclassifying data items is decided by it. A stricter penalty is imposed with a greater value of C, which results in a smaller margin and perhaps fewer misclassifications.
  8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect classifications or margin violations. The objective function in SVM is frequently formed by combining it with the regularisation term.
  9. Dual Problem: A dual Problem of the optimisation problem that requires locating the Lagrange multipliers related to the support vectors can be used to solve SVM. The dual formulation enables the use of kernel tricks and more effective computing.

Mathematical intuition of Support Vector Machine

Consider a binary classification problem with two classes, labeled as +1 and -1. We have a training dataset consisting of input feature vectors X and their corresponding class labels Y.

The equation for the linear hyperplane can be written as:

[Tex]w^Tx+ b = 0[/Tex]

The vector W represents the normal vector to the hyperplane. i.e the direction perpendicular to the hyperplane. The parameter b in the equation represents the offset or distance of the hyperplane from the origin along the normal vector w.

The distance between a data point x_i and the decision boundary can be calculated as:

[Tex]d_i = \frac{w^T x_i + b}{||w||}[/Tex]

where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the normal vector W

For Linear SVM classifier :

[Tex]\hat{y} = \left\{ \begin{array}{cl} 1 & : \ w^Tx+b \geq 0 \\ 0 & : \ w^Tx+b < 0 \end{array} \right.[/Tex]

Optimization:

  • For Hard margin linear SVM classifier:

[Tex]\underset{w,b}{\text{minimize}}\frac{1}{2}w^Tw =\underset{W,b}{\text{minimize}}\frac{1}{2}\left\| w \right\|^{2} \\ \text{subject to}\; y_i(w^Tx_i + b) \geq 1 \;for\; i = 1, 2,3, \cdots,m[/Tex]

The target variable or label for the ith training instance is denoted by the symbol ti in this statement. And ti=-1 for negative occurrences (when yi= 0) and ti=1positive instances (when yi = 1) respectively. Because we require the decision boundary that satisfy the constraint: [Tex]t_i(w^T x_i + b)\geq1[/Tex]

  • For Soft margin linear SVM classifier:

[Tex]\underset{w,b}{\text{minimize }}\frac{1}{2}w^Tw+ C \sum_{i=1}^m \zeta_{i} \\ \text{subject to } y_i(w^Tx_i + b)\ge 1-\zeta_{i}\;\; and \; \zeta_{i} \ge 0\;\; for \; i = 1, 2,3, \cdots,m[/Tex]

  • Dual Problem: A dual Problem of the optimisation problem that requires locating the Lagrange multipliers related to the support vectors can be used to solve SVM. The optimal Lagrange multipliers α(i) that maximize the following dual objective function

[Tex]\underset{\alpha}{\text{maximize}}: \frac{1}{2}\underset{i\to m\;}{\sum}\;\underset{j\to m}{\sum} \alpha_i\alpha_j t_i t_j K(x_i, x_j) -\underset{i\to m}{\sum}\alpha_i[/Tex]

where,

  • αi is the Lagrange multiplier associated with the ith training sample.
  • K(xi, xj) is the kernel function that computes the similarity between two samples xi and xj. It allows SVM to handle nonlinear classification problems by implicitly mapping the samples into a higher-dimensional feature space.
  • The term ∑αi represents the sum of all Lagrange multipliers.

The SVM decision boundary can be described in terms of these optimal Lagrange multipliers and the support vectors once the dual issue has been solved and the optimal Lagrange multipliershave been discovered. The training samples that have i > 0 are the support vectors, while the decision boundary is supplied by:

[Tex]w= \underset{i\to m}{\sum}\alpha_i t_i K(x_i, x) + b \\ t_i(w^Tx_i-b) = 1 \Longleftrightarrow b= w^Tx_i-t_i[/Tex]

Types of Support Vector Machine

Based on the nature of the decision boundary, Support Vector Machines (SVM) can be divided into two main parts:

  • Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the data points into their respective classes. A hyperplane that maximizes the margin between the classes is the decision boundary.
  • Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be separated into two classes by a straight line (in the case of 2D). By using kernel functions, nonlinear SVMs can handle nonlinearly separable data. The original input data is transformed by these kernel functions into a higher-dimensional feature space, where the data points can be linearly separated. A linear SVM is used to locate a nonlinear decision boundary in this modified space.

Popular kernel functions in SVM

The SVM kernel is a function that takes low-dimensional input space and transforms it into higher-dimensional space, ie it converts nonseparable problems to separable problems. It is mostly useful in non-linear separation problems. Simply put the kernel, does some extremely complex data transformations and then finds out the process to separate the data based on the labels or outputs defined.

[Tex]\begin{aligned} \text{Linear : } K(w,b) &= w^Tx+b \\ \text{Polynomial : } K(w,x) &= (\gamma w^Tx+b)^N \\ \text{Gaussian RBF: } K(w,x) &= \exp(-\gamma|| x_i-x_j||^n \\ \text{Sigmoid :} K(x_i, x_j) &= \tanh(\alpha x_i^Tx_j + b) \end{aligned}[/Tex]

Advantages of SVM

  • Effective in high-dimensional cases.
  • Its memory is efficient as it uses a subset of training points in the decision function called support vectors.
  • Different kernel functions can be specified for the decision functions and its possible to specify custom kernels.

SVM implementation in Python

Predict if cancer is Benign or malignant. Using historical data about patients diagnosed with cancer enables doctors to differentiate malignant cases and benign ones are given independent attributes.

Steps

  • Load the breast cancer dataset from sklearn.datasets
  • Separate input features and target variables.
  • Build and train the SVM classifiers using RBF kernel.
  • Plot the scatter plot of the input features.
  • Plot the decision boundary.
  • Plot the decision boundary
Python

# Load the important packagesfrom sklearn.datasets import load_breast_cancerimport matplotlib.pyplot as pltfrom sklearn.inspection import DecisionBoundaryDisplayfrom sklearn.svm import SVC# Load the datasetscancer = load_breast_cancer()X = cancer.data[:, :2]y = cancer.target#Build the modelsvm = SVC(kernel="rbf", gamma=0.5, C=1.0)# Trained the modelsvm.fit(X, y)# Plot Decision BoundaryDecisionBoundaryDisplay.from_estimator( svm, X, response_method="predict", cmap=plt.cm.Spectral, alpha=0.8, xlabel=cancer.feature_names[0], ylabel=cancer.feature_names[1], )# Scatter plotplt.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolors="k")plt.show()

Output:

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (7)

Breast Cancer Classifications with SVM RBF kernel



aswathisasidharan

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (9)

Improve

Previous Article

Regression using k-Nearest Neighbors in R Programming

Next Article

Classifying data using Support Vector Machines(SVMs) in Python

Please Login to comment...

Support Vector Machine (SVM) Algorithm - GeeksforGeeks (2024)

FAQs

What are support vectors in SVM algorithm? ›

The lines that are adjacent to the optimal hyperplane are known as support vectors as these vectors run through the data points that determine the maximal margin. The SVM algorithm is widely used in machine learning as it can handle both linear and nonlinear classification tasks.

What is the best algorithm for SVM? ›

Why is SVM the best algorithm? A. SVM is considered one of the best algorithms because it can handle high-dimensional data, is effective in cases with limited training samples, and can handle non-linear classification using kernel functions.

What is SVM in machine learning GeeksforGeeks? ›

Support Vector Machine(SVM)

Support Vector Machine is a effective supervised machine learning algorithm used for classification and regression tasks. The main objective of SVM is to find an optimal hyperplane that best separates the data into different classes in a high-dimensional space.

What are the basics of SVM? ›

The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the data points in different classes in the feature space. The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible.

What is a real life example of a support vector machine? ›

Real-Life Applications of SVMs:

Image Recognition: Facebook's facial recognition feature uses SVM to identify faces in uploaded photos. Google Photos also uses SVM to categorize and search images. Speech Recognition: Apple's Siri and Google Assistant use SVM to recognize voice commands.

How many support vectors are needed in SVM? ›

Hard Margin Linear SVM

All of the support vectors lie exactly on the margin. Regardless of the number of dimensions or size of data set, the number of support vectors could be as little as 2.

What is the main goal of SVM in machine learning? ›

The goal of the Support Vector Machine (SVM) algorithm in machine learning is to find an optimal hyperplane that separates different classes of data points in a high-dimensional space. SVM is a supervised learning algorithm that can be used for both classification and regression tasks.

What are the disadvantages of SVM? ›

SVM Disadvantages

Long training time for large datasets. Difficult to understand and interpret the final model, variable weights and individual impact. Since the final model is not so easy to see, we can not do small calibrations to the model hence its tough to incorporate our business logic.

When to use SVM? ›

Linear SVM is used when the data is linearly separable, which means that the classes can be separated with a straight line (in 2D) or a flat plane (in 3D). The SVM algorithm finds the hyperplane that best divides the data into classes. Non-Linear SVM is used when the data is not linearly separable.

How does SVM actually work? ›

SVMs can handle both linearly separable and non-linearly separable data. They do this by using different types of kernel functions, such as the linear kernel, polynomial kernel or radial basis function (RBF) kernel. These kernels enable SVMs to effectively capture complex relationships and patterns in the data.

Is SVM supervised or unsupervised? ›

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

What are support vectors in hard margin SVM? ›

Support vectors are special because they are the training points that define the maximum margin of the hyperplane to the data set and they therefore determine the shape of the hyperplane. If you were to move one of them and retrain the SVM, the resulting hyperplane would change.

What are the support vectors in SVR? ›

Support Vectors: In SVR, data points that are closest to the regression line and define the margin are known as “support vectors.” These points play a crucial role in determining the regression model.

What is the support of a vector? ›

support of a vector is the number of non-zero elements in that vector.

What is the difference between logistic regression and support vector machines? ›

Difference between SVM and Logistic Regression

SVM is based on geometrical properties of the data while logistic regression is based on statistical approaches. The risk of overfitting is less in SVM, while Logistic regression is vulnerable to overfitting.

Top Articles
15 Best Ways To Make Money Reading Books
Emeralds &amp; Religion - The History of Emeralds in Religion | The Natural Emerald Company
Craigslist Livingston Montana
Worcester Weather Underground
Bin Stores in Wisconsin
Body Rubs Austin Texas
Chase Bank Operating Hours
Nikki Catsouras Head Cut In Half
What's New on Hulu in October 2023
Irving Hac
Savage X Fenty Wiki
Craigslist Labor Gigs Albuquerque
Facebook Marketplace Charlottesville
Https E24 Ultipro Com
Interactive Maps: States where guns are sold online most
"Une héroïne" : les funérailles de Rebecca Cheptegei, athlète olympique immolée par son compagnon | TF1 INFO
Urban Dictionary: hungolomghononoloughongous
Pinellas Fire Active Calls
O'Reilly Auto Parts - Mathis, TX - Nextdoor
Homeaccess.stopandshop
Teekay Vop
Raw Manga 1000
Best Boston Pizza Places
Speedstepper
WRMJ.COM
Farm Equipment Innovations
Ocala Craigslist Com
Worthington Industries Red Jacket
Emiri's Adventures
Chase Bank Cerca De Mí
Netherforged Lavaproof Boots
Muma Eric Rice San Mateo
Scanning the Airwaves
The Closest Walmart From My Location
Indio Mall Eye Doctor
Wrigley Rooftops Promo Code
How Many Dogs Can You Have in Idaho | GetJerry.com
M Life Insider
Skyward Marshfield
RECAP: Resilient Football rallies to claim rollercoaster 24-21 victory over Clarion - Shippensburg University Athletics
Weather Underground Cedar Rapids
Rush Copley Swim Lessons
Petra Gorski Obituary (2024)
Goats For Sale On Craigslist
Tìm x , y , z :a, \(\frac{x+z+1}{x}=\frac{z+x+2}{y}=\frac{x+y-3}{z}=\)\(\frac{1}{x+y+z}\)b, 10x = 6y và \(2x^2\)\(-\) \(...
Rétrospective 2023 : une année culturelle de renaissances et de mutations
Publix Store 840
Jovan Pulitzer Telegram
WHAT WE CAN DO | Arizona Tile
Latest Posts
Article information

Author: Errol Quitzon

Last Updated:

Views: 5934

Rating: 4.9 / 5 (59 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Errol Quitzon

Birthday: 1993-04-02

Address: 70604 Haley Lane, Port Weldonside, TN 99233-0942

Phone: +9665282866296

Job: Product Retail Agent

Hobby: Computer programming, Horseback riding, Hooping, Dance, Ice skating, Backpacking, Rafting

Introduction: My name is Errol Quitzon, I am a fair, cute, fancy, clean, attractive, sparkling, kind person who loves writing and wants to share my knowledge and understanding with you.