Mastering the Art of Feature Selection: Python Techniques for Visualizing Feature Importance (2024)

daython3

6 min read

Method 1: Correlation Matrix Heatmap

One way to visualise feature importance is by creating a correlation matrix heatmap. A correlation matrix is a table that shows the pairwise correlations between different features in the dataset.

The heatmap shows the strength and direction of the correlation between each pair of features. A high positive correlation (closer to 1) indicates that two features are highly related. A low correlation (closer to 0) indicates that there is little to no linear relationship between the features.

In our case we use a correlation matrix heatmap to identify highly correlated features in the dataset. Highly correlated features may provide redundant or repetitive information to the model, which can negatively impact the model’s performance. By visualising the correlation matrix heatmap, we can identify such features and remove them from the dataset.

Here’s an example of using a correlation matrix heatmap to visualise feature correlation in a dataset with both continuous and discrete features:

Method 2: Univariate Feature Selection

Another way to visualise feature importance is by using univariate feature selection. Univariate feature selection is a statistical method that selects the features with the highest statistical significance with respect to the target variable. In other words, it selects the features that are most likely to be relevant for predicting the target variable.

It is important to mention that the effectiveness of this method can be influenced by the scale of the features.

Here’s an example of using univariate feature selection to visualise feature importance in a dataset with both continuous and discrete features using chi-square test:

# apply univariate feature selection
best_features = SelectKBest(score_func=chi2, k=5).fit(df_scaled, target)# get the scores and selected features
scores = best_features.scores_
selected_features = df_scaled.columns[best_features.get_support()]
sorted_idxs = np.argsort(scores)[::-1]
sorted_scores = scores[sorted_idxs]
sorted_feature_names = np.array(df_scaled.columns)[sorted_idxs]
# plot scores
plt.figure(figsize=(12, 6))
sns.barplot(x=sorted_scores, y=sorted_feature_names)
plt.xlabel('Scores')
plt.ylabel('Features')
plt.title('Feature Importances using Univariate Feature Selection (Chi-square test)')
plt.show()

Mastering the Art of Feature Selection: Python Techniques for Visualizing Feature Importance (4)

Here’s an example of using univariate feature selection to visualise feature importance in a dataset with both continuous and discrete features using anova test:

# apply univariate feature selection
best_features = SelectKBest(score_func=f_classif, k=5).fit(df_scaled, target)# get the scores and selected features
scores = best_features.scores_
selected_features = df_scaled.columns[best_features.get_support()]
sorted_idxs = np.argsort(scores)[::-1]
sorted_scores = scores[sorted_idxs]
sorted_feature_names = np.array(df_scaled.columns)[sorted_idxs]
# plot scores
plt.figure(figsize=(12, 6))
sns.barplot(x=sorted_scores, y=sorted_feature_names)
plt.xlabel('Scores')
plt.ylabel('Features')
plt.title('Feature Importances using Univariate Feature Selection (ANOVA)')
plt.show()

Mastering the Art of Feature Selection: Python Techniques for Visualizing Feature Importance (5)

In the case of discrete features, we can use chi-square or mutual information tests, while for continuous features, we can use ANOVA or correlation-based tests. In this case, I did not select specific features for each test since I wanted to check how the results are affected.

Method 3: Recursive Feature Elimination

Recursive feature elimination is a machine learning technique that selects features by recursively considering smaller and smaller sets of features. It starts by considering all features, fits a model, and eliminates the least important feature based on a predefined criterion. In this case, we set the n_features_to_select parameter to select the 5 most important features.

Here’s an example of using recursive feature elimination to visualise feature importance in a dataset with both continuous and discrete features:

# Create a random forest classifier
clf = RandomForestClassifier()# Apply recursive feature elimination
selector = RFE(clf, n_features_to_select=5)
selector = selector.fit(features, target)
X_new = selector.transform(features)
# Plot feature importances
importances = selector.estimator_.feature_importances_
std = np.std([tree.feature_importances_ for tree in selector.estimator_.estimators_], axis=0)
indices = np.argsort(importances)[::-1]
plt.figure(figsize=(12, 6))
plt.title("Feature importances")
plt.bar(range(X_new.shape[1]), importances[indices], color="r", yerr=std[indices], align="center")
plt.xticks(range(X_new.shape[1]), features.columns[selector.get_support()][indices], rotation=90)
plt.xlim([-1, X_new.shape[1]])
plt.ylabel('Feature Imporance Scores')
plt.xlabel('Features')
plt.title('Feature Importances using Recursive Feature Elimination based on Random Forest')
plt.show()

Mastering the Art of Feature Selection: Python Techniques for Visualizing Feature Importance (6)

Method 4: Feature Importance from Tree-based Models

Another method for visualising feature importance is by using tree-based models such as Random Forest or Gradient Boosting. These models can be used to rank the importance of each feature in the dataset. In Python, we can use the feature_importances_ attribute of the trained tree-based models to get the feature importance scores. The scores can be visualised using a bar chart.

Here is an example code snippet for visualising feature importance from a Random Forest model:

# Train Random Forest model
rf_model = RandomForestClassifier()
rf_model.fit(features, target)# Get feature importances
importances = rf_model.feature_importances_
# Visualize feature importances
plt.figure(figsize=(12, 6))
plt.bar(range(features.shape[1]), importances)
plt.xticks(range(features.shape[1]), features.columns, rotation=90)
plt.ylabel('Feature Imporance Scores')
plt.xlabel('Features')
plt.title('Feature Importances using Random Forest')
plt.show()

Mastering the Art of Feature Selection: Python Techniques for Visualizing Feature Importance (7)

Method 5: LASSO Regression

LASSO (Least Absolute Shrinkage and Selection Operator) is a modification of linear regression method that performs both feature selection and regularisation to prevent overfitting. LASSO shrinks the regression coefficients of less important features to zero, effectively removing them from the model. The remaining non-zero coefficients indicate the important features.

It is important to mention that the effectiveness of this method can be influenced by the scale of the features.

Here’s an example of using LASSO regression to visualise feature importance in a dataset with both continuous and discrete features:

# Fit the LASSO model
lasso = LassoCV(cv=5, random_state=0)
lasso.fit(df_scaled, target)# Plot the coefficients
plt.figure(figsize=(10,6))
plt.plot(range(len(df_scaled.columns)), lasso.coef_, marker='o', markersize=8, linestyle='None')
plt.axhline(y=0, color='gray', linestyle='--', linewidth=2)
plt.xticks(range(len(df_scaled.columns)), df_scaled.columns, rotation=90)
plt.ylabel('Coefficients')
plt.xlabel('Features')
plt.title('Feature Importance using LASSO Regression')
plt.show()

Mastering the Art of Feature Selection: Python Techniques for Visualizing Feature Importance (8)

Conclusion

In this article, we explored different methods for visualising feature importance in a dataset using Python. We covered correlation matrix heatmaps, univariate feature selection, recursive feature elimination, feature importance from tree-based models, and lasso regression.

Visualising feature importance is an important step in the machine learning workflow as it helps identify the most important features that contribute to the predictive power of the model. By using the methods covered in this article, you can gain insights into the relationships between features and their impact on the target variable.

Remember, feature selection is not a one-size-fits-all approach, and the best method for your dataset may depend on your specific problem and data. Therefore, it is always a good idea to try different methods and evaluate their performance before selecting the best one for your problem.

Additionally, it’s important to note that feature importance is just one aspect of feature selection. Depending on the problem at hand, other methods such as principal component analysis (PCA) or independent component analysis (ICA) may be more appropriate. Additionally, it’s important to use domain knowledge to guide feature selection and not rely solely on automatic methods.