How XGBoost algorithm works—ArcGIS Pro (2024)

XGBoost is a supervised machine learning method for classification and regression and is used by the Train Using AutoML tool. XGBoost is short for extreme gradient boosting. This method is based on decision trees and improves on other methods such as random forest and gradient boost. It works well with large, complicated datasets by using various optimization methods.

To fit a training dataset using XGBoost, an initial prediction is made. Residuals are computed based on the predicted value and the observed values. A decision tree is created with the residuals using a similarity score for residuals. The similarity of the data in a leaf is calculated, as well as the gain in similarity in the subsequent split. The gains are compared to determine a feature and a threshold for a node. The output value for each leaf is also calculated using the residuals. For classification, the values are typically calculated using the log of odds and probabilities. The output of the tree becomes the new residual for the dataset, which is used to construct another tree. This process is repeated until the residuals stop reducing or for a specified number of times. Each subsequent tree learns from the previous trees and is not assigned equal weight, unlike how Random Forest works.

To use this model for prediction, the output from each tree multiplied by a learning rate is added to the initial prediction to arrive at a final value or classification.

XGBoost uses the following parameters and methods to optimize the algorithm and provide better results and performance:

  • Regularization—A Regularization parameter (lambda) is used while calculating the similarity scores to reduce the sensitivity to individual data and avoid overfitting.
  • Pruning—A Tree Complexity Parameter (gamma) is selected to compare the gains. The branch where the gain is smaller than the gamma value is removed. This prevents overfitting by trimming unnecessary branches and reducing the depth of the trees.
  • Weighted quantile sketch—Instead of testing every possible value as the threshold for splitting the data, only weighted quantiles are used. The selection of quantiles is done using a sketch algorithm, which estimates a distribution on multiple systems over a network.
  • Parallel Learning—This method divides the data into blocks that can be used in parallel to create the trees or for other computations.
  • Sparsity-aware split finding—XGBoost handles sparsity in data by trying both directions in a split and finding a default direction by calculating the gain.
  • Cache-aware Access—This method uses the cache memory of the system to calculate the similarity scores and output values. The cache memory is a faster access memory compared to the main memory and improves the overall performance of the model.
  • Blocks for Out-of-core Computation—This method works with large datasets that cannot fit in the cache or the main memory and that must be kept in hard drives. The dataset is divided into blocks and compressed. Uncompressing the data in the main memory is faster than reading from the hard drive. Another technique called sharding is used when the data must be kept on multiple hard drives.

Additional Resources

Chen Tianqi and Guestrin Carlos. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.

XGBoost Documentation

Related topics

Feedback on this topic?

How XGBoost algorithm works—ArcGIS Pro (2024)
Top Articles
Ledger Nano S Plus Crypto Hardware Wallet (Matte-Black) - Safeguard Your Crypto, NFTs and Tokens - Walmart.com
All You Need to Know About the Amazon Prime Store Card — Tally
Po Box 7250 Sioux Falls Sd
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
Avonlea Havanese
Tesla Supercharger La Crosse Photos
Obituary (Binghamton Press & Sun-Bulletin): Tully Area Historical Society
Best Theia Builds (Talent | Skill Order | Pairing + Pets) In Call of Dragons - AllClash
Barstool Sports Gif
Acbl Homeport
Azeroth Pilot Reloaded - Addons - World of Warcraft
Bros Movie Wiki
Springfield Mo Craiglist
Love In The Air Ep 9 Eng Sub Dailymotion
Midlife Crisis F95Zone
Craftology East Peoria Il
Eva Mastromatteo Erie Pa
Mzinchaleft
Palm Coast Permits Online
NHS England » Winter and H2 priorities
Bj Alex Mangabuddy
Unity - Manual: Scene view navigation
Governor Brown Signs Legislation Supporting California Legislative Women's Caucus Priorities
Hampton University Ministers Conference Registration
Jordan Poyer Wiki
How to Make Ghee - How We Flourish
Walmart Pharmacy Near Me Open
Beaufort 72 Hour
Kroger Feed Login
4Oxfun
JVID Rina sauce set1
Marokko houdt honderden mensen tegen die illegaal grens met Spaanse stad Ceuta wilden oversteken
Ou Football Brainiacs
Miles City Montana Craigslist
Angel Haynes Dropbox
Publix Christmas Dinner 2022
Craftsman Yt3000 Oil Capacity
Motor Mounts
Kamzz Llc
4083519708
Second Chance Apartments, 2nd Chance Apartments Locators for Bad Credit
6576771660
Here's Everything You Need to Know About Baby Ariel
Lady Nagant Funko Pop
Port Huron Newspaper
Crigslist Tucson
Devotion Showtimes Near Showplace Icon At Valley Fair
552 Bus Schedule To Atlantic City
Diccionario De Los Sueños Misabueso
Sam's Club Fountain Valley Gas Prices
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6550

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.