Hacker Pig Latin: A Base64 Primer for Security Analysts (2024)

Hacker Pig Latin: A Base64 Primer for Security Analysts (1)

(image by Daniel Berkman, via Adobe Stock)

Figure 1: Hacker Pig Latin: A Base64 Primer for Security Analysts (2) (image by Daniel Berkman, via Adobe Stock)

If you have young kids, you'll relate to the value of being able to speak in code. For a few years, I was able to use Pig Latin to speak covertly around my kids. It was handy, and surprisingly effective, until they decoded the scheme and began speaking Pig Latin in front of me. Artypay overray.

I think about this every time I witness attacks where pieces are encoded (not encrypted) to hide what's going on. One of these encodings is typically Base64. Why is this so common? Most machines speak Base64, but most security analysts don't.

In this post, I'll explain what the Base64 encoding scheme is, then discuss how it's used both for good and evil intent. Next, I'll look at some common detection applications of Base64 and where they sometimes fall short, giving advice on what you can do to strengthen them. Finally, I'll address some other encoding algorithms in the wild to help round out the topic and perhaps give you the ability to dead reckon when the bad guys may be trying to hide something.

What Is Base64?
Base64 is an encoding scheme that can take any binary input and represent it using a set of 64 ASCII characters.It's important to note that Base64 isnotencryption; it's an encoding scheme, so decoding it is trivial. Simple, free Base64 encode/decode tools are easy to find online.

Encoding in Base64 is an inflationary operation: the 11-character input string "Hello World" converts to 16 characters in Base64.

"Hello World" --( Base64Encode )--> "SGVsbG8gV29ybGQ="

Figure 4: Base64 Table Hacker Pig Latin: A Base64 Primer for Security Analysts (3) Credit: Wikimedia Commons

Let's look at a few more Base64 strings.

"Secret string" => U2VjcmV0IHN0cmluZw==

"Be sure to drink your ovaltine" => QmUgc3VyZSB0byBkcmluayB5b3VyIG92YWx0aW5l

They all contain characters from the set [A-Za-z0-9/+] and can end with 0-2 equal signs. Why these characters? The purpose of Base64 is to encode anything (namely binary data) into the characters that are carried easily by text-only protocols.

For example, e-mail was originally only designed to carry text data. As e-mail evolved, the protocols that delivered email didn't. Attaching binary documents like pictures and media files was not possible. The path of least resistance to allow email to progress was to create a binary-to-text encoding scheme rather than altering the protocol.

One facet of the SMTP protocol that makes this clear is the end of message indicator. In SMTP, the signal an email client uses to show the end of a message is for it to supply a single line that contains only a period.(SMTP Protocol implementation details, although long, are surprisingly easy to read:https://tools.ietf.org/html/rfc2821. Isn't this period trick odd? What if an email author wanted to send a single line with a period in their email?) Send arbitrary (non-text) data as part of the message body and you could possibly interfere with this protocol feature, and likely others, too.

Another common legitimate example of Base64 use is embedding raw binary data (e.g., images) in-line with html pages. HTML is a text-only protocol after all, and if you want to carry an image right in the page, versus by hyperlink for the browser to grab on a separate connection, Base64 is your answer.

Why Might Attackers Use Base64?
Base64 is often used to hide the plaintext elements of an attack that can't be concealed under the veil of encryption. Look for Base64 use in early stages of attacks, when the breach is narrow.

Using real encryption is hard during early attack stages because encryption requires tooling and key exchange. The adversary can't guarantee that the required cryptographic tools will be available and accessible on the victim host to decrypt anything. But Base64 tooling is far more ubiquitous.

Even if we presume tooling to not be an issue, carrying a symmetric key with the encrypted payload defeats the purpose; asymmetric keys aren't a solution as this requires both infrastructure and further exposure. When an adversary uses encryption, it usually occurs later in the attack and piggybacks over third-party infrastructure.

Examples:
Let's walk through a simple encoding exercise. Encoding the two letters "IN" into Base64 becomes "SU4=". Like so:

Figure 5: Hacker Pig Latin: A Base64 Primer for Security Analysts (4)

Important!
There's one BIG takeaway to absorb as you look at the tables and example: There is no direct translation between ASCII and its Base64 equivalent. In other words,the character "A" translates to three different representations in Base64 depending on what the offset is. It's this misunderstanding that lies at the root of the problem with many Base64 security detections.

These encoded strings illustrate this:

"Secret string" =Base64=> U2VjcmV0IHN0cmluZw==

"ASecret string" =Base64=> QVNlY3JldCBTdHJpbmc=

"AASecret string" =Base64=> QUFTZWNyZXQgU3RyaW5n

--- As we prepend more characters now, things begin repeating ---

"AAASecret string" =Base64=> QUFBU2VjcmV0IFN0cmluZw==

...

A detection that looks at the Base64 version of "Secret String" must consider that it has three representations.

Common Base64 Analysis Techniques and Oversights
When analyzing a string believed to be nefarious plaintext data hidden with Base64, it's important to remember that the suspect string may be only a fragment. It might be necessary to add padding to the beginning and adjust padding at the end to get the decoded text out. Let's look at an example using the CyberChef tool.

("Analysis techniques and oversights," continued on page 2 of 2)

Hacker Pig Latin: A Base64 Primer for Security Analysts (5)(continued from page 1)

Our suspect string is:

ldCBodHRwOi8vMTAuMS4yLjMvdG9vbGtpdHMvbm90aGluZ190b19zZWVfa*gVyZS5iaW4=

Step 1: Adjust Trailing Padding if Necessary
We put the suspect string into CyberChef and choose the "From Base64" recipe, which produces the error: "Data is not a valid byteArray." Adjust the number of trailing "=" from 0-2 until the error goes away. In this example, deleting the "=" allows for decoding.

Figure 6: Hacker Pig Latin: A Base64 Primer for Security Analysts (6) CyberChef

Step 2: If Plaintext Isn't Apparent, Prepend Some Characters
If the output looks to be binary and you suspect text, don't give up yet. Add some characters to the beginning to see if it's simply a bit alignment problem due to truncated data. You can use any valid Base64 character here, but consider using the "/" as the injected padding tends to stand out better (unless the first encoded character is already a "/"). From our test string, three padding characters caused the plaintext to be revealed.

Figure 7: Hacker Pig Latin: A Base64 Primer for Security Analysts (7)

Where Will I See Base64?
A security analyst will encounter Base64 encoded strings in a variety of places.

The routine and most common places come from examining mail attachments and embedded content (mostly images) from web pages. Other places should cause analysts to be on alert -- for instance, when Base64 strings are detected on the command line.

Below is an example of a reverse shell hiding in plain sight using a powershell command. (Ref: mkpsrevshell.py, https://gist.github.com/tothi/ab288fb523a4b32b51a53e542d40fe58.)This leverages the "-e / -EncodedCommand" feature of powershell that allows a Base64 string to be passed in. Powershell will decode the Base64, then execute the script inside.

Figure 8: Hacker Pig Latin: A Base64 Primer for Security Analysts (8) Ref mkpsrevshell.py https://gist.github.com/tothi/ab288fb523a4b32b51a53e542d40fe58

The behavior of spawning a process with Base64 reflected on the command line by itself is suspicious. If you're monitoring Windows process creation, you should inspect when you see that happening.

Let's look at another common oversight spotted in a Sigma IDS rule. The rule fragment below is published to Sigma and looks for a particular Base64 string (among other things, see full rule for that):

Figure 9: Hacker Pig Latin: A Base64 Primer for Security Analysts (9)

This rule contains a detection element if the string '"L3NlcnZlc" is observed. According to the rule, this string translates to "/server=." In fact, it falls a bit short. If we use CyberChef, we notice that it actually translates to "/servet" a mistake/bug introduced probably from the input string carrying a trailing "=" sign. Now that we are savvy Base64 sleuths, we can update this rule to the correct string: "L3NlcnZlcj0=." And also using our knowledge of the bit offset problem, add the two other Base64 variants that will detect the same thing: "y9zZXJ2ZXI9," "c2VydmVyPQ."

Another common Base64 exposure for security analysts is examining HTTP Basic Authentication. (Maybe this isn't as "common" as it used to be, but I'm pretty sure every security analyst has seen at least one of these alerts fire.) Here's an example of an HTTP header using it. The problem here is now pretty obvious. This is a plain-text password. HTTP basic auth carries the convention of Base64 encoded "username:password" in the "Authorization" client header. This example decodes to "joeuser:very$ecure."

Figure 10: Hacker Pig Latin: A Base64 Primer for Security Analysts (10)

Other Encoding Schemes
If you're a security analyst, at this point you may have realized a great evil application for Base64: data exfiltration over DNS! But there are a couple problems here. First, the defined character set for Base64 includes characters not allowed in DNS strings (+, /, =). Second, DNS is case-insensitive. An adversary couldn't guarantee that their Base64 encoded subdomain wouldn't get "lowered" along the way. But … there's always Base32! Base32 is very similar to Base64 encoding, except it carries data when we can't use upper/lowercase to encode information. Base32 is even more inflationary than Base64, so encoding large amounts of data for exfiltration using Base32 is surely to be a very loud network event.

Don't forget, too, that Base16 (hex) and Base2 (binary) are also valid encoding schemes with early access tooling available. Security analysts see these everywhere as part of their daily exposure but rarely as part of an adversary technique to analyze like Base64.

Variants of Base64 use different alphabets. For instance, there's a "filename safe" variant that substitutes the "/" for a "-."So just because you see something that looks like a Base64 string but has an "-" in it, don't discount it too quickly. The CyberChef tool demonstrated earlier can be configured for these alternate alphabets.

Summary
We explored Base64 encoding from the security analyst's perspective. Base64 encoding is traditionally used to convert binary data to printable text characters, but it can also be used to hide plaintext. Security analysts should keep these common techniques in mind while performing investigations, as all too often encoding plaintext as Base64 is enough to allow the best detection engine to miss (our eyes).

Once understood, Base64 detection flaws can be identified and signatures/logic improved to reflect all possible permutations.

Hacker Pig Latin: A Base64 Primer for Security Analysts (2024)

FAQs

What is Base64 in cybersecurity? ›

Base64 encoding is a method of converting binary data to text and is widely used for a variety of legitimate purposes including file transmission and email attachments.

Why do hackers use Base64? ›

Why Might Attackers Use Base64? Base64 is often used to hide the plaintext elements of an attack that can't be concealed under the veil of encryption. Look for Base64 use in early stages of attacks, when the breach is narrow.

What is a Base64 password? ›

Fundamentally, Base64 is used to encode binary data as printable text. This allows you to transport binary over protocols or mediums that cannot handle binary data formats and require simple text. [ Download now: A sysadmin's guide to Bash scripting. ] Base64 uses 6-bit characters grouped into 24-bit sequences.

Is Base64 clear text? ›

In HTTP Basic authentication, the "password:username" is encoded in Base64. Since it's not encrypted, it's cleartext.

What is Base64 used for? ›

Base64 is a binary to a text encoding scheme that represents binary data in an American Standard Code for Information Interchange (ASCII) string format. It's designed to carry data stored in binary format across the channels, and it takes any form of data and transforms it into a long string of plain text.

Can Base64 be decoded? ›

Base64 Decode involves reversing the encoding process. Here's a step-by-step breakdown: Convert Base64 Characters to Binary: Each Base64 character is converted back to its 6-bit binary representation. Combine Binary Groups: The 6-bit binary groups are combined to form the original binary data.

Why Base64 is bad? ›

It makes data about 33% larger in terms of memory usage. So base64 is one of these little things that make software slow. That's why you should use it only when it's absolutely necessary.

Is Base64 better than hex? ›

Base64 has been around forever, and hex has significantly lower information density than it - consistently more than 30% smaller. Hex has lower information density, however it is more compressible, particularly if your data is byte aligned.

What is Base64 secret? ›

Base64 is not an encryption algorithm, encoding and decoding do not rely on a secret key but Base64 is commonly used to encode to text the results of encryption algorithms. This detector will only look for generic secrets inside Base64 encoded-text representing unicode text.

What does Base64 code look like? ›

Base-64 maps 3 bytes (8 x 3 = 24 bits) in 4 characters that span 6-bits (6 x 4 = 24 bits). The result looks something like "TWFuIGlzIGRpc3Rpb...". Therefore the bloating is only a mere 4/3 = 1.3333333 times the original.

What are illegal Base64 characters? ›

This error happens when the string that you are trying to transform contains a character not recognized by the basic Base 64 Alphabet (in this case it was an underscore character). Below you can see which characters are accepted.

How to decrypt Base64 encrypted password? ›

The fglpass tool can decrypt a BASE64 encoded and encrypted password using a RSA private key. The fglpass tool uses the RSA private key that was used to encrypt it or that is associated to a certificate containing the public part of that private key.

What does Base64 end with? ›

Base64 strings tend to end in one or two equal signs, but not always! These strings are not exact representations of the binary data in ASCII, because padding is applied to ensure the string length is a multiple of 4 characters. Each equal sign represents two bits of zero-padding.

What is Base64 authentication? ›

Base64 Authentication Policy is used to authenticate clients with username and password only. The difference from the Plain-Text Authentication Policy is the way the username and password are sent.

How to decode a code? ›

What is the approach to solve the questions of this section?
  1. Observe alphabets or numbers given in the code keenly.
  2. Find the sequence it follows whether it is ascending or descending.
  3. Detect the rule in which the alphabets/numbers/words follow.
  4. Fill the appropriate letter/number/word in the blank given.

Why is Base64 used in cryptography? ›

Base64 is commonly used in cryptography to exchange keys. Note that if you're encoding or decoding secret data, such as keys, you most likely want a constant-time encoder (example: https://boringssl.googlesource.com/boringssl/+/master/crypto...), not the fastest one.

Is Base64 encryption or hashing? ›

Base64 is not an encryption, it is an encoding. It's role is to make sure the password can be stored in the database nicely and special characters aren't a problem. It does nothing to protect the password. From security standpoint, it is exactly the same as storing it without any encoding.

What is the use of Base64 command? ›

The base64 command encodes binary strings into text representations using the base64 encoding format. Base64 encoding is often used in LDIF files to represent non-ASCII character strings. It is also frequently used to encode certificate contents or the output of message digests such as MD5 or SHA.

What is the difference between Base64 and hex? ›

The difference between Base64 and hex is really just how bytes are represented. Hex is another way of saying "Base16". Hex will take two characters for each byte - Base64 takes 4 characters for every 3 bytes, so it's more efficient than hex.

Top Articles
Capital Gains Tax: what you pay it on, rates and allowances
Capital: Definition, How It's Used, Structure, and Types in Business
Kostner Wingback Bed
Pnct Terminal Camera
Coverage of the introduction of the Water (Special Measures) Bill
Apply A Mudpack Crossword
How Far Is Chattanooga From Here
Music Archives | Hotel Grand Bach - Hotel GrandBach
What Was D-Day Weegy
Smokeland West Warwick
Amateur Lesbian Spanking
Max 80 Orl
Pvschools Infinite Campus
7 Low-Carb Foods That Fill You Up - Keto Tips
24 Best Things To Do in Great Yarmouth Norfolk
Vermont Craigs List
Www Craigslist Milwaukee Wi
Cocaine Bear Showtimes Near Regal Opry Mills
Exterior insulation details for a laminated timber gothic arch cabin - GreenBuildingAdvisor
Att.com/Myatt.
Company History - Horizon NJ Health
Caring Hearts For Canines Aberdeen Nc
Bay Area Craigslist Cars For Sale By Owner
§ 855 BGB - Besitzdiener - Gesetze
Accuradio Unblocked
SOGo Groupware - Rechenzentrum Universität Osnabrück
Radical Red Ability Pill
WPoS's Content - Page 34
The Monitor Recent Obituaries: All Of The Monitor's Recent Obituaries
Motor Mounts
Mercedes W204 Belt Diagram
Craigslist Texas Killeen
Redding Activity Partners
Syracuse Jr High Home Page
Movies123.Pick
Laurin Funeral Home | Buried In Work
Mta Bus Forums
Delaware judge sets Twitter, Elon Musk trial for October
Duff Tuff
Bernie Platt, former Cherry Hill mayor and funeral home magnate, has died at 90
Nail Salon Open On Monday Near Me
Doe Infohub
Owa Hilton Email
My Gsu Portal
Iron Drop Cafe
786 Area Code -Get a Local Phone Number For Miami, Florida
Hy-Vee, Inc. hiring Market Grille Express Assistant Department Manager in New Hope, MN | LinkedIn
How To Connect To Rutgers Wifi
Cool Math Games Bucketball
Room For Easels And Canvas Crossword Clue
Volstate Portal
Cbs Scores Mlb
Latest Posts
Article information

Author: Maia Crooks Jr

Last Updated:

Views: 5820

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Maia Crooks Jr

Birthday: 1997-09-21

Address: 93119 Joseph Street, Peggyfurt, NC 11582

Phone: +2983088926881

Job: Principal Design Liaison

Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.