Basics
Understanding URL Encoding
a.k.a Percent Encoding and why is it important?
URL (Uniform Resource Locator) makes it easier to surf the internet, a URL makes it easier to look up a website on the internet, or makes it possible for a client to make requests to the server.
So what is URL Encoding (or Percent Encoding)?
In simple terms, URL Encoding is a filter that converts certain characters of a URL into % and followed by the hexadecimal equivalent of the character.
Now, we can’t just replace the entire URL into its hexadecimal equivalent, that would make this process redundant. Instead, we typically encode the characters marked as reserved characters in RFC 3986 and leave the unreserved characters as it is.
Why do we use URL encoding?
Let’s say an attacker tries to execute an attack on a web app by injecting a malicious code in form of an XSS attack.
A basic XSS script would look like this,
<script>alert(“Haxxor”)</script>
if we encode this particular script using URL encoding it would look something like this,
%3Cscript%3Ealert(%22Haxxor%22)%3C%2Fscript%3E
rendering this particular code useless. Now, there are other ways to do an XSS attack in this case which we’ll cover some other day.
TL;DR
URL Encoding converts reserved, unsafe, and non-ASCII characters in URLs to a format that is universally accepted and understood by all web browsers and servers. It first converts the character to one or more bytes. Then each byte is represented by two hexadecimal digits preceded by a percent sign (
%
) - (e.g.%xy
). The percent sign is used as an escape character.