What are JSON Web Tokens (JWTs) and how they rose into prominence?

Play this article

HTTP is stateless:

HTTP(Hyper Text Transfer Protocol) is the de facto standard for navigating the Internet. HTTP makes it possible to exchange data over the web and follows the client-server protocol. And, it is a stateless protocol.

Let's assume we have a simple static website(example.com), where a user does not need to authenticate/authorize to access resources. When a user opens the website, it initiates a request to a server. The server responds with some resources and then completely forgets about the user. When he sends another request, the server has no idea it is indeed the same user. The information or the state that would help the server identify the user is not maintained anywhere. (Note: state in simple terms is just a collection of variables that stores some useful information such as the identity of a user visiting a website).

Benefits of statelessness:

In simple terms, HTTP was designed to be stateless for the sake of simplicity.

Saving the state can be tricky, particularly in distributed environments. Also, recovering from a fault becomes harder since we need to recover states as well after the system crashes.

Challenges due to statelessness:

Let's say, you logged in to amazon.com and added a pair of shoes(item id = 1 ) to the cart. In the background, Amazon will verify your login credentials, and create a cart for you. Just for the sake of imagination, let's assume Amazon does not maintain the state of the user internally. So, when you later add a pair of socks ( item id =2 ) to the cart, you once again need to send your login credentials to Amazon so that it can recognize you and add the newly added item to the same cart you previously created. Hence, every request needs to be paired with login credentials. This is a big burden to undertake.

Getting out of statelessness madness:

Session: The traditional antidote to statelessness

The session is created in the following way. Client sends login credentials to the server, and the server authenticates the request, creates a session and returns the session id to the client. The client which is typically a browser, stores the session id inside a cookie. From then on, whenever a client requests the server resource, it will pair its request with the session id. The server fetches the session id from the request, searches for it in the database, and if found the session is validated and the user authorized. If not, it is rejected as an invalid session.

With sessions, a client does not need to send login credentials on each request.

Problems with sessions:

Sessions can be problematic in a distributed environment.

Let's say, we have two servers in our system, server1 and server2, and a current user session, session1. Both, these servers contain a copy of the session and user's requests can be authorized by both of the servers.

Now, say that the old session has expired and a new session, session2 is created. It is first updated at server1 but, for some reason, it did not timely update on server2. In this case, if the user's request is routed to server2 paired with the new session session2, it won't be recognized and the user will not be authenticated.

JWT to the rescue:

The main reason for the aforementioned issue is the fact that sessions are stored on servers. This can be solved by having a mechanism to store session information on the client side. This is what JWT exactly does.

What is JWT?

Borrowing the definition from the official website,

J**SON Web Token (JWT) is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. This information can be verified and trusted because it is digitally signed.

Jwt tokens are used for two main tasks:

a. Authorization:

Once the user is authenticated, the server generated jwt and gives it back to the client. The client can use that token in subsequent requests to authorize himself.

b. Information Exchange:

Jwt can also be used to transfer information securely. Because it is signed, it is possible to verify that the sender is indeed who he says he is. Moreover, since the header and body are included in the signature, it also protects from message tempering (More on it later).

The structure of JWT:

The structure of jwt is base64encoded(header).base64encoded(body).signature

Header:

It consists of two parts:

  1. The type of token: In the case of JWT, the token type will be JWT.

  2. The signing algorithm used: HMAC SHA256 or RS256

{ "alg": "RS256", 
  "typ": "JWT"
}

Then, it is base64 encoded.

Payload:

The payload consists of claims. Claims are the information about the entity being authorized. They are of three types:

  1. Registered claims: These are the predetermined claims which are advisable to be included in a jwt token.

    iss(issuer): The entity that issued the token. Eg, the backend service of an application.

    aud (audience): The entities for which the jwt is intended. Eg. If the issuer generates jwt tokens for multiple related applications, all these applications can be considered as the audience of the token.

    sub(subject): The entity whose information is stored in the claims. Eg. user or user id.

    exp(expiration time): Defines a time at or after which the jwt should not be accepted for processing.

  2. Public claims: Public claims can be created by anyone at will but need to be registered in IANA JSON Web Token Registry to avoid a collision.

  3. Private claims: Multiple parties can agree to create private claims to facilitate information exchange between them. Since they are private, they need not be registered on the token registry.

An example payload could be:

{
  "sub": "123412342",
  "name": "Pramithas Dhakal",
  "is_verified": true
}

Then, it is base64 encoded.

Signature:

To create a signature, a secret key is needed. We take the encoded header and payload and then sign it with the private key using the algorithm mentioned in the header section. (More on it later on)

If the algorithm used is, RS256, the resultant signature will be,

RS256(
  base64UrlEncode(header) + "." +
  base64UrlEncode(payload),
  secret
)

How jwt signing work?

  1. First, we have the base64 encoded header and payload separated by a dot which is called signing input, which looks like this: base64UrlEncode(header) + "." + base64UrlEncode(payload)

  2. Then, we hash the signing input using SHA-256 signing algorithm.

    SHA-256 (base64UrlEncode(header) + "." + base64UrlEncode(payload)

  3. Then, the token issuer encrypts the key with the private key to generate the output i.e. the cipher text.

    RSA (SHA-256 (base64UrlEncode(header) + "." + base64UrlEncode(payload))

How does validation work?

  1. Decode the header and find out the algorithm, which in our case is RS256.

  2. Calculate fresh hash using the header and payload.

  3. Decrypt the signature using the public key. The decrypted value is the original hash created by the authorization server.

  4. Compare the newly calculated hash with the one generated by the authorization server.

  5. If they match, it means that the data is not tampered with. If not, the data is tampered with, and the token is rejected.

This is the basic overview of how jwt token work.

Thus, jwt came into existence to solve the problems we had with sessions. In this post, I focused mainly on the demerits of sessions and on how jwt overcame those shortcomings.

But, this does not necessarily mean that jwt is better or worse in comparison with the session.

They both have their own merits and demerits. A more elaborate comparison is out of the scope of this blog post and is left as a research exercise for the readers.