Overview of the lesson

The Hypertext Transfer Protocol, or simply HTTP, is the core mechanism used by the Web to define how systems should communicate and the format of the messages.

What you'll learn

  • What HTTP is and how it works.
  • HTTP requests.
  • HTTP responses.
  • How to analyse HTTP requests.

In this lesson we will explore what the HTTP protocol is, how it works, and how can you analyse HTTP messages in your browser. The HTTP protocol is already very well documented, so instead of a full discussion of the varying HTTP semantics, we will focus on an overview, while offering relevant resources to extend your understanding.

Prerequisites A basic understanding of web applications and how they work. Also, you should be familiar with the concept of web server.
Objective To get familiar with the basic features of the HTTP protocol: what it can do and its intended uses.
TL;DR
  • Hypertext Transfer Protocol (HTTP) is a network protocol used to transmit documents or plain-text data between a client and a web server.
  • HTTP is a stateless protocol. This means the web server doesn't have any information about the user between requests.
  • To communicate with the web server, the client (usually your browser) sends an HTTP request based on your interaction with the web application. The server processes the request and responds with a message called HTTP response, which is further displayed in your browser.
  • HTTP headers are simple key-value pairs of clear-text (e.g., Accept: text/html).
  • The first line of an HTTP request contains three elements: the verb, the path, and the HTTP version. This first line is followed by multiple headers and a message that is optional.
  • The structure of HTTP responses is similar to one described above. The difference is that the first line of a response contains the HTTP version, a status code, and a status message (e.g., HTTP/1.1 200 OK).
  • As an ethical hacker, it is essential to understand HTTP messages and how to analyse them as you will work a lot with them.

HTTP is the core protocolA protocol is a set of rules that define how networked devices should communicate. for transmitting data over the Web. What started as a simple project of bearly one page and a half documentation quickly turned into a de-facto standard used not just by web applications, but by any Internet-connected device.

How HTTP protocol works?

At its fundamental level, HTTP uses another protocol called TCP/IP as a transmission mechanism. Simply put, when a user visits a website, his web browser establishes a TCP connection to the server on port 80A port is a unique number used to identify a service running on a server. Port 80 is the default port allocated for HTTP protocol. However, this can be changed by the developer to other numbers. and sends the user’s request.

While multiple HTTP requests can be sent through a single TCP connection, each HTTP request is independent, which makes HTTP a stateless protocol. This means that the web server cannot distinguish the sender of a request. It treats every request as a unique and autonomous interaction.

So, the browser and web server communicates by exchanging messages, as shown in Fig 1. One purpose of HTTP is to define and standardise the format of these messages. The communication process works as follows: a client (usually a web browser) sends a message called HTTP request to the server based on the user interaction with the web application. The server processes the request and responds with a message called HTTP response, which is further displayed to the user.

Figure 1 - Basic HTTP workflow.
Note:
In reality, there are more computers between a browser and the server handling the request: there are routers, modems, and more. Thanks to the layered design of the Web, these are hidden in the network and transport layers. HTTP is on top of the application layer. Although important to diagnose network problems, the underlying layers are mostly irrelevant to the description of HTTP.

So, a more accurate representation of this process would look like this:

Figure 2 - Intermediary devices that forward the HTTP messages.

HTTP messages

HTTP was conceived as a simple, text-based protocol. All messages are in a human-readablea.k.a plain-text or clear-text format, which facilitates debugging and testing (yeah, that’s great for us ethical hackers). The structure of both the request and the response is divided into two sections: headers and body.

A header is a key-value pair message that can enclose much information, such as the date, content-type, the IP address of the issuer, the name of the web server, and many others. Each header is separated by an empty line. Below is an example of a request headers vs. a response header:

Accept: text/html
Accept-Language: de

Translated into English, these headers are equivalent to saying “Hey web server, I want the information in HTML format. Also, I would like the information in the German language, if possible.”

Content-Length: 281
Set-Cookie: lastReadArticle=1234; expires=Tue, 28-Jul-2021 14:04:00 GMT;

These response headers are equivalent to saying, “Hey browser, here is the content you requested. Its length is 281 octets. Also, set a cookie with the name lastReadArticle and value 1234, that expires on Tue, 28-Jul-2021.”

The body of an HTTP message is the message itself. It can be plain-text, HTML, XML, JSON or a set of key-value pairs (e.g., username=admin&password=supersecurepassword).

HTTP requests

Below is an example of an HTTP request:

GET /docs/index.html HTTP/1.1
Host: www.uphack.io
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4)
(blank line)

The first line of every HTTP requests contains three elements, each delimited by a whitespace character:

  • The HTTP method (or verb) that tells the server what action a user wants to perform.

GET is usuallyThis is not a rule, and it all depends on the developer’s implementation. If he wants to use the POST method to retrieve information, that’s fine; he can do that. used to retrieve information, while POST to send information to the server (such as contact form-data). But these are not the only HTTP methods available. HTTP also supports some lesser-used verbs such as PUT, DELETE, HEAD, TRACE, and OPTIONS.

  • The path of the requested resource — in our example, that is /docs/index.html
  • The HTTP version which is self-explanatory. There are four versions: 0.9, 1.0, 1.12. While 0.9 and 1.0 are still supported by popular web servers, such as Nginx or Apache, all modern web applications use HTTP/1.1 or HTTP/2 due to significantly increased performances and capabilities.

Except from these three elements, there are also some interesting HTTP headers.

  • Host — specifies the domain name of the server.
  • Accept — describes the content-types that the browser wants in the response.
  • Accept-Language — describes the language in which the content should be displayed (e.g., en-gb for English, gb for German, etc.). If the server can’t find the requested language, it will simply ignore this header.
  • Accept-Encoding — specifies the content encoding scheme that should be used to compress the response.
  • User-Agent — provides details about the client (i.e., your web browser and the operating system).

You can find all HTTP headers that your browser can send among a short description here.

HTTP response

This is what an HTTP response looks like:

HTTP/1.1 200 OK
Server: Apache
Last-Modified: Sun, 31 May 2020 01:43:58 GMT
cf-request-id: 0309ff745a0000d208e51e5200000001
Accept-Ranges: bytes
Content-Length: 12
Vary: Accept-Encoding
Content-Type: text/plain

Hello World!

Each HTTP response starts with the HTTP version, a status code, and a status message. The status code indicates whether the request has been successful or not. For instance, if the requested resource exists, the server will return 200 OK, as in the above example. On the other side, if the resource does not exists or the user is not allowed to access that resource, a 404 Not Found or a 403 Forbidden status is returned. There are many other status codes, and they are grouped into five classes:

  1. Informational 1xx (e.g., 100 Continue, 101 Switching Protocols)
  2. Successful 2xx (e.g., 200 OK, 201 Created, 202 Accepted)
  3. Redirection 3xx (e.g., 301 Moved Permanently, 302 Found)
  4. Client Error 4xx (e.g., 401 Unauthorized, 403 Forbidden, 404 Not Found)
  5. Server Error 5xx (e.g., 500 Internal Server Error, 502 Bad Gateway)

The entire list of the HTTP status codes and their usage is defined in RFC 2616.

After the first line, the web server responds with several HTTP headers, followed by a new line and the bodythe requested document or data. The purpose of response headers is to offer additional information about the response. In our example, we can see the Server, Content-Type, and Content-Length.

Analysing HTTP requests

Head over to any website and open your browser’s developers tools. Then look for a tab called Network — this how it should look like in Google Chrome.

Figure 3 - HTTP requests in browser.

As an exercise, try to get familiar with HTTP. Navigate to different websites and observe the requests sent by your browser. Don’t hesitate to consult HTTP documentation whenever you don’t understand something.

Check of understanding

Conclusion

As an ethical hacker, it is essential to understand the HTTP protocol and how it works. In a future lesson, you will learn how to edit those requests and modify the behavior of an application — but for now, make sure you have an in-depth understanding of the basics. This lesson intended to provide you an overview, rather than a comprehensive technical documentation. We encourage you to take a look at the following resources to broaden your knowledge:

Next lessons View all
Security Fundamentals

Introduction to Information Security

We cannot start a meaningful exploration of computer security without defining the subject itself. Hence, in this lesson, we will outline several terms that security people use all the time.

Read now
Security Fundamentals

Foundations of Information Security - CIA Triad

Confidentiality, Integrity, and Availability are the cornerstones of Information Security. These principles are so fundamental that anytime a cybersecurity incident occurs, one of these principles has been compromised. 

Read now
Security Fundamentals

Access Controls: Authentication and Authorization

Data is one of the most valuable assets of modern businesses. Engineers can ensure data integrity and confidentiality through various techniques such as cryptography or access controls.

Read now