Overview of the lesson
The Hypertext Transfer Protocol, or simply HTTP, is the core mechanism used by the Web to define how systems should communicate and the format of the messages.
What you'll learn
In this lesson we will explore what the HTTP protocol is, how it works, and how can you analyse HTTP messages in your browser. The HTTP protocol is already very well documented, so instead of a full discussion of the varying HTTP semantics, we will focus on an overview, while offering relevant resources to extend your understanding.
Prerequisites | A basic understanding of web applications and how they work. Also, you should be familiar with the concept of web server. |
Objective | To get familiar with the basic features of the HTTP protocol: what it can do and its intended uses. |
TL;DR |
|
HTTP is the core protocolA protocol is a set of rules that define how networked devices should communicate. for transmitting data over the Web. What started as a simple project of bearly one page and a half documentation quickly turned into a de-facto standard used not just by web applications, but by any Internet-connected device.
At its fundamental level, HTTP uses another protocol called TCP/IP as a transmission mechanism. Simply put, when a user visits a website, his web browser establishes a TCP connection to the server on port 80A port is a unique number used to identify a service running on a server. Port 80 is the default port allocated for HTTP protocol. However, this can be changed by the developer to other numbers. and sends the user’s request.
While multiple HTTP requests can be sent through a single TCP connection, each HTTP request is independent, which makes HTTP a stateless protocol. This means that the web server cannot distinguish the sender of a request. It treats every request as a unique and autonomous interaction.
So, the browser and web server communicates by exchanging messages, as shown in Fig 1. One purpose of HTTP is to define and standardise the format of these messages. The communication process works as follows: a client (usually a web browser) sends a message called HTTP request to the server based on the user interaction with the web application. The server processes the request and responds with a message called HTTP response, which is further displayed to the user.
So, a more accurate representation of this process would look like this:
HTTP was conceived as a simple, text-based protocol. All messages are in a human-readablea.k.a plain-text or clear-text format, which facilitates debugging and testing (yeah, that’s great for us ethical hackers). The structure of both the request and the response is divided into two sections: headers and body.
A header is a key-value pair message that can enclose much information, such as the date, content-type, the IP address of the issuer, the name of the web server, and many others. Each header is separated by an empty line. Below is an example of a request headers vs. a response header:
Accept: text/html
Accept-Language: de
Translated into English, these headers are equivalent to saying “Hey web server, I want the information in HTML format. Also, I would like the information in the German language, if possible.”
Content-Length: 281
Set-Cookie: lastReadArticle=1234; expires=Tue, 28-Jul-2021 14:04:00 GMT;
These response headers are equivalent to saying, “Hey browser, here is the content you requested. Its length is 281 octets. Also, set a cookie with the name lastReadArticle and value 1234, that expires on Tue, 28-Jul-2021.”
The body of an HTTP message is the message itself. It can be plain-text, HTML, XML, JSON or a set of key-value pairs (e.g., username=admin&password=supersecurepassword).
Below is an example of an HTTP request:
GET /docs/index.html HTTP/1.1
Host: www.uphack.io
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4)
(blank line)
The first line of every HTTP requests contains three elements, each delimited by a whitespace character:
GET is usuallyThis is not a rule, and it all depends on the developer’s implementation. If he wants to use the POST method to retrieve information, that’s fine; he can do that. used to retrieve information, while POST to send information to the server (such as contact form-data). But these are not the only HTTP methods available. HTTP also supports some lesser-used verbs such as PUT, DELETE, HEAD, TRACE, and OPTIONS.
Except from these three elements, there are also some interesting HTTP headers.
You can find all HTTP headers that your browser can send among a short description here.
This is what an HTTP response looks like:
HTTP/1.1 200 OK
Server: Apache
Last-Modified: Sun, 31 May 2020 01:43:58 GMT
cf-request-id: 0309ff745a0000d208e51e5200000001
Accept-Ranges: bytes
Content-Length: 12
Vary: Accept-Encoding
Content-Type: text/plain
Hello World!
Each HTTP response starts with the HTTP version, a status code, and a status message. The status code indicates whether the request has been successful or not. For instance, if the requested resource exists, the server will return 200 OK, as in the above example. On the other side, if the resource does not exists or the user is not allowed to access that resource, a 404 Not Found or a 403 Forbidden status is returned. There are many other status codes, and they are grouped into five classes:
The entire list of the HTTP status codes and their usage is defined in RFC 2616.
After the first line, the web server responds with several HTTP headers, followed by a new line and the bodythe requested document or data. The purpose of response headers is to offer additional information about the response. In our example, we can see the Server, Content-Type, and Content-Length.
Head over to any website and open your browser’s developers tools. Then look for a tab called Network — this how it should look like in Google Chrome.
As an exercise, try to get familiar with HTTP. Navigate to different websites and observe the requests sent by your browser. Don’t hesitate to consult HTTP documentation whenever you don’t understand something.
As an ethical hacker, it is essential to understand the HTTP protocol and how it works. In a future lesson, you will learn how to edit those requests and modify the behavior of an application — but for now, make sure you have an in-depth understanding of the basics. This lesson intended to provide you an overview, rather than a comprehensive technical documentation. We encourage you to take a look at the following resources to broaden your knowledge: