HTTP from First Principles

HTTP from First Principles: The Complete Guide for Backend Engineers

As backend engineers, we operate in a world of complex systems. But if we start discussing every single component, we could be stuck for years. Instead, we'll focus on the topics that appear in the vast majority—say, 90%—of all codebases.

With that in mind, let's talk about the HTTP protocol, the primary medium through which our browsers talk to our servers. While many protocols exist for client-server communication, HTTP is one of the most ubiquitous, and mastering it is non-negotiable.

The Two Foundational Ideas of HTTP

At the heart of the HTTP protocol lie two ideas that define its architecture and behavior.

1. Statelessness: The Protocol with No Memory

Statelessness means that the server has no memory of past interactions. Each HTTP request is treated as a new, unrelated event.

  • Self-Contained Requests: Because the server doesn't remember past requests, each new request must carry all the necessary information for the server to process it. This includes the URL, method, headers, and any authentication tokens or session information needed for that specific interaction.

  • The Benefits of Statelessness:

    • Simplicity: Server architecture is greatly simplified. There's no need to store and manage session information for every active user, which would require additional resources and complexity.

    • Scalability: A stateless protocol makes it easy to distribute requests across multiple servers (load balancing). Since no single server needs to keep track of a session, any server can handle any request, making horizontal scaling seamless.

    • Resilience: If a server crashes, it doesn't affect the client's state, as there is no session memory that needs to be restored.

Of course, applications like e-commerce sites need to maintain state for user logins or shopping carts. Developers implement state management on top of the stateless HTTP protocol using techniques like cookies, sessions, and tokens, which we'll explore later in this series.

2. The Client-Server Model

HTTP operates on a strict client-server model:

  • The Client (typically a web browser, mobile app, or another application) initiates all communication by sending a request to a server. The client is responsible for providing all information the server needs.

  • The Server (which hosts websites, APIs, or other content) waits for these incoming requests, processes them, and sends back an appropriate response.

The key thing to remember is that communication is always initiated by the client.

(Note: Throughout our discussion, we can safely assume HTTP and HTTPS are interchangeable. HTTPS is simply HTTP secured with an encryption layer like TLS, but the underlying principles are the same.)

The Connection: How Messages Travel

For a client and server to communicate, they first need to establish a connection. HTTP relies on the TCP (Transmission Control Protocol) standard for this, as it is a reliable, connection-based protocol that ensures messages are not lost. This process is often visualized using the OSI Model.

As backend engineers, we primarily operate at Layer 7, the Application Layer. Deeper concepts like the TCP 3-way handshake or TLS encryption exist in lower layers (Transport, Session) and border on network engineering. While good to know, we will focus on the Application Layer to avoid getting lost in a rabbit hole. For now, all you need to remember is that a reliable network connection is established between the client and server before messages are sent.

The Evolution of HTTP Versions

  • HTTP/1.0: Opened a new TCP connection for every single request/response cycle. This was inefficient and slow.

  • HTTP/1.1: Introduced persistent connections, allowing multiple requests and responses over the same TCP connection. This dramatically improved performance and is still the most widely used version today.

  • HTTP/2.0: Introduced multiplexing, allowing multiple requests to be sent in parallel over a single connection. It uses a more efficient binary format instead of plain text.

  • HTTP/3.0: Built on the QUIC protocol (which runs over UDP instead of TCP) to further reduce latency and better handle packet loss.

Anatomy of an HTTP Message

Let's break down what these request and response messages actually look like.

The Request Message

A request message is sent by the client to the server.

HTTP
POST /api/users HTTP/1.1
Host: api.example.com
User-Agent: Mozilla/5.0
Authorization: Bearer <your_jwt_token>
Accept: application/json
Content-Type: application/json
Content-Length: 54

{
  "name": "Jitendra Sahoo",
  "email": "jitendra@example.com"
}
  • Request Line: Contains the Method (POST), the Resource URL (/api/users), and the HTTP Version (HTTP/1.1).

  • Headers: Key-value pairs providing metadata about the request.

  • Blank Line: A single blank line separates the headers from the body.

  • Request Body: The actual data being sent to the server.

The Response Message

A response message is sent by the server back to the client.

HTTP
HTTP/1.1 201 Created
Date: Wed, 23 Jul 2025 00:25:02 GMT
Content-Type: application/json
Content-Length: 88
Cache-Control: no-cache

{
  "id": "12345",
  "name": "Jitendra Sahoo",
  "email": "jitendra@example.com"
}
  • Status Line: Contains the HTTP Version, the Status Code (201), and the Status Text (Created).

  • Headers: Key-value pairs providing metadata about the response.

  • Blank Line: Separates headers from the body.

  • Response Body: The data or resource being sent back.

The Power of HTTP Headers

Headers are one of the most critical and extensible parts of HTTP. Think of them like the address and shipping information on the outside of a parcel—they provide essential metadata without needing to open the package itself.

Categorizing Headers

  • Request Headers: Provide information about the request and the client (e.g., User-Agent, Authorization, Accept).

  • General Headers: Apply to both requests and responses, providing context about the message itself (e.g., Date, Cache-Control, Connection).

  • Representation Headers: Describe the body of the message (the resource itself), such as its format, size, or encoding (e.g., Content-Type, Content-Length, Content-Encoding).

  • Security Headers: Enhance security by instructing the browser on how to behave (e.g., Content-Security-Policy, Strict-Transport-Security).

The Ideas of Extensibility and Remote Control

  • Extensibility: HTTP is powerful because new custom headers (often prefixed with X-) can be added without changing the protocol itself, allowing it to adapt to new technologies.

  • Remote Control: Headers allow the client to act as a "remote control" for the server, influencing how it processes the request. For example, the Accept header tells the server what content format the client prefers (e.g., application/json).

HTTP Methods: The Language of Intent

HTTP methods define the intent of the action the client wants to perform on a resource.

  • GET: Retrieve data. Should not modify anything.

  • POST: Create a new resource.

  • PUT: Replace an existing resource entirely.

  • PATCH: Partially update an existing resource (generally preferred over PUT for updates).

  • DELETE: Remove a resource.

Idempotency: A Key Concept

An HTTP method is idempotent if making the same request multiple times produces the same result as making it once.

  • Idempotent Methods: GET, PUT, DELETE. Fetching data twice is the same as fetching it once. Replacing a resource with the same data twice is the same as doing it once. Deleting a resource twice is the same as deleting it once (the second time it's already gone).

  • Non-Idempotent Methods: POST. Submitting a request to create a note twice will result in two different notes being created.

The OPTIONS Method and CORS (Cross-Origin Resource Sharing)

You probably won't use the OPTIONS method directly, but it's crucial for how the modern web works. It's used by browsers to handle CORS.

Same-Origin Policy: For security, browsers restrict web pages from making requests to a domain different from the one that served the page.

CORS is the mechanism that allows servers to safely relax this policy. For "complex" requests (like a PUT, DELETE, or a request with custom headers like Authorization), the browser first sends an OPTIONS preflight request. This request asks the server for permission.

  1. The Preflight Request (OPTIONS): The browser sends an OPTIONS request to the resource URL, asking things like "Do you allow PUT requests from my origin?" and "Do you allow an Authorization header?"

  2. The Preflight Response: The server, if configured for CORS, responds with special Access-Control-Allow-* headers, such as:

    • Access-Control-Allow-Origin: https://example.com (Yes, I allow requests from this origin).

    • Access-Control-Allow-Methods: GET, PUT, DELETE (These are the methods I allow).

    • Access-Control-Allow-Headers: Content-Type, Authorization (These are the headers I allow) (below response is in orange)

  3. The Actual Request: If the browser is satisfied with the permissions granted in the preflight response, it then sends the actual PUT request. If not, it blocks the request and shows a CORS error in the console.

  4. Simple Request -has the name of server to be allowed in access control allow. 

HTTP Status Codes: The Universal Language of Outcomes

Status codes are a standardized way for the server to communicate the result of a request, so the client doesn't have to guess based on the response body. They are categorized by their first digit:

  • 1xx (Informational): The request was received, continuing process. (Rarely used directly by developers).

  • 2xx (Success): The request was successfully received, understood, and accepted.

    • 200 OK: Standard success.

    • 201 Created: A new resource was created.

    • 204 No Content: Success, but there's no body to return.

  • 3xx (Redirection): The client needs to take additional action to complete the request.

    • 301 Moved Permanently: The resource has a new permanent URL.

    • 304 Not Modified: Used for caching. Tells the client its cached version is still valid.

  • 4xx (Client Error): The request contains bad syntax or cannot be fulfilled.

    • 400 Bad Request: Invalid data sent by the client.

    • 401 Unauthorized: Authentication is required and has failed.

    • 403 Forbidden: Authenticated, but lacking permissions.

    • 404 Not Found: The requested resource could not be found.

    • 409 Conflict: The request could not be completed due to a conflict (e.g., trying to create a resource with a name that already exists).

  • 5xx (Server Error): The server failed to fulfill a valid request.

    • 500 Internal Server Error: A generic, unexpected error on the server.

    • 503 Service Unavailable: The server is temporarily down for maintenance or overloaded.

Handling Large Data and Connections

Caching with ETags

HTTP caching allows browsers to store responses and avoid re-downloading unchanged data.

  1. First Request: The server responds with the data, a Cache-Control header (e.g., specifying how long to cache), and an ETag header (a unique hash of the response content).

  2. Subsequent Requests: The browser sends the ETag value in an If-None-Match request header.

  3. Server Response: If the content hasn't changed, the server responds with a lightweight 304 Not Modified status, and the browser uses its cached copy. If it has changed, the server sends a 200 OK with the new content and a new ETag.

Content Negotiation and Compression

Clients can specify their preferences using Accept (e.g., application/json), Accept-Language, and Accept-Encoding headers. If the client indicates it can handle gzip compression, the server can compress large responses before sending them, drastically reducing file size and saving bandwidth. The browser then automatically decompresses the content.

Handling Large Files

  • Uploading (Client to Server): For large files like images or videos, clients use multipart/form-data requests. The file's binary data is broken into parts and sent to the server.

  • Downloading (Server to Client): For large responses, servers can use chunked transfer or event streams (Content-Type: text/event-stream). This allows the server to send the data in continuous chunks over a persistent Keep-Alive connection, so the client can start processing the data as it arrives instead of waiting for the entire file to download.


This covers the essential, foundational components of HTTP. By internalizing how the protocol works—from its stateless nature to the intricate dance of headers, methods, and status codes—you move beyond being a framework-specific developer. You become a true backend engineer, equipped with a mental map to understand, debug, and build robust systems in any environment.


Comments

Popular posts from this blog

Authentication and Authorization

The Anatomy of a Backend Request: Layers, Middleware, and Context Explained