Introduction
Most backend resources either stay too high-level (“use FastAPI”) or jump into code without a clean mental model. This post builds backend engineering from first principles — so a motivated non-computer-science learner can follow it, while still using correct CS language and real production patterns.
Who this is for: self-taught developers, scientists transitioning into software, and Python backend interview prep. Examples use FastAPI + Python, but the principles apply to any stack.
How to read: skim the headings once, then re-read with the code blocks and build a tiny demo API as you go.
Every backend topic here fits the request lifecycle: HTTP → routing → parsing → validation → auth → business logic → data layer → response, plus cross-cutting concerns (middleware, caching, security, observability) that wrap the whole pipeline.
What you’ll be able to explain after this:
- Why HTTP is stateless and how apps still store user state (cookies/JWT/sessions)
- How to design clean REST resources, status codes, and pagination
- Validation as a trust boundary (controller → service → repository)
- AuthN vs AuthZ + common security failures (CSRF, token leakage, brute force)
- Caching layers (ETag/CDN/Nginx/Redis) and staleness/invalidation tradeoffs
- Where performance really goes (DB queries, N+1, indexes, timeouts)
- When to use background jobs and queues (Celery/RQ), retries, idempotency
- What “production-ish” looks like (Docker Compose + Nginx reverse proxy)
Table of contents
- 0. First principles: the backend’s 4 jobs
- 1. HTTP fundamentals: requests, responses, and status codes
- 2. HTTP method semantics: safe, idempotent, retry-friendly
- 3. REST architecture: constraints and why they matter
- 4. Resource design: nouns, URLs, and CRUD mapping
- 5. JSON contract: serialization & deserialization
- 6. Lists done right: pagination, sorting, filtering
- 7. Working with APIs: Postman/Insomnia + failure modes
- 8. Validation: the trust boundary (controller → service → DB)
- 9. Access control: authentication vs authorization (cookies/JWT)
- 10. Middleware & CORS: cross-cutting concerns
- 11. Data layer: ORM, transactions, indexes (FastAPI + SQLAlchemy)
- 12. Caching layers: HTTP/CDN/proxy/Redis and staleness
- 13. Performance: measure first, then fix the big costs
- 14. Scaling: vertical vs horizontal (stateless design)
- 15. Background jobs: queues, retries, idempotency
- 16. Testing: pytest, integration tests, contracts
- 17. CI: GitHub Actions basics
- 18. Security essentials: production mindset
- 19. Observability: logs, metrics, tracing
- 20. Production basics: Docker Compose + Nginx reverse proxy
- 21. Quick review: 10 backend concepts (interview drill)
- 22. Data-intensive backends (performance + reliability patterns)
0) First principles: what a backend is
A backend exists to do four fundamental jobs:
- Expose capabilities via a stable interface (usually HTTP APIs).
- Enforce correctness (validation + business rules).
- Control access (authentication + authorization).
- Manage state reliably (databases, caches, queues) and operate under load (performance, scaling, observability).
Everything else (frameworks, ORMs, caches, message queues) is a tool to serve these jobs.
1) The ground: network + HTTP
1.1 Request → Response
HTTP is a message protocol:
- client sends a request (method, path, headers, body)
- server sends a response (status code, headers, body)
1.2 Methods (verbs)
Common methods:
GETreadPOSTcreate/submit actionPUTreplacePATCHpartial updateDELETEdeleteHEADlike GET but no bodyOPTIONScapabilities / CORS preflight
1.3 Status codes (API "physics")
Use status codes consistently:
200 OKsuccess read/update201 Createdsuccess create202 Acceptedaccepted for async job204 No Contentsuccess with no response body400 Bad Requestinvalid request format401 Unauthorizednot authenticated403 Forbiddenauthenticated but not allowed404 Not Foundresource not found409 Conflictduplicates / version conflict422 Unprocessable Entityvalidation errors (FastAPI default)429 Too Many Requestsrate limited500 Internal Server Errorunexpected failure
2) What is a REST API?
REST is an architecture style defined by constraints. It's not a library.
2.1 REST = Representation + State + Transfer
- Representation (RE): how the resource is represented (JSON, HTML, XML).
- State (S): current properties of the resource.
- Transfer (T): movement of representation via HTTP (GET/POST/…).
Example:
GET /tasks/123transfers a JSON representation of task #123.
2.2 REST constraints
- Client–Server separation
- UI logic stays on the client; data + rules stay on the server.
- Uniform interface
- consistent endpoints, methods, status codes, and payload shapes.
- Layered system
- intermediaries (load balancer, gateway, proxy) can exist; each layer interacts with adjacent layer only.
- Cacheable
- responses explicitly declare if caching is allowed and for how long.
- Stateless
- server does not rely on stored client context between requests (unless you choose sessions explicitly).
- Code on demand (optional)
- server may send executable code (e.g., JavaScript) to extend client functionality.
3) Method semantics: safe + idempotent
These properties matter for retries, caching, and correctness.
3.1 Safe
A safe operation should not change server state:
GET,HEAD,OPTIONS
3.2 Idempotent
An operation is idempotent if repeating it yields the same end state:
GETidempotent (and safe)PUTidempotent (replace)DELETEidempotent (delete again → still deleted)POSTusually not idempotent (creates a new resource each time)PATCHdepends on implementation (often not guaranteed)
Why it matters: if the client retries due to network failure, idempotent methods prevent duplicate side effects.
4) Resources: design by nouns
A resource is any noun-like business object:
users,tasks,tags,orders,documents
4.1 Good URL patterns
- collection:
/tasks - item:
/tasks/{id} - nested:
/users/{id}/tasks
Keep URLs:
- noun-based (no verbs in path if possible)
- stable
- consistent across the API
4.2 CRUD mapping
- Create:
POST /tasks - Read:
GET /tasks,GET /tasks/{id} - Update full:
PUT /tasks/{id} - Update partial:
PATCH /tasks/{id} - Delete:
DELETE /tasks/{id}
4.3 Beyond CRUD (actions)
Sometimes one needs an action:
POST /tasks/{id}/completePOST /payments/{id}/refund
Prefer modeling as state change (e.g., done=true) when possible.
5) API interface design in practice (Postman/Insomnia)
Postman/Insomnia helps :
- test endpoints and payloads
- validate status codes
- keep "collections" as a living contract
A professional habit:
- test success and all failure modes:
- invalid input → 422
- unauthenticated → 401
- unauthorized → 403
- not found → 404
- conflict → 409
6) Pagination + sorting + filtering
6.1 Why pagination is not optional
Without pagination:
- responses become huge
- DB gets overloaded
- UI becomes slow (especially infinite scroll)
6.2 Offset pagination (page + limit)
Query:
GET /tasks?limit=20&page=2&sort=-created_at
Rules:
limitmust have bounds (e.g. 1..100)pagestarts at 1- provide defaults:
limit=20,page=1,sort=-created_at
Pros: easy
Cons: slow/unstable for deep pages, duplicates when data changes
6.3 Cursor pagination (best for infinite scroll)
Instead of page, you use a cursor token (like created_at + id):
GET /tasks?limit=20&cursor=2026-01-20T12:00:00Z|a1b2...
Pros: stable and scalable
Cons: more complex
6.4 Cursor pagination (stable infinite scroll)
Cursor pagination usually needs:
- a stable sort:
(created_at, id) - a cursor token:
"created_at|id"
Endpoint idea
GET /tasks?limit=20&cursor=<created_at>|<id>
Pseudo-implementation
# where tasks are ordered by created_at desc, id desc
# cursor = "2026-01-20T12:00:00Z|<id>"
# Fetch items with (created_at, id) < cursor tuple in same order.
In production you also:
- sign/encrypt the cursor token
- validate token format
- return
next_cursorin response
7) Serialization & deserialization
- Deserialization: request JSON → typed objects
- Serialization: typed objects → response JSON
In FastAPI, Pydantic handles:
- type coercion
- validation
- schema generation (OpenAPI docs)
8) Validation (trust nothing from the network)
Validation enforces an input contract: structure (schema), types, and constraints. In backend systems it is a trust boundary: every request payload is untrusted until it passes checks at the API edge and domain rules inside the application.
Validation reduces failures caused by unexpected payloads (bugs), inconsistent/partial inputs (corrupted data), and hostile or abusive requests (security, including oversized bodies and injection attempts).
8.1 What to validate (contract + invariants)
- Presence: required fields must exist
- Types: string vs integer vs list/object
- Constraints: min/max length, numeric ranges, allowed enums
- Formats: email, UUID, URL, ISO-8601 datetime
- Normalization: trim whitespace, lowercase emails, canonical forms
- Cross-field rules: e.g.,
start_date < end_date,min ≤ max
8.2 Validation vs sanitization vs escaping
- Validation rejects inputs that violate the contract (correctness gate)
- Sanitization transforms inputs into a canonical safer form (trim/normalize)
- Escaping / parameterization prevents injection when input is used in a context (SQL/HTML)
Validation is not a complete injection defense. For SQL use parameterized queries; for HTML/templates use correct escaping. Never concatenate raw user input into SQL.
8.3 Validation across layers (Controller → Service → Repository)
In a layered backend architecture, validation is defense in depth. Each layer validates what it owns:
-
Controller (FastAPI route + Pydantic): boundary validation of untrusted network input
(schema, types, basic constraints). Invalid payloads typically return
422. -
Service (domain/business rules): semantic validation (uniqueness, state transitions,
cross-field domain rules, permission decisions). These map to stable application errors (e.g.,
409conflict). - Repository/DB (integrity): final enforcement using constraints and transactions (UNIQUE/NOT NULL/CHECK/FK). This layer prevents race-condition inconsistencies and translates DB exceptions into domain errors.
Controller validation avoids wasting resources on bad input, service validation encodes business meaning, and repository/DB constraints guarantee correctness even under concurrency.
8.4 Error semantics (API behavior)
400 Bad Request: malformed JSON / invalid syntax422 Unprocessable Content: valid JSON but fails schema/constraint validation (common in FastAPI)409 Conflict: well-formed request but violates a business invariant (e.g., duplicate unique field)
Good validation errors should be specific, consistent, and safe (do not leak internal stack traces, raw DB errors, or secrets).
8.5 FastAPI example (layered validation with clean error mapping)
This example shows: (1) Pydantic boundary validation, (2) service business checks, (3) repository integrity as a last guardrail.
from fastapi import FastAPI, HTTPException, status
from pydantic import BaseModel, EmailStr, Field
from typing import Dict
app = FastAPI()
# ---------- Domain error ----------
class ConflictError(Exception):
pass
# ---------- 1) Controller schema ----------
class UserCreateIn(BaseModel):
email: EmailStr
name: str = Field(min_length=1, max_length=60)
class UserOut(BaseModel):
id: int
email: EmailStr
name: str
# ---------- 3) Repository (integrity) ----------
class UserRepository:
def __init__(self):
self._users_by_email: Dict[str, dict] = {}
self._id = 0
def email_exists(self, email: str) -> bool:
return email.lower() in self._users_by_email
def insert_user(self, email: str, name: str) -> dict:
# In real DB: UNIQUE(email) enforces this under concurrency.
if self.email_exists(email):
raise ConflictError("email already exists")
self._id += 1
user = {"id": self._id, "email": email.lower(), "name": name}
self._users_by_email[email.lower()] = user
return user
# ---------- 2) Service (business rules) ----------
class UserService:
def __init__(self, repo: UserRepository):
self.repo = repo
def create_user(self, email: str, name: str) -> dict:
# Business invariant: email must be unique
if self.repo.email_exists(email):
raise ConflictError("email already exists")
return self.repo.insert_user(email=email, name=name)
repo = UserRepository()
svc = UserService(repo)
@app.post("/users", response_model=UserOut, status_code=status.HTTP_201_CREATED)
def create_user(payload: UserCreateIn):
try:
return svc.create_user(payload.email, payload.name)
except ConflictError as e:
raise HTTPException(status_code=409, detail=str(e))
8.6 Practical limits (defense against abuse)
Validation also includes resource bounding: even “valid” inputs can be abusive if they are too large, too frequent, or too expensive to process. Limits preserve availability and stable latency.
- Max request size: cap body size to prevent memory pressure and payload DoS
- Rate limiting: protect expensive endpoints (login, search) from brute force and spikes
- Pagination limits: cap
page_size(e.g., max 100) to avoid large scans - Timeouts: apply timeouts to DB calls/external APIs to avoid stuck workers
“In FastAPI, I validate request shape at the boundary with Pydantic (422), enforce business invariants in the service, rely on DB constraints in the repository for integrity under concurrency, and apply resource limits (payload size, rate limits, pagination caps, timeouts) to protect availability.”
9) Authentication and Authorization
Authentication answers: Who are you?
Authorization answers: What are you allowed to do?
AuthN (authentication) establishes identity. AuthZ (authorization) enforces permissions on resources. You can be authenticated but not authorized (e.g., logged in but forbidden).
9.0 Typical HTTP status codes
401 Unauthorized: not authenticated (missing/invalid credentials)403 Forbidden: authenticated but not authorized
9.1 Typical approaches
1) Session cookie (stateful)
Server stores session state (e.g., in Redis/DB). Client holds a session ID cookie.
- Pros: easy logout/invalidation, good for browsers
- Cons: requires server-side state and storage; scaling needs shared session store
2) JWT Bearer token (stateless)
Client sends Authorization: Bearer <jwt>. JWT contains claims (user id, roles, expiry),
signed by server. No session lookup is required for each request.
- Pros: scalable; works well across services
- Cons: revocation is harder (needs denylist/short expiry); token leakage is serious
3) API keys (simple but limited)
Key identifies the client/application, often used for service-to-service or public APIs.
- Pros: simple to implement
- Cons: weak identity model (often no user context), rotation and leakage risks
9.2 Cookies (short but important)
For browser-based auth, cookies must be configured to reduce XSS/CSRF risks:
- HttpOnly: prevents JavaScript from reading the cookie (mitigates token theft via XSS)
- Secure: cookie is only sent over HTTPS
- SameSite: reduces CSRF by restricting cross-site cookie sending
Cookie example (secure session cookie)
Set-Cookie: session_id=abc123;
HttpOnly;
Secure;
SameSite=Lax;
Path=/;
If you use cookies for authentication, you must consider CSRF defenses: SameSite, CSRF tokens, and verifying Origin/Referer for sensitive requests.
9.3 Authorization models (how permissions are expressed)
- RBAC (Role-Based Access Control): roles like admin/editor/viewer
- ABAC (Attribute-Based): policies based on attributes (user, resource, context)
- Resource-based checks: “user can access only their own objects”
9.4 Example: protect endpoint + role check (FastAPI)
from fastapi import FastAPI, Depends, HTTPException, status
app = FastAPI()
def get_current_user():
# verify token/session and return user object
return {"id": "u1", "role": "user"}
def require_admin(user=Depends(get_current_user)):
if user["role"] != "admin":
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Forbidden")
return user
@app.delete("/admin/users/{user_id}")
def delete_user(user_id: str, admin=Depends(require_admin)):
return {"deleted": user_id}
“Authentication proves identity (401 if missing/invalid). Authorization enforces permissions (403 if not allowed). For browsers I prefer secure session cookies + CSRF defenses; for APIs JWT bearer tokens are common with short expiry and rotation.”
9.5 Why “HTTP is stateless” matters
HTTP is stateless, meaning each request is independent and the server does not automatically remember any client state between requests. Request #2 must contain all information needed to handle it, or the server must be able to look up required context using identifiers provided by the client.
Applications still need state (login sessions, shopping carts). Stateless means the protocol does not preserve that state automatically. State is carried by the client (cookies/tokens) or stored in a server-side database/cache and retrieved per request.
Where state lives in practice
- Client-side: cookies or
Authorizationheaders are sent with every request - Server-side: session data stored in Redis/DB, fetched by a session ID from the cookie
Stateful vs stateless authentication
- Session cookie (stateful): cookie contains session ID, server loads session from storage
- JWT bearer (stateless): token contains claims, server verifies signature without DB lookup
“HTTP is stateless: every request must be self-contained. We implement user state using cookies or tokens, and if we use sessions, the server retrieves state from a shared store like Redis.”
Diagram: how “stateless HTTP” still supports login state
HTTP is stateless, so the server doesn’t remember you automatically. The client must send context on every request (cookie/token), and the server may fetch state from storage.
STATEFUL AUTH (Session Cookie + Server-side session store)
--------------------------------------------------------
Browser API Server Redis/DB
| POST /login | |
|----------------------->| create session |
| |--------------------------->| SET session:abc = {user_id, roles, ...}
| |<---------------------------|
| Set-Cookie: session_id=abc |
|<-----------------------| |
|
| GET /profile
| Cookie: session_id=abc
|----------------------->| lookup session by ID |
| |--------------------------->| GET session:abc
| |<---------------------------| {user_id, roles, ...}
| | authorize + respond |
|<-----------------------| 200 OK |
STATELESS AUTH (JWT Bearer Token)
---------------------------------
Client API Server
| POST /login |
|---------------------->| issue JWT (signed)
|<----------------------| 200 OK + access_token
|
| GET /profile
| Authorization: Bearer <jwt>
|---------------------->| verify signature + exp
| (no session lookup needed)
|<----------------------| 200 OK
9.6 JWT (Bearer token) + RBAC in FastAPI (minimal example)
This section shows the core idea of JWT-based authentication and role-based authorization (RBAC) in FastAPI. The flow is:
- User logs in → backend verifies credentials → issues a signed JWT
- Client sends
Authorization: Bearer <token>on each request - Backend verifies JWT signature + expiry → extracts identity (
sub) and role → enforces permissions
This is intentionally minimal to teach the concept. Real production JWT systems require stronger controls: key rotation (kid/JWKS), issuer/audience validation, refresh tokens, revocation strategy, secure secret management, and careful claim validation.
Install
pip install python-jose[cryptography] passlib[bcrypt]
Conceptual model (claims you care about)
sub: subject (user identifier)role: authorization role (e.g., user/admin)exp: expiration time (token lifetime)
JWT verification gives you authentication (who the user is). Role checks implement
authorization (what the user is allowed to do). You will usually return 401 for invalid/missing tokens
and 403 for “authenticated but not allowed.”
Minimal FastAPI JWT + RBAC code
The example below includes:
(1) password verification (bcrypt),
(2) token issuance (/login),
(3) dependency that extracts the current user from the Bearer token,
(4) an admin-only endpoint.
from datetime import datetime, timedelta, timezone
from typing import Optional, Dict
from fastapi import FastAPI, Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
from jose import jwt, JWTError
from passlib.context import CryptContext
from pydantic import BaseModel
app = FastAPI()
# -----------------------------
# Minimal config (DO NOT hardcode secrets in production)
# -----------------------------
SECRET_KEY = "change-me-in-production"
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 30
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="login")
# -----------------------------
# Fake user store (replace with DB)
# -----------------------------
# In a real app: store password hashes, not plain passwords.
# Here we hash at startup for demo clarity.
fake_users_db: Dict[str, dict] = {
"alice": {"username": "alice", "role": "user", "password_hash": pwd_context.hash("alicepass")},
"admin": {"username": "admin", "role": "admin", "password_hash": pwd_context.hash("adminpass")},
}
class TokenOut(BaseModel):
access_token: str
token_type: str = "bearer"
class User(BaseModel):
username: str
role: str
# -----------------------------
# Helpers
# -----------------------------
def verify_password(plain_password: str, password_hash: str) -> bool:
return pwd_context.verify(plain_password, password_hash)
def authenticate_user(username: str, password: str) -> Optional[User]:
record = fake_users_db.get(username)
if not record:
return None
if not verify_password(password, record["password_hash"]):
return None
return User(username=record["username"], role=record["role"])
def create_access_token(*, sub: str, role: str, expires_minutes: int) -> str:
now = datetime.now(timezone.utc)
payload = {
"sub": sub,
"role": role,
"iat": int(now.timestamp()),
"exp": int((now + timedelta(minutes=expires_minutes)).timestamp()),
}
return jwt.encode(payload, SECRET_KEY, algorithm=ALGORITHM)
# -----------------------------
# Auth dependency: parse and validate JWT
# -----------------------------
def get_current_user(token: str = Depends(oauth2_scheme)) -> User:
cred_error = HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid authentication credentials",
headers={"WWW-Authenticate": "Bearer"},
)
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
username: str = payload.get("sub")
role: str = payload.get("role")
if not username or not role:
raise cred_error
except JWTError:
raise cred_error
# Optional: verify user still exists (common in production)
if username not in fake_users_db:
raise cred_error
return User(username=username, role=role)
# -----------------------------
# Authorization dependency: RBAC
# -----------------------------
def require_admin(user: User = Depends(get_current_user)) -> User:
if user.role != "admin":
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Forbidden")
return user
# -----------------------------
# Routes
# -----------------------------
@app.post("/login", response_model=TokenOut)
def login(form: OAuth2PasswordRequestForm = Depends()):
user = authenticate_user(form.username, form.password)
if not user:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Incorrect username or password")
token = create_access_token(sub=user.username, role=user.role, expires_minutes=ACCESS_TOKEN_EXPIRE_MINUTES)
return TokenOut(access_token=token)
@app.get("/me")
def read_me(user: User = Depends(get_current_user)):
return {"username": user.username, "role": user.role}
@app.get("/admin/metrics")
def admin_metrics(admin: User = Depends(require_admin)):
return {"ok": True, "message": f"Hello {admin.username}, you are an admin."}
How to test quickly (curl)
# 1) login to get JWT
curl -X POST http://localhost:8000/login \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "username=admin&password=adminpass"
# 2) use the token on protected endpoint
curl http://localhost:8000/admin/metrics \
-H "Authorization: Bearer <PASTE_TOKEN_HERE>"
Minimum production checklist
- Validate claims: check
exp(and in production alsoiss,aud) - Key management: rotate keys; consider
kidheader + JWKS for multiple keys - Token lifetime: short-lived access tokens; refresh tokens for longer sessions
- Revocation strategy: denylist or session store for “logout everywhere”
- Secure transport: HTTPS everywhere; never log tokens
- RBAC vs ABAC: roles are simple; attribute/policy checks may be needed for fine-grained control
“JWT gives stateless authentication: verify signature + expiry, extract sub and claims.
Authorization is enforced separately (RBAC dependency). Invalid/missing token → 401; insufficient role → 403.”
10) Middleware & CORS (cross-cutting concerns)
Some backend problems are not “business logic.” They are concerns that apply to every request: logging, timing, auth, security headers, compression, request IDs, CORS, etc. Instead of repeating the same code in every endpoint, backends use middleware.
Middleware is code that runs around your endpoints:
before the request reaches the route handler and/or after the handler returns a response.
Think of it as a pipeline: request → middleware chain → route handler → middleware chain → response.
10.1 Why middleware exists (real-world reasons)
- Consistency: apply headers/logging/auth rules uniformly
- Observability: add request IDs, timing, metrics
- Security: add security headers, block oversized bodies, enforce HTTPS behind proxy
- Performance: caching headers, compression, rate limiting (often at proxy)
10.2 FastAPI middleware example: request ID + timing
This adds a correlation ID (useful for logs) and exposes response time. In production you’d also log it (or send to tracing/metrics).
import time, uuid
from fastapi import FastAPI, Request
app = FastAPI()
@app.middleware("http")
async def add_request_id_and_timing(request: Request, call_next):
request_id = request.headers.get("X-Request-ID") or str(uuid.uuid4())
start = time.perf_counter()
response = await call_next(request)
duration_ms = (time.perf_counter() - start) * 1000
response.headers["X-Request-ID"] = request_id
response.headers["X-Response-Time-ms"] = f"{duration_ms:.2f}"
return response
When debugging production: request ID + structured logs can reduce “guessing time” massively.
10.3 CORS: what it actually is (and what it is NOT)
CORS (Cross-Origin Resource Sharing) is a browser security rule. It controls whether a web page running on one origin (domain) is allowed to call APIs on another origin.
- Origin = scheme + host + port (e.g.,
https://app.com) - If your frontend is on
http://localhost:3000and API onhttp://localhost:8000, that is cross-origin.
CORS is not authentication and not a server security boundary. It only restricts what browsers allow. Non-browser clients (curl, Postman) can call your API regardless of CORS. You still need AuthN/AuthZ on the server.
10.4 Preflight (OPTIONS): why the browser sends it
For some requests, the browser sends a preflight request:
OPTIONS /endpoint to ask the server which methods/headers are allowed.
This happens for “non-simple” requests (e.g., custom headers like Authorization, or non-GET/POST with JSON in some cases).
Typical flow (browser):
1) OPTIONS /api/secure
Origin: http://localhost:3000
Access-Control-Request-Method: GET
Access-Control-Request-Headers: Authorization
2) Server replies with:
Access-Control-Allow-Origin: http://localhost:3000
Access-Control-Allow-Methods: GET
Access-Control-Allow-Headers: Authorization
3) Browser then sends the real GET request
10.5 FastAPI CORS configuration (recommended patterns)
If you control the frontend origins, whitelist them explicitly. Avoid wildcard * in production.
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
ALLOWED_ORIGINS = [
"http://localhost:3000",
"https://www.janmajay.de",
]
app.add_middleware(
CORSMiddleware,
allow_origins=ALLOWED_ORIGINS,
allow_credentials=True, # needed if you use cookies
allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"],
allow_headers=["Authorization", "Content-Type", "X-Request-ID"],
)
10.6 Cookies + CORS (the part that breaks people)
If you use cookie-based auth across origins, you must set:
allow_credentials=Truein CORS middlewareSameSite=None; Secureon the cookie (requires HTTPS)- Frontend must send credentials (fetch:
credentials: "include")
fetch("https://api.example.com/me", {
method: "GET",
credentials: "include"
});
Cookies across origins raise CSRF risk. If you use cookies for auth, use SameSite + CSRF protections for sensitive actions.
10.7 Practical CORS rules (safe defaults)
- Whitelist exact origins (don’t use
*in production) - Only allow headers you need (especially
Authorization) - Don’t confuse CORS with security: AuthN/AuthZ still required
- Handle preflight (OPTIONS) or your frontend will “mysteriously fail”
“Middleware handles cross-cutting concerns like logging, timing, headers, and auth uniformly. CORS is a browser policy for cross-origin calls; it’s not auth. In production I whitelist origins and handle preflight correctly.”
11) Caching (speed by remembering)
Caching is a performance technique where we store the result of an expensive operation (DB query, API call, computation) so repeated requests can reuse it instead of recomputing. In CS terms, caching trades space (memory/storage) for time (lower latency) and reduces load on upstream systems.
Caching can exist at many layers (each with different scope and consistency guarantees):
- Browser cache (HTTP caching, client-side)
- CDN cache (edge caching, near end users)
- Reverse proxy cache (Nginx/Varnish in front of your app)
- Application cache (Redis/Memcached, app-controlled)
- DB indexes (not a cache; query-acceleration structures inside the DB engine)
Important: do not cache everything. Caching introduces the risk of stale data. You must define a freshness policy (e.g., TTL), invalidation strategy, or revalidation mechanism.
11.0 Cache vocabulary: hit, miss, TTL
- Cache hit: data exists in cache → fast response
- Cache miss: not in cache → fetch from origin/DB → store → return
- TTL (Time-To-Live): expiry time for a cached entry (limits staleness)
11.1 HTTP caching with ETag (best for GET)
HTTP caching is especially effective for GET endpoints and static resources.
One robust strategy is revalidation using ETag.
Idea:
- server responds with an
ETagrepresenting the current resource version (often a hash) - client later sends
If-None-Matchwith that ETag - server returns
304 Not Modifiedif unchanged (no body, saves bandwidth) - server returns
200with new content + newETagif changed
Combine ETag with Cache-Control for explicit freshness:
Cache-Control: public, max-age=60 means the response can be reused for 60 seconds before revalidation.
FastAPI example (ETag + Cache-Control):
from fastapi import FastAPI, Request, Response
import hashlib, json
app = FastAPI()
@app.get("/api/config")
def get_config(request: Request, response: Response):
payload = {"featureA": True, "version": 3}
body = json.dumps(payload, separators=(",", ":")).encode()
etag = hashlib.sha256(body).hexdigest()
if request.headers.get("if-none-match") == etag:
response.status_code = 304
return
response.headers["ETag"] = etag
response.headers["Cache-Control"] = "public, max-age=60"
return payload
11.2 CDN caching (edge cache, closest to the user)
A CDN caches responses at edge locations near users. It reduces latency and load on your origin server. CDNs work best for static assets and cacheable public GET responses.
- High impact for global audiences (lower round-trip time)
- Use TTL and cache rules carefully
- Avoid caching private/user-specific responses as public
11.3 Reverse proxy caching with Nginx (cache in front of the app)
A reverse proxy (e.g., Nginx) can cache upstream responses so your app/DB does not get hit for repeated requests. This is useful for public GET endpoints and for absorbing traffic bursts.
Minimal Nginx proxy cache example:
# inside http { ... }
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m
max_size=1g inactive=60m use_temp_path=off;
server {
listen 80;
server_name example.com;
location /api/ {
proxy_pass http://127.0.0.1:8000;
proxy_cache api_cache;
proxy_cache_key "$scheme$request_method$host$request_uri";
# cache only successful responses
proxy_cache_valid 200 10m;
proxy_cache_valid 404 1m;
# do not cache when auth/cookies exist (safety rule)
proxy_no_cache $http_authorization $http_cookie;
proxy_cache_bypass $http_authorization $http_cookie;
add_header X-Cache-Status $upstream_cache_status always;
}
}
Debug tip: the first request usually shows X-Cache-Status: MISS, the next shows HIT.
11.4 Application caching with Redis (cache-aside / lazy loading)
Redis is commonly used as an application cache because it is fast, supports TTL, and provides atomic operations. A standard approach is cache-aside:
- read from cache
- if miss → read from DB
- store result in cache (with TTL)
- return result
Python + Redis example (cache-aside):
import json
from redis import Redis
r = Redis(host="localhost", port=6379, decode_responses=True)
def get_user_profile(user_id: str) -> dict:
key = f"user:profile:{user_id}"
cached = r.get(key)
if cached is not None:
return json.loads(cached)
# expensive operation (DB query)
profile = db_fetch_user_profile(user_id)
# TTL limits staleness
r.set(key, json.dumps(profile), ex=60)
return profile
Consistency note: TTL-based caching may serve stale data for up to TTL seconds. For stronger consistency, invalidate the relevant cache keys on writes/updates.
11.5 Cache stampede (thundering herd) + mitigation
A cache stampede occurs when many requests miss simultaneously (e.g., popular key expires), causing a burst of DB load. Common mitigations:
- single-flight locking: only one request recomputes, others wait and reuse
- TTL jitter: add small random noise to TTL to avoid synchronized expirations
- stale-while-revalidate: serve slightly stale data while refreshing in the background
Best-effort Redis lock example (single-flight + TTL jitter):
import json, random, time
from redis import Redis
r = Redis(decode_responses=True)
def get_with_lock(key: str, ttl_s: int, compute_fn):
cached = r.get(key)
if cached is not None:
return json.loads(cached)
lock_key = key + ":lock"
got_lock = r.set(lock_key, "1", nx=True, ex=10) # lock auto-expires
if got_lock:
try:
value = compute_fn()
jitter = random.randint(0, 10)
r.set(key, json.dumps(value), ex=ttl_s + jitter)
return value
finally:
r.delete(lock_key)
# someone else recomputing: wait briefly and retry
for _ in range(5):
time.sleep(0.05)
cached2 = r.get(key)
if cached2 is not None:
return json.loads(cached2)
# fallback policy choice
return compute_fn()
11.6 DB indexes are not cache (but essential for performance)
A DB index is a data structure (e.g., B-tree) maintained by the database to accelerate queries. It is not a cache because it is part of the DB engine’s storage and changes query complexity (often from scan to logarithmic lookup). Indexes improve read performance but usually increase write cost and storage usage.
12) Scaling: vertical vs horizontal
12.1 Vertical scaling (scale up)
You increase resources on one machine:
- more CPU
- more RAM
- faster disk
Example
- A single VM: upgrade from 2 CPU / 4GB RAM → 8 CPU / 16GB RAM
Pros: simple
Cons: hard limit; single point of failure
12.2 Horizontal scaling (scale out)
You run multiple replicas of your service:
- 2, 4, 10 backend instances
- a load balancer distributes requests
Pros: scalable + resilient
Cons: requires stateless design + shared state in DB/cache
12.3 Horizontal scaling example with Docker + Nginx load balancing
docker-compose.yml
services:
api:
build: .
deploy:
replicas: 3 # (works in swarm; for local dev use multiple services or docker compose scale)
environment:
- DATABASE_URL=postgresql://postgres:postgres@db:5432/app
depends_on:
- db
nginx:
image: nginx:alpine
ports:
- "8080:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- api
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_DB: app
nginx.conf
events {}
http {
upstream api_upstream {
# in real setups, you'd list service DNS names or use service discovery
# Example conceptually:
server api:8000;
}
server {
listen 80;
location / {
proxy_pass http://api_upstream;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}
12.4 Concurrency vs Parallelism (backend mental model)
Many beginners confuse concurrency with parallelism. Backends care about both — but for different reasons.
- Concurrency: handling multiple requests in overlapping time (good for I/O waits).
- Parallelism: doing multiple computations at the same time (needs multiple CPU cores).
Why backends are mostly “I/O bound”
A typical request spends most time waiting on database, network calls, or disk, not executing Python code. While you wait, concurrency lets you serve other requests.
Request timeline (typical)
--------------------------
parse+validate: 2ms
DB query: 120ms (waiting)
serialize: 3ms
total: 125ms
Main lesson: DB/network waiting dominates.
3 execution models used in real backends
- 1) Thread-per-request (classic): simple mental model; good for blocking I/O; too many threads can hurt.
- 2) Async event loop (async/await): one thread can juggle many in-flight I/O waits efficiently.
- 3) Multi-process workers: uses multiple CPU cores; good isolation; common with Gunicorn.
Async improves concurrency for I/O. It does not make CPU-heavy work faster. CPU-heavy work needs parallelism (multiple processes) or a background job queue.
Async vs Background jobs (common confusion)
- async/await: “I can serve other requests while waiting for DB/HTTP.”
- background jobs: “This task should not run in the request path at all.”
Use async when: waiting on DB, waiting on HTTP, waiting on Redis
Use background jobs: PDF processing, video conversion, ML inference, embeddings, long pipelines
Python reality check: GIL (one sentence only)
In CPython, CPU-bound Python code does not run truly in parallel in threads due to the GIL. For CPU-heavy work, prefer multiple processes or move work to background workers.
Concurrency “tools” backends use
- Connection pools (DB): limit concurrent DB connections (prevents overload)
- Timeouts: don’t let requests hang forever
- Backpressure: reject/queue work when overloaded
- Rate limiting: protects scarce resources
“Concurrency is overlapping requests (great for I/O waits); parallelism is true simultaneous execution (CPU cores). Async helps I/O-bound endpoints; CPU-heavy tasks go to background workers or multi-process scaling.”
Imagine a request takes 30ms total, but only 3ms is actual CPU work. The other 27ms is usually waiting on the database/network.
A typical CPU runs around 3–4 GHz. At 3.5 GHz, in 30ms a single core has about 105 million CPU cycles available — and your handler might use only a small fraction of them. In a synchronous/blocking design, your thread just sits there waiting.
This is why concurrency matters: while one request waits on I/O, the server can make progress on other requests instead of wasting time.
12.4.1 FastAPI example: concurrency with async I/O (DB/HTTP waiting)
Below, both endpoints do the same thing: call an external API and return the result. The async version can keep handling other requests while waiting on the network. The blocking version ties up a worker while it waits.
Async only helps if the work is truly I/O wait and the libraries are async-friendly.
If you call blocking code inside an async def, you can still block the event loop.
from fastapi import FastAPI
import time
import httpx
import requests
app = FastAPI()
# ---------------------------
# BAD for high concurrency (blocking I/O)
# ---------------------------
@app.get("/blocking-weather")
def blocking_weather():
# This blocks the worker while waiting on the network.
r = requests.get("https://httpbin.org/delay/1", timeout=3)
return {"status": r.status_code}
# ---------------------------
# GOOD for high concurrency (async I/O)
# ---------------------------
@app.get("/async-weather")
async def async_weather():
# This yields control while waiting, so the server can handle other requests.
async with httpx.AsyncClient(timeout=3.0) as client:
r = await client.get("https://httpbin.org/delay/1")
return {"status": r.status_code}
Practical rule: If your endpoint spends time waiting (DB/HTTP/Redis), prefer async I/O libraries.
12.4.2 FastAPI example: parallelism for CPU-heavy work (process pool)
For CPU-heavy work (hashing, image processing, ML inference), async does not help. You need parallelism using multiple CPU cores. A simple pattern is to offload CPU work to a process pool.
CPython threads are limited for CPU-bound code by the GIL. A ProcessPool uses multiple OS processes → true parallel CPU execution across cores.
from fastapi import FastAPI
from concurrent.futures import ProcessPoolExecutor
import hashlib
app = FastAPI()
# A global pool (one per app process)
cpu_pool = ProcessPoolExecutor(max_workers=4)
def heavy_cpu_task(n: int) -> str:
# Artificial CPU work: repeated hashing
x = b"hello"
for _ in range(n):
x = hashlib.sha256(x).digest()
return x.hex()
@app.get("/cpu-sync")
def cpu_sync(n: int = 200_000):
# This blocks the worker CPU (bad under load)
out = heavy_cpu_task(n)
return {"result": out[:16]}
@app.get("/cpu-parallel")
async def cpu_parallel(n: int = 200_000):
# Offload CPU work to another process (parallelism)
import asyncio
loop = asyncio.get_running_loop()
out = await loop.run_in_executor(cpu_pool, heavy_cpu_task, n)
return {"result": out[:16]}
For real systems, CPU-heavy work is often better as a background job (Celery/RQ), especially if it may take seconds+ or needs retries. Use process pools for “medium” CPU tasks that must return quickly.
12.4.3 One clean decision table
| Problem type | Best tool | Why |
|---|---|---|
| I/O wait (DB/HTTP/Redis) | async/await + async libs |
Free the server while waiting |
| CPU heavy (hashing, ML, image/PDF) | multi-process / process pool / job queue | Use multiple cores (true parallelism) |
| Long-running pipeline (seconds-minutes) | background jobs (Celery/RQ) | Durable + retries + doesn’t block requests |
“Async increases concurrency for I/O-bound endpoints by letting the server do other work while waiting. CPU-heavy work needs parallelism (processes) or background workers — async won’t make CPU faster.”
In horizontal scaling, your API must be stateless (or store session state in Redis / DB).
13) Performance: what matters most
Backend performance is primarily about latency (time per request) and throughput (requests per second). In practice, most slow backends are not slow because of Python itself — they are slow because the request path spends time waiting on I/O (database, network) or doing too much work per request.
A request handler is a pipeline: parse → validate → query/compute → respond. Performance work is about finding the dominant cost in that pipeline and reducing it.
13.1 The backend performance hierarchy (typical bottlenecks)
The following “hierarchy” is a useful rule-of-thumb: when an endpoint is slow, these are usually the reasons, in roughly decreasing frequency.
-
Database queries dominate latency
Poor queries, missing indexes, large result sets, and N+1 query patterns often outweigh everything else. -
External API calls dominate latency
Network round trips and third-party services introduce unpredictable latency and failures. -
CPU-heavy work blocks worker threads/processes
Serialization, large JSON transformations, PDF/image processing, or ML inference can saturate CPU and reduce throughput.
Optimize the biggest wait first: if you spend 300ms in the DB and 10ms in Python code, optimizing the Python part won’t move the needle.
13.2 Measure first: where time actually goes
Performance tuning without measurement is guessing. The minimal professional approach:
- Add timing logs around DB calls and external HTTP calls.
- Inspect query plans (e.g.,
EXPLAIN) for slow database queries. - Track percentiles: p50 vs p95 vs p99 latency (tail latency matters in production).
FastAPI example: timing middleware (quick visibility)
import time
from fastapi import FastAPI, Request
app = FastAPI()
@app.middleware("http")
async def timing_middleware(request: Request, call_next):
start = time.perf_counter()
resp = await call_next(request)
duration_ms = (time.perf_counter() - start) * 1000
resp.headers["X-Response-Time-ms"] = f"{duration_ms:.2f}"
return resp
13.3 Practical rules (high-impact improvements)
1) Paginate lists (never return unbounded collections)
Returning “all rows” is a common performance and memory failure. Pagination bounds work per request and improves perceived performance. Prefer cursor-based pagination for large datasets; offset pagination is simpler but slows down at high offsets.
from fastapi import Query
@app.get("/items")
def list_items(limit: int = Query(20, ge=1, le=100), offset: int = Query(0, ge=0)):
# SELECT ... LIMIT :limit OFFSET :offset
return db_list_items(limit=limit, offset=offset)
2) Index columns used in filters/sorts
Indexes speed up lookups and sorting, but cost extra work on writes. Index columns that appear frequently in:
WHERE, JOIN, and ORDER BY. Verify with query plans rather than guessing.
More indexes → faster reads, slower writes, more storage. Use indexes based on real query patterns.
3) Avoid N+1 queries
The N+1 problem happens when you fetch a list (1 query), then for each item fetch related data (N queries). It is common with ORMs if relationships are lazily loaded. Fix it by using joins, eager loading, or batch queries.
Example pattern:
Bad:
1 query: fetch 100 posts
100 queries: fetch author for each post
Total: 101 queries (slow)
Good:
1 query: fetch posts + authors (JOIN / eager load)
4) Cache expensive reads (but handle staleness)
If the same expensive data is requested repeatedly, cache it (Redis, Nginx cache, HTTP caching). Use TTL to limit staleness and consider invalidation on writes for critical correctness.
import json
from redis import Redis
r = Redis(decode_responses=True)
def get_stats():
key = "stats:v1"
cached = r.get(key)
if cached:
return json.loads(cached)
data = db_compute_stats() # expensive query/aggregation
r.set(key, json.dumps(data), ex=30) # cache for 30s
return data
5) Use async for I/O waits, not for CPU-heavy work
async/await improves concurrency when your handler spends time waiting on I/O (HTTP calls, DB calls).
It does not make CPU-heavy code faster. CPU-heavy tasks should be moved to:
background workers (Celery/RQ), or optimized with native libraries, or parallelized safely.
Async helps when you wait on the network; background jobs help when you burn CPU.
6) Add timeouts for external calls (performance + reliability)
External services can be slow or hang. Always set timeouts, and consider retries with backoff for transient failures. Without timeouts, slow dependencies can saturate workers and cascade into outages.
import httpx
def fetch_user_from_partner(user_id: str):
with httpx.Client(timeout=3.0) as client:
r = client.get(f"https://partner.example/api/users/{user_id}")
r.raise_for_status()
return r.json()
13.4 A realistic performance scenario (end-to-end)
Suppose GET /orders is slow. A typical optimization workflow:
- Measure: log DB time + external call time + total time (p50/p95).
- Fix query shape: avoid selecting unused columns, limit result size, paginate.
- Add/adjust indexes: on
user_id,created_atif used in filters/sorts. - Remove N+1: join related tables or eager load.
- Cache: cache expensive aggregates (e.g., summary totals) with TTL.
- Protect dependencies: add timeouts/retries for external services.
“Most backend latency is DB + network. I measure first (percentiles), then fix query patterns (pagination, indexes, avoid N+1), cache expensive reads, use async for I/O waits, and always set timeouts on external calls.”
14) Data layer: ORM design (FastAPI + SQLAlchemy)
An ORM (Object–Relational Mapper) is a programming abstraction that maps relational database tables (rows/columns) to language objects (Python classes/instances). Instead of writing raw SQL for every operation, you work with objects and relations, and the ORM generates SQL and tracks changes for you.
Table users ↔ Python class User
Row in users ↔ instance of User
Column email ↔ attribute User.email
14.1 Why ORMs are used (benefits)
- Productivity: CRUD operations become concise and less error-prone
- Maintainability: domain model lives in code (types, relationships, constraints)
- Portability: same ORM code can target SQLite/Postgres/MySQL (with caveats)
- Safety: parameterized queries by default reduce SQL injection risk
- Transactions: ORMs integrate well with unit-of-work + session patterns
14.2 What an ORM does under the hood (unit of work + identity map)
Mature ORMs (including SQLAlchemy ORM) implement two key ideas:
- Identity map: within a session, each DB row is represented by a single Python object. If you query the same row twice, you usually get the same object instance (consistency inside the session).
-
Unit of work: the ORM tracks changes you make to objects and flushes them as SQL
(
INSERT/UPDATE/DELETE) oncommit().
ORMs are not “free performance.” You still must understand SQL, indexes, and query patterns (especially to avoid N+1 queries and accidental full-table scans).
14.3 A small theoretical example: objects vs tables
Suppose you have a relational table:
CREATE TABLE posts (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL
);
In an ORM, you represent this table as a class. The ORM maps class attributes to columns and generates SQL for you.
When you create an object and commit, the ORM emits an INSERT. When you modify an attribute and commit,
it emits an UPDATE.
14.4 created_at / updated_at (timestamps for auditing)
In production systems, created_at and updated_at are common auditing fields:
- created_at: time the row was created (immutable)
- updated_at: time the row was last modified (changes on update)
14.5 Minimal FastAPI + SQLAlchemy ORM stack (SQLite demo)
This is a realistic minimal stack:
- SQLAlchemy ORM for mapping classes ↔ tables
- SQLite for demo (swap to Postgres in production)
SQLite is great for demos and local dev. In production, Postgres is preferred for concurrency, robustness, and advanced indexing/features. The ORM layer remains similar, but performance and operational behavior differ.
14.6 Minimal code example (model + session + sorting)
The code below shows: (1) an ORM model, (2) automatic timestamps, and (3) sorting by created_at.
from datetime import datetime
from fastapi import FastAPI, Depends, Query
from sqlalchemy import create_engine, Column, Integer, String, DateTime, select, desc, asc
from sqlalchemy.orm import declarative_base, sessionmaker, Session
DATABASE_URL = "sqlite:///./app.db"
engine = create_engine(
DATABASE_URL,
connect_args={"check_same_thread": False} # needed for SQLite + threads
)
SessionLocal = sessionmaker(bind=engine, autocommit=False, autoflush=False)
Base = declarative_base()
class Post(Base):
__tablename__ = "posts"
id = Column(Integer, primary_key=True, index=True)
title = Column(String(120), nullable=False)
created_at = Column(DateTime, nullable=False, default=datetime.utcnow)
updated_at = Column(DateTime, nullable=False, default=datetime.utcnow, onupdate=datetime.utcnow)
Base.metadata.create_all(bind=engine)
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
app = FastAPI()
@app.post("/posts")
def create_post(title: str, db: Session = Depends(get_db)):
post = Post(title=title)
db.add(post)
db.commit()
db.refresh(post)
return {"id": post.id, "title": post.title, "created_at": post.created_at}
@app.get("/posts")
def list_posts(
sort: str = Query("desc", pattern="^(asc|desc)$"),
db: Session = Depends(get_db)
):
order = desc(Post.created_at) if sort == "desc" else asc(Post.created_at)
posts = db.execute(select(Post).order_by(order).limit(50)).scalars().all()
return [{"id": p.id, "title": p.title, "created_at": p.created_at, "updated_at": p.updated_at} for p in posts]
14.7 Common ORM pitfalls (fast interview checklist)
- N+1 queries: fetching relationships in a loop; fix with joins/eager loading
- Unbounded queries: missing pagination/limits
- Missing indexes: slow filters/sorts without indexes (verify with query plans)
- Session misuse: long-lived sessions or leaking sessions across requests
“An ORM maps tables to objects and uses a session (identity map + unit of work) to generate SQL and manage transactions. It improves productivity, but you still need SQL awareness to avoid N+1 queries and slow scans.”
15) Background jobs (RQ / Celery) for heavy tasks
A background job is work that runs outside the HTTP request–response lifecycle. The API handler enqueues a task and returns quickly; the heavy/slow part is executed by a separate worker process (often on another machine). This design increases reliability and throughput for real-world systems.
Background jobs are tasks executed asynchronously after the API response, typically via a queue (Redis/RabbitMQ/SQS) and workers that consume tasks.
15.1 Why background jobs exist
- Keep API fast: return response quickly (low latency)
- Prevent timeouts: avoid long blocking operations inside web workers
- Improve throughput: free request handlers to serve more traffic
- Enable retries safely: transient failures can be retried with backoff
- Isolate resources: heavy CPU/RAM work runs in worker pool, not API processes
15.2 Typical use cases
- Email/SMS: verification email after signup, password reset
- RAG pipelines: chunking documents, generating embeddings, indexing vectors
- Media processing: resizing images, transcoding video/audio
- Analytics: event ingestion, aggregation, periodic reports
- Webhooks: delivery with retries and exponential backoff
15.3 Common pattern (Producer → Queue → Worker)
The web server acts as a producer and enqueues jobs. A queue/broker stores jobs. A worker acts as a consumer and executes them. Results are stored in a DB/cache and can be queried through a status endpoint.
POST /jobs→ enqueue job → returnsjob_id(202 Accepted)GET /jobs/{job_id}→ job state + result/error
async/await is primarily about non-blocking I/O inside the same process.
Background jobs mean the work happens in separate execution (workers), potentially durable and retriable.
15.4 Response semantics
For heavy tasks, the API should usually return 202 Accepted with a job_id.
This indicates the request was accepted for processing, but is not complete yet.
Example response:
{
"status": "queued",
"job_id": "a1b2c3d4"
}
15.5 Minimal in-process background tasks (FastAPI BackgroundTasks)
Framework background tasks (e.g., FastAPI BackgroundTasks) are useful for small, best-effort jobs
but they are not a durable queue (tasks can be lost if the server restarts).
from fastapi import FastAPI, BackgroundTasks
app = FastAPI()
def send_verification_email(to_email: str) -> None:
# call SMTP/provider here
pass
@app.post("/signup")
def signup(email: str, background_tasks: BackgroundTasks):
# create user in DB ...
background_tasks.add_task(send_verification_email, email)
return {"status": "created"}
Not durable (lost on crash), competes with API for CPU/memory, limited visibility/retry control. For heavy tasks, use a real queue (RQ/Celery).
15.6 Celery + Redis example (durable queue + workers)
Celery uses a broker (Redis/RabbitMQ) to store jobs and worker processes to execute them. This is a standard production pattern for background processing.
Worker: define task (tasks.py)
from celery import Celery
celery_app = Celery(
"worker",
broker="redis://localhost:6379/0",
backend="redis://localhost:6379/1",
)
@celery_app.task(bind=True, max_retries=3)
def build_embeddings(self, document_id: str):
try:
# heavy pipeline:
# 1) load document
# 2) chunk text
# 3) generate embeddings
# 4) store vectors + build index
return {"document_id": document_id, "status": "done"}
except Exception as exc:
# exponential backoff for transient failures
raise self.retry(exc=exc, countdown=2 ** self.request.retries)
API server: enqueue job (app.py)
from fastapi import FastAPI
from tasks import build_embeddings, celery_app
app = FastAPI()
@app.post("/documents/{document_id}/embed")
def embed_document(document_id: str):
job = build_embeddings.delay(document_id) # enqueue
return {"status": "queued", "job_id": job.id}
@app.get("/jobs/{job_id}")
def job_status(job_id: str):
res = celery_app.AsyncResult(job_id)
return {"state": res.state, "result": res.result}
15.7 Reliability topics
- Idempotency: tasks may retry; ensure re-running does not create duplicates
- Retry policy: retry transient errors; fail fast on permanent input errors
- Dead-letter queue (DLQ): move repeatedly failing jobs for later inspection
- Observability: log
job_id, duration, failures; track metrics - Ordering/priority: some systems need priority queues and rate limiting
“For heavy work, return 202 + job_id, process via queue + workers, and design tasks to be idempotent with retries/backoff and good observability.”
(See the full RQ/Celery section near the end of this file )
16) Testing with pytest (backend quality)
Install:
pip install pytest httpx
Example test using FastAPI TestClient:
from fastapi.testclient import TestClient
from main import app
client = TestClient(app)
def test_create_and_get_task():
r = client.post("/tasks", json={"title": "hello", "done": False})
assert r.status_code == 201
task = r.json()
assert task["title"] == "hello"
r2 = client.get(f"/tasks/{task['id']}")
assert r2.status_code == 200
assert r2.json()["id"] == task["id"]
Test principles:
- unit tests for pure functions (fast)
- integration tests for API endpoints
- database tests using temporary DB (or test containers)
17) CI (GitHub Actions)
.github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r requirements.txt
- run: pytest -q
18) Security essentials (production mindset)
Security is not one feature — it’s a collection of defaults that limit damage when something goes wrong. The goal is simple: reduce attack surface, prevent easy mistakes, and fail safely under bad inputs, leaked credentials, and broken dependencies.
Assume: inputs are malicious, credentials leak, dependencies fail, and traffic spikes — then design defaults so the system degrades safely.
18.1 Don’t leak internals (errors, stack traces, debug mode)
In production, never expose stack traces, file paths, raw SQL errors, or secrets to users. Return a generic error to clients and log the details internally with a request ID.
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import logging
app = FastAPI()
log = logging.getLogger("app")
@app.exception_handler(Exception)
async def catch_all(request: Request, exc: Exception):
log.exception("Unhandled error") # log full stack trace internally
return JSONResponse(status_code=500, content={"detail": "Internal Server Error"})
- Dev: detailed errors help you
- Prod: detailed errors help attackers
18.2 Secrets (env vars, rotation, and “never log tokens”)
Secrets include DB passwords, JWT signing keys, API keys, OAuth client secrets. One rule covers 90% of incidents: secrets must not live in Git or logs.
- Store: environment variables or a secret manager
- Rotate: treat leaks as inevitable; rotation is your recovery path
- Log hygiene: never log
Authorizationheaders, cookies, or passwords
import os
DATABASE_URL = os.environ["DATABASE_URL"]
SECRET_KEY = os.environ["SECRET_KEY"] # JWT signing key
“It’s fine, it’s only on my server.” If it’s in Git history, HTML, or logs, it eventually leaks.
18.3 Browser threats (XSS vs CSRF) — why cookies need extra care
If you use cookies for authentication, understand these two common web threats:
- XSS: attacker injects JavaScript into your site → tries to steal data or perform actions
- CSRF: browser automatically sends cookies → attacker triggers actions from another site
Practical takeaway: cookie-based auth needs good cookie flags and CSRF defenses for sensitive actions.
Set-Cookie: session_id=...; HttpOnly; Secure; SameSite=Lax; Path=/;
See also: Authentication/Cookies section for the meaning of HttpOnly, Secure, and SameSite.
18.4 HTTPS/TLS (security + correctness)
HTTPS is not optional for real systems. Without HTTPS, credentials and tokens can be intercepted,
and cookies are unsafe (the Secure flag becomes meaningless).
Nginx: redirect HTTP → HTTPS (minimal)
server {
listen 80;
server_name example.com;
return 301 https://$host$request_uri;
}
18.5 Security headers (cheap, high impact)
Security headers reduce browser attack surface. They don’t replace validation/auth, but they harden defaults.
from fastapi import Request
@app.middleware("http")
async def security_headers(request: Request, call_next):
resp = await call_next(request)
resp.headers["X-Content-Type-Options"] = "nosniff"
resp.headers["X-Frame-Options"] = "DENY"
resp.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
# Start simple; CSP needs tuning:
# resp.headers["Content-Security-Policy"] = "default-src 'self';"
return resp
18.6 Abuse controls (rate limits + payload limits + timeouts)
Many “attacks” are just resource exhaustion: too many requests, huge bodies, or slow upstream calls. Apply hard limits to preserve availability.
- Rate limiting: protect login/search and expensive endpoints
- Max request size: avoid huge payload DoS
- Timeouts: external calls must not hang workers
Nginx: cap request body size
server {
client_max_body_size 5m;
}
18.7 File uploads (the forgotten attack surface)
Uploads create real risk: large payload DoS, zip bombs, malicious file types, path traversal. Safe defaults:
- limit size (proxy + app)
- validate content type (extension is not enough)
- store outside web root (don’t serve raw uploads directly)
- randomize filenames (avoid collisions and path tricks)
18.8 Dependency hygiene (silent killer)
Many real incidents come from outdated dependencies. Pin versions, update regularly, and audit in CI.
pip install pip-audit
pip-audit
18.9 Security checklist (fast revision)
- Input boundary: validate early; reject bad payloads
- Access control: AuthN + AuthZ; least privilege
- No leaks: no stack traces / debug in prod; safe error responses
- Secrets: env/secret manager; rotate; never log tokens/passwords
- Browser hardening: cookie flags, CSRF awareness, security headers
- Transport: HTTPS everywhere
- Abuse limits: rate limits, body caps, timeouts
- Uploads: strict limits and safe storage
- Dependencies: pin + audit + update discipline
“I treat security as safe defaults: strict boundaries (validation/auth), no internal leakage, secrets outside Git/logs, HTTPS everywhere, hardened browser surface (cookies/headers/CSRF), abuse limits (rate/body/timeouts), safe uploads, and dependency hygiene.”
19) Observability
- structured logs (JSON logs)
- request IDs / correlation IDs
- metrics (latency, error rate, throughput)
- traces (distributed tracing if microservices)
20) Quick Review Table: 10 backend concepts
This table is a fast revision checklist. For each row, you should be able to explain: (1) what it is, (2) why it matters, (3) one real example.
| Concept | What it is (theory) | Why it matters + typical tools |
|---|---|---|
| 1) Authentication vs Authorization |
AuthN proves identity (“who are you?”). AuthZ enforces permissions (“what can you do?”).
HTTP: 401 vs 403.
|
Prevents unauthorized access and defines security model. Tools: session cookies, JWT bearer tokens, OAuth2, RBAC/ABAC policies. |
| 2) Rate limiting | Bounds requests per identity (IP/user/token) using algorithms like token bucket/leaky bucket. | Protects availability, prevents brute force, stabilizes latency. Tools: Nginx/Cloudflare rate limits, Redis counters, API gateways. |
| 3) Database indexing | Indexes are DB-managed data structures (often B-trees) that accelerate query lookup and ordering. | Faster reads, but increased write cost + storage. Don’t index everything; index based on query patterns. Tools: EXPLAIN query plans, composite indexes. |
| 4) Transactions + ACID | Transaction = atomic unit of work. ACID: Atomicity, Consistency, Isolation, Durability. | Guarantees correctness under concurrency; prevents partial updates. Tools: DB transactions, isolation levels, row locks, optimistic locking. |
| 5) Caching | Stores results to avoid recomputation (space ↔ time tradeoff). Key issues: staleness, invalidation, TTL. | Lower latency and reduced DB/origin load; risk of stale reads and stampedes. Tools: Redis, Nginx cache, CDN cache, HTTP cache (ETag/Cache-Control). |
| 6) Message queues | Producer → queue → consumer model for async work; jobs processed by workers with ack/retry semantics. | Handles heavy tasks reliably, decouples services, smooths spikes. Tools: Celery/RQ, Redis/RabbitMQ/SQS, DLQ, idempotency patterns. |
| 7) Load balancing | Distributes traffic across instances. Strategies: round-robin, least-connections, hashing, sticky sessions. | Improves availability and throughput; enables horizontal scaling. Tools: Nginx/HAProxy/Cloud LB, autoscaling, health checks. |
| 8) CAP theorem | Under network partition, choose between Consistency and Availability; Partition tolerance is required. | Guides distributed DB/service design tradeoffs (CP vs AP). Tools: consensus (Raft), eventual consistency, quorum reads/writes. |
| 9) Reverse proxy | Front door for apps: routes requests to upstreams and can terminate TLS, cache, compress, and filter traffic. | Central place for security + performance controls; improves deployability. Tools: Nginx, Envoy, Traefik (TLS, caching, rate limiting, routing). |
| 10) CDN | Distributed edge network that caches/serves content near users; reduces origin load and latency. | Faster global delivery, better burst handling; must set caching rules carefully. Tools: Cloudflare/Akamai/Fastly, cache rules, TTL, purge/invalidation. |
For each row: say one definition sentence, one tradeoff sentence, and one tool/example sentence. That’s usually enough to answer most backend interview “concept” questions cleanly.
21) Production basics: Docker Compose + Nginx reverse proxy
A common production setup is: Nginx as a reverse proxy in front of your app container. Nginx terminates HTTP traffic, handles routing, and can add TLS, compression, caching, and rate limiting. Your FastAPI app runs behind it (often with Uvicorn/Gunicorn).
Client → Nginx (reverse proxy) → FastAPI (app) → DB/Redis
Docker Compose example (FastAPI + Nginx)
This Compose file runs two services: app (FastAPI) and nginx (reverse proxy).
Nginx forwards requests to the app using the Docker service name app on port 8000.
version: "3.9"
services:
app:
build: .
container_name: fastapi_app
expose:
- "8000"
environment:
- ENV=production
restart: unless-stopped
nginx:
image: nginx:1.27-alpine
container_name: nginx_proxy
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
depends_on:
- app
restart: unless-stopped
Minimal Nginx reverse proxy config
This configuration forwards all requests to the FastAPI app. It also forwards common proxy headers so your app can read the real client IP and scheme (useful for logs, redirects, auth callbacks).
server {
listen 80;
server_name _;
location / {
proxy_pass http://app:8000;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Real-IP $remote_addr;
# reasonable timeouts for upstream
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
}
}
Production notes (short but essential)
- Don’t run debug: use production settings and proper logging.
- Run multiple workers: for CPU-bound scaling, prefer Gunicorn with Uvicorn workers (or scale containers horizontally behind Nginx).
- Health checks: add a
/healthendpoint and configure monitoring. - TLS/HTTPS: terminate TLS at Nginx or use a managed proxy (e.g., Cloudflare). For real production, add HTTPS.
- Secrets: never bake API keys into images; use env vars or secret managers.
In real production you should serve HTTPS. A common pattern is Nginx + Let’s Encrypt (Certbot) or a managed edge proxy. Keep HTTP (80) only for redirecting to HTTPS (443).
22) Data-intensive backends (real-world architectures + technology choices)
A data-intensive backend is a system where the hard part is not CRUD — the hard part is moving + transforming + serving data reliably at scale. These systems fail in different ways: duplicates, out-of-order events, partial writes, overloaded downstreams, long tail latency, and “one bad tenant” issues.
HOT PATH (user-facing, strict latency) COLD PATH (heavy, async, reliable)
API → validate → read/cache → respond | ingest → transform → index/aggregate → publish
Strong backends keep the hot path boring and predictable, and push heavy work to the cold path.
22.1 Real examples of “data-intensive” systems
- Analytics/event tracking: clickstream → Kafka → warehouse → dashboards
- Media/OCR pipelines: upload → queue → OCR/ETL → searchable index
- Search/recommendation: ingest content → compute features → serve ranked results
- Payments/orders: state machines + idempotency + auditability
- IoT/telemetry: high-frequency writes + aggregation + downsampling
22.2 Reference architectures
A) Queue-based “job pipeline” (most common prototype)
Client
→ API (FastAPI)
→ Postgres (metadata + job state)
→ Object storage (S3/MinIO/local) for large payloads/files
→ Queue (Redis/RabbitMQ/Kafka)
→ Workers (Celery/RQ/Arq) for heavy processing
→ Cache (Redis) for hot reads + rate limit
B) Streaming/event-driven pipeline (Kafka-style)
Producers → Kafka topics → stream processors (Flink/Spark/ksqlDB)
→ sinks (ClickHouse/BigQuery/Postgres/Elastic)
→ API reads optimized stores
Most teams don’t need Kafka on day 1. Start with queue + workers. Add streaming only when you truly need: huge throughput, event ordering/partitioning, or many downstream consumers.
22.3 Technology choices: what goes where (practical mapping)
- Postgres/MySQL: metadata, transactions, job states, permissions, audit logs
- S3/MinIO/local FS: big blobs (PDFs, images, exports, embeddings files)
- Redis: cache, rate limit, locks, queues (small pipelines)
- ClickHouse / BigQuery: analytics queries, aggregations, time-series at scale
- Elasticsearch/OpenSearch: full-text search + filters
22.4 Data modeling for pipelines (the part people miss)
- Immutable events are easier than mutable state. Store “what happened”, derive views later.
- Version everything: doc_version, schema_version, pipeline_version.
- Separate raw vs derived: raw input (object store) vs derived artifacts (DB/index/warehouse).
- Explicit job states: queued → running → success/failed (+ retry_count + last_error).
22.5 Reliability patterns (real production painkillers)
1) Idempotency (avoid duplicates under retry)
Networks fail. Clients retry. Workers retry. Without idempotency, you will duplicate jobs and corrupt derived data.
Idempotency key examples:
- upload_id
- tenant_id + file_sha256
- order_id + operation_type
- doc_id + version + chunk_index
2) Retry policy (transient vs permanent)
- Retry: timeouts, 5xx, connection resets
- Fail fast: invalid input, forbidden access, schema mismatch
- DLQ: after N retries → dead letter queue for manual inspection
3) Backpressure (systems die without it)
- Queue + 202: accept request, return job_id, process async
- 429 / rate limit: protect DB, workers, and external APIs
- Load shedding: degrade features (“no rerank”, “no export”) under overload
4) Outbox pattern (don’t lose events)
If you write to Postgres and also publish to a queue, you can lose one of them on crash. Outbox stores the message in the same DB transaction, and a dispatcher publishes later.
Transaction:
INSERT job row
INSERT outbox row (event to publish)
Commit
Dispatcher reads outbox → publishes → marks delivered
22.6 Performance engineering: where time really goes
- Tail latency (p95/p99) matters more than average
- Batch I/O: fewer round-trips beats micro-optimizing Python
- Connection pooling: DB pools & HTTP client pools are critical
- Use async for I/O waits, not for CPU-heavy work
- Cache what’s safe: hot reads, precomputed views, aggregated results
If a request is waiting on DB/network for ~30ms, your CPU can do ~tens of millions of cycles in that time. Concurrency wins by not wasting waiting time, not by “making Python faster”.
22.7 A concrete prototype: “Document Processing Service” (real backend model)
This is a strong interview demo because it includes: file upload, object storage, job queue, worker processing, and polling/streaming status.
Endpoints:
- POST /documents → upload metadata + get presigned URL (or direct upload)
- POST /documents/{id}/ingest → enqueue processing job (returns 202 + job_id)
- GET /jobs/{job_id} → status: queued/running/success/failed
- GET /documents/{id} → returns derived outputs (text, index status, etc.)
FastAPI sketch (enqueue + status)
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uuid
import time
app = FastAPI()
# Pretend stores (replace with Postgres + Redis queue in real code)
JOBS = {}
DOCS = {}
class IngestReq(BaseModel):
tenant_id: str
object_key: str # path in S3/MinIO/local
idempotency_key: str
@app.post("/documents/ingest", status_code=202)
def ingest(req: IngestReq):
# Idempotency: return existing job if same key was already used
for job_id, job in JOBS.items():
if job["idempotency_key"] == req.idempotency_key:
return {"job_id": job_id, "status": job["status"]}
doc_id = str(uuid.uuid4())
job_id = str(uuid.uuid4())
DOCS[doc_id] = {"tenant_id": req.tenant_id, "object_key": req.object_key}
JOBS[job_id] = {
"doc_id": doc_id,
"tenant_id": req.tenant_id,
"status": "queued",
"created_at": time.time(),
"idempotency_key": req.idempotency_key,
"retry_count": 0,
"last_error": None,
}
# In real system: publish job_id into Redis/RabbitMQ/Kafka
return {"job_id": job_id, "doc_id": doc_id, "status": "queued"}
@app.get("/jobs/{job_id}")
def job_status(job_id: str):
job = JOBS.get(job_id)
if not job:
raise HTTPException(404, "job not found")
return job
Production upgrade: Postgres for JOBS/DOCS, Redis/RabbitMQ for queue, Celery/RQ workers to process, S3/MinIO for file storage, and structured logs + metrics for visibility.
22.8 Observability (what you log/measure in real data pipelines)
- Request: request_id, tenant_id, endpoint, status, duration_ms
- Queue: queue depth, enqueue rate, worker concurrency, retry counts
- Jobs: success rate, p95 processing time, failure reasons, DLQ size
- DB: slow query logs, connection pool saturation, locks
- Cost: external API calls per tenant/day (if any)
“I design a hot path with predictable latency and a cold path with queues/workers. I use idempotency + retries + DLQ + backpressure to survive failures, store raw vs derived separately (object store + DB/index), and I measure p95/p99 plus queue depth to keep the system stable under load.”
Final checklist: backend maturity
When you build any feature, ask:
- What is the resource + contract?
- What validation and invariants must hold?
- What authn/authz rules apply?
- Where is truth stored (DB)?
- How will it scale (stateless + cache + queue)?
- How is it tested and deployed?
- How do I observe it in production?