Modern applications thrive on data. They process, analyze, and present information to users in meaningful ways. Dynamic applications, in particular, rely on constantly updating data. Servers act as repositories for this ever-changing information, serving it to clients (web browsers, mobile apps, other services) upon request. In the world of web-based APIs, understanding response time is paramount.
As API designers, we must set realistic Service Level Agreements (SLAs), considering both technological constraints and budget limitations.
Example: For real-time voice communication over the internet, excessive latency can significantly degrade the user experience. If a one-way delay exceeds 100 milliseconds, noticeable delays and disruptions can occur. Therefore, API designers need to establish target latency thresholds and carefully design the system to meet those goals.
Years of innovation in web services have set a high bar for user expectations. Slow or unresponsive applications are quickly abandoned, making performance a key factor in user satisfaction. To meet these expectations, we must ask:
- How quickly does our API process requests and send back responses?
- How does the volume of requests impact API performance?
Different APIs, depending on their functionality, exhibit varying levels of latency. Some operations involve accessing different types of memory, each with its own performance characteristics. To get a sense of these differences, refer to this table of typical latency numbers:
Operation | Typical Latency |
---|---|
CPU Register Access | 0.4 ns |
L1 Cache Access | 1 ns |
L2 Cache Access | 4 ns |
L3 Cache Access | 10 ns |
Main Memory Access | 100 ns |
NVMe SSD Read | 20 μs |
SSD Write | 50 μs |
HDD Disk Seek | 2-5 ms |
Database Query | 10-100 ms |
Network: Same Region | 0.5-2 ms |
Network: Cross-US | 50-100 ms |
Network: Cross-Continental | 100-200 ms |
Network: Satellite | 500-700 ms |
Key Point: A region refers to a broad geographical area (e.g., North America, Europe). A zone is a more isolated location within a region (e.g., a specific data center cluster). A data center is a physical facility containing servers and network infrastructure. Understanding these distinctions is crucial when analyzing latency, as network distances play a significant role.
Latency vs. Response Time: Two Sides of the Same Coin
The time it takes for a client to send a request and receive a response from a server is a critical measure of API performance. This time can be broken down into two components:
- Latency (Network Latency): This is the time it takes for a message (request or response) to travel across the network from client to server, excluding any processing time on the server.
- Processing Time: The time the server spends handling the request, including database queries, computations, file operations, and any other necessary processing.
API Response Time, the total time from request initiation to response reception, encompasses both network latency and server processing time.
sequenceDiagram participant Client participant Network participant Server Note over Client,Server: Total Response Time Client->>Network: Request Start Note over Network: Network Latency Network->>Server: Request Arrives Note over Server: Processing Time Server->>Network: Response Start Note over Network: Network Latency Network->>Client: Response Arrives Note over Client,Network: Latency Component Note over Server: Processing Component
We can express this relationship as:
Response Time = Latency + Processing Time
To optimize response time, we must address both latency and processing time. Latency is influenced by factors like distance between client and server, network congestion, and the presence of intermediary components like caches or proxy servers. Ping measurements often provide a rough estimate of network latency.
Key Point: Acceptable response times vary depending on the application. Generally, response times under one second are considered good. However, for real-time applications like online gaming or video conferencing, much lower latencies are required for a seamless user experience.
Factors Affecting Response Time: A Deeper Look
Numerous factors contribute to an API’s response time. The table at the beginning of this piece provides some insights into the time taken for various operations on the server side. Let’s examine other factors that influence the journey of a request from client to server and back:
sequenceDiagram participant Client participant DNS participant LoadBalancer participant Server participant Database Note over Client,Database: Request Journey Client->>DNS: DNS Lookup DNS-->>Client: IP Resolution Client->>LoadBalancer: TCP Handshake LoadBalancer-->>Client: Connection Established Client->>LoadBalancer: SSL/TLS Handshake LoadBalancer-->>Client: Secure Connection Client->>LoadBalancer: HTTP Request LoadBalancer->>Server: Forward Request Server->>Database: Data Retrieval Database-->>Server: Return Data Server->>LoadBalancer: Process & Format Response LoadBalancer-->>Client: HTTP Response Note over Client,Database: Total Response Time = Sum of all operations
- DNS Lookup: If the client doesn’t already know the server’s IP address, it needs to perform a DNS lookup, querying a DNS server to resolve the domain name (e.g., api.example.com) into an IP address.
- TCP Handshake: Before data can be exchanged, a TCP connection must be established between client and server using a three-way handshake.
- SSL/TLS Handshake: If the communication is over HTTPS, an additional SSL/TLS handshake is required to establish a secure, encrypted channel.
- Request Transmission: Once the connection is established, the client sends its HTTP request to the server.
- Server Processing: The server receives the request and processes it, potentially involving database queries, computations, or file operations.
- Data Retrieval (If Applicable): If the request involves retrieving data, the server fetches it from the database or other storage.
- Response Transmission: The server sends the HTTP response back to the client, containing the requested data or a status code indicating the outcome of the request.
- Data Download: The client receives and processes the response data.
Key Point: The time taken for DNS lookup, TCP handshake, and SSL/TLS handshake is often referred to as the base time. This base time, combined with the round-trip time for the request and response, contributes to the overall latency.
We can represent this as:
Latency = Base Time + Round-Trip Time (Request/Response) + Data Download Time
Think About It: Imagine a scenario where a client is accessing an API hosted in a different geographical region. Would you expect the response time to be the same for clients in various locations? Why or why not?