Introduction to Response Time in APIs

Modern applications thrive on data. They process, analyze, and present information to users in meaningful ways. Dynamic applications, in particular, rely on constantly updating data. Servers act as repositories for this ever-changing information, serving it to clients (web browsers, mobile apps, other services) upon request. In the world of web-based APIs, understanding response time is paramount.

As API designers, we must set realistic Service Level Agreements (SLAs), considering both technological constraints and budget limitations.

Example: For real-time voice communication over the internet, excessive latency can significantly degrade the user experience. If a one-way delay exceeds 100 milliseconds, noticeable delays and disruptions can occur. Therefore, API designers need to establish target latency thresholds and carefully design the system to meet those goals.

Years of innovation in web services have set a high bar for user expectations. Slow or unresponsive applications are quickly abandoned, making performance a key factor in user satisfaction. To meet these expectations, we must ask:

How quickly does our API process requests and send back responses?
How does the volume of requests impact API performance?

Different APIs, depending on their functionality, exhibit varying levels of latency. Some operations involve accessing different types of memory, each with its own performance characteristics. To get a sense of these differences, refer to this table of typical latency numbers:

Operation	Typical Latency
CPU Register Access	0.4 ns
L1 Cache Access	1 ns
L2 Cache Access	4 ns
L3 Cache Access	10 ns
Main Memory Access	100 ns
NVMe SSD Read	20 μs
SSD Write	50 μs
HDD Disk Seek	2-5 ms
Database Query	10-100 ms
Network: Same Region	0.5-2 ms
Network: Cross-US	50-100 ms
Network: Cross-Continental	100-200 ms
Network: Satellite	500-700 ms

Key Point: A region refers to a broad geographical area (e.g., North America, Europe). A zone is a more isolated location within a region (e.g., a specific data center cluster). A data center is a physical facility containing servers and network infrastructure. Understanding these distinctions is crucial when analyzing latency, as network distances play a significant role.

Latency vs. Response Time: Two Sides of the Same Coin

The time it takes for a client to send a request and receive a response from a server is a critical measure of API performance. This time can be broken down into two components:

Latency (Network Latency): This is the time it takes for a message (request or response) to travel across the network from client to server, excluding any processing time on the server.
Processing Time: The time the server spends handling the request, including database queries, computations, file operations, and any other necessary processing.

API Response Time, the total time from request initiation to response reception, encompasses both network latency and server processing time.

sequenceDiagram
    participant Client
    participant Network
    participant Server
    
    Note over Client,Server: Total Response Time
    
    Client->>Network: Request Start
    Note over Network: Network Latency
    Network->>Server: Request Arrives
    
    Note over Server: Processing Time
    
    Server->>Network: Response Start
    Note over Network: Network Latency
    Network->>Client: Response Arrives
    
    Note over Client,Network: Latency Component
    Note over Server: Processing Component

We can express this relationship as:

Response Time = Latency + Processing Time

To optimize response time, we must address both latency and processing time. Latency is influenced by factors like distance between client and server, network congestion, and the presence of intermediary components like caches or proxy servers. Ping measurements often provide a rough estimate of network latency.

Key Point: Acceptable response times vary depending on the application. Generally, response times under one second are considered good. However, for real-time applications like online gaming or video conferencing, much lower latencies are required for a seamless user experience.

Factors Affecting Response Time: A Deeper Look

Numerous factors contribute to an API’s response time. The table at the beginning of this piece provides some insights into the time taken for various operations on the server side. Let’s examine other factors that influence the journey of a request from client to server and back:

sequenceDiagram
    participant Client
    participant DNS
    participant LoadBalancer
    participant Server
    participant Database

    Note over Client,Database: Request Journey
    Client->>DNS: DNS Lookup
    DNS-->>Client: IP Resolution
    
    Client->>LoadBalancer: TCP Handshake
    LoadBalancer-->>Client: Connection Established
    
    Client->>LoadBalancer: SSL/TLS Handshake
    LoadBalancer-->>Client: Secure Connection
    
    Client->>LoadBalancer: HTTP Request
    LoadBalancer->>Server: Forward Request
    
    Server->>Database: Data Retrieval
    Database-->>Server: Return Data
    
    Server->>LoadBalancer: Process & Format Response
    LoadBalancer-->>Client: HTTP Response

    Note over Client,Database: Total Response Time = Sum of all operations

DNS Lookup: If the client doesn’t already know the server’s IP address, it needs to perform a DNS lookup, querying a DNS server to resolve the domain name (e.g., api.example.com) into an IP address.

TCP Handshake: Before data can be exchanged, a TCP connection must be established between client and server using a three-way handshake.

SSL/TLS Handshake: If the communication is over HTTPS, an additional SSL/TLS handshake is required to establish a secure, encrypted channel.

Request Transmission: Once the connection is established, the client sends its HTTP request to the server.

Server Processing: The server receives the request and processes it, potentially involving database queries, computations, or file operations.

Data Retrieval (If Applicable): If the request involves retrieving data, the server fetches it from the database or other storage.

Response Transmission: The server sends the HTTP response back to the client, containing the requested data or a status code indicating the outcome of the request.

Data Download: The client receives and processes the response data.

Key Point: The time taken for DNS lookup, TCP handshake, and SSL/TLS handshake is often referred to as the base time. This base time, combined with the round-trip time for the request and response, contributes to the overall latency.

We can represent this as:

Latency = Base Time + Round-Trip Time (Request/Response) + Data Download Time

Think About It: Imagine a scenario where a client is accessing an API hosted in a different geographical region. Would you expect the response time to be the same for clients in various locations? Why or why not?

Introduction to Response Time in APIs

Table of Contents

Latency vs. Response Time: Two Sides of the Same Coin

Factors Affecting Response Time: A Deeper Look