Latency — Software & Tech Glossary

Latency is the time delay between initiating an action and seeing its effect. In software, it usually refers to the time between sending a request (like clicking a link or calling an API) and receiving the first byte of the response. It’s measured in milliseconds (ms).

Lower latency means faster response. A web page with 50ms latency feels instant. One with 3,000ms (3 seconds) feels slow.

You’ll hear this when…

“Latency is high” or “we’re seeing latency spikes” means the system is responding slowly. Latency appears in performance monitoring, infrastructure planning, and user experience discussions.

Common latency contexts:

Network latency: The time for data to travel between two points. Physical distance matters — a server in the same city responds faster than one on another continent.
Database latency: The time for a database query to return results.
API latency: The total time for an API call to complete, including network transit and server processing.

The P99 latency (99th percentile) is a frequently cited metric — it tells you what the slowest 1% of requests experience. A P99 of 500ms means 99% of requests complete in under 500ms, but 1% take longer.

Latency vs. throughput

Latency is how long a single request takes. Throughput is how many requests a system can handle per second. A system can have low latency but low throughput (handles each request quickly, but can’t handle many at once) or vice versa. Both matter, but they measure different things.

Source: Google — Site Reliability Engineering