Peak EWMA Load Balancer
This load balancer should be configured with the type URL
type.googleapis.com/envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma.
Note
Peak EWMA is a contrib extension that must be explicitly enabled at Envoy build time. See Contrib builds for details.
The Peak EWMA (Exponentially Weighted Moving Average) load balancer implements a latency- and active-requests-aware variant of the Power of Two Choices (P2C) algorithm. It automatically routes traffic to the best-performing hosts based on real-time latency measurements and outstanding active requests.
Peak EWMA considers more input data than Envoy’s other load balancing algorithms, which enables it to make superior routing decisions. In the case of a slowdown, it seamlessly moves traffic away from the affected host.
Peak EWMA is also well-suited for cross-data-center routing: it naturally prefers upstream hosts in the closest data center, but seamlessly fails over to other data centers during slowdowns (and fails back when performance recovers).
In scenarios where all upstream hosts have similar request latency, Peak EWMA behaves equivalently to equal-weighted least request load balancing (using P2C selection).
Note
Peak EWMA requires both the load balancing policy AND the HTTP filter to function properly.
The HTTP filter (envoy.filters.http.peak_ewma) measures request RTT and provides timing
data to the load balancer. Without the HTTP filter, the load balancer cannot collect
latency measurements.
Note: This requirement may change if Peak EWMA becomes a core Envoy load balancing algorithm in the future, which would require core changes to integrate RTT measurement functionality.
Important
Peak EWMA considers latency and load when making routing decisions. It does not handle unhealthy hosts or error responses directly. This is especially critical because upstream hosts that fast-fail (return errors quickly) may appear to have low latency, causing Peak EWMA to send them a greater proportion of traffic — exactly the opposite of what you want.
Always configure Envoy’s health checking and outlier detection to automatically remove failing hosts from the load balancing pool before Peak EWMA makes routing decisions.
Algorithm Overview
Peak EWMA uses the cost function: Cost = RTT_peak_ewma * (active_requests + 1)
Key characteristics:
Latency-sensitive: Automatically de-prioritizes slow hosts
Load-aware: Considers both latency and current request count
O(1) complexity: Efficient P2C selection scales to large clusters
Adaptive: No manual tuning required, responds to performance changes
Health-agnostic: Operates only on healthy hosts as determined by health checking and outlier detection
Integration with Health Management
Peak EWMA works in conjunction with Envoy’s health management systems:
Health Checking: Only hosts that pass active health checks are considered for load balancing
Outlier Detection: Hosts ejected by outlier detection are automatically excluded from selection
Error Handling: HTTP error responses (4xx/5xx) do not directly affect Peak EWMA routing decisions
For comprehensive host health management, configure Peak EWMA alongside:
cluster:
# Health checking removes unresponsive hosts
health_checks:
- timeout: 5s
interval: 10s
http_health_check:
path: "/health"
# Outlier detection removes hosts with high error rates
outlier_detection:
consecutive_5xx: 3
interval: 30s
base_ejection_time: 30s
# Peak EWMA optimizes among remaining healthy hosts
load_balancing_policy:
policies:
- typed_extension_config:
name: envoy.load_balancing_policies.peak_ewma
typed_config:
"@type": type.googleapis.com/envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma
decay_time: 10s
# HTTP filter configuration - required for RTT measurement
http_filters:
- name: envoy.filters.http.peak_ewma
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.peak_ewma.v3alpha.PeakEwmaConfig
Configuration Parameters
Peak EWMA supports the following configuration parameters:
- decay_time (
google.protobuf.Duration, default: 10s) The time window over which latency observations decay to half their original weight. Shorter values adapt faster to performance changes, longer values provide more stability.
- aggregation_interval (
google.protobuf.Duration, default: 100ms) Frequency of EWMA data aggregation from worker threads. Lower values provide fresher data but increase CPU overhead.
- max_samples_per_host (
google.protobuf.UInt32Value, default: 1,000) Ring buffer size per host per worker thread for RTT samples. Larger values handle traffic bursts better but consume more memory.
Buffer capacity = max_samples_per_host / aggregation_interval = RPS capacity per host per worker.
- default_rtt (
google.protobuf.Duration, default: 10ms) Baseline RTT for cost calculations when no measurements are available yet. Should reflect expected latency in your environment.
- penalty_value (
google.protobuf.DoubleValue, default: 1,000,000.0) Cost penalty for hosts without RTT data. You probably should not change this value.
Example configuration
Minimal configuration with defaults suitable for most deployments:
cluster:
load_balancing_policy:
policies:
- typed_extension_config:
name: envoy.load_balancing_policies.peak_ewma
typed_config:
"@type": type.googleapis.com/envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma
# HTTP filter configuration - required for RTT measurement
http_filters:
- name: envoy.filters.http.peak_ewma
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.peak_ewma.v3alpha.PeakEwmaConfig
Complete configuration showing all available parameters:
cluster:
load_balancing_policy:
policies:
- typed_extension_config:
name: envoy.load_balancing_policies.peak_ewma
typed_config:
"@type": type.googleapis.com/envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma
decay_time: 10s
aggregation_interval: 100ms
max_samples_per_host: 1000
default_rtt: 10ms
penalty_value: 1000000.0
Statistics
The Peak EWMA load balancer outputs statistics in the cluster.<cluster_name>.peak_ewma. namespace.
Name |
Type |
Description |
|---|---|---|
samples_recorded |
Counter |
Total RTT samples recorded across all hosts |
samples_dropped |
Counter |
Samples dropped due to buffer overflow |
ewma_calculations |
Counter |
Number of EWMA calculations performed |
hosts_with_data |
Gauge |
Number of hosts with available EWMA data |
aggregation_cycles |
Counter |
Number of aggregation timer cycles executed |
Performance Characteristics
Peak EWMA provides the following performance characteristics:
Selection complexity: O(1) per request using Power of Two Choices algorithm
Memory usage: Configurable via
max_samples_per_hostparameterCPU overhead: Minimal during request processing, periodic aggregation every 100ms
The load balancer maintains constant selection time regardless of cluster size.
API Reference
The Peak EWMA load balancing policy is configured using the
envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma proto message.