Peak EWMA Load Balancer

  • This load balancer should be configured with the type URL type.googleapis.com/envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma.

Note

Peak EWMA is a contrib extension that must be explicitly enabled at Envoy build time. See Contrib builds for details.

The Peak EWMA (Exponentially Weighted Moving Average) load balancer implements a latency- and active-requests-aware variant of the Power of Two Choices (P2C) algorithm. It automatically routes traffic to the best-performing hosts based on real-time latency measurements and outstanding active requests.

Peak EWMA considers more input data than Envoy’s other load balancing algorithms, which enables it to make superior routing decisions. In the case of a slowdown, it seamlessly moves traffic away from the affected host.

Peak EWMA is also well-suited for cross-data-center routing: it naturally prefers upstream hosts in the closest data center, but seamlessly fails over to other data centers during slowdowns (and fails back when performance recovers).

In scenarios where all upstream hosts have similar request latency, Peak EWMA behaves equivalently to equal-weighted least request load balancing (using P2C selection).

Note

Peak EWMA requires both the load balancing policy AND the HTTP filter to function properly. The HTTP filter (envoy.filters.http.peak_ewma) measures request RTT and provides timing data to the load balancer. Without the HTTP filter, the load balancer cannot collect latency measurements.

Note: This requirement may change if Peak EWMA becomes a core Envoy load balancing algorithm in the future, which would require core changes to integrate RTT measurement functionality.

Important

Peak EWMA considers latency and load when making routing decisions. It does not handle unhealthy hosts or error responses directly. This is especially critical because upstream hosts that fast-fail (return errors quickly) may appear to have low latency, causing Peak EWMA to send them a greater proportion of traffic — exactly the opposite of what you want.

Always configure Envoy’s health checking and outlier detection to automatically remove failing hosts from the load balancing pool before Peak EWMA makes routing decisions.

Algorithm Overview

Peak EWMA uses the cost function: Cost = RTT_peak_ewma * (active_requests + 1)

Key characteristics:

  • Latency-sensitive: Automatically de-prioritizes slow hosts

  • Load-aware: Considers both latency and current request count

  • O(1) complexity: Efficient P2C selection scales to large clusters

  • Adaptive: No manual tuning required, responds to performance changes

  • Health-agnostic: Operates only on healthy hosts as determined by health checking and outlier detection

Integration with Health Management

Peak EWMA works in conjunction with Envoy’s health management systems:

  • Health Checking: Only hosts that pass active health checks are considered for load balancing

  • Outlier Detection: Hosts ejected by outlier detection are automatically excluded from selection

  • Error Handling: HTTP error responses (4xx/5xx) do not directly affect Peak EWMA routing decisions

For comprehensive host health management, configure Peak EWMA alongside:

cluster:
  # Health checking removes unresponsive hosts
  health_checks:
  - timeout: 5s
    interval: 10s
    http_health_check:
      path: "/health"

  # Outlier detection removes hosts with high error rates
  outlier_detection:
    consecutive_5xx: 3
    interval: 30s
    base_ejection_time: 30s

  # Peak EWMA optimizes among remaining healthy hosts
  load_balancing_policy:
    policies:
    - typed_extension_config:
        name: envoy.load_balancing_policies.peak_ewma
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma
          decay_time: 10s

# HTTP filter configuration - required for RTT measurement
http_filters:
- name: envoy.filters.http.peak_ewma
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.peak_ewma.v3alpha.PeakEwmaConfig

Configuration Parameters

Peak EWMA supports the following configuration parameters:

decay_time (google.protobuf.Duration, default: 10s)

The time window over which latency observations decay to half their original weight. Shorter values adapt faster to performance changes, longer values provide more stability.

aggregation_interval (google.protobuf.Duration, default: 100ms)

Frequency of EWMA data aggregation from worker threads. Lower values provide fresher data but increase CPU overhead.

max_samples_per_host (google.protobuf.UInt32Value, default: 1,000)

Ring buffer size per host per worker thread for RTT samples. Larger values handle traffic bursts better but consume more memory.

Buffer capacity = max_samples_per_host / aggregation_interval = RPS capacity per host per worker.

default_rtt (google.protobuf.Duration, default: 10ms)

Baseline RTT for cost calculations when no measurements are available yet. Should reflect expected latency in your environment.

penalty_value (google.protobuf.DoubleValue, default: 1,000,000.0)

Cost penalty for hosts without RTT data. You probably should not change this value.

Example configuration

Minimal configuration with defaults suitable for most deployments:

cluster:
  load_balancing_policy:
    policies:
    - typed_extension_config:
        name: envoy.load_balancing_policies.peak_ewma
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma

# HTTP filter configuration - required for RTT measurement
http_filters:
- name: envoy.filters.http.peak_ewma
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.peak_ewma.v3alpha.PeakEwmaConfig

Complete configuration showing all available parameters:

cluster:
  load_balancing_policy:
    policies:
    - typed_extension_config:
        name: envoy.load_balancing_policies.peak_ewma
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma
          decay_time: 10s
          aggregation_interval: 100ms
          max_samples_per_host: 1000
          default_rtt: 10ms
          penalty_value: 1000000.0

Statistics

The Peak EWMA load balancer outputs statistics in the cluster.<cluster_name>.peak_ewma. namespace.

Name

Type

Description

samples_recorded

Counter

Total RTT samples recorded across all hosts

samples_dropped

Counter

Samples dropped due to buffer overflow

ewma_calculations

Counter

Number of EWMA calculations performed

hosts_with_data

Gauge

Number of hosts with available EWMA data

aggregation_cycles

Counter

Number of aggregation timer cycles executed

Performance Characteristics

Peak EWMA provides the following performance characteristics:

  • Selection complexity: O(1) per request using Power of Two Choices algorithm

  • Memory usage: Configurable via max_samples_per_host parameter

  • CPU overhead: Minimal during request processing, periodic aggregation every 100ms

The load balancer maintains constant selection time regardless of cluster size.

API Reference

The Peak EWMA load balancing policy is configured using the envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma proto message.