FAQ - OpenSSL Performance Benchmark

Benchmark Methodology

Are synthetic benchmarks representative of real-world performance?

No, but that's intentional. This benchmark measures the library overhead ceiling. Tools like openssl speed isolate cryptographic performance.

Real applications (Nginx, Node.js) spend time on HTTP parsing, TCP management, and logging. A 15% OpenSSL regression might only cause a 1-2% application slowdown.

We're measuring the engine, not the whole car. If the library is slower, the application cannot be faster, but other factors will dilute the impact.

Why test on localhost instead of over a real network?

To eliminate network jitter. The s_time handshake test connects to s_server on localhost to measure CPU cost, not network latency.

In production, network RTT often dominates connection time. A 0.5ms CPU regression might be invisible against 50ms of network latency. Testing on localhost isolates the CPU efficiency we're trying to measure.

Why compile from source instead of using distro packages?

To compare codebases, not packagers' optimization skills.

Linux distributions apply heavy patching and compiler flags (-O3, -march=native). Using upstream defaults ensures we're comparing OpenSSL 1.1 vs 3.x code changes, not Debian's vs RHEL's build process.

Absolute numbers may differ from apt-get install openssl, but relative regressions between versions remain valid.

Doesn't Docker add overhead?

On Linux, Docker overhead is negligible (syscall isolation). On macOS/Windows, there is virtualization overhead.

However, since every version runs in the same Docker environment, the overhead cancels out. We're measuring trends across versions, not absolute hardware limits.

Block Sizes Explained

What is a "block size" in cryptographic benchmarks?

In cryptographic benchmarks, "block size" refers to the size of data chunks being encrypted or hashed in a single operation—NOT the underlying cipher's internal block size.

Understanding the terminology:

Cipher block size (fixed): AES always operates on 128-bit (16-byte) internal blocks. This is a property of the algorithm itself and cannot be changed.
Benchmark block size (variable): The amount of data passed to the encryption function in one call. When we test "8KB blocks," we're encrypting 8,192 bytes in a single EVP_EncryptUpdate() call.

Why block size matters for performance:

Every encryption operation has overhead:

Context initialization: Setting up the cipher context
Key schedule: Preparing round keys
Provider lookup (OpenSSL 3.x): Finding the algorithm implementation
Function call overhead: Entry/exit costs

Example comparison:

Scenario	Data Size	Block Size	Overhead Impact
IoT sensor	64 bytes	64B	High - 100% overhead per message
Database field	256 bytes	256B	Medium - overhead still significant
File encryption	1 MB	8KB	Low - overhead amortized
TLS record	16KB	16KB	Minimal - bulk throughput

Real-world implications:

Chat applications encrypting 100-byte messages see lower throughput than the "8KB" benchmark numbers suggest
Bulk file encryption achieves near-maximum throughput shown in benchmarks
Database encryption (per-field) falls somewhere in between

This is why we test multiple block sizes: to help you understand performance across different use cases.

What block sizes are tested in this benchmark?

16B, 64B, 256B, 1KB, and 8KB.

Small blocks (16-64B) stress initialization overhead
Large blocks (8KB) show maximum throughput
The gap reveals Provider architecture overhead in OpenSSL 3.x

Note: Currently, only 1KB and 8KB block size data is captured in the benchmark results. Smaller block sizes (16B, 64B, 256B) require additional benchmark runs to collect.

Statistical Iterations

Why run each version multiple times?

To eliminate measurement noise and provide statistical confidence. A single run might be affected by temporary system conditions. Multiple iterations reveal whether performance is consistent or variable.

Results include mean ± standard deviation. Low stddev indicates reliable measurement; high stddev suggests investigating system interference.

How many iterations should I use?

Depends on your needs:

2 iterations: Minimal statistics, fast
3 iterations: Recommended default
10 iterations: Publication quality
20 iterations: Maximum confidence

Each iteration runs in a fresh Docker container. GitHub Actions runs all in parallel, so wall-clock time stays the same (~30 minutes), but CI minutes scale linearly.

What if I see high standard deviation (>5%)?

High variance indicates performance instability. Possible causes:

System load during test
Thermal throttling
Resource contention

Solutions:

Increase iterations to 10+
Review detailed-iterations.json for outliers
Check CI runner specifications

AVX Impact Testing

What are AVX and AVX2?

Advanced Vector Extensions (AVX/AVX2) are CPU instruction set extensions that enable SIMD (Single Instruction Multiple Data) operations. They allow processing multiple data elements simultaneously, significantly accelerating cryptographic operations.

Does the benchmark test AVX impact?

Yes. The benchmark automatically runs with and without AVX enabled to measure the performance difference. This is controlled using the OPENSSL_ia32cap environment variable, which tells OpenSSL which CPU features to use at runtime.

Why is AVX particularly important for ML-KEM?

ML-KEM (Kyber) is a lattice-based post-quantum algorithm that involves many matrix and polynomial operations. These operations map extremely well to SIMD instructions:

Vector additions/multiplications: Core of lattice operations
Parallel NTT (Number Theoretic Transform): AVX2 accelerates by 4-8x
Packing/unpacking: Vectorized bit manipulation

You'll often see 50-100%+ improvement with AVX enabled for ML-KEM vs disabled.

How do I run the AVX impact test locally?

Use the standalone test script:

./scripts/test-avx-impact.sh 3.5.4

Or run inside a Docker container:

docker run --rm openssl-bench:3.5.4 ./avx_benchmark.sh

Interpreting Results

Why does throughput improve but handshakes slow down in OpenSSL 3.x?

Different architectural components:

Throughput increase: Updated assembly optimizations, better pipelining for bulk operations.

Handshake decrease: Provider model introduces abstraction layers. Handshakes involve many small operations (RNG, hashing, signing), and per-operation overhead accumulates.

Why is TLS 1.2 session resumption faster than TLS 1.3?

TLS 1.2 session resumption completely bypasses asymmetric crypto. TLS 1.3 PSK resumption still performs HKDF key derivation and potentially ephemeral DH for enhanced forward secrecy.

TLS 1.2 resumption: 30-40K+ CPS
TLS 1.3 resumption: 6-7K CPS

TLS 1.3 provides better security properties, but TLS 1.2 resumption remains faster in pure throughput.

How do I know if results are reliable?

Check the standard deviation:

<1%: Excellent consistency
1-5%: Acceptable
>5%: Investigate

Also review detailed-iterations.json for outliers or patterns.

Why does OpenSSL 3.5.x show lower AES-GCM 1KB performance?

This is NOT a performance regression—it's a benchmark methodology change.

Commit 607a46d fixed the openssl speed command to properly test AEAD ciphers by computing and validating authentication tags. Previous versions were testing an unrealistic scenario that didn't include tag operations.

OpenSSL 3.5.x benchmark numbers are more realistic but not directly comparable to 3.4.x and earlier. The actual encryption performance hasn't regressed—the measurement is now more accurate.

👉 See full explanation in the Version Analysis

Technical Details

What tools are used for each metric?

Throughput: openssl speed -evp [algo] (uses hardware acceleration)
Handshakes: openssl s_time + openssl s_server
Asymmetric ops: openssl speed [algo] (RSA, ECDSA, ECDH)
PQC: openssl speed ml-kem-768 (OpenSSL 3.5+ only)

Why use the -evp flag?

The EVP (Envelope) interface allows hardware acceleration (AES-NI). Testing without -evp measures software-only implementation, which is irrelevant for production use.

Frequently Asked Questions

📑 Quick Navigation