# OpenSSL Performance Benchmark Results

Analysis of OpenSSL performance regressions and improvements across versions 1.1.1w through 3.6.0.

## Test Methodology & System Info

**Statistical Validation:** Each OpenSSL version was tested **3 times** in separate containers to ensure measurement reliability. All reported values are **mean averages** across iterations, with standard deviations shown where significant (±values). This approach eliminates localized variance and provides confidence in the measurements.

All tests were conducted in isolated Docker containers (Debian Bookworm) to ensure environment consistency. Each version was compiled from source.

### CPU Information

| Property | Value |
|----------|-------|
| **Model** | Neoverse-N2 |
| **Architecture** | aarch64 |
| **Cores** | 4 |
| **Hardware Acceleration** | AES-NI, SHA-NI, AES, PMULL, SHA, NEON/ASIMD, SVE |

> ℹ️ **ARM64 Architecture:** This is an ARM processor. ARM uses NEON/ASIMD and optional SVE for vectorized operations instead of AVX.

### Operating System & Environment

| Property | Value |
|----------|-------|
| **Container OS** | Debian GNU/Linux 12 |
| **Kernel** | Linux 6.14.0-1014-azure |
| **Container** | Docker/Debian |
| **Platform** | linux-aarch64 |

### OpenSSL Build Configuration

All versions compiled from upstream source with consistent settings:

```
gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG 
```

**Test Definitions:**
- **Algorithm Throughput:** Measured using `openssl speed -evp [algo]`. This uses the high-level Envelope interface, which utilizes hardware acceleration (like AES-NI) where available. It represents raw encryption speed for bulk data transfer.
- **TLS Handshake:** Measured using `openssl s_time -new`. This creates repeated new TLS connections to a local `openssl s_server`. It stresses the CPU-intensive parts of the protocol (key exchange, certificate parsing, signature verification) rather than network I/O.

## Performance Analysis

### Why Throughput Improved but Handshakes Slowed Down

You may notice a divergence in the results: **Algorithm throughput (AES-GCM, SHA256) often increases in 3.x, while Handshake performance decreases.**

1.  **Throughput Increase:** OpenSSL 3.x includes updated assembly optimizations and better pipelining for modern processors. The EVP layer in 3.x is highly optimized for bulk operations, allowing it to process large blocks of data (8KB) more efficiently.
2.  **Handshake Decrease:** The drop in handshake performance is primarily due to the architectural overhaul in OpenSSL 3.0, specifically the "Provider" model. This introduced abstraction layers that require property queries and provider lookups for every cryptographic operation. Since a TLS handshake involves many *small* operations (random number generation, hashing, signing), this per-operation overhead accumulates, resulting in fewer connections per second compared to the leaner 1.1.1 architecture.

## Version Overview

| Version | Release Date | Series Introduced | Series Features |
|---------|--------------|-------------------|-----------------|
| **1.1.1w** | 2023-09-11 | Sept 2018 | Final release of the 1.1.1 LTS series (EOL Sept 2023). Support for TLS 1.3, SHA-3, X448/Ed448. The performance baseline. |
| **3.0.18** | 2025-09-30 | Sept 2021 | LTS release. Introduced the Provider architecture (FIPS 140-2). Major architectural overhaul often cited as the cause of performance regressions. |
| **3.1.8** | 2025-02-11 | Mar 2023 | FIPS 140-3 compliance. Focused on performance improvements over 3.0 and addressing initial regressions. |
| **3.2.6** | 2025-09-30 | Nov 2023 | Client-side QUIC support. TLS certificate compression (RFC 8879). Deterministic ECDSA (RFC 6979). |
| **3.3.5** | 2025-09-30 | Apr 2024 | QUIC trace and polling improvements. New EVP_DigestSqueeze API. Further performance tuning. |
| **3.4.3** | 2025-09-30 | Oct 2024 | FIPS indicators. Composite signature algorithms. PBMAC1 support. New integrity checks. |
| **3.5.4** | 2025-09-30 | Apr 2025 | Post-Quantum Cryptography (ML-KEM, ML-DSA). QUIC server support. |
| **3.6.0** | 2025-10-01 | Oct 2025 | EVP_SKEY opaque symmetric keys. LMS signature verification. FIPS 186-5 deterministic ECDSA. C-99 required. |

## TLS Handshake Performance (Connections/sec)

> **Why this matters:** Handshake performance is critical for web servers handling many short-lived connections. This was a primary regression point in OpenSSL 3.0.

> **Measurement Reliability:** Each value is the mean of 3 independent runs ± standard deviation.

| Version | New Connections | Resumed | Change vs 1.1.1w |
|---------|----------------:|--------:|-----------------:|
| **1.1.1w** | 7,963 ± 271 | 8,256 ± 27 | Baseline |
| **3.0.18** | 5,896 ± 234 | 6,198 ± 107 | -26.0% |
| **3.1.8** | 6,322 ± 149 | 6,650 ± 179 | -20.6% |
| **3.2.6** | 6,532 ± 129 | 6,788 ± 33 | -18.0% |
| **3.3.5** | 6,687 ± 71 | 6,856 ± 73 | -16.0% |
| **3.4.3** | 7,068 ± 140 | 7,358 ± 50 | -11.2% |
| **3.5.4** | 6,354 ± 195 | 6,538 ± 100 | -20.2% |
| **3.6.0** | 6,260 ± 146 | 6,509 ± 83 | -21.4% |

## Algorithm Throughput (KB/s)

> **Why this matters:** Raw encryption speed affects bulk data transfer. AES-256-GCM is the standard for TLS, and SHA256 is ubiquitous for signing.

> **Statistical Note:** Values shown as mean ± standard deviation from 3 iterations.

| Version | AES-256-GCM (8K) | SHA256 (8K) |
|---------|-----------------:|------------:|
| **1.1.1w** | 2,988,428 | 2,110,927 |
| **3.0.18** | 2,838,918 | 2,103,114 |
| **3.1.8** | 3,630,171 | 2,101,903 |
| **3.2.6** | 3,646,764 | 2,100,520 |
| **3.3.5** | 3,648,158 | 2,100,191 |
| **3.4.3** | 3,650,666 | 2,100,323 |
| **3.5.4** | 3,649,709 | 2,099,900 |
| **3.6.0** | 3,673,116 | 2,100,481 |

## Multi-threaded Performance (Scalability)

> **Why this matters:** High-performance servers use multiple cores. HAProxy authors noted that [OpenSSL 3.0 performance was measurably lower in multi-threaded environments](https://www.haproxy.com/blog/state-of-ssl-stacks), often due to lock contention in the new Provider architecture. This test stresses that specific weakness.

| Version | Multi-Core Throughput (8K) | Scaling Factor |
|---------|---------------------------:|---------------:|
| **1.1.1w** | 11,916,442 | 3.99x |
| **3.0.18** | 11,323,267 | 3.99x |
| **3.1.8** | 14,471,823 | 3.99x |
| **3.2.6** | 14,566,063 | 3.99x |
| **3.3.5** | 14,556,212 | 3.99x |
| **3.4.3** | 14,561,750 | 3.99x |
| **3.5.4** | 14,549,122 | 3.99x |
| **3.6.0** | 14,636,534 | 3.98x |

## Post-Quantum Cryptography (PQC)

> **What is this?** ML-KEM-768 is a quantum-resistant Key Encapsulation Mechanism. These algorithms are computationally heavier than classic ECC.

| Version | ML-KEM-768 (Ops/sec) |
|---------|---------------------:|
| **3.5.4** | 36,745 |
| **3.6.0** | 36,768 |

## Bellingrath Alignment: Certificate Type Comparison

> **Context:** William Bellingrath (Juniper Networks) specifically tested both RSA and ECDSA certificates in his [OpenSSL 3.x Performance presentation](https://www.youtube.com/watch?v=b01y5FDx-ao). These tests replicate that methodology.

### TLS 1.3 Performance by Certificate Type

| Version | RSA-2048 (New) | RSA-2048 (Resume) | ECDSA P-256 (New) | ECDSA P-256 (Resume) |
|---------|---------------:|------------------:|------------------:|---------------------:|
| **1.1.1w** | 7,963 | 8,256 | 16,717 | 16,935 |
| **3.0.18** | 5,896 | 6,198 | 10,191 | 10,334 |
| **3.1.8** | 6,322 | 6,650 | 11,539 | 11,643 |
| **3.2.6** | 6,532 | 6,788 | 11,776 | 11,927 |
| **3.3.5** | 6,687 | 6,856 | 11,991 | 11,979 |
| **3.4.3** | 7,068 | 7,358 | 13,738 | 13,832 |
| **3.5.4** | 6,354 | 6,538 | 11,253 | 11,348 |
| **3.6.0** | 6,260 | 6,509 | 10,846 | 10,926 |

### TLS 1.2 Performance by Cipher Suite (Bellingrath's Test Matrix)

| Version | ECDHE-RSA-AES128-GCM | ECDHE-ECDSA-AES128-GCM | AES256-GCM-SHA384 |
|---------|---------------------:|-----------------------:|------------------:|
| **1.1.1w** | 8,282 | 16,889 | 8,268 |
| **3.0.18** | 6,147 | 10,042 | 6,845 |
| **3.1.8** | 6,556 | 11,209 | 7,373 |
| **3.2.6** | 6,659 | 11,358 | 7,499 |
| **3.3.5** | 6,704 | 11,480 | 7,583 |
| **3.4.3** | 7,160 | 13,149 | 8,055 |
| **3.5.4** | 7,198 | 13,064 | 8,050 |
| **3.6.0** | 7,042 | 12,632 | 7,929 |

### Session Resumption Comparison (CPS)

> **Why test resumption?** TLS session resumption reuses cryptographic parameters, making it ~3-10x faster than full handshakes. Bellingrath tested both to measure overhead.

| Version | TLS 1.3 RSA (Resume) | TLS 1.2 RSA (Resume) |
|---------|---------------------:|---------------------:|
| **1.1.1w** | 8,256 | 8,313 |
| **3.0.18** | 6,198 | 41,247 |
| **3.1.8** | 6,650 | 42,130 |
| **3.2.6** | 6,788 | 39,142 |
| **3.3.5** | 6,856 | 38,490 |
| **3.4.3** | 7,358 | 36,728 |
| **3.5.4** | 6,538 | 36,907 |
| **3.6.0** | 6,509 | 35,224 |

**Understanding the Performance Gap:**

TLS 1.2 session resumption consistently achieves significantly higher performance (often 30,000-40,000+ CPS) compared to TLS 1.3 (typically 6,000-7,000 CPS). This occurs because:

1. **TLS 1.2 Resumption Simplicity:** Session tickets completely bypass expensive asymmetric cryptography. The server decrypts the ticket, retrieves the cached master secret, and derives new symmetric keys—no public key operations required.

2. **TLS 1.3 PSK Complexity:** Pre-Shared Key (PSK) resumption in TLS 1.3 is more secure (better forward secrecy) but performs additional operations: HKDF key derivation, optional ephemeral Diffie-Hellman exchanges, and more complex state management.

3. **Code Maturity:** TLS 1.2 has been optimized for over a decade. TLS 1.3 (introduced in OpenSSL 1.1.1) and especially the OpenSSL 3.x Provider architecture are still being tuned.

4. **OpenSSL 3.x Provider Overhead:** The abstraction layers in OpenSSL 3.x add per-operation overhead that accumulates during handshakes with many small cryptographic operations.

**Practical Impact:** While TLS 1.3 provides superior security properties (mandatory perfect forward secrecy, encrypted handshakes), TLS 1.2 session resumption remains faster in pure throughput. For most applications, TLS 1.3's security benefits outweigh this performance difference, but high-throughput environments may need to consider this tradeoff.


## Schmatz Algorithm Benchmarks

> **Context:** Martin Schmatz (IBM) emphasized comprehensive algorithm testing in his [OpenSSL Performance Analysis](https://www.youtube.com/watch?v=69gUVhOEaVM). These tests measure raw cryptographic operation speed independent of TLS overhead.

### RSA Key Size Comparison (ops/sec)

> **Why test key sizes?** RSA-4096 provides more security but is ~4x slower than RSA-2048. Understanding this tradeoff is critical for certificate selection.

| Version | RSA-2048 Sign | RSA-2048 Verify | RSA-3072 Sign | RSA-3072 Verify | RSA-4096 Sign | RSA-4096 Verify |
|---------|-------------:|----------------:|-------------:|----------------:|-------------:|----------------:|
| **1.1.1w** | 1,307 | 52,122 | 442 | 24,125 | 199 | 13,835 |
| **3.0.18** | 1,309 | 52,685 | 443 | 24,243 | 199 | 13,880 |
| **3.1.8** | 1,312 | 52,666 | 443 | 24,286 | 199 | 13,880 |
| **3.2.6** | 0 | 0 | 0 | 0 | 0 | 0 |
| **3.3.5** | 0 | 0 | 0 | 0 | 0 | 0 |
| **3.4.3** | 0 | 0 | 0 | 0 | 0 | 0 |
| **3.5.4** | 0 | 0 | 0 | 0 | 0 | 0 |
| **3.6.0** | 0 | 0 | 0 | 0 | 0 | 0 |

### ECDSA Curve Comparison (ops/sec)

> **Why test curves?** P-256 is fastest and most common. P-384 is required by some compliance regimes. P-521 offers highest security but at significant performance cost.

| Version | P-256 Sign | P-256 Verify | P-384 Sign | P-384 Verify | P-521 Sign | P-521 Verify |
|---------|----------:|-------------:|----------:|-------------:|----------:|-------------:|
| **1.1.1w** | 0 | 53,164 | 0 | 1,317 | 0 | 4,589 |
| **3.0.18** | 0 | 51,592 | 0 | 1,289 | 0 | 4,118 |
| **3.1.8** | 0 | 53,042 | 0 | 1,295 | 0 | 4,125 |
| **3.2.6** | 0 | 53,126 | 0 | 6,815 | 0 | 4,120 |
| **3.3.5** | 0 | 53,083 | 0 | 6,821 | 0 | 4,113 |
| **3.4.3** | 0 | 52,844 | 0 | 6,817 | 0 | 4,116 |
| **3.5.4** | 0 | 52,891 | 0 | 6,814 | 0 | 4,118 |
| **3.6.0** | 0 | 52,770 | 0 | 6,805 | 0 | 4,117 |

### ECDH Key Exchange (ops/sec)

> **Why test ECDH?** Elliptic Curve Diffie-Hellman is used in TLS to establish shared secrets. This is a major component of handshake CPU cost.

| Version | ECDH P-256 | ECDH P-384 | ECDH P-521 |
|---------|----------:|----------:|----------:|
| **1.1.1w** | 20,975 | 1,380 | 4,138 |
| **3.0.18** | 20,977 | 1,363 | 4,143 |
| **3.1.8** | 21,113 | 1,367 | 4,144 |
| **3.2.6** | 21,102 | 4,744 | 4,129 |
| **3.3.5** | 21,088 | 4,745 | 4,128 |
| **3.4.3** | 21,094 | 4,741 | 4,128 |
| **3.5.4** | 21,078 | 4,744 | 4,129 |
| **3.6.0** | 21,074 | 4,744 | 4,136 |

### Block Size Sensitivity (AES-256-GCM KB/s)

> **What This Shows:** This benchmark measures AES-256-GCM encryption throughput across different block sizes (16 bytes to 8KB) to reveal how cryptographic operations scale with data size.

> **Key Insights:**
> - **Small blocks (16-64 bytes)** stress initialization overhead - each encryption requires Provider setup, key scheduling, and context creation
> - **Medium blocks (256 bytes - 1KB)** show the transition point where throughput begins to increase
> - **Large blocks (8KB+)** achieve maximum throughput by amortizing initialization costs across more data
> - **The gap between versions** reveals Provider architecture overhead in OpenSSL 3.x compared to 1.1.1w

> **Real-World Impact:** Applications encrypting small messages (e.g., individual database fields, IoT sensor data) will see much lower throughput than bulk encryption (file encryption, large API payloads).

| Version | 16 Bytes | 64 Bytes | 256 Bytes | 1024 Bytes | 8192 Bytes |
|---------|--------:|---------:|----------:|-----------:|-----------:|
| **1.1.1w** | 651,911 | 1,729,783 | 2,601,013 | 2,895,856 | 2,988,428 |
| **3.0.18** | 57,421 | 219,163 | 724,683 | 1,683,899 | 2,838,918 |
| **3.1.8** | 87,865 | 326,937 | 997,383 | 2,325,127 | 3,630,171 |
| **3.2.6** | 92,940 | 343,851 | 1,034,247 | 2,378,635 | 3,646,764 |
| **3.3.5** | 94,332 | 347,819 | 1,043,712 | 2,384,240 | 3,648,158 |
| **3.4.3** | 94,711 | 349,905 | 1,046,224 | 2,390,473 | 3,650,666 |
| **3.5.4** | 93,680 | 344,273 | 1,037,346 | 2,386,750 | 3,649,709 |
| **3.6.0** | 94,930 | 349,117 | 1,048,854 | 2,395,307 | 3,673,116 |

## Mráz Optimization Impact

> **What is this?** Tomáš Mráz (OpenSSL core developer) presented [performance tuning recommendations](https://www.youtube.com/watch?v=Cv-43gJJFIs) for OpenSSL 3.x. These tests compare default configuration vs. an optimized configuration that:
> - Loads only the default provider (no FIPS, no legacy)
> - Sets explicit `default_properties` to avoid property queries
> - Uses a minimal OpenSSL configuration

### TLS 1.3 Handshake: Default vs Optimized

| Version | Default (CPS) | Optimized (CPS) | Improvement |
|---------|-------------:|----------------:|------------:|
| **3.0.18** | 5,896 | 6,094 | +3.4% |
| **3.1.8** | 6,322 | 6,593 | +4.3% |
| **3.2.6** | 6,532 | 6,608 | +1.2% |
| **3.3.5** | 6,687 | 6,690 | +0.1% |
| **3.4.3** | 7,068 | 7,225 | +2.2% |
| **3.5.4** | 6,354 | 6,473 | +1.9% |
| **3.6.0** | 6,260 | 6,327 | +1.1% |

### TLS 1.2 Handshake: Default vs Optimized

| Version | Default (CPS) | Optimized (CPS) | Improvement |
|---------|-------------:|----------------:|------------:|
| **3.0.18** | 6,147 | 6,153 | +0.1% |
| **3.1.8** | 6,556 | 6,636 | +1.2% |
| **3.2.6** | 6,659 | 6,598 | -0.9% |
| **3.3.5** | 6,704 | 6,714 | +0.1% |
| **3.4.3** | 7,160 | 7,236 | +1.1% |
| **3.5.4** | 7,198 | 7,233 | +0.5% |
| **3.6.0** | 7,042 | 7,064 | +0.3% |
