性能接近理论上限的的大文件 hashing
从 compio 到无锁 SPSC queue,逐步逼近 openssl speed 的大文件 hashing 实现 笔者下载到 SDEZ 1.65 后,发现 7-zip SHA256 计算性能并不理想,于是本着写玩具的念头搓了一个可能是目前最快的 hash 工具。 第一版基于 compio ,但是单纯使用 compio + 同线程 hashing 对于单个大文件顺序读的效果并不理想。 第二版改为完全单线程,read 与 hash 负载集中,达到了 50% 理论吞吐极限。 此时 strace 抓了一下 Linux 第三版改为 IO Thread 分离, 好问题。 虽然没有人提供。 绞尽脑汁的 userspace I/O 优化,第一步应该先看看有没有合适的flag。 内核可以在更底层根据你将要进行的读取策略,进行第一步优化。 Repo:fasthash前言
sha256sum syscall ,还引入了 POSIX_FADV_SEQUENTIAL,对应 Windows FILE_FLAG_SEQUENTIAL_SCAN 。crossfire 无锁 SPSC channel 与无锁 queue buffer pool,终于达到了 openssl speed 16k block microbench 性能的 97%。实现
use std::fs::File;
use std::io::{BorrowedBuf, Read as _};
use std::sync::Arc;
use crossfire::flavor::Queue;
use digest::Digest;
fn hash_file<D: Digest>(
mut file: File,
buffer_size: usize,
queue_len: usize,
) -> std::io::Result<Box<[u8]>> {
let mut ctx = D::new();
let queue = Arc::new(crossfire::spsc::Array::new(queue_len));
let (readed_tx, readed_rx) = crossfire::spsc::bounded_blocking(queue_len);
for _ in 0..queue_len {
_ = queue.push(Vec::with_capacity(buffer_size));
}
let queue_ = queue.clone();
let io_result = std::thread::spawn(move || {
loop {
let Some(mut buffer) = queue_.pop() else {
std::thread::yield_now();
continue;
};
let mut cursor = BorrowedBuf::from(buffer.spare_capacity_mut());
file.read_buf(cursor.unfilled())?;
if cursor.len() == 0 {
break std::io::Result::Ok(());
} else {
let new_len = cursor.len();
unsafe { buffer.set_len(new_len) };
_ = readed_tx.send(buffer);
}
}
});
while let Ok(mut buffer) = readed_rx.recv() {
ctx.update(&buffer);
buffer.clear();
_ = queue.push(buffer);
}
io_result.join().expect("file io thread panicked")?;
Ok(ctx.finalize().to_vec().into_boxed_slice())
}实际测试数据
Command Time Notes fasthash sha256 -b 1MiB file36s queue=8 fasthash sha256 -b 256K file52s queue=8 sha256sum file67s uutils 0.9.0 hashlib* 68s Python 3.13.5 openssl sha256 file78s OpenSSL 3.6.0 MSVC – 85s Nanazip 6.0.1742 Get-FileHash file96.9s PowerShell 5.1 certutil -hashfile file sha256107s sha256sum file123s Microsoft coreutils open -r file | hash sha256sum130s NuShell 0.112.2 openssl speed -evp sha256
Doing sha256 ops for 3s on 16 size blocks: 15350007 sha256 ops in 2.91s
Doing sha256 ops for 3s on 64 size blocks: 13970470 sha256 ops in 2.97s
Doing sha256 ops for 3s on 256 size blocks: 10049997 sha256 ops in 2.88s
Doing sha256 ops for 3s on 1024 size blocks: 4545898 sha256 ops in 3.00s
Doing sha256 ops for 3s on 8192 size blocks: 743081 sha256 ops in 2.94s
Doing sha256 ops for 3s on 16384 size blocks: 386881 sha256 ops in 2.95s
version: 3.6.0
built on: Wed Oct 8 20:29:58 2025 UTC
options: bn(64,64)
compiler: cl /Z7 /Fdossl_static.pdb /Gs0 /GF /Gy /MD /W3 /wd4090 /nologo /O2 -DL_ENDIAN -DOPENSSL_PIC -D"OPENSSL_BUILDING_OPENSSL" -D"OPENSSL_SYS_WIN32" -D"WIN32_LEAN_AND_MEAN" -D"UNICODE" -D"_UNICODE" -D"_CRT_SECURE_NO_DEPRECATE" -D"_WINSOCK_DEPRECATED_NO_WARNINGS" -D"NDEBUG" -D_WINSOCK_DEPRECATED_NO_WARNINGS -D_WIN32_WINNT=0x0502
CPUINFO: OPENSSL_ia32cap=0xfffaf38bffcbffff:0x184007a4239c27a9:0x00400810bc18c410:0x0000000000000000:0x0000000000000000
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
sha256 84507.57k 301173.92k 894886.69k 1551666.52k 2072279.00k 2146423.98k为什么不换算法?
gxhash/rapidhash/xxh3 这些力大砖飞或 SIMD 优化的新型 non-crypto 算法对于大文件校验是很合适的。后记