Checksum Algorithm Benchmarking
We're implementing checksum providers into DDF. Our initial benchmark testing is intended to give a brief overview into the performance of each of the checksums tested.
Assumptions and Decisions
Certain assumptions and decisions were made during this testing:
- The file sizes tested (~5mb,~50mb,~200mb) are sufficient for giving us an idea of performance for each algorithm tested.
- 5 tests for each size is a sufficient sample size for benchmarking.
CRC32 Benchmark Results (in seconds)
Test # | ~5MB | ~50MB | ~200MB |
---|---|---|---|
1 | .005 | .106 | .576 |
2 | .004 | .082 | .216 |
3 | .005 | .054 | .24 |
4 | .005 | .052 | .401 |
5 | .005 | .053 | .239 |
MD5 Benchmark Results (in seconds)
Test # | ~5MB | ~50MB | ~200MB |
---|---|---|---|
1 | .015 | .208 | 1.038 |
2 | 015 | .208 | .731 |
3 | .016 | .214 | .959 |
4 | .016 | .21 | .705 |
5 | .015 | .258 | .963 |
~5MB Comparison (in seconds)
Algorithm | Min | Max | Avg |
---|---|---|---|
CRC32 | .004 | .005 | .0048 |
MD5 | .015 | .016 | .0154 |
~50MB Comparison (in seconds)
Algorithm | Min | Max | Avg |
---|---|---|---|
CRC32 | .052 | .106 | .06974 |
MD5 | .208 | .258 | .2196 |
~200MB Comparison (in seconds)
Algorithm | Min | Max | Avg |
---|---|---|---|
CRC32 | .216 | .576 | .3344 |
MD5 | .705 | 1.038 | .8792 |
~MB/s Comparison
Algorithm | ~5MB | ~50MB | ~200MB |
---|---|---|---|
CRC32 | .00096 | .001388 | .001672 |
MD5 | .00308 | .004392 | .004396 |
~ Average MB/S Comparison
Algorithm | ~Avg MB/s |
---|---|
CRC32 | .00134 |
MD5 | .003956 |
Result Analysis
- There seems to be a skew in the data for > ~50MB and the first run taking significantly longer than other test runs.
- CRC32 has the lowest average approximate MB/s.
- MD5 seems to be outperformed in every metric in each of its size categories.
Decision
Based on customer requirements, added Adler32 for performance (faster than CRC32) and SHA-256 for secure checksums.
https://github.com/codice/ddf/tree/2.27.x/libs/checksum/src/main/java/org/codice/ddf/checksum/impl