Checksum Algorithm Benchmarking

We're implementing checksum providers into DDF. Our initial benchmark testing is intended to give a brief overview into the performance of each of the checksums tested. 


Assumptions and Decisions

Certain assumptions and decisions were made during this testing:


  1. The file sizes tested (~5mb,~50mb,~200mb) are sufficient for giving us an idea of performance for each algorithm tested.
  2. 5 tests for each size is a sufficient sample size for benchmarking.

CRC32 Benchmark Results (in seconds)

Test #~5MB~50MB~200MB

1

.005.106.576
2.004.082.216
3.005.054.24
4.005.052.401
5.005.053.239

MD5 Benchmark Results (in seconds)

Test #~5MB~50MB~200MB

1

.015.2081.038
2015.208.731
3.016.214.959
4.016.21.705
5.015.258.963

~5MB Comparison (in seconds)

AlgorithmMinMaxAvg

CRC32

.004.005.0048
MD5.015.016.0154

~50MB Comparison (in seconds)

AlgorithmMinMaxAvg

CRC32

.052.106.06974
MD5.208.258.2196

~200MB Comparison (in seconds)

AlgorithmMinMaxAvg

CRC32

.216.576.3344
MD5.7051.038.8792

~MB/s Comparison

Algorithm~5MB~50MB~200MB
CRC32.00096.001388.001672
MD5.00308.004392.004396

~ Average MB/S Comparison

Algorithm~Avg MB/s
CRC32.00134
MD5.003956


Result Analysis

  • There seems to be a skew in the data for > ~50MB and the first run taking significantly longer than other test runs.
  • CRC32 has the lowest average approximate MB/s.
  • MD5 seems to be outperformed in every metric in each of its size categories.


 Decision

Based on customer requirements, added Adler32 for performance (faster than CRC32) and SHA-256 for secure checksums.

https://github.com/codice/ddf/tree/2.27.x/libs/checksum/src/main/java/org/codice/ddf/checksum/impl