SimpleChecksums

Documentation for SimpleChecksums.

Synopsis

Installation

using Pkg
Pkg.add("SimpleChecksums")

Use

using SimpleChecksums

data = UInt8.(0:255);

# super-simple sums:
@assert additive_checksum(UInt16, data) == additive16(data) == 0x7f80
@assert additive_checksum(UInt32, data) == additive32(data) == 0x00007f80
@assert additive_checksum(UInt64, data) == additive64(data) == 0x0000000000007f80

# BSD and SYS-V checksums from GNU sum utility:
@assert bsd_checksum(UInt16, data) == bsd16(data) == 0x0200
@assert sysv_checksum(UInt16, data) == sysv16(data) == 0x7f80

# Fletcher's and Adler's checksums:
@assert fletcher_checksum(UInt16, data) == fletcher16(data) == 0x5500
@assert fletcher_checksum(UInt32, data) == fletcher32(data) == 0xaaaa7f80
@assert fletcher_checksum(UInt64, data) == fletcher64(data) == 0x002aaa8000007f80

@assert fletcher_checksum(UInt16, data, UInt16(0), 0x100) == fletcher16a(data) == 0x8080
@assert fletcher_checksum(UInt32, data, UInt32(0), 0x10000) == fletcher32a(data) == 0xaa807f80
@assert fletcher_checksum(UInt64, data, UInt64(0), 0x100000000) == fletcher64a(data) == 0x002aaa8000007f80

@assert fletcher_checksum(UInt32, data, UInt32(1), UInt32(65521), 5552) == adler32(data) == 0xadf67f81

Why?

Checksums are small summaries of data that can be used to detect errors introduced to the data during transmission or storage. In modern times, checksums have largely been replaced with cryptographic hashes like MD5 and SHA, but these functions and the computational power required to compute them in a reasonable amount of time did not exist when many of the data transmission and storage standards we use today were invented. If we want to use the checksums that appear in these standards in Julia code, it helps to have a standard library to compute them correctly and efficiently.

If you are building a new data transmission or storage standard and need a way to check for errors, consider using a cryptographic hash like SHA, or at least a better error-detecting code like CRC, before choosing any of these functions.

API

SimpleChecksums.additive_checksumMethod
additive_checksum(T, data, init::T = zero(T), modulo::T = typemax(T)) where {T<:Unsigned}

Compute a simple additive hash of the values in data.

Sums all the values in data, then computes modulo modulo, optionally starting from init instead of zero. Sums are allowed to overflow, which resets the sum to zero.

This function is extremely fast but suffers from many collisions. Flaws to keep in mind are:

  1. zero values in data do not affect the checksum value at all;
  2. the order of the values in data can be permuted without changing the checksum value; and
  3. because of the modulo operation, the checksum of all zeros is equal to the checksum of all modulo values.

Arguments

  • T: An unsinged integer type.
  • data: A single Unsigned value or an iterator of values.
  • init::Unsigned = zero(T): Optional starting value.
  • modulo::T = typemax(T): A number to use as the modulo for addition. Typical choices are typemax(T) (the default) or some prime number close to but less than typemax(T).

Predefined convenience functions are additive16, additive32, and additive64.

See also: sysv16 for a different way of folding the sum into a fixed number of bits.

source
SimpleChecksums.adler32Function
adler32(data, init::Unsigned = one(UInt32))

Compute Adler's 32-bit checksum.

Adler's 32-bit checksum is the same as Fletcher's 32-bit checksum with a modulus of 65521 (the largest 16-bit prime) and an inital value of one. This is a strict improvement on fletcher16, but has a higher probability of collision, worse likelihood of error detection, and is slower than fletcher32. Prefer fletcher32 instead.

source
SimpleChecksums.bsd_checksumFunction
bsd_checksum(UInt16, data, init::UInt16 = zero(UInt16))

Compute a hash of data using the method defined in BSD's sum utility (and implemented in the GNU sum program).

The BSD checksum computes a 16-bit checksum value by bit-rotating the checksum value from the previous step to the right by 1 bit, then adding the next value from data and repeating. Sums are allowed to overflow, which resets the sum to zero.

This function is fast but suffers from many collisions. Flaws to keep in mind are:

  1. zero values at the beginning of data do not affect the checksum value at all (unless the initial value is set to something other than zero); and
  2. runs of zeros in data that are multiples of 16 in length do not affect the checksum value at all.

Arguments

  • data: A single Unsigned value or an iterator of values.
  • init::UInt16 = zero(UInt16): Optional starting value.

Predefined convenience function is bsd16.

source
SimpleChecksums.fletcher16Function
fletcher16(data, [init, modulo])

Alias of fletcher_checksum(UInt16, data, init, modulo=0x00ff, blocksize=380368696).

source
SimpleChecksums.fletcher32Function
fletcher32(data, [init, modulo])

Alias of fletcher_checksum(UInt32, data, init, modulo=0x0000ffff, blocksize=23726746).

source
SimpleChecksums.fletcher32aFunction
fletcher32a(data, [init, modulo])

Alias of fletcher_checksum(UInt32, data, init, modulo=0x00010000, blocksize=23726565).

source
SimpleChecksums.fletcher64Function
fletcher64(data, [init, modulo])

Alias of fletcher_checksum(UInt64, data, init, modulo=0x00000000ffffffff, blocksize=92681).

source
SimpleChecksums.fletcher64aFunction
fletcher64a(data, [init, modulo])

Alias of fletcher_checksum(UInt64, data, init, modulo=0x0000000100000000, blocksize=92681).

source
SimpleChecksums.fletcher_checksumMethod
fletcher_checksum(T, data, init::T = zero(T), modulo::T = typemax(T), blocksize::Integer = 1)

Compute a hash of data using Fletcher's checksum.

Fletcher's checksum computes two values: c0, a running sum of values modulo some number, and c1, a running sum of c0 values modulo some number. The two values are contatenated bitwise into a single unsigned integer.

This function is fast and good at detecting small errors. Flaws to keep in mind are:

  1. zero values at the beginning of data do not affect the checksum value at all (unless the initial value is set to something other than zero); and
  2. the worst-case performance for this algorithm occurs when data are truly random, but all errors of up to 2 bits are detected on all runs of data of length modulo * (sizeof(T)*4) bits.

Arguments

  • T: An unsinged integer type.
  • data: A single Unsigned value or an iterator of values.
  • init::Unsigned = zero(T): Optional starting value.
  • modulo::T = typemax(T): Optional modulo value, applied independently to the sum and the sum-of-sums after each block is summed.
  • blocksize::Integer = 1: Optional block size for summing before applying the modulo operation. blocksize should be chosen with modulo, T, and eltype(data) in mind to make sure the sum operation does not overflow.

Predefined convenience functions are fletcher16, fletcher32, and fletcher64 for the standard modulo value of typemax(T) shifted right half the number of bits in the checksum, and fletcher16a, fletcher32a, and fletcher64a for the alternate modulo value of 1 left shifted half the number of bits in the checksum.

See also: adler32.

source
SimpleChecksums.sysv_checksumFunction
sysv_checksum(UInt16, data, init::UInt16 = zero(UInt16))

Compute a hash of data using the method defined in SYS-V's sum utility (and implemented in the GNU sum program).

The SYS-V checksum computes a 16-bit checksum value using a simple 32-bit additive sum, then folding the sum into a 16-bit number by adding the low 16 bits to the high 16 bits (shifted right by 16 bits). This fold is performed twice to guarantee the result is a 16-bit number. Unlike the SYS-V and GNU implementations, the sum in this implementation is allowed to overflow, which resets the sum to zero (SYS-V and GNU will raise an error instead).

This function is very fast but suffers from many collisions. Flaws to keep in mind are:

  1. zero values anywhere in data do not affect the checksum value at all (unless the initial value is set to something other than zero); and
  2. the order of the data can be permuted without affecting the checksum value.

Arguments

  • data: A single Unsigned value or an iterator of values.
  • init::UInt16 = zero(UInt16): Optional starting value.

Predefined convenience function is sysv16.

See also: additive16.

source