Titbits

Validating integrity of files with md5 checksum

Introduction

An MD5 checksum is a 32-character hexadecimal number that verifies 128-bit MD5 hashes on a file. If two files have the same MD5 checksum value, then there is a high probability that the two files are the same.

However, it is very unlikely that any two non-identical files in the real world will have the same MD5 hash, unless they have been specifically created to have the same hash.

After downloading, transferring or comparing, we can compute the MD5 checksum on the file. By doing this, we can verify the integrity of the files. Virtually any change to a file will cause its MD5 hash to changes virtually any change to a file will cause its MD5 hash to change.

There are a variety of MD5 checksum tools or programs available to calculate the checksum. Below are few ways to do so.

Using md5sum

md5sum is program that is included in most Unix-like operating systems or compatibility layers such as Cygwin.

Compute md5sum for the input file
$ md5sum samplefile.txt
3b85ec9ab2984b91070128be6aae25eb  samplefile.txt
Check MD5 hash for multiple files
$ md5sum filetohashA.txt filetohashB.txt filetohashC.txt

595f44fec1e92a71d3e9e77456ba80d1  filetohashA.txt
71f920fa275127a7b60fa4d4d41432a3  filetohashB.txt
43c191bf6d6c3f263a8cd0efd4a058ab  filetohashC.txt
Check MD5
$ echo 'D43F2404CA13E22594E5C8B04D3BBB81  filetohashA.txt' | md5sum -c
filetohashA.txt: OK
Create MD5 hash file hash.md5
$ md5sum filetohashA.txt filetohashB.txt filetohashC.txt > hash.md5

$ cat hash.md5
595f44fec1e92a71d3e9e77456ba80d1  filetohashA.txt
71f920fa275127a7b60fa4d4d41432a3  filetohashB.txt
43c191bf6d6c3f263a8cd0efd4a058ab  filetohashC.txt

$ md5sum -c hash.md5
filetohashA.txt: OK
filetohashB.txt: OK
filetohashC.txt: OK
Checksum Fail
$ echo 'D43F2404CA13E22594E5C8B04D3BBB81  README.md' | md5sum -c
README.md: FAILED
md5sum: WARNING: 1 computed checksum did NOT match

Syntax and Options

md5sum [OPTION]… [FILE]…

md5sum --help
Usage: md5sum [OPTION]... [FILE]...
Print or check MD5 (128-bit) checksums.

With no FILE, or when FILE is -, read standard input.

  -b, --binary         read in binary mode
  -c, --check          read MD5 sums from the FILEs and check them
      --tag            create a BSD-style checksum
  -t, --text           read in text mode (default)
  -z, --zero           end each output line with NUL, not newline,
                       and disable file name escaping

The following five options are useful only when verifying checksums:
      --ignore-missing  don't fail or report status for missing files
      --quiet          don't print OK for each successfully verified file
      --status         don't output anything, status code shows success
      --strict         exit non-zero for improperly formatted checksum lines
  -w, --warn           warn about improperly formatted checksum lines

      --help display this help and exit
      --version output version information and exit

The sums are computed as described in RFC 1321.  When checking, the input
should be a former output of this program.  The default mode is to print a
line with checksum, a space, a character indicating input mode ('*' for binary,
' ' for text or where binary is insignificant), and name for each FILE.

GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Full documentation at: <https://www.gnu.org/software/coreutils/md5sum>
or available locally via: info '(coreutils) md5sum invocation'

Using Certutil

Certutil.exe is a windows command-line program that is installed as part of Certificate Services. You can use Certutil.exe to dump and display certification authority (CA) configuration information, configure Certificate Services, backup and restore CA components, and verify certificates, key pairs, and certificate chains.

Certutil has many functions, mostly related to viewing and managing certificates, but the –hashfile subcommand can be used on any file to get a hash in MD5, SHA256, or several other formats.

Using -hashfile option, we can generate and display a cryptographic hash for a specific file.

C:\> certutil -hashfile -?

Hash algorithms: MD2 MD4 MD5 SHA1 SHA256 SHA384 SHA512
Compute MD5 hash Using hashfile option
C:\> certutil -hashfile Readme.md

MD5 hash of Readme.md
595f44fec1e92a71d3e9e77456ba80d1
CertUtil: -hshfile command completed successfully.

Using Java

Using standard java classes
public String checksum(File file) {
  try {
    InputStream fin = new FileInputStream(file);
    java.security.MessageDigest md5er =
        MessageDigest.getInstance("MD5");
    byte[] buffer = new byte[1024];
    int read;
    do {
      read = fin.read(buffer);
      if (read > 0)
        md5er.update(buffer, 0, read);
    } while (read != -1);
    fin.close();
    byte[] digest = md5er.digest();
    if (digest == null)
      return null;
    String strDigest = "0x";
    for (int i = 0; i < digest.length; i++) {
      strDigest += Integer.toString((digest[i] & 0xff) 
                + 0x100, 16).substring(1).toUpperCase();
    }
    return strDigest;
  } catch (Exception e) {
    return null;
  }
}
Using Apache Commons dependency
org.apache.commons.codec.digest.DigestUtils
    .md5Hex(org.apache.commons.io.FileUtils.readFileToByteArray(file))
UsingSpring and Apache Commons dependency
org.springframework.util.DigestUtils
    .md5DigestAsHex(org.apache.commons.io.FileUtils.readFileToByteArray(file))
Using Guava dependency
maven:
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>29.0-jre</version>
</dependency>

gradle:
compile group: 'com.google.guava', name: 'guava', version: '29.0-jre'

HashCode md5 = Files.hash(file, Hashing.md5());
byte[] md5Bytes = md5.asBytes();
String md5Hex = md5.toString();
Using Fast MD5 dependency
maven:
<dependency>
    <groupId>com.joyent.util</groupId>
    <artifactId>fast-md5</artifactId>
    <version>2.7.1</version>
</dependency>

gradle:
compile group: 'com.joyent.util', name: 'fast-md5', version: '2.7.1'

String hash = MD5.asHex(MD5.getHash(new File(filename)));

Using Python

import hashlib

with open("your_filename.png", "rb") as f:
    file_hash = hashlib.md5()
    while chunk := f.read(8192):
        file_hash.update(chunk)

print(file_hash.digest())
print(file_hash.hexdigest())  # to get a printable str instead of bytes

Using Go

package main

import (
	"crypto/md5"
	"encoding/hex"
	"fmt"
	"io"
	"os"
)

func hash_file_md5(filePath string) (string, error) {
	var returnMD5String string
	file, err := os.Open(filePath)
	if err != nil {
		return returnMD5String, err
	}
	defer file.Close()
	hash := md5.New()
	if _, err := io.Copy(hash, file); err != nil {
		return returnMD5String, err
	}
	hashInBytes := hash.Sum(nil)[:16]
	returnMD5String = hex.EncodeToString(hashInBytes)
	return returnMD5String, nil

}

func main() {
	hash, err := hash_file_md5(os.Args[0])
	if err == nil {
		fmt.Println(hash)
	}
}

Madan Narra13 Posts

Software developer, Consultant & Architect

Madan is a software developer, writer, and ex-failed-startup co-founder. He has over 10+ years of experience building scalable and distributed systems using Java, JavaScript, Node.js. He writes about software design and architecture best practices with Java and is especially passionate about Microservices, API Development, Distributed Applications and Frontend Technologies.

  • Github
  • Linkedin
  • Facebook
  • Twitter
  • Instagram

Contents

Get The Best Of All Hands Delivered To Your Inbox

Subscribe to our newsletter and stay updated.