Validating integrity of files with md5 checksum
Last modified: 27 Jun, 2020Introduction
An MD5
checksum is a 32-character hexadecimal number that verifies 128-bit MD5 hashes on a file. If two files have the same MD5 checksum value, then there is a high probability that the two files are the same.
However, it is very unlikely that any two non-identical files in the real world will have the same MD5 hash, unless they have been specifically created to have the same hash.
After downloading, transferring or comparing, we can compute the MD5 checksum on the file. By doing this, we can verify the integrity of the files. Virtually any change to a file will cause its MD5 hash to changes virtually any change to a file will cause its MD5 hash to change.
There are a variety of MD5 checksum tools or programs available to calculate the checksum. Below are few ways to do so.
Using md5sum
md5sum
is program that is included in most Unix
-like operating systems
or compatibility layers such as Cygwin
.
$ md5sum samplefile.txt
3b85ec9ab2984b91070128be6aae25eb samplefile.txt
$ md5sum filetohashA.txt filetohashB.txt filetohashC.txt
595f44fec1e92a71d3e9e77456ba80d1 filetohashA.txt
71f920fa275127a7b60fa4d4d41432a3 filetohashB.txt
43c191bf6d6c3f263a8cd0efd4a058ab filetohashC.txt
$ echo 'D43F2404CA13E22594E5C8B04D3BBB81 filetohashA.txt' | md5sum -c
filetohashA.txt: OK
$ md5sum filetohashA.txt filetohashB.txt filetohashC.txt > hash.md5
$ cat hash.md5
595f44fec1e92a71d3e9e77456ba80d1 filetohashA.txt
71f920fa275127a7b60fa4d4d41432a3 filetohashB.txt
43c191bf6d6c3f263a8cd0efd4a058ab filetohashC.txt
$ md5sum -c hash.md5
filetohashA.txt: OK
filetohashB.txt: OK
filetohashC.txt: OK
$ echo 'D43F2404CA13E22594E5C8B04D3BBB81 README.md' | md5sum -c
README.md: FAILED
md5sum: WARNING: 1 computed checksum did NOT match
Syntax and Options
md5sum [OPTION]… [FILE]…
Usage: md5sum [OPTION]... [FILE]...
Print or check MD5 (128-bit) checksums.
With no FILE, or when FILE is -, read standard input.
-b, --binary read in binary mode
-c, --check read MD5 sums from the FILEs and check them
--tag create a BSD-style checksum
-t, --text read in text mode (default)
-z, --zero end each output line with NUL, not newline,
and disable file name escaping
The following five options are useful only when verifying checksums:
--ignore-missing don't fail or report status for missing files
--quiet don't print OK for each successfully verified file
--status don't output anything, status code shows success
--strict exit non-zero for improperly formatted checksum lines
-w, --warn warn about improperly formatted checksum lines
--help display this help and exit
--version output version information and exit
The sums are computed as described in RFC 1321. When checking, the input
should be a former output of this program. The default mode is to print a
line with checksum, a space, a character indicating input mode ('*' for binary,
' ' for text or where binary is insignificant), and name for each FILE.
GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Full documentation at: <https://www.gnu.org/software/coreutils/md5sum>
or available locally via: info '(coreutils) md5sum invocation'
Using Certutil
Certutil.exe
is a windows
command-line program that is installed as part of Certificate Services. You can use Certutil.exe to dump and display certification authority (CA) configuration information, configure Certificate Services, backup and restore CA components, and verify certificates, key pairs, and certificate chains.
Certutil
has many functions, mostly related to viewing and managing certificates, but the –hashfile subcommand can be used on any file to get a hash in MD5, SHA256, or several other formats.
Using -hashfile
option, we can generate and display a cryptographic hash for a specific file.
C:\> certutil -hashfile -?
Hash algorithms: MD2 MD4 MD5 SHA1 SHA256 SHA384 SHA512
C:\> certutil -hashfile Readme.md
MD5 hash of Readme.md
595f44fec1e92a71d3e9e77456ba80d1
CertUtil: -hshfile command completed successfully.
Using Java
public String checksum(File file) {
try {
InputStream fin = new FileInputStream(file);
java.security.MessageDigest md5er =
MessageDigest.getInstance("MD5");
byte[] buffer = new byte[1024];
int read;
do {
read = fin.read(buffer);
if (read > 0)
md5er.update(buffer, 0, read);
} while (read != -1);
fin.close();
byte[] digest = md5er.digest();
if (digest == null)
return null;
String strDigest = "0x";
for (int i = 0; i < digest.length; i++) {
strDigest += Integer.toString((digest[i] & 0xff)
+ 0x100, 16).substring(1).toUpperCase();
}
return strDigest;
} catch (Exception e) {
return null;
}
}
org.apache.commons.codec.digest.DigestUtils
.md5Hex(org.apache.commons.io.FileUtils.readFileToByteArray(file))
org.springframework.util.DigestUtils
.md5DigestAsHex(org.apache.commons.io.FileUtils.readFileToByteArray(file))
maven:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>29.0-jre</version>
</dependency>
gradle:
compile group: 'com.google.guava', name: 'guava', version: '29.0-jre'
HashCode md5 = Files.hash(file, Hashing.md5());
byte[] md5Bytes = md5.asBytes();
String md5Hex = md5.toString();
maven:
<dependency>
<groupId>com.joyent.util</groupId>
<artifactId>fast-md5</artifactId>
<version>2.7.1</version>
</dependency>
gradle:
compile group: 'com.joyent.util', name: 'fast-md5', version: '2.7.1'
String hash = MD5.asHex(MD5.getHash(new File(filename)));
Using Python
import hashlib
with open("your_filename.png", "rb") as f:
file_hash = hashlib.md5()
while chunk := f.read(8192):
file_hash.update(chunk)
print(file_hash.digest())
print(file_hash.hexdigest()) # to get a printable str instead of bytes
Using Go
package main
import (
"crypto/md5"
"encoding/hex"
"fmt"
"io"
"os"
)
func hash_file_md5(filePath string) (string, error) {
var returnMD5String string
file, err := os.Open(filePath)
if err != nil {
return returnMD5String, err
}
defer file.Close()
hash := md5.New()
if _, err := io.Copy(hash, file); err != nil {
return returnMD5String, err
}
hashInBytes := hash.Sum(nil)[:16]
returnMD5String = hex.EncodeToString(hashInBytes)
return returnMD5String, nil
}
func main() {
hash, err := hash_file_md5(os.Args[0])
if err == nil {
fmt.Println(hash)
}
}