Sanity check for BZ2 files before decompression

To check if a file integrity is intact without expanding it, you can try the obvious send to /dev/null or use a built in equivilent

bzip2 -dc file.bz2 > /dev/null
bzip2 -tv file.bz2

Now, the problem can be that the split files were not complete ! so when i executed

cat file.bz2.001 file.bz2.002 file.bz2.003 file.bz2.004 > file.bz2

I thought that 004 was the last part, but it was not, how can i know without wasting CPU cycles ! let me show you

NOTE: If you have not yet concatenated the files, you can do the header checks on the first file, and the footer checks on the last 😉 in my case, I already have them concatenated (Not a problem, i can add more files to it later) so i am doing both checks on the same file

Check the header, Every bzip2 file starts with: 42 5a 68 xx

head -c4 file.bz2 | hexdump -C

Should result in

00000000  42 5a 68 39                                       |BZh9|

And it does

But when i check the footer with the command

tail -c64 file.tar.bz2 | hexdump -C

The first line of that output should be

00000000  17 72 45 38 50 90 00 00  00 00 00 00 00 00 00 00  |.rE8P...........|

But it is not, in my case it was

00000000  e4 fc 2e 36 2f e7 d5 bf  74 d5 3b 0f ef 4a 61 15  |...6/...t.;..Ja.|

Which looks like random compressed data

In conclusion, I should find all the other parts to be concatinated to this before wasting time decompressing it

Leave a Reply

Your email address will not be published. Required fields are marked *