Repair Corrupted PDF

One of our clients has two of his PDF files corrupted. Trying to open the file from Acrobat Reader, and all you get is an error message prompting you "There was an error opening this document. The file is damaged and could not be repaired." Since those PDF files were previously saved into the document management system that I designed, somehow it became my responsibility to return sanity to his files.

Anyway, the customers are always right.

So I began my request repairing these corrupted PDF files. Being an Open Source die-hard, obviously I would give FOSS solutions a try first. Both Google and Yahoo search revealed nothing. All the existing command-line ghostscript or pdf2blah bark at those files, telling me that cross-reference table cannot be rebuild, and nothing can be extracted out.

pdftk

Then through a forum I was referred to pdftk -- the PDF toolkit. The main functionality is for merging and splitting multiple PDF files, but it also claims to "repair corrupted PDF (where possible)".

It turns out, where possible means, most of the time it is not possible. It failed to open the file with very minimal error message, let along repairing it. Since pdftk is GPL'ed, I thought I might be able to put more debugging information to figure out what exactly were wrong with these corrupted files. Then it turns out it is actually developed in Java and needs gcj to compile. Too much work to have the build environment set up -- so I gave up.

Solution

Update: I realised that publishing something like this is probably not good to the commercial companies behind some of these PDF repair software. I still have the content encrypted to me for my own reference.

-----BEGIN PGP MESSAGE-----
Version: GnuPG v1.4.1 (GNU/Linux)

hQIOAyUSukYgoM25EAf/Sf5ZcVs2OARrExFVYqGyE/5jA/tEbHyfeZ0ilKLeknWf
kG47gya6AinSlsqLcYW12vBwFVebG054n8n+ytIAMCTMmYBNDPqR6IimkaKRbohy
rbrl8JdKs28x6kDFiiN3iVSM7EEKZ7zcGfgwApITAQKnPAd9+ZQgNa+u0N+PH4SZ
DXLwlwmLU1zT261v9CeLI5sdOKQP7w47TKVC6XoBP8/enjcNj6vy5PWo3z+Lst+S
ywiCRpFE6LxrHZk10BdP8iZIspkD4Fh3MFfRLMYXlnYNFqlAqoVS2sp1B/YnHi2i
8J7HhM4wxsvcIPVsmDX9o0xyDQWx5PoY7hYPB1YHlQf+IjBJL1TulLRgyj+JJrvc
Bn/DpbKknEDFIoShEQc5Ic8Qyk4Yw3pLdYHhRwwk+CFh8aWCjDb/10AOF51XnymL
TPgtNrsX8LV+kHsA5Wvp5BE7f+GXsRXyyIRe9Ibl6RUKmLo/niVvdeeZGeJE49ne
0RZdUmkCzyQiMvXGzdTEuoWTgDC4QJLJWWckFPvBfDE/WLUW4oe3b3zFqbOSkA0E
YcWiIvm6r05H4X8gPfDUh6JURjOUFQUJChYZETpvLftq3ssipCYEht0sHwlJNIrB
vO6NQhQhFibqPhvREfUtvZ8+Y9EsFmhAaQJJPry/tZjTAIvPiTpyAFRzHKE2Y0BU
tNLqAcjYZiHMNw+c4rciwfP+GBgBanKc6KiqWjhlmg6mTAGcVppBrhtPMqbUGi4p
c3jH8chZB82eqH77+gVgsYM/fAlP8R7xXHM+Gyt7PxHgrXwFUwk6MS0AnXIgSm+v
9xR3/M1B2mFYcFK3MbCLr8pLNaU5dlRmnWEBD50+hcMTy0CU1yYYoo/Y3iuXitHV
x5upbPemaqTMPg+7Fc/nhKLSa7FeKnAqTclCl2kSSy4/QFWHXJdlZTPbM//1Rs6u
Y+bAEVl7TfUsesaxwwY8Do5ER3L9TfV08RUGHFUVAh2WrR/wLZpvtsLGWJLKFSYz
eEvKWH2M9ptxYq2eIevZt/zlVqp6SaJIHTHfxgbI9E1KD4FVG/J0FKZFshS2glsI
2fugxax1joOdN1ixAooZ+M/A9OFrAhR6hACxjvdtZ1dmMDCE0d84vWK8SNi7bqYt
NzA2qEsl5/rTfCTWK6yZkhBPi+yFcufuiyGQTS32NCieaKk3w4D2WEUPinH4+Otj
c264qVFt88BQxOX/YZQViCXoCsXgxj17TSswt9HHsHbW0XrS4mvOP7rCig1OypS9
3zmLZBxFG4tl9YFVx3auxTINmCRqyOK51j4r4f+He1dLOcnWV2OX+9+Hax6XJvay
w79siUoXtw6niCVElhX7dkZ2FW6wCtKtP2Z/aeLjsWw+wjl+3tWWcX8s0M6Qo4Px
KXglBLQZc+vdzuufZ1e2aJrtU+rTIx5KqlCeB3GQCznevGLRc5tf85whPwpzf1eD
Hi7K55AkFV+MoKwBcky4cPEJlwOOWVwOLUF3STy9cYDdatraO86pCqbMpYm+xldX
lrSq44+0yoIlrfVHF+wpiMwZfxisUt8C5joUWdocJ4r3H8y4j1uklgwlqHjEE2n9
+i7SZc2p/50IJFn0lJm5pciOGBevGMkhGKfIKtvGzx2HsLI1qLHiwb1qC/vv97dZ
D3YTqU0yPO7GaYq5YdKiecZiXNruUbbpB/0XxpQ27e0JMeWLKWj5GF+1KNVUO7jo
NYIaLiKBltpcQydZBPv2oeVZ+fa7mtpliMshwFURuNcWtQOabRwe34YMdgHWI3kQ
tOH23KYLN+MdeAyeWbs0QJvdGzZYWyC/r0HQBLk2lS3neqvPOSrXmzU9PzwLGotz
Ed1BXN94daXRp3mqbLhYWgulsLyjraJrcwj9tp08S+du4hU+z4sFLxkwjfcLrayB
b5oPz0PPVs5Asl5K2WFzWKg6Ef+/NxgaDK8MBBC8JFuZgIIMZ+angqBgP5MFn8rJ
17QORMJxplKEVYhVAMh++Js0ytvZn42ZLIQ9crU73DjZsex11D3Jr09p+10FS/uB
VW2bCJSq+/t5tVkahmkl4px2GMBIFiKGepeui6oaTjOH9dT6o+huLqJYHRvrkxgJ
Tup29T7bCESqV2TggHRZzFKTcf7Vr1r9UZsW7IBBNbJcxJvHCHkrBU41iOPhR6aq
1hldc+JHzy+7RALHxQfVbPDvr6nebRVuURQ11lgkWIDB1hVz9K0GhOw71bYllG/o
I/DfjW18X0awlcO5azQ9jVVKzGbESsS5u6gyTd7HlzTopswAyZC7TmjjOqfhgZnS
hlWCwoHIC7WBUpDS6mSn4Tuskc4e4pGvFN2WpL9xtNy+pKSqqRKk8y24kpwAmIn4
3pHTd35oW98SLJbS8eEodht1J7G1wQvnKHpAiG8eYBg85vXK17nASbWqpveg
=T8oG
-----END PGP MESSAGE-----

Update again: The said vendor has already fixed up their program to prevent copying out repaird PDF from evaluation version of the software. Good on them, and sorry I do not have a copy of older version of that program (thus no need to ask me about it).

Verdict

So in the end I managed to repair those two corrupted PDF files and kept the client happy, using just evaluation version of a commercial software. And about making a crippled evaluation version -- there is little point crippling the application as there might be trivial ways getting around it. Cripple your output to annoy potential customers, and they might buy your software if they are desperate enough.