One of our clients has two of his PDF files corrupted. Trying to open the file from Acrobat Reader, and all you get is an error message prompting you “There was an error opening this document. The file is damaged and could not be repaired.” Since those PDF files were previously saved into the document management system that I designed, somehow it became my responsibility to return sanity to his files.
Anyway, the customers are always right.
So I began my request repairing these corrupted PDF files. Being an Open Source die-hard, obviously I would give FOSS solutions a try first. Both Google and Yahoo search revealed nothing. All the existing command-line ghostscript or pdf2blah bark at those files, telling me that cross-reference table cannot be rebuild, and nothing can be extracted out.
Then through a forum I was referred to pdftk — the PDF toolkit. The main functionality is for merging and splitting multiple PDF files, but it also claims to “repair corrupted PDF (where possible)”.
It turns out, where possible means, most of the time it is not possible. It failed to open the file with very minimal error message, let along repairing it. Since pdftk is GPL’ed, I thought I might be able to put more debugging information to figure out what exactly were wrong with these corrupted files. Then it turns out it is actually developed in Java and needs gcj to compile. Too much work to have the build environment set up — so I gave up.
Update: I realised that publishing something like this is probably not good to the commercial companies behind some of these PDF repair software. I still have the content encrypted to me for my own reference.
-----BEGIN PGP MESSAGE----- Version: GnuPG v1.4.1 (GNU/Linux) hQIOAyUSukYgoM25EAf/Sf5ZcVs2OARrExFVYqGyE/5jA/tEbHyfeZ0ilKLeknWf kG47gya6AinSlsqLcYW12vBwFVebG054n8n+ytIAMCTMmYBNDPqR6IimkaKRbohy rbrl8JdKs28x6kDFiiN3iVSM7EEKZ7zcGfgwApITAQKnPAd9+ZQgNa+u0N+PH4SZ DXLwlwmLU1zT261v9CeLI5sdOKQP7w47TKVC6XoBP8/enjcNj6vy5PWo3z+Lst+S ywiCRpFE6LxrHZk10BdP8iZIspkD4Fh3MFfRLMYXlnYNFqlAqoVS2sp1B/YnHi2i 8J7HhM4wxsvcIPVsmDX9o0xyDQWx5PoY7hYPB1YHlQf+IjBJL1TulLRgyj+JJrvc Bn/DpbKknEDFIoShEQc5Ic8Qyk4Yw3pLdYHhRwwk+CFh8aWCjDb/10AOF51XnymL TPgtNrsX8LV+kHsA5Wvp5BE7f+GXsRXyyIRe9Ibl6RUKmLo/niVvdeeZGeJE49ne 0RZdUmkCzyQiMvXGzdTEuoWTgDC4QJLJWWckFPvBfDE/WLUW4oe3b3zFqbOSkA0E YcWiIvm6r05H4X8gPfDUh6JURjOUFQUJChYZETpvLftq3ssipCYEht0sHwlJNIrB vO6NQhQhFibqPhvREfUtvZ8+Y9EsFmhAaQJJPry/tZjTAIvPiTpyAFRzHKE2Y0BU tNLqAcjYZiHMNw+c4rciwfP+GBgBanKc6KiqWjhlmg6mTAGcVppBrhtPMqbUGi4p c3jH8chZB82eqH77+gVgsYM/fAlP8R7xXHM+Gyt7PxHgrXwFUwk6MS0AnXIgSm+v 9xR3/M1B2mFYcFK3MbCLr8pLNaU5dlRmnWEBD50+hcMTy0CU1yYYoo/Y3iuXitHV x5upbPemaqTMPg+7Fc/nhKLSa7FeKnAqTclCl2kSSy4/QFWHXJdlZTPbM//1Rs6u Y+bAEVl7TfUsesaxwwY8Do5ER3L9TfV08RUGHFUVAh2WrR/wLZpvtsLGWJLKFSYz eEvKWH2M9ptxYq2eIevZt/zlVqp6SaJIHTHfxgbI9E1KD4FVG/J0FKZFshS2glsI 2fugxax1joOdN1ixAooZ+M/A9OFrAhR6hACxjvdtZ1dmMDCE0d84vWK8SNi7bqYt NzA2qEsl5/rTfCTWK6yZkhBPi+yFcufuiyGQTS32NCieaKk3w4D2WEUPinH4+Otj c264qVFt88BQxOX/YZQViCXoCsXgxj17TSswt9HHsHbW0XrS4mvOP7rCig1OypS9 3zmLZBxFG4tl9YFVx3auxTINmCRqyOK51j4r4f+He1dLOcnWV2OX+9+Hax6XJvay w79siUoXtw6niCVElhX7dkZ2FW6wCtKtP2Z/aeLjsWw+wjl+3tWWcX8s0M6Qo4Px KXglBLQZc+vdzuufZ1e2aJrtU+rTIx5KqlCeB3GQCznevGLRc5tf85whPwpzf1eD Hi7K55AkFV+MoKwBcky4cPEJlwOOWVwOLUF3STy9cYDdatraO86pCqbMpYm+xldX lrSq44+0yoIlrfVHF+wpiMwZfxisUt8C5joUWdocJ4r3H8y4j1uklgwlqHjEE2n9 +i7SZc2p/50IJFn0lJm5pciOGBevGMkhGKfIKtvGzx2HsLI1qLHiwb1qC/vv97dZ D3YTqU0yPO7GaYq5YdKiecZiXNruUbbpB/0XxpQ27e0JMeWLKWj5GF+1KNVUO7jo NYIaLiKBltpcQydZBPv2oeVZ+fa7mtpliMshwFURuNcWtQOabRwe34YMdgHWI3kQ tOH23KYLN+MdeAyeWbs0QJvdGzZYWyC/r0HQBLk2lS3neqvPOSrXmzU9PzwLGotz Ed1BXN94daXRp3mqbLhYWgulsLyjraJrcwj9tp08S+du4hU+z4sFLxkwjfcLrayB b5oPz0PPVs5Asl5K2WFzWKg6Ef+/NxgaDK8MBBC8JFuZgIIMZ+angqBgP5MFn8rJ 17QORMJxplKEVYhVAMh++Js0ytvZn42ZLIQ9crU73DjZsex11D3Jr09p+10FS/uB VW2bCJSq+/t5tVkahmkl4px2GMBIFiKGepeui6oaTjOH9dT6o+huLqJYHRvrkxgJ Tup29T7bCESqV2TggHRZzFKTcf7Vr1r9UZsW7IBBNbJcxJvHCHkrBU41iOPhR6aq 1hldc+JHzy+7RALHxQfVbPDvr6nebRVuURQ11lgkWIDB1hVz9K0GhOw71bYllG/o I/DfjW18X0awlcO5azQ9jVVKzGbESsS5u6gyTd7HlzTopswAyZC7TmjjOqfhgZnS hlWCwoHIC7WBUpDS6mSn4Tuskc4e4pGvFN2WpL9xtNy+pKSqqRKk8y24kpwAmIn4 3pHTd35oW98SLJbS8eEodht1J7G1wQvnKHpAiG8eYBg85vXK17nASbWqpveg =T8oG -----END PGP MESSAGE-----
Update again: The said vendor has already fixed up their program to prevent copying out repaird PDF from evaluation version of the software. Good on them, and sorry I do not have a copy of older version of that program (thus no need to ask me about it).
So in the end I managed to repair those two corrupted PDF files and kept the client happy, using just evaluation version of a commercial software. And about making a crippled evaluation version — there is little point crippling the application as there might be trivial ways getting around it. Cripple your output to annoy potential customers, and they might buy your software if they are desperate enough.