Batch search & replace in PDF files
The other day I found out I had misspelled a word in a whole batch of automatically generated PDF files. Regenerating all of them would be a lot of work, as the PDF files were plots created using perl/PDL, gnuplot and epstopdf (available in
texlive-extra-utils), and the input data was scattered over about 20 different machines. Of course I could have hand-edited all the files using Inkscape, but that would also be a lot of work. Instead, I discovered there's an easy way to automatically search & replace text strings in PDF files on my Linux system, using sed and pdftk.
First, make sure you have pdftk installed. On Ubuntu you can simply do:
sudo apt-get install pdftk
Then, use a shell-script to uncompress the PDF-files, replace the text and recompress them again. For instance:
#!/bin/bash oldtext=$1 newtext=$2 pdffile=$3 cp $pdffile $pdffile.bak pdftk $pdffile output $pdffile.tmp uncompress sed -i "s/$oldtext/$newtext/g" $pdffile.tmp pdftk $pdffile.tmp output $pdffile compress
You can easily modify this to run on a whole batch of files. Actually, I just made this as a quick hack, and executed the script using something like:
for i in *.pdf ; do replacepdftext.sh oldword newword $i ; done
But I'm sure there's a better way to integrate batch processing in the script itself...