The other day I found out I had misspelled a word in a whole batch of
automatically generated PDF files. Regenerating all of them would be a
lot of work, as the PDF files were plots created using perl/PDL,
gnuplot and epstopdf (available in texlive-extra-utils
), and
the input data was scattered over about 20 different machines. Of course
I could have hand-edited all the files using Inkscape, but that
would also be a lot of work. Instead, I discovered there's an easy way
to automatically search & replace text strings in PDF files on my Linux
system, using sed and pdftk.
First, make sure you have pdftk installed. On Ubuntu you can simply do:
sudo apt-get install pdftk
Then, use a shell-script to uncompress the PDF-files, replace the text and recompress them again. For instance:
#!/bin/bash
oldtext=$1
newtext=$2
pdffile=$3
cp $pdffile $pdffile.bak
pdftk $pdffile output $pdffile.tmp uncompress
sed -i "s/$oldtext/$newtext/g" $pdffile.tmp
pdftk $pdffile.tmp output $pdffile compress
You can easily modify this to run on a whole batch of files. Actually, I just made this as a quick hack, and executed the script using something like:
for i in *.pdf ; do replacepdftext.sh oldword newword $i ; done
But I'm sure there's a better way to integrate batch processing in the script itself...