Batch search & replace in PDF files

The other day I found out I had misspelled a word in a whole batch of automatically generated PDF files. Regenerating all of them would be a lot of work, as the PDF files were plots created using perl/PDL, gnuplot and epstopdf (available in texlive-extra-utils), and the input data was scattered over about 20 different machines. Of course I could have hand-edited all the files using Inkscape, but that would also be a lot of work. Instead, I discovered there's an easy way to automatically search & replace text strings in PDF files on my Linux system, using sed and pdftk.

First, make sure you have pdftk installed. On Ubuntu you can simply do: sudo apt-get install pdftk

Then, use a shell-script to uncompress the PDF-files, replace the text and recompress them again. For instance:


#!/bin/bash

oldtext=$1
newtext=$2
pdffile=$3

cp $pdffile $pdffile.bak
pdftk $pdffile output $pdffile.tmp uncompress
sed -i "s/$oldtext/$newtext/g" $pdffile.tmp
pdftk $pdffile.tmp output $pdffile compress

You can easily modify this to run on a whole batch of files. Actually, I just made this as a quick hack, and executed the script using something like:

for i in *.pdf ; do replacepdftext.sh oldword newword $i ; done

But I'm sure there's a better way to integrate batch processing in the script itself...

Reply

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <span> <b> <i>
  • Lines and paragraphs break automatically.
  • Each email address will be obfuscated in a human readable fashion or (if JavaScript is enabled) replaced with a spamproof clickable link.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options