I have had occasion to scan a few papers to PDF format recently. Typically this produces large PDF files, because many bytes are used to represent the RGB value of the pixels of the document image. Of course, the original document is generally black and white, and the faithful representation of its coloring is pointless and makes the PDF file larger than it can be.
So I made a little script which takes a PDF file written by the scanning program and 'monochromizes' it.
Here is the script - it uses various image processing commands from the Linux world. It is not 'fancy', just utilitarian - use at your owk risk! The reduction in size of the PDF file can be significant - so if you are struggling with overly large PDFs, this script (or your own modification to it) may be of value.
#!/bin/sh NPAGES=`pdftk $1 dump_data | grep NumberOfPages | awk '{print $2}'` OUTPUTFILE=`basename $1 .pdf`.bw.pdf i=0 while [ $i -lt $NPAGES ] do i=`expr $i + 1` echo $i d=`echo $i | awk '{printf "%05d",$i}'` echo $d pdftk A=$1 cat A$i output page$d.pdf pdftoppm page$d.pdf -gray tmp ppmtopgm tmp-000001.pgm | \ pamthreshold -simple -threshold=0.85 | \ pnmtops -imagewidth=8.5 > tmp.ps ps2pdf -dPDFSETTINGS=/ebook tmp.ps mv tmp.pdf newpage$d.pdf rm page$d.pdf done pdftk newpage*.pdf cat output $OUTPUTFILE rm newpage0*.pdf rm tmp.ps rm tmp-000001.pgm