Class ExtractClippedImages


  • public class ExtractClippedImages
    extends Object

    Clipped Image Extraction from PDF files


    This class provides a simple Java API to extract clipped images from a PDF file and also a static convenience method if you just want to dump all the images from a PDF file or directory containing PDF files at a set of sizes

    Example 1 - access API methods

    ExtractClippedImages extract=new ExtractClippedImages("C:/pdfs/mypdf.pdf");
     //extract.setPassword("password");
     if (extract.openPDFFile()) {
         int pageCount=extract.getPageCount();
         for (int page=1; page<=pageCount; page++) {
    
            int imagesOnPageCount=extract.getImageCount(page);
            for (int image=0; image<imagesOnPageCount; image++) {
                 BufferedImage image=extract.getClippedImage(page, image);
            }
         }
     }
    
     extract.closePDFfile();

    Example 2 - convenience static method

    Extract images with clip applied and rescale to set of predefined pixel heights (-1 is raw size)
    ExtractClippedImages.writeAllClippedImagesToDirs(inputDir, outDir, new String[]{"500","scaled","-1","rawSize"});

    Example 3 - Access directly from the Jar

    ExtractClippedImages can run from jar directly using the command and will extract all files from a PDF file or directory to a defined output directory:
    java -cp libraries_needed org/jpedal/examples/ ExtractClippedImages $inputDir $output_dir imageOutputType h1 dir1 h2 dir2 ... hn dirn
    Values with SPACES must be surrounded by "" as in "This is one value". The values passed are:
    • $inputDir - directory containing files.
    • $output_dir - directory to put files in.
    • imageOutputType - save images as jpeg, tiff or png.
    • Any number of pairs of values: h dir
    h - height required in pixels as an integer for output (-1 means keep current size) dir - directory to write out images.
    So to create 3 versions of the image (one at original size, one at 100 and one at 50 pixels high), you would use:
    java -cp libraries_needed org/jpedal/examples/images/ExtractClippedImages pdfFiles ouput /logs/image.log -1 raw 100 medium 50 thumbnail/
    Note image quality depends on the raw image in the original. It can be VERY memory intensive.

    See our support pages for more information on extracting images.
    • Constructor Detail

      • ExtractClippedImages

        public ExtractClippedImages​(String fileName)
        Sets up an ExtractClippedImages instance to open a PDF File
        Parameters:
        fileName - full path to a single PDF file
      • ExtractClippedImages

        public ExtractClippedImages​(byte[] byteArray)
        Sets up an ExtractClippedImages instance to open a PDF file contained as a BLOB within a byte[] stream
        Parameters:
        byteArray -
    • Method Detail

      • getClippedImage

        public BufferedImage getClippedImage​(int page,
                                             int imageNumber)
                                      throws PdfException
        extract any image from any page - recommended you process images on each page in turn as quicker
        Parameters:
        page - logical page number (1 is first page)
        imageNumber - image on page (0 is first image)
        Returns:
        BufferedImage
        Throws:
        PdfException - PdfException
      • writeAllClippedImagesToDirs

        public static void writeAllClippedImagesToDirs​(String inputDir,
                                                       String outDir,
                                                       String imageType,
                                                       String[] subDirs)
                                                throws PdfException
        Convenience method to Extract all the images in a directory of PDF files
        Parameters:
        inputDir - directory of input files
        outDir - directory of output files
        subDirs - sub directory of files
        Throws:
        PdfException - PdfException
      • main

        public static void main​(String[] args)
        main routine which checks for any files passed and runs the demo
        Parameters:
        args - arguments
      • getImageCount

        public int getImageCount​(int page)
                          throws PdfException
        returns an image count for the selected page
        Parameters:
        page - logical page number
        Returns:
        int number of images (0 if no images)
        Throws:
        PdfException - PdfException
      • setPassword

        public void setPassword​(String password)
        sets the Owner or User Password to use when opening encrypted PDF file
        Parameters:
        password - the USER or OWNER password for the PDF file
      • getPageCount

        public int getPageCount()
        number of pages in PDF file (starting at 1)
        Returns:
        page count
      • openPDFFile

        public boolean openPDFFile()
                            throws PdfException
        routine to open the PDF File so we can access - needs to be checked as will be false if file cannot be opened for any reason
        Returns:
        true if successful
        Throws:
        PdfException - is problem opening file
      • closePDFfile

        public void closePDFfile()
        ensure PDF file is closed once no longer needed and all resources released