Class ExtractImages

java.lang.Object
org.jpedal.examples.images.ExtractImages

public class ExtractImages extends Object

Image Extraction from PDF files


This class provides a simple Java API to extract images from a PDF file and also a static convenience method if you just want to dump all the images from a PDF file or directory containing PDF files.

See our Support Pages for more info on Image Extraction.
  • Constructor Summary

    Constructors
    Constructor
    Description
    ExtractImages(byte[] byteArray)
    Sets up an ExtractImages instance to open a PDF file contained as a BLOB within a byte[] stream
    Sets up an ExtractImages instance to open a PDF File
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    ensure PDF file is closed once no longer needed and all resources released
    getImage(int page, int imageNumber, boolean imageAsDisplayed)
    extract any image from any page - recommended you process images on each page in turn as quicker
    int
    getImageCount(int page)
    returns an image count for the selected page
    org.jpedal.objects.PdfImageData
    getImageData(int page)
     
    int
    number of pages in PDF file (starting at 1)
    static void
    main(String[] args)
    This class will allow you to extract Images via command line from a single PDF file or a directory of PDF files.
    boolean
    routine to open the PDF File so we can access - needs to be checked as will be false if file cannot be opened for any reason
    void
    setPassword(String password)
    sets the Owner or User Password to use when opening encrypted PDF file
    static void
    writeAllImagesToDir(String inputDir, String outputDir, String imageType, boolean generateMetaData, boolean outputPagesInSepDirs)
    Convenience method to Extract all the images in a directory of PDF files
    static void
    writeAllImagesToDir(String inputDir, String password, String outputDir, String imageType, boolean generateMetaData, boolean outputPagesInSepDirs)
    Convenience method to Extract all the images in a directory of PDF files
    static void
    writeAllImagesToDir(String inputDir, String password, String outputDir, String imageType, boolean generateMetaData, boolean outputPagesInSepDirs, ErrorTracker errorTracker)
    Convenience method to Extract all the images in a directory of PDF files

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ExtractImages

      public ExtractImages(String fileName)
      Sets up an ExtractImages instance to open a PDF File
      Parameters:
      fileName - full path to a single PDF file
    • ExtractImages

      public ExtractImages(byte[] byteArray)
      Sets up an ExtractImages instance to open a PDF file contained as a BLOB within a byte[] stream
      Parameters:
      byteArray - pdf file data
  • Method Details

    • writeAllImagesToDir

      public static void writeAllImagesToDir(String inputDir, String password, String outputDir, String imageType, boolean generateMetaData, boolean outputPagesInSepDirs) throws PdfException
      Convenience method to Extract all the images in a directory of PDF files
      Parameters:
      inputDir - directory containing PDF files
      password - password used to open PDF files
      outputDir - directory for writing out images
      imageType - 3 letter value for image format to be used
      generateMetaData - if true include additional XML file with metadata on image
      outputPagesInSepDirs - if true place images from each page in separate sub-directory
      Throws:
      PdfException - if problem with processing PDF files
    • writeAllImagesToDir

      public static void writeAllImagesToDir(String inputDir, String outputDir, String imageType, boolean generateMetaData, boolean outputPagesInSepDirs) throws PdfException
      Convenience method to Extract all the images in a directory of PDF files
      Parameters:
      inputDir - directory containing PDF files
      outputDir - directory for writing out images
      imageType - 3 letter value for image format to be used
      generateMetaData - if true include additional XML file with metadata on image
      outputPagesInSepDirs - if true place images from each page in separate sub-directory
      Throws:
      PdfException - if problem with processing PDF files
    • writeAllImagesToDir

      public static void writeAllImagesToDir(String inputDir, String password, String outputDir, String imageType, boolean generateMetaData, boolean outputPagesInSepDirs, ErrorTracker errorTracker) throws PdfException
      Convenience method to Extract all the images in a directory of PDF files
      Parameters:
      inputDir - directory containing PDF files
      password - password used to open PDF files
      outputDir - directory for writing out images
      imageType - 3 letter value for image format to be used
      generateMetaData - if true include additional XML file with metadata on image
      outputPagesInSepDirs - if true place images from each page in separate sub-directory
      errorTracker - a custom error tracker
      Throws:
      PdfException - if problem with processing PDF files
    • main

      public static void main(String[] args)
      This class will allow you to extract Images via command line from a single PDF file or a directory of PDF files.
      The example expects three parameters:
      • Value 1 is the file name or directory of PDF files to process
      • Value 2 is directory to write out the images
      • Value 3 is image type (jpeg,tiff,png). Default is png
      Parameters:
      args - The expected arguments are described above.
    • getImage

      public BufferedImage getImage(int page, int imageNumber, boolean imageAsDisplayed) throws PdfException
      extract any image from any page - recommended you process images on each page in turn as quicker
      Parameters:
      page - logical page number (1 is first page)
      imageNumber - image on page (0 is first image)
      imageAsDisplayed - if true return image as displayed (with scaling/rotation) otherwise use raw stored image (often but not always the same). Neither is clipped
      Returns:
      BufferedImage
      Throws:
      PdfException - if problem with extracting image from PDF file
    • getImageCount

      public int getImageCount(int page) throws PdfException
      returns an image count for the selected page
      Parameters:
      page - logical page number
      Returns:
      int number of images (0 if no images)
      Throws:
      PdfException - if problem with opening PDF file
    • getImageData

      public org.jpedal.objects.PdfImageData getImageData(int page) throws PdfException
      Throws:
      PdfException
    • setPassword

      public void setPassword(String password)
      sets the Owner or User Password to use when opening encrypted PDF file
      Parameters:
      password - the USER or OWNER password for the PDF file
    • getPageCount

      public int getPageCount()
      number of pages in PDF file (starting at 1)
      Returns:
      page count
    • openPDFFile

      public boolean openPDFFile() throws PdfException
      routine to open the PDF File so we can access - needs to be checked as will be false if file cannot be opened for any reason
      Returns:
      true if successful
      Throws:
      PdfException - is problem opening file
    • closePDFfile

      public void closePDFfile()
      ensure PDF file is closed once no longer needed and all resources released