Image Extraction from PDF files

This class provides a simple Java API to extract pages as images from a PDF file and also a static convenience method if you just want to dump all the pages as images from a PDF file or directory containing PDF files isBackgroundTransparent MUST be false for generating JPEG images

Note: It is recommended to resort to ConverPagesToHiResImages as the first choice for generating better quality images when converting with non-default settings.
See our Support Page for Examples on Convert PDF pages to Images
There is a simpler example (org.jpedal.examples.images.ConvertPagesToHiResImages) for producing higher res images of pages (but likely to be slower).

Field Summary

Fields

Modifier and Type

Field

Description

static int

maxPageCount

used as part of test to limit pages to first 10 - please do not use
Constructor Summary

Constructors

Constructor

Description

ConvertPagesToImages(byte[] byteArray)

Sets up an ConvertPagesToImages instance to open a PDF file contained as a BLOB within a byte[] stream (do not pad with additional empty bytes)

ConvertPagesToImages(String fileName)

Sets up an ConvertPagesToImages instance to open a PDF File
Method Summary

Modifier and Type

Method

Description

void

closePDFfile()

ensure PDF file is closed once no longer needed and all resources released

void

extractPagesAsImages(String file_name, String output_dir, String name, boolean isTransparent)

BufferedImage

getPageAsImage(int page)

BufferedImage

getPageAsImage(int page, boolean isBackgroundTransparent)

int

getPageCount()

number of pages in PDF file (starting at 1)

Iterator<Integer>

getPageRange()

get the page numbers

static void

main(String[] args)

main routine which checks parameters passed in and runs the conversion

boolean

openPDFFile()

open the pdf file and set the page range if there is one

void

setFitToSize(int[] dimensions)

output image to fit a certain size (respecting aspect ratio) Value set will override any setPageScaling() setting

void

setPageRange(PageRanges r)

Set the page range

void

setPageScaling(float pageScaling)

alter page scaling (default is 1.33f which gives same size as Acrobat at 100)

void

setPassword(String password)

sets the Owner or User Password to use when opening encrypted PDF file

static void

writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, float pageScaling)

convenience static method to convert PDF file or directory of files

static void

writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, float pageScaling, String password)

convenience static method to convert PDF file with password or directory of files

static void

writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, float pageScaling, String password, ErrorTracker errorTracker)

convenience static method to convert PDF file with password or directory of files

static void

writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, int[] dimensions)

convenience static method to convert PDF file or directory of files to a set output size

static void

writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, String pageRange, float pageScaling)

convenience static method to convert PDF file with pageRanges See class SetOfIntegerSyntax for explanation of the syntax

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- maxPageCount
  
  public static int maxPageCount
  
  used as part of test to limit pages to first 10 - please do not use
Constructor Details
- ConvertPagesToImages
  
  public ConvertPagesToImages(String fileName)
  
  Sets up an ConvertPagesToImages instance to open a PDF File
  
  Parameters:
  
  fileName - full path to a single PDF file
- ConvertPagesToImages
  
  public ConvertPagesToImages(byte[] byteArray)
  
  Sets up an ConvertPagesToImages instance to open a PDF file contained as a BLOB within a byte[] stream (do not pad with additional empty bytes)
  
  Parameters:
  
  byteArray - file's BLOB
Method Details
- writeAllPagesAsImagesToDir
  
  public static void writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, float pageScaling) throws PdfException
  
  convenience static method to convert PDF file or directory of files
  
  Parameters:
  
  inputDir - directory of files to convert
  
  outDir - directory of output
  
  format - format of images
  
  pageScaling - scaling
  
  Throws:
  
  PdfException - PdfException
- writeAllPagesAsImagesToDir
  
  public static void writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, float pageScaling, String password) throws PdfException
  
  convenience static method to convert PDF file with password or directory of files
  
  Parameters:
  
  inputDir - directory of files to convert
  
  outDir - directory of output
  
  format - format of images
  
  pageScaling - scaling
  
  password - to access PDF files
  
  Throws:
  
  PdfException - PdfException
- writeAllPagesAsImagesToDir
  
  public static void writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, float pageScaling, String password, ErrorTracker errorTracker) throws PdfException
  
  convenience static method to convert PDF file with password or directory of files
  
  Parameters:
  
  inputDir - directory of files to convert
  
  outDir - directory of output
  
  format - format of images
  
  pageScaling - scaling
  
  password - to access PDF files
  
  errorTracker - custom error tracker
  
  Throws:
  
  PdfException - PdfException
- writeAllPagesAsImagesToDir
  
  public static void writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, int[] dimensions) throws PdfException
  
  convenience static method to convert PDF file or directory of files to a set output size
  
  Parameters:
  
  inputDir - directory of files to convert
  
  outDir - directory of output
  
  format - format of images
  
  dimensions - int[]{width, height}
  
  Throws:
  
  PdfException - PdfException
- writeAllPagesAsImagesToDir
  
  public static void writeAllPagesAsImagesToDir(String inputDir, String outDir, String format, String pageRange, float pageScaling) throws PdfException
  
  convenience static method to convert PDF file with pageRanges See class SetOfIntegerSyntax for explanation of the syntax
  
  Parameters:
  
  inputDir - directory of file to convert
  
  outDir - directory of output
  
  format - format of images
  
  pageRange - the range of pages we want to output
  
  pageScaling - float of the scaling
  
  Throws:
  
  PdfException - PdfException
- setPageScaling
  
  public void setPageScaling(float pageScaling)
  
  alter page scaling (default is 1.33f which gives same size as Acrobat at 100)
  if setFitToSize(final int[] dimensions) is also set, this value will be ignored
  
  Parameters:
  
  pageScaling - scaling
- setFitToSize
  
  public void setFitToSize(int[] dimensions)
  
  output image to fit a certain size (respecting aspect ratio) Value set will override any setPageScaling() setting
  
  Parameters:
  
  dimensions - int[]{width, height} in pixels
- setPageRange
  
  public void setPageRange(PageRanges r)
  
  Set the page range
  
  Parameters:
  
  r - attribute to obtain the pages to convert.
- openPDFFile
  
  public boolean openPDFFile() throws PdfException, RuntimeException
  
  open the pdf file and set the page range if there is one
  
  Returns:
  
  boolean of if pdf is open
  
  Throws:
  
  PdfException - is problem opening file
  
  RuntimeException
- extractPagesAsImages
  
  public void extractPagesAsImages(String file_name, String output_dir, String name, boolean isTransparent)
- main
  
  public static void main(String[] args)
  
  main routine which checks parameters passed in and runs the conversion
  
  Parameters:
  
  args - arguments
- getPageAsImage
  
  public BufferedImage getPageAsImage(int page, boolean isBackgroundTransparent) throws PdfException
  
  Parameters:
  
  page - Logical page number in PDF (first page is 1)
  
  isBackgroundTransparent - defines if BufferedImage has a white or transparent background
  
  Returns:
  
  BufferedImage of PDF page
  
  Throws:
  
  PdfException - is any issues decoding PDF file
- getPageAsImage
  
  public BufferedImage getPageAsImage(int page) throws PdfException
  
  Parameters:
  
  page - Logical page number in PDF (first page is 1)
  
  Returns:
  
  BufferedImage of PDF page with white background
  
  Throws:
  
  PdfException - is any issues decoding PDF file
- getPageRange
  
  public Iterator<Integer> getPageRange()
  
  get the page numbers
  
  Returns:
  
  Iterator of page numbers
- setPassword
  
  public void setPassword(String password)
  
  sets the Owner or User Password to use when opening encrypted PDF file
  
  Parameters:
  
  password - the USER or OWNER password for the PDF file
- getPageCount
  
  public int getPageCount()
  
  number of pages in PDF file (starting at 1)
  
  Returns:
  
  page count
- closePDFfile
  
  public void closePDFfile()
  
  ensure PDF file is closed once no longer needed and all resources released

Class ConvertPagesToImages

Image Extraction from PDF files

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

maxPageCount

Constructor Details

ConvertPagesToImages

ConvertPagesToImages

Method Details

writeAllPagesAsImagesToDir

writeAllPagesAsImagesToDir

writeAllPagesAsImagesToDir

writeAllPagesAsImagesToDir

writeAllPagesAsImagesToDir

setPageScaling

setFitToSize

setPageRange

openPDFFile

extractPagesAsImages

main

getPageAsImage

getPageAsImage

getPageRange

setPassword

getPageCount

closePDFfile