Class ConvertPagesToImages


  • public class ConvertPagesToImages
    extends Object

    Image Extraction from PDF files


    This class provides a simple Java API to extract pages as images from a PDF file and also a static convenience method if you just want to dump all the pages as images from a PDF file or directory containing PDF files isBackgroundTransparent MUST be false for generating JPEG images

    Note: It is recommended to resort to ConverPagesToHiResImages as the first choice for generating better quality images when converting with non-default settings.

    Example 1 - access API methods

    ConvertPagesToImages extract=new ConvertPagesToImages("C:/pdfs/mypdf.pdf");
     //extract.setPassword("password");
    
     //reduce size of image (also reduces memory usage
     //extract.setPageScaling(0.25f); // default values is 1.33f to give same size as Adobe at 100%
    
     if (extract.openPDFFile()) {
         int pageCount=extract.getPageCount();
    
            BufferedImage image=extract.getPageAsImage(page, isBackgroundTransparent);
         }
     }
    
     extract.closePDFfile();

    Example 2 - convenience static method

    Extract all pages as images

    ConvertPagesToImages.writeAllPagesAsImagesToDir("pdfs", "images" , "png", 1.33f);

    Example 3 - Access directly from the Jar

    ConvertPagesToImages can run from jar directly using the command and will extract all files from a PDF file or directory to a defined output directory:
    java -cp libraries_needed org/jpedal/examples/images/ConvertPagesToImages inputValues
    Where inputValues is 4 values:
    • First value: The PDF filename (including the path if needed) or a directory containing PDF files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/").
    • Second value: The location to write out images extracted from the PDF file or files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/").
    • Third value: This indicates the required output image type (default is png if nothing specified). Options are tiff, bmp, png, jpg.
    • Fourth value: Scaling of page 1.33f gives same size as page appears in Acrobat at 100%
    There is another example (org.jpedal.examples.images.ConvertPagesToHiResImages) for producing higher res images of pages (but likely to be slower).
    Click here for a list of code examples to convert images
    • Field Detail

      • maxPageCount

        public static int maxPageCount
        used as part of test to limit pages to first 10 - please do not use
    • Constructor Detail

      • ConvertPagesToImages

        public ConvertPagesToImages​(String fileName)
        Sets up an ConvertPagesToImages instance to open a PDF File
        Parameters:
        fileName - full path to a single PDF file
      • ConvertPagesToImages

        public ConvertPagesToImages​(byte[] byteArray)
        Sets up an ConvertPagesToImages instance to open a PDF file contained as a BLOB within a byte[] stream (do not pad with additional empty bytes)
        Parameters:
        byteArray - file's BLOB
    • Method Detail

      • writeAllPagesAsImagesToDir

        public static void writeAllPagesAsImagesToDir​(String inputDir,
                                                      String outDir,
                                                      String format,
                                                      float pageScaling)
                                               throws PdfException
        convenience static method to convert PDF file or directory of files
        Parameters:
        inputDir - directory of files to convert
        outDir - directory of output
        format - format of images
        pageScaling - scaling
        Throws:
        PdfException - PdfException
      • setPageScaling

        public void setPageScaling​(float pageScaling)
        alter page scaling (default is 1.33f which gives same size as Acrobat at 100)
        Parameters:
        pageScaling - scaling
      • main

        public static void main​(String[] args)
        main routine which checks parameters passed in and runs the conversion
        Parameters:
        args - arguments
      • getPageAsImage

        public BufferedImage getPageAsImage​(int page,
                                            boolean isBackgroundTransparent)
                                     throws PdfException
        Parameters:
        page - Logical page number in PDF (first page is 1)
        isBackgroundTransparent - defines if BufferedImage has a white or transparent background
        Returns:
        BufferedImage of PDF page
        Throws:
        PdfException - is any issues decoding PDF file
      • getPageAsImage

        public BufferedImage getPageAsImage​(int page)
                                     throws PdfException
        Parameters:
        page - Logical page number in PDF (first page is 1)
        Returns:
        BufferedImage of PDF page with white background
        Throws:
        PdfException - is any issues decoding PDF file
      • setPassword

        public void setPassword​(String password)
        sets the Owner or User Password to use when opening encrypted PDF file
        Parameters:
        password - the USER or OWNER password for the PDF file
      • getPageCount

        public int getPageCount()
        number of pages in PDF file (starting at 1)
        Returns:
        page count
      • openPDFFile

        public boolean openPDFFile()
                            throws PdfException
        routine to open the PDF File so we can access - needs to be checked as will be false if file cannot be opened for any reason
        Returns:
        true if successful
        Throws:
        PdfException - is problem opening file
      • closePDFfile

        public void closePDFfile()
        ensure PDF file is closed once no longer needed and all resources released