Class ExtractEmbeddedFiles

java.lang.Object
org.jpedal.examples.BaseExample
org.jpedal.examples.acroform.ExtractEmbeddedFiles

public class ExtractEmbeddedFiles extends org.jpedal.examples.BaseExample

File Extraction from PDF files


This class provides a simple Java API to extract embedded files and file attachments from a PDF file and also a static convenience method if you just want to dump all files from a PDF file or directory containing PDF files. All files are extracted to a folder at the given output location with a name matching the pdf filename

Example 1 - access API methods


 ExtractEmbeddedFiles extract=new ExtractEmbeddedFiles("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
     if (extract.containsEmbeddedFiles()) {
         extract.extractEmbeddedFiles("C:/output/");
     }
     if (extract.containsFilesAttachments()) {
         extract.extractFileAttachments("C:/output");
     }
 }
 extract.closePDFfile();
 

Example 2 - convenience static method

Extract all embedded files and file attachments from a pdf


 ExtractEmbeddedFiles.extractAllFilesFromPdf("C:/pdfs/mypdf.pdf", "C:/output");
 

Example 3 - Access directly from the Jar

ExtractEmbeddedFiles can run from jar directly using the command and will extract all embedded files and file attachments from a PDF file or directory to a defined output directory:

java -cp libraries_needed org/jpedal/examples/acroform/ExtractEmbeddedFiles inputValues

Where inputValues is 3 values:
  • First value: The PDF filename (including the path if needed) or a directory containing PDF files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/").
  • Second value: The location to write out extracted files from the PDF file or files. If it contains spaces it must be enclosed by double quotes (ie "C:/Path with spaces/").

  • Constructor Details

    • ExtractEmbeddedFiles

      public ExtractEmbeddedFiles(String fileName)
    • ExtractEmbeddedFiles

      public ExtractEmbeddedFiles(byte[] byteArray)
  • Method Details

    • main

      public static void main(String[] args)
    • setPassword

      public void setPassword(String password)
      Parameters:
      password - the USER or OWNER password for the PDF file
    • containsFilesAttachments

      public boolean containsFilesAttachments()
      Method to flag if the current file contains file attachments
      Returns:
      True is file attachments are present, otherwise false.
    • extractFileAttachments

      public void extractFileAttachments(String outputDirectory)
      Extract files from file attachment annotations in the open file and place them in the output directory specified. A directory is placed in the given directory, the name is that of the pdf and it contains all extracted files. When extracting the files any existing files of the same name will be replaced. This does not extract files contained within the EmbeddedFiles dictionary (such as those found in Portfolios).
      Parameters:
      outputDirectory - Path where the extract files should be saved.
    • extractAllEmbeddedFilesAsMap

      public static Map<String,byte[]> extractAllEmbeddedFilesAsMap(String inputFilename) throws PdfException
      Throws:
      PdfException
    • extractEmbeddedFile

      public byte[] extractEmbeddedFile(String requestedFile)
    • getEmbeddedFileNames

      public String[] getEmbeddedFileNames()
    • extractAllFileAttachmentsAsMap

      public static Map<String,byte[]> extractAllFileAttachmentsAsMap(String inputFilename) throws PdfException
      Throws:
      PdfException
    • extractAllFileAttachmentsOnPageAsMap

      public static Map<String,byte[]> extractAllFileAttachmentsOnPageAsMap(String inputFilename, int page) throws PdfException
      Throws:
      PdfException
    • extractFileAttachment

      public byte[] extractFileAttachment(String requestedFile)
    • extractAllFileAttachmentFilesOnPage

      public HashMap<String,byte[]> extractAllFileAttachmentFilesOnPage(int page)
    • getFileAttachmentNames

      public String[] getFileAttachmentNames()
    • containsEmbeddedFiles

      public boolean containsEmbeddedFiles()
      Method to flag if the current file contains embedded files.
      Returns:
      True is embedded files are present, otherwise false.
    • extractEmbeddedFiles

      public void extractEmbeddedFiles(String outputDirectory)
      Extract embedded files and place them in the output directory specified. A directory is placed in the given directory, the name is that of the pdf and it contains all extracted files. When extracting the files any existing files of the same name will be replaced. This does not extract files contained within File Attachment annotations.
      Parameters:
      outputDirectory - Path where the extracted files should be saved.
    • extractAllFilesFromPdf

      public static void extractAllFilesFromPdf(String inputDir, String outputDir) throws PdfException
      static method to write out all pages in a PDF files or directory of PDF files as images
      Parameters:
      inputDir - directory of files to convert
      outputDir - directory of output
      Throws:
      PdfException - PdfException
    • showEmbeddedFilesDetails

      public void showEmbeddedFilesDetails()
    • showFileAttachmentDetails

      public void showFileAttachmentDetails()