Class FindTextInRectangle


  • public class FindTextInRectangle
    extends Object

    Find text in PDF files


    This class provides a simple Java API to find text in a PDF file and also a static convenience method if you want to search a PDF file or directory containing PDF files

    Example 1 - access API methods

    FindTextInRectangle extract=new FindTextInRectangle("C:/pdfs/mypdf.pdf");
     //extract.setPassword("password");
     if (extract.openPDFFile()) {
          int pageCount=extract.getPageCount();
          for (int page=1; page<=pageCount; page++) {
    
              float[] coords=extract.findTextOnPage(page, "textToFind", SearchType.MUTLI_LINE_RESULTS ) ;
          }
     }
    
     extract.closePDFfile();


    Example 2 - convenience static method

    Find text in a file or set of files and return results as an ArrayList Last value is maximum number of pages to process or -1 for all pages

    ArrayList resultsForPages=FindTextInRectangle.findTextOnAllPages(pdfFile,String textToFind);
     float[] cordsForPage1=resultsForPages.get(0);


    See our Support Pages for more information on Text Searching.
    • Constructor Detail

      • FindTextInRectangle

        public FindTextInRectangle​(String fileName)
        Sets up an FindTextInRectangle instance to open a PDF File
        Parameters:
        fileName - full path to a single PDF file
      • FindTextInRectangle

        public FindTextInRectangle​(byte[] byteArray)
        Sets up an FindTextInRectangle instance to open a PDF file contained as a BLOB within a byte[] stream
        Parameters:
        byteArray - Array that will hold the BLOB
    • Method Detail

      • findTextOnPage

        public float[] findTextOnPage​(int page,
                                      String textToFind,
                                      int searchType)
                               throws PdfException
        Return the coords for the page specified.The origin of the coords is the bottom left hand corner (on unrotated page)
        Parameters:
        textToFind - test to look for
        page - :: Page number to check for results
        searchType - A static int from org.jpedal.grouping.SearchType class
        Returns:
        float[] containing all coords for the page, or empty array is no results found
        [0]=result x1 coord
        [1]=result y1 coord
        [2]=result x2 coord
        [3]=result y2 coord
        [4]=either -101 to show that the next text area is the remainder of this word on another line else any other value is ignored.
        Throws:
        PdfException - PdfException
      • findTextOnPage

        public float[] findTextOnPage​(int page,
                                      int x1,
                                      int y1,
                                      int x2,
                                      int y2,
                                      String textToFind,
                                      int searchType)
                               throws PdfException
        Return the coords for the page specified.The origin of the coords is the bottom left hand corner (on unrotated page)
        Parameters:
        page - page to search
        x1 - x1
        y1 - y1
        x2 - x2
        y2 - y2
        textToFind - text to look for
        searchType - A static int from org.jpedal.grouping.SearchType class
        Returns:
        float[] containing all coords for the page, or empty array is no results found
        [0]=result x1 coord
        [1]=result y1 coord
        [2]=result x2 coord
        [3]=result y2 coord
        [4]=either -101 to show that the next text area is the remainder of this word on another line else any other value is ignored.
        Throws:
        PdfException - pdfException
      • findTextOnAllPages

        public static List<float[]> findTextOnAllPages​(String inputDir,
                                                       String textToFind)
                                                throws PdfException
        Convenience method to find text in a PDF file
        Parameters:
        inputDir - a PDF file
        textToFind - text to look for
        Returns:
        ArrayList containing set of float[] values for all pages (-1 for actual page) * The origin of the coords is the bottom left hand corner (on unrotated page) organised in the following order.
        [0]=result x1 coord
        [1]=result y1 coord
        [2]=result x2 coord
        [3]=result y2 coord
        [4]=either -101 to show that the next text area is the remainder of this word on another line else any other value is ignored. s
        Throws:
        PdfException - PdfException
      • setPassword

        public void setPassword​(String password)
        Parameters:
        password - the USER or OWNER password for the PDF file
      • getPageCount

        public int getPageCount()
        number of pages in PDF file (starting at 1)
        Returns:
        page count
      • openPDFFile

        public boolean openPDFFile()
                            throws PdfException
        routine to open the PDF File so we can access
        Returns:
        true if successful
        Throws:
        PdfException
      • closePDFfile

        public void closePDFfile()
        ensure PDF file is closed once no longer needed and all resources released