Package org.jpedal.examples.text
Class FindTextInRectangle
java.lang.Object
org.jpedal.examples.BaseExample
org.jpedal.examples.text.FindTextInRectangle
public class FindTextInRectangle
extends org.jpedal.examples.BaseExample
Find text in PDF files
This class provides a simple Java API to find text in a PDF file and also a static convenience method if you want to search a PDF file or directory containing PDF files
See our Support Pages for more information on Text Searching.
-
Constructor Summary
ConstructorDescriptionFindTextInRectangle
(byte[] byteArray) Sets up an FindTextInRectangle instance to open a PDF file contained as a BLOB within a byte[] streamFindTextInRectangle
(String fileName) Sets up an FindTextInRectangle instance to open a PDF File -
Method Summary
Modifier and TypeMethodDescriptionvoid
decodeFile
(String file_name) routine to decode a filestatic List
<float[]> findTextOnAllPages
(String inputDir, String textToFind) Convenience method to find text in a PDF filefloat[]
findTextOnPage
(int page, int x1, int y1, int x2, int y2, String textToFind, int searchType) Return the coords for the page specified.The origin of the coords is the bottom left hand corner (on unrotated page)float[]
findTextOnPage
(int page, String textToFind, int searchType) Return the coords for the page specified.The origin of the coords is the bottom left hand corner (on unrotated page)int
number of pages in PDF file (starting at 1)void
setPassword
(String password) Methods inherited from class org.jpedal.examples.BaseExample
closePDFfile, openPDFFile
-
Constructor Details
-
FindTextInRectangle
Sets up an FindTextInRectangle instance to open a PDF File- Parameters:
fileName
- full path to a single PDF file
-
FindTextInRectangle
public FindTextInRectangle(byte[] byteArray) Sets up an FindTextInRectangle instance to open a PDF file contained as a BLOB within a byte[] stream- Parameters:
byteArray
- Array that will hold the BLOB
-
-
Method Details
-
decodeFile
routine to decode a file- Throws:
PdfException
-
findTextOnPage
Return the coords for the page specified.The origin of the coords is the bottom left hand corner (on unrotated page)- Parameters:
page
- :: Page number to check for resultstextToFind
- test to look forsearchType
- A static int from org.jpedal.grouping.SearchType class- Returns:
- float[] containing all coords for the page, or empty array is no results found
[0]=result x1 coord
[1]=result y1 coord
[2]=result x2 coord
[3]=result y2 coord
[4]=either -101 to show that the next text area is the remainder of this word on another line else any other value is ignored. - Throws:
PdfException
- PdfException
-
findTextOnPage
public float[] findTextOnPage(int page, int x1, int y1, int x2, int y2, String textToFind, int searchType) throws PdfException Return the coords for the page specified.The origin of the coords is the bottom left hand corner (on unrotated page)- Parameters:
page
- page to searchx1
- x1y1
- y1x2
- x2y2
- y2textToFind
- text to look forsearchType
- A static int from org.jpedal.grouping.SearchType class- Returns:
- float[] containing all coords for the page, or empty array is no results found
[0]=result x1 coord
[1]=result y1 coord
[2]=result x2 coord
[3]=result y2 coord
[4]=either -101 to show that the next text area is the remainder of this word on another line else any other value is ignored. - Throws:
PdfException
- pdfException
-
findTextOnAllPages
public static List<float[]> findTextOnAllPages(String inputDir, String textToFind) throws PdfException Convenience method to find text in a PDF file- Parameters:
inputDir
- a PDF filetextToFind
- text to look for- Returns:
- ArrayList containing set of float[] values for all pages (-1 for actual page) * The origin of the coords
is the bottom left hand corner (on unrotated page) organised in the following order.
[0]=result x1 coord
[1]=result y1 coord
[2]=result x2 coord
[3]=result y2 coord
[4]=either -101 to show that the next text area is the remainder of this word on another line else any other value is ignored. s - Throws:
PdfException
- PdfException
-
setPassword
- Parameters:
password
- the USER or OWNER password for the PDF file
-
getPageCount
public int getPageCount()number of pages in PDF file (starting at 1)- Returns:
- page count
-