PDF Index Assistant

Miscellaneous

Acrobotics Inc. specializes in automation software for use with Adobe Acrobat 5.0 and Acrobat 6.0. Its programs perform tasks on word searchable pages contained in Portable Document Format files that Adobe Acrobat doesn’t, such as creating word indexes. It’s a niche market for people who use word searchable PDF files to review large batches of documents. I was curious to see how well the tool would work for revealing information in legal documents that might otherwise be undetected in a traditional word search.

The PDF Index Assistant program took less than three minutes to install. For the program to work, you must use the full licensed version of Acrobat 5.0 or 6.0. PDF Index Assistant will not work with Acrobat Reader.

Acrobotics states that PDF Index Assistant can build a comprehensive index for all text found in a PDF file. It’s a standalone program that must be started outside of Acrobat. Once started, you are prompted to select a PDF file that contains searchable text. Keep in mind it will not work on all PDF files because not all text you see is visually displayed in a PDF file. Trying to perform a “Find” for a word you can see on screen using Acrobat generally will determine if the text is searchable. If the word isn’t found, then the PDF file doesn’t contain searchable text.

I first selected a PDF file containing a five-page generic confidentiality agreement that had previously been scanned and then went through optical character recognition using Adobe Capture. PDF Index Assistant quickly produced an alphabetically sorted list of all words, numbers and symbols with pagination references for each in seconds. The final results opened in Microsoft Notepad as a text file. The report was an eight-page list of everything in the PDF file. For example, the number “0” appears on Page 1 of the document in the PDF file. The word “Agreement” appears on all five pages of the document. More interesting was that PDF Index Assistant picked up instances where “agreement” was misspelled or misidentified by the OCR program. To me, this is the key success of PDF Index Assistant. The software enables the user to find all possible combinations of the word being searched.

Next, I used a 471-page PDF file I ran through a generic OCR program. The file contained all documents associated with a patent prosecution history, which included forms, pleadings, correspondence and articles. Traditionally, this is a challenging package of documents to review. The PDF Index Assistant stalled a couple of times trying to index the words, but after about four hours it produced an alphabetically sorted list of words, numbers and symbols with page references for each. I was able to use other programs while the PDF Index Assistant was chugging away on my desktop, but with noticeable sluggishness. Although the time it took to complete the index would try the nerves of the most patient individual, the word list was exceptionally detailed and accurate. Acrobotics said the processing time depends on the speed of the computer.

Thankfully, PDF Index Assistant has the option of automatically removing categories of common words. This means that pronouns, conjunctions and numbers don’t have to show up in the finished index results. The user will then be able to cut down on indexing time.

PDF Index Assistant
Acrobotics Inc.
www.acrobotics.net
Fax: (918) 592-4389
Price: $195.

Windows 95/98/ME/NT/2000/XP. Fully licensed version of Acrobat 5.0 or 6.0.

Reviewed by Paul D. Pollard, a graduate of Villanova University Law School and director of litigation support for Fish & Richardson.

Dec/Jan '05 Issue

PROS
Inexpensive. Can search any PDF file. Provides every letter combination, making it unlikely to miss a word.

CONS
Not recommended for large PDF files.

VERDICT
Has accurate results, but can’t benefit all litigation practices. Using it for long documents requires patience to allow the program to complete indexing.


  | Home  | 

Issue Archive  |  Resources  |  About Us  |  Contact Us  |  Subscribe  |

Subscribers  |  Advertisers  |

Updated 11/22/04
© Law Office Computing Magazine
www.lawofficecomputing.com
(800) 394-2626