Find words in PDF from another file (NA)

Cap'n Jack

Final Approach
Joined
Jun 25, 2006
Messages
8,783
Location
Nebraska
Display Name

Display name:
Cap'n Jack
I have a collection of research papers all in PDF (several hundred)- the authors for each paper are listed in the PDF. I have an Excel file of people who received research grants (~100-150 people).

I want to see which papers were authored by people in my list.

Tools available to me:
Windows XP Pro
Adobe reader 6
MS Excel 2002, MS Word 2002

I can install Visual Studio 6, or download Visual Basic or Visual C++ Express (whatever the current version is)

The task is to read the last name, search the PDFs, and put the file name in my excel file with the name.

The brute force method is to copy the last name and have Adobe search through the files using manual copying into the acrobat search dialog.

Is there a way to do it programmatically? The number of hits would be low enough that I can live with false positives, so just ientifying the files with that name in it is good enough.

Thanks much!
-->Jack
 
Hmm, if you can install C# there are interop methods for accessing MS Office apps such as Word and Excel. Accessing a PDF file would likely require purchasing something, but if just open it as a text file you'll probably be okay since you're just searching for strings.
 
install the google desktop utility. (desktop.google.com). It will index whatever you tell it to index...including the full text of all documents...including PDF files. Then just use that tool to search through them.
 
Hmm, if you can install C# there are interop methods for accessing MS Office apps such as Word and Excel. Accessing a PDF file would likely require purchasing something, but if just open it as a text file you'll probably be okay since you're just searching for strings.

The VBA in Word or Excel lets me do that within office Apps now. I'm not keen at giving money to Adobe since their software always calls home and looks for stuff for me to buy.
 
install the google desktop utility. (desktop.google.com). It will index whatever you tell it to index...including the full text of all documents...including PDF files. Then just use that tool to search through them.

Jason- how do I access the index? It seems like the issue was moved to another piece of software. I still have to maually look it up.
 
Jack--I could do this with PHP/Linux very easily (probably take me 15 minutes). If you're interested let me know. Basically I'd just need all the PDFs and a text file with the names you want searched. I could come over sometime and show you how to do it in a virtual machine or something.

There probably is an easy way to do it with all that fancy Microsoft technology--but that just isn't my area of expertise.
 
Jesse- thank you for your generous offer. I started to check the results we'd get manually before calling you in.

Due to references used (the name appears as a reference rather than an author) and since it seems like the Chinese use only a few surnames I wind up with so many hits that manual searching seems to work just as well as anything else- I still need to check the hits to see if I'm looking at the right person.

A program isn't going to save me much work after all.

Thanks to everyone for their advice!
 
Back
Top