Over time one gains quite a library of PDF files. Everything from how-tos and tutorials to eBooks. I’ve been searching for quite some time to find a good way to index these types of documents on a linux-based server. Microsoft has their solution with Sharepoint and you can get close to a full-indexed solution with tools like Google desktop. What I really wanted was a good web-based solutions. Enter Sphider. It’s fairly straight forward to give Sphider pdf indexing capability.

Sphider and Sphider-plus provide a simple solution for web-based indexing. They provide a collection of php scripts and commands for administration, indexing, and searching.

Instead of using the built-in converter try this:
1. Download the linux related pre-compiled binary of pdftotext included in the xpdf bundle from: www.foolabs.com/xpdf/download.html
2. Unzip/untar the package and save only the pdftotext file (it has no extension, that’s ok)
3. Rename “pdftotext” to “pdftotext.script”
4. Upload via FTP this file to the “converter” directory of Sphider-plus
5. Identify the physical path of your web site (your hoster should provide this information anywhere)
6. Create an empty text file and into this write two lines:
/PATH/TO/YOUR/WEB/DOWN/TO/converter/pdftotext.script $1 –
7. Adapt the full path above to your needs and use simple slashes (not double backslashes)
Second line begins with a slash and ends WITH the minus sign!
(Thanks to the user posted this hint sometimes ago)
8. Save this file as “pdftotext” (without the quotes)
9. Upload it to the converter dir
10. Set permissions of both pdftotext and pdftotext.script to 755 or 777 (whatever needed to run correctly)
11. Set permissions of the converter dir to 777! Otherwise indexing fails because of pdftotext is unable to write a temp file needed!
12. Last: change the pdftotext path in conf.php to:
$pdftotext_path = ‘/PATH/TO/YOUR/WEB/DOWN/TO/converter/pdftotext’;

Comments are closed.



  • slide
  • slide
  • slide
  • slide