Searching in file attachments on a Drupal platform

It is often the case that interesting information concerning a certain topic is not available at first sight. For instance: a user of a platform might write something concerning the topic of interest on a simple Drupal page but the really interesting information is hidden in a PDF-attachment. The standard Drupal search engine does not allow searching in file attachments and so a custom solution has to be found.
1 answer

Using an Apache Solr server

To enable the examination of files an Apache Solr (http://lucene.apache.org/solr/) server can be set up. It is
accessible and usable for searching attached documents through the modules “Search API”, “Search API Solr search”, and “Search API attachments”. Solr is an open source search platform developed to allow the easy integration of a powerful search solution into many different applications. This highly scalable system provides many features expected from a modern search engine: handling of rich documents like MS Word or PDF files, a potent full text search function, faceted search options, and many more. Therefore it is perfectly suitable for solving this challenge.

Note:
Drupal coordinates many internal procedures by using a scheduling program called “Cron”, which can be run as a
separate program on the web server or, since Drupal version 7, from within Drupal itself. The documents are only accessible after they have been published and have been made available for indexing by a Cron job. After a piece of content has been indexed it can easily be retrieved using the Solr search interface provided by the Search API module; it works exactly like the interface of the standard Drupal search function but makes use of a Solr Server that runs somewhere else.

Taggings: