Like most CMS search extensions, Joomla Search indexes content from inside the CMS.
Roughly speaking, it works as follows:
- We tell the search extension which types of content we are using and where/how these are stored in the Joomla database
- We search then either generates its own index on the basis of this information or searches through the relevant database tables/fields in real time
For simple search scenarios Joomla Search has several advantages: It’s easy to set up, and if we only need to search through articles and other native Joomla content, it works pretty much out of the box.
Requirements for this project
The search index should include the entire frontend/website regardless of the source of the content (e.g. articles, modules, various components, SPPB pages). It should also index any self-hosted PDF/DOC files linked to anywhere on the site.
The search required only a simple user interface, but should be easily maintainable and very scalable. It should also allow the user to search multiple websites from one interface, not all of which were built with Joomla.
The search should allow easy filtering of results by content language and source website. The layout of the results page should be easily to manipulate, and individual results should feature contextual snippets.
Why not use Joomla Search?
Varied content types
With Joomla Search things gets more complicated with each additional type of content.
For a website using a variety of components, modules and methods to generate and render content, it takes a lot of effort to set up Joomla Search up to to take all of that content into account. Furthermore there is no way of indexing content within e.g. PDF and DOC files.
Extending Joomla Search?
It would have been possible to develop our own, more substantial Joomla search extension, but this would have required searching and filtering the database directly, an approach which would be likely to weigh heavily on the database/server, and so affect performance.
Finally, there is no way to allow the user to search multiple websites from one search instance. If all of the sites in question were Joomla-based it might be possible to implement a custom API within each site to return results for that site, however developing this would be a serious undertaking – and even then the solution would only work for sites built with Joomla.
Assessing Google GSS as a solution
Although Google GSS was the natural first choice for this project, we also reviewed and considered various other options, such as Apache Solr and ElasticSearch: each a great solution in its own way but neither as well-suited to our requirements as GSS.
For those that haven’t encountered GSS, Google Site Search allows you to display standard Google search results within your own personal search engine. The search results are drawn from a predefined subset of Google’s actual search index.
Google GSS has many advantages:
- It indexes everything visible to the user, including files. It also makes use of keywords, metadata, user location, when indexing and displaying results.
- The results returned by the API are the same as displayed with the normal Google search, containing snippets, titles, descriptions, URLs etc.
- GSS is far faster and more powerful than any standard CMS search extension, and consumes no local DB/server resources.
- The Google search API is very powerful and offers many features (e.g. filtering) that can be easily implemented.
- Finally, GSS is well-documented, can be used with any platform or technology (Python, PHP…) and is easy to implement and maintain.
There are a few disadvantages:
- Since GSS draws on the existing Google search index, the underlying data belongs to Google and cannot be manually manipulated to suit the client.
- New content is indexed as Google sees fit – it’s possible to request a re-index but there’s no guarantee on how quickly this will take place.
- Results can be filtered by language, however only by one language at a time. Google doesn’t support filtering the search results by multiple languages (e.g. English and German).
Implementing Google GSS within Joomla
Custom Search API
Google GSS works on the basis of the Custom Search API. We build and submit search queries as requests to the API, and GSS send back the results in either XML or JSON format
Since relatively little code is necessary to integrate GSS within a website, and since we planned to implement GSS on multiple sites, we chose not to develop a dedicated Joomla extension, but rather base our integration solely on PHP, using just two PHP files and a Joomla article layout override to render this within the site.
Search form & page layout
Our first PHP file handles the form and layout, and includes an HTML-form-based search interface to capture the search query/options, and the layout necessary to display results in the frontend.
The form contains an input field for the search query, a select field for the domains/site filter and two checkboxes for the language filter.
This script was implemented within the context of an article layout override, with a file path something like
/templates/mytemplate/html/com_content/article/search.php. To render the search within the website then we simply created an article and assigned this layout. The article itself didn’t need to contain any content, unless the site wanted e.g. to include static text on the search page. We then set up a single article menu item to set the URL alias (e.g. /search) and thus create the starting point on our site for the search.
Data processing & API requests
The second PHP file is the brains behind the operation. This file is called on form submit via Ajax from our first file and it:
- Contains the search logic, and handles all of our data and parameters
- Submits the API request on form submit
- Manipulates the
$_POSTdata returned by the API
- Contains a hardcoded list of the domains that the user can search/filter, used to populate the filter dropdown in the search form
The Custom Search API requires the following parameters for each API request:
- Google API URL
- Our API client ID
- Number of search results returned per request (or default)
- Offset which will helps us handle pagination
This script handles the submitted POST data and builds the query from it as follows:
$gss_url= // API call $gss_base_url.'& // Google API URL cx='.$gss_client_id.'& // API client ID num='.$gss_num.'& // Number of results start='.$gss_start.'& // Result offset hl='.$gss_lang.'& // UI display language lr=lang_'.$gss_lang.'& // Result language filter q='.urlencode($gss_q); // Search query + site filter*
*The q parameter contains the query from the search input field, as well as an additional (optional) parameter site, which is used by Google to define which sites/domains to search.
The script builds the query using curl, executes it and fetches the results from Google. Curl is very powerful, but in our case all we need to do is initialise curl, set the URL containing the query and specify that the API request is a
RETURNTRANSFER, which lets Google know we expect to receive a response to the call.
Here’s the code:
$curl = curl_init(); curl_setopt_array($curl, array( CURLOPT_RETURNTRANSFER => 1, CURLOPT_URL => $gss_url, )); $resp = curl_exec($curl); curl_close($curl);
The final API request looks something like:
&cx=our_id&num=10&start=0&hl=en&lr=lang_en&q=search term here
The response containing the search results is built by Google. Since this is in XML format, we use the built-in PHP class
SimpleXMLElement that, when created, represents a simple XML document. Following this all that remains for us to do is parse this response and render it as HTML.
Was the project a success?
The solution as described above took about one to two weeks to implement, with an additional 1-2 days spent in preparation with research and planning.
The final result looks great and (unsurprisingly) matches Google itself in terms of performance.
Expect the unexpected
The project was a success … for about a month: In March 2017 Google announced that they would be discontinuing the GSS service entirely in April 2018.