Search UI Autocompletion

We’re investigating how to provide autocomplete functionality on certain input fields in the Search UI (that is, providing suggestions to the user as they type). Some potential applications of this functionality include the catalog search fields and the gazetteer search box.

For the catalog search fields, we’re looking into providing suggestions based on the previous search phrases the user has entered. Additionally, we’re looking into providing suggestions based not only on the individual user’s previously used search phrases, but also on the search phrases used by other users of the particular DDF instance. The search phrases used to provide suggestions should be preserved between DDF restarts.

Possible Autocomplete Implementations

Client-side

  1. Browser-based autocomplete. This will work (if we’re okay with having suggestions based only on the user’s previous input field values in their browser rather than the searches being made by all users of that particular DDF instance), but browser autocompletion gives suggestions based on previous inputs from many input fields, not just a particular one. How can we avoid getting all the extra autocomplete suggestions from the browser? We can use the Web Storage API’s localStorage. When the user submits a query, add the query string to an array in the window.localStorage object (making sure there aren’t any duplicates). This data is stored in the browser and does not expire, so it could easily be used to power a <datalist> (an element that can be used to define the possible suggestions for an <input>) for the search input field. We could also use JQuery UI’s autocomplete or another JavaScript autocomplete widget if we wanted more configurability. This solution would allow input field value suggestions to persist between DDF restarts.

Server-side

  1. In-memory autocomplete. That is, just keep a data structure in memory that contains the previously used input field values on that DDF instance and use it to give autocomplete suggestions. Some quick Google searches revealed a few immature and unmaintained libraries that would give decent suggestions, but they appear to have limited configurability. With this approach, we would also have to figure out a way to make the input field value suggestions persist between DDF restarts—a simple solution would be to write out the phrases to a text file. 

    If we’re okay with having only prefix-based suggestions (i.e. only matching the user’s input against the beginning of the words and phrases available for suggestion), a trie would be a simple solution and there are already some decent implementations available.

  2. Solr/Lucene suggestions. A separate index, or even a dictionary file (which is just a text file), could be used as the persistence mechanism for the input field value suggestions. Solr has a well-documented “Suggester” component that would make it relatively easy to implement a suggestion service. Lucene also provides suggestion APIs, but using Lucene would likely involve more work than using Solr. This solution would give us the most flexibility in determining how suggestions are curated.

A Potential Server-Side Implementation

We could define a “Suggester” interface that implementations that want to provide suggestions would have to implement. These implementations would be registered as OSGi services. There could be a different service for each autocompletion type (e.g. one for catalog search suggestions and one for gazetteer suggestions). Each service would have a named string property that serves as a key for fetching the suggester service you want (e.g. “search” for the catalog search suggester and “gazetteer” for the gazetteer suggester).

Since a suggestion service would be used primarily by the UI, it would make sense to implement it RESTfully. When you hit the /suggest REST endpoint, you’d pass the string for which you want completion suggestions, the number of suggestions you want, and the specific suggestion service you want to use (which would be the key described above). The endpoint would simply return a JSON list of the top suggested words and phrases.

For search field suggestions, a Solr implementation (via SolrJ) would probably be the best option because persistence is already taken care of and we’d have the most options for determining how suggestions are curated.

Collecting the Words and Phrases to Suggest

A PreQueryPlugin seems like a logical place to record incoming queries in the suggestion collection, but it would not work if suggestions are to be made based on queries submitted via the Search UI input fields. Just because the PreQueryPlugin is run does not mean the user actually submitted a query from the Search UI.

To record the user’s query string, we would probably have to have an endpoint that takes a query string and the suggestion service names you want and adds the query to the services' suggestion collections. For this to work, the “Suggester” interface would have to have methods for inserting phrases into the suggestion collection.

Factoring in Frequency

If we want to factor in each input field value’s use frequency into our suggestions, we can probably do so with Solr or Lucene by calculating a weight for each entry based on how frequently it is used and using one of the lookup implementations that takes weight into account. Otherwise, we can provide suggestions based solely on how well the possible suggestion matches the user’s input string.

Summary

For search query input values, if we’re okay with having suggestions based only on the user’s previous searches and not those of all users querying that particular DDF instance, then we would recommend storing the user’s queries in the browser via localStorage. The stored queries can easily be used to power a <datalist> for the browser’s autocomplete functionality, or to act as a backing data source for JQuery UI’s autocomplete or another JavaScript autocomplete widget. This solution has the advantage of requiring only client-side code changes (and small ones at that). Additionally, it allows the suggestion data to persist between DDF restarts.

If curating search suggestions based on the searches of all users querying a particular DDF instance is necessary, then we recommend the server-side solution described above. This solution is fairly extensible, as adding a new autocompletion service would only require implementing another “Suggester”, registering it as an OSGi service, and determining how to collect your suggestion data.