Solr Catalog Provider Apps
Description
The Solr Catalog Provider (SCP) is an implementation of the CatalogProvider
interface using Apache Solr as a data store. Some notable features of the SCP are
- Supports Extensible Metacards
- Fast, simple contextual searching
- Indexes XML Attributes as well as CDATA sections and XML text elements
- Simple relative (//element) and absolute pathing (/root/element) xpath support.
- Works with an embedded, local Solr Server (all-in-one Catalog)
- No configuration necessary on a single-node Distribution
- Data directory of solr indexes are configurable
- Works with a standalone Solr Server
Usage
The Solr Catalog Provider is used in conjunction with an Apache Solr Server data store. The Solr Catalog Provider can work with an embedded, local Solr Server instance or an external Solr Server. The embedded, local instance is a lightweight solution that works out of the box without any configuration. It however does not provide a Solr Admin GUI or a "REST-like HTTP/XML and JSON API." If that is necessary, see Standalone Solr Server App.
Two different apps exist:
catalog-solr-app
- includes the Solr Catalog Provider and an embedded Solr Server all-in-one.catalog-solr-external-app
- includes only the Solr Catalog Provider and is meant to be used only with the Standalone Solr Server App (catalog-solr-server-app
).
App Comparison / Usage Chart
Feature | Embedded Solr | Standalone Solr | ||
---|---|---|---|---|
Pro | Con | Pro | Con | |
Scalability |
|
|
| |
Flexibility |
|
|
| |
(Administrative) Tools |
|
|
| |
Security |
|
|
| |
Performance |
|
|
| |
Backup/Recovery |
|
|
|
|
When to Use
Use the local, embedded Solr Catalog Provider when only one
DDF
instance is necessary and scalability is not an issue. The local, embedded Solr Catalog Provider requires no installation and little to no configuration since it ready out of the box. It is great for demonstrations, training exercises, or for sparse querying and ingesting. For heavy querying and ingesting processing, use the Standalone Solr Server on a separate machine. See the Standalone Solr Server Recommended Configuration. Both Apps can store the same amount of data and indexes.Installation and Uninstallation
The Solr Source can be installed and uninstalled using the normal processes described in the Configuration section. Ensure that no other Catalog Provider is installed before installing this Catalog Provider.
Embedded Solr Server and Solr Catalog Provider
Users can use the Solr Catalog Provider with an embedded Solr Server by installing (if it is not already installed) the feature, catalog-solr-provider
. By installing this feature, it will install a Solr Catalog Provider and start up an instance of an embedded Java Solr Server within the distribution. Optional configurations are available. See the Configuration section for more information.
Solr Catalog Provider for External Solr
If the Solr Server is not embedded within the current distribution, a user will need to install the external Solr Catalog Provider by installing the feature catalog-solr-external-provider
. This will not install any Solr Servers. Installing the feature will provide a user an "unconfigured" Solr Catalog Provider. See the Configuration section for how to configure this Solr Catalog Provider to connect to an external Solr Server.
Configuration
Embedded Solr Server and Solr Catalog Provider
No configuration is necessary in order for the embedded Solr Server and the Solr Catalog Provider to work out of the box. The standard installation described above is sufficient. When the catalog-solr-provider
feature is installed, it by default stores the Solr index files to <DISTRIBUTION_INSTALLATION_DIRECTORY>/data/solr. A user does not have to specify any parameters. In addition, the catalog-solr-provider
feature contains all files necessary for Solr to start the server.
However, this component can be configured to specify the directory to use for data storage using the normal processes described in the Configuration section.
The configurable properties for the SCP are accessed from the Catalog Embedded Solr Catalog Provider Configurations in the Admin Console.
Handy Tip
The Embedded (Local) Solr Catalog Provider works on startup without any configuration because a local embedded Solr Server is automatically started and pre-configured.
Configurable Properties
Title | Property | Type | Description | Default Value | Required |
---|---|---|---|---|---|
Data Directory File Path | dataDirectoryPath | String | Specifies the directory to use for data storage. A shutdown of the server is necessary for this property to take effect. If a filepath is provided with directories that don't exist, SCP will attempt to create those directories. Out of the box (without configuration), the SCP writes to <DISTRIBUTION_INSTALLATION_DIRECTORY>/data/solr If dataDirectoryPath is left blank (empty string), it will default to <DISTRIBUTION_INSTALLATION_DIRECTORY>/data/solr. If If It is recommended to use an absolute filepath to minimize confusion such as /opt/solr_data in Linux or C:/solr_data in Windows. Permissions are necessary to write to the directory. | No | |
Force Auto Commit | forceAutoCommit | Boolean / Checkbox | WARNING: Performance Impact. Only in special cases should auto-commit be forced. Forcing auto-commit makes the search results visible immediately. | No |
Solr Configuration Files
The Apache Solr product has Configuration files to customize behavior for the Solr Server. These files can be found at <DISTRIBUTION_INSTALLATION_DIRECTORY>/etc/solr. Care must be taken in editing these files because they will directly affect functionality and performance of the Solr Catalog Provider. A restart of the distribution is necessary for changes to take effect.
Note on Solr Configuration File Changes
Solr Configuration files should not be changed in most cases. Changes to the schema.xml will most likely need code changes within the Solr Catalog Provider.
Moving Solr Data to a New Location
If SCP has been installed for the first time, then changing the (1) Data Directory File Path
property and (2) restarting the distribution is all that is necessary because no data had been written into Solr previously. Nonetheless, if a user needs to change the location after the user has already ingested data in a previous location, these are the steps that are required:
- Change the
Data Directory File Path
property within the Catalog Embedded Solr Catalog Provider Configuration in the Admin Console to the desired future location of the Solr data files. - Shutdown the distribution.
- Find the future location on the drive. If the current location does not exist, create the directories.
- Find the location of where the current Solr data files exist and copy all the directories in that location to the future the location. For instance if the previous Solr data files existed at C:/solr_data and it is necessary to move it to C:/solr_data_new, then copy all directories within C:/solr_data into C:/solr_data_new. Usually this consists of copying the index and tlog directories into the new data directory.
- Start the distribution. SCP should recognize the index files and be able to query them as it could before.
Note: Changes Require a Distribution Restart
If the Data Directory File Path
property is changed, no changes will occur to the SCP until the distribution has been restarted.
Handy Tip
If Data Directory File Path
property is changed to a new directory and the previous data is not moved into that directory, then no data will be in Solr. Solr will create an empty index. Therefore it is possible to have multiple places where Solr files are stored and a user can toggle between those locations for different sets of data.
Solr Catalog Provider for External Solr
In order for the external Solr Catalog Provider to work, it must be pointed at the external Solr Server. When the catalog-solr-external-provider
feature is installed, it is in an unconfigured state until the user provides an HTTP url to the external Solr Server. The configurable properties for this SCP are accessed from the Catalog External Solr Catalog Provider Configurations in the Admin Console.
Configurable Properties
Title | Property | Type | Description | Default Value | Required |
---|---|---|---|---|---|
HTTP URL | url | String | HTTP URL of the standalone, preconfigured Solr 4.x Server. | http://localhost:8181/solr | Yes |
Force Auto Commit | forceAutoCommit | Boolean / Checkbox | WARNING: Performance Impact. Only in special cases should auto-commit be forced. Forcing auto-commit makes the search results visible immediately. | Unchecked/False | No |
Implementation Details
Indexing Text
When storing fields, the Solr Catalog Provider will analyze and tokenize the text values of STRING_TYPE and XML_TYPE AttributeTypes. These types of fields are indexed in at least three ways: in raw form, analyzed with case sensitivity, and then analyzed without concern to case sensitivity. Concerning XML, the Solr Catalog Provider will analyze and tokenize XML CDATA sections, XML Element text values, and XML Attribute values.
Known Issues
- When searching with the ANY_TEXT field, SCP does not search all text fields within the Catalog Provider. Instead, it searches the METADATA field.
SCP does not fully support spatial capabilities. - SCP does not support ingesting or querying GeometryCollection WKT.
- SCP does not support crossing the International Date Line or pole wrapping.
- SCP ignores the following AttributeDescriptor methods: isIndexed, isTokenized, isMultivalued, isStored. SCP instead indexes, tokenizes, and stores data based on the AttibuteFormat, such as it will store and not index all fields labeled as AttributeFormat.BINARY regardless of user instruction. SCP as of now has no multivalue support even though it is supported by Solr.
- SCP has a 1000 nautical mile limit for nearest neighbor queries. If a point is not provided, then the centroid of the shape will be used for distance calculations.
- SCP does not support full TextPath. Attributes and equality expressions are not supported currently.