Index

This class represents an Index inside an Elasticsearch cluster. It provides a set of methods that allow the user to query the index and add new data.

The class also keeps a buffer of documents waiting to be pushed to the index, the user can add documents to the buffer and the class will push them as soon as the buffer is full. The user can also force the push of the records by flushing the buffer.

To initialize an index:

client = JayAPI::Elasticsearch::ClientFactory.new(
  cluster_url: 'https://my-cluster.elastic.io'
).create(max_attempts: 3, wait_strategy: :constant, wait_interval: 2)

index = JayAPI::Elasticsearch::Index.new(
  client: client,
  index_name: 'my_index'
)

The cluster_url and the index_name are the only required parameters. If the cluster is configured to use Elasticsearch’s default port (9200) and has no authentication in place this is all you need. However in most cases that would not be enough, so you can also provide the following extra parameters:

  • port: The port number where the Elasticsearch cluster is listening for connections.

  • username: The username to use when authentication against the cluster.

  • password: The user’s password

  • batch_size: The amount of documents the Index will store in its buffer before triggering an automatic flush.

  • logger: If you want the messages to be logged to a particular logger. If you don’t pass a logger then the class will create one.

The create method, that returns the client object, also takes optional arguments, which define connection re-try behaviour:

  • max_attempts: Sets the maximum number of reconnection attempts in response to server errors.

  • wait_strategy: Determines the strategy for wait intervals between reconnection attempts. Options are:

    • :constant - Maintains a consistent wait time specified by wait_time.

    • :geometric - Increases the wait time geometrically based on wait_time.

  • wait_time: Specifies the base wait time (in seconds) for the chosen wait_strategy.

#push

The push method stores a document in the Index’s buffer. If the buffer reaches the maximum number of records the buffer will be flushed automatically.

push takes a single Hash, the document you want to send to the index.

Warning

When using the push method make sure to call flush at the end. Automatic flushing only occurs when the buffer is full, if you do not call flush at the end of the run you might lose some documents.

Example:

documents.each do |document|
  # do something with your document, then push it
  index.push(document)
end

index.flush # Do not forget to flush the index at the end.

#index

index pushes a document directly to the Elasticsearch cluster without adding it to the buffer first. So you don’t need to call flush:

index takes a single Hash, the document you want to send to the index.

Example:

index.index(my_document)

Note

Pushing documents one at a time is very inefficient because the Index needs to perform an HTTP Request for each one. If you want to send many documents use push instead.

#flush

Flushes the current buffer to Elasticsearch, pushing all the documents currently stored in the queue (if there are any).

Example:

documents.each do |document|
  index.push(document)
end

index.flush

#queue_size

Returns the current number of documents currently waiting to be flushed to Elasticsearch:

Example

index.queue_size # => 16

#delete_by_query

This method allows you to remove the documents that match the given query from the index. The method has a single parameter:

  • query: A Hash with the query you want to use to match documents for deletion. For more information on this parameter or how to create queries see the #search method documentation.

On success the method will return a Hash with information about the executed command, for example:

{
  took: 740,
  timed_out: false,
  total: 1748,
  deleted: 1748,
  batches: 2,
  version_conflicts: 0,
  noops: 0,
  retries: { bulk: 0, search: 0 },
  throttled_millis: 0,
  requests_per_second: -1.0,
  throttled_until_millis: 0,
  failures: []
}

On error an Elasticsearch::Transport::Transport::ServerError will be raised.