Index¶
This class represents an Index inside an Elasticsearch cluster. It provides a set of methods that allow the user to query the index and add new data.
The class also keeps a buffer of documents waiting to be pushed to the index, the user can add documents to the buffer and the class will push them as soon as the buffer is full. The user can also force the push of the records by flushing the buffer.
To initialize an index:
client = JayAPI::Elasticsearch::ClientFactory.new(
cluster_url: 'https://my-cluster.elastic.io'
).create(max_attempts: 3, wait_strategy: :constant, wait_interval: 2)
index = JayAPI::Elasticsearch::Index.new(
client: client,
index_name: 'my_index'
)
The cluster_url
and the index_name
are the only required parameters. If
the cluster is configured to use Elasticsearch’s default port (9200
) and has
no authentication in place this is all you need. However in most cases that
would not be enough, so you can also provide the following extra parameters:
port
: The port number where the Elasticsearch cluster is listening for connections.username
: The username to use when authentication against the cluster.password
: The user’s passwordbatch_size
: The amount of documents theIndex
will store in its buffer before triggering an automatic flush.logger
: If you want the messages to be logged to a particular logger. If you don’t pass a logger then the class will create one.
The create
method, that returns the client object, also takes optional arguments,
which define connection re-try behaviour:
max_attempts
: Sets the maximum number of reconnection attempts in response to server errors.wait_strategy
: Determines the strategy for wait intervals between reconnection attempts. Options are::constant
- Maintains a consistent wait time specified bywait_time
.:geometric
- Increases the wait time geometrically based onwait_time
.
wait_time
: Specifies the base wait time (in seconds) for the chosenwait_strategy
.
#push¶
The push
method stores a document in the Index
’s buffer. If the buffer
reaches the maximum number of records the buffer will be flushed automatically.
push
takes a single Hash
, the document you want to send to the index.
Warning
When using the push
method make sure to call flush
at the end.
Automatic flushing only occurs when the buffer is full, if you do not call
flush
at the end of the run you might lose some documents.
Example:
documents.each do |document|
# do something with your document, then push it
index.push(document)
end
index.flush # Do not forget to flush the index at the end.
#index¶
index
pushes a document directly to the Elasticsearch cluster without adding
it to the buffer first. So you don’t need to call flush
:
index
takes a single Hash
, the document you want to send to the index.
Example:
index.index(my_document)
Note
Pushing documents one at a time is very inefficient because the Index
needs to perform an HTTP Request for each one. If you want to send many
documents use push
instead.
#search¶
The search
method allows you to search the Elasticsearch index for documents
matching the provided query. This method takes two arguments:
query
AHash
with the query you want to execute, this Hash will be converted to JSON before being sent to Elasticsearch. It must follow Elasticsearch’s DSL. There is no limit to what you can put in this Hash, no validation, nor transformation is performed. Queries can be as simple or as complex as you want.type
(optional): Specify :search_after for using the Search After feature. This is needed if you have more than 10,000 matching documents.
You can compose the query
by yourself or you can use the
QueryBuilder, which offers an easier, albeit limited interface.
The search
method returns a QueryResults class which you can use to
iterate the result set in batches.
Example:
index.search(
query: {
match_all: { }
},
sort: [
{
'@timestamp': 'desc'
}
],
type: :search_after
)
#flush¶
Flushes the current buffer to Elasticsearch, pushing all the documents currently stored in the queue (if there are any).
Example:
documents.each do |document|
index.push(document)
end
index.flush
#queue_size¶
Returns the current number of documents currently waiting to be flushed to Elasticsearch:
Example
index.queue_size # => 16
#delete_by_query¶
This method allows you to remove the documents that match the given query from the index. The method has a single parameter:
query
: AHash
with the query you want to use to match documents for deletion. For more information on this parameter or how to create queries see the #search method documentation.
On success the method will return a Hash
with information about the executed
command, for example:
{
took: 740,
timed_out: false,
total: 1748,
deleted: 1748,
batches: 2,
version_conflicts: 0,
noops: 0,
retries: { bulk: 0, search: 0 },
throttled_millis: 0,
requests_per_second: -1.0,
throttled_until_millis: 0,
failures: []
}
On error an Elasticsearch::Transport::Transport::ServerError
will be raised.