I. Basic Concepts#
-
NRT:
Near Realtime, which has two levels of meaning: one is the very small delay (about 1 second) from writing a piece of data to when it can be searched, and the other is that search and analysis operations based on Elasticsearch can take seconds. -
Cluster
A cluster provides indexing and search services externally, consisting of one or more nodes. The cluster name determines which cluster each node belongs to (the default name is elasticsearch). -
Node
A single Elasticsearch server instance is called a node. A node is part of a cluster, and each node has an independent name, which is obtained as a UUID by default at startup, but can also be configured manually. A node can only join one Elasticsearch cluster. -
Primary Shard
The primary shard is the unit of data storage. Horizontal scaling is achieved by increasing shards, specified at index creation and cannot be modified afterward. -
Replica Shard
A replica shard is a copy of the shard's data, ensuring high availability and preventing data loss, while also sharing the search requests of the shard, improving the overall throughput and performance of the cluster. -
Index
An index is a collection of documents with the same structure, analogous to a database instance in a relational database (after the removal of types in Es 6.0, it essentially corresponds to a single data table). -
Type
Type has been deprecated since Es 6.0. Reference link: Removal of mapping types -
Document
A document is the smallest data storage unit in Elasticsearch, in JSON format, similar to a record in a relational database (a row of data), with diverse structural definitions. Documents under the same index should have as similar a structure as possible.
Lucene Retrieval#
Lucene Concepts#
Basic retrieval process: Query analysis => Tokenization => Keyword search => Search ranking
Inverted Index#
Traditionally, our retrieval is done by traversing articles one by one to find the corresponding keyword positions. Inverted indexing, however, forms a mapping relationship table between words and articles through tokenization strategies, allowing for retrieval of article lists in O(1) time using keywords. In simple terms, it is finding documents by content, while traditional indexing like MySQL is finding documents by ID.
The underlying implementation is based on the FST (Finite State Transducer) data structure. Since version 4+, Lucene has extensively used the FST data structure. FST has two advantages:
-
Small space usage. It compresses storage space by reusing prefixes and suffixes of words in the dictionary.
-
Fast query speed. O(len(str)) query time complexity.
Reference links🔗:
elasticsearch inverted index principle
In-depth analysis of Lucene's FST dictionary
II. Using Elasticsearch#
CRUD Operations#
- Get API
- Delete API
- Update API
- Bulk API
Search#
- Search API
- Aggregations
- Query DSL
- Elasticsearch SQL
Analyzer Tokenization#
Standard Analyzer
: The default tokenizer, splits words and converts uppercase to lowercase.Simple Analyzer
: Splits based on non-alphabetic characters (symbols are filtered out) and converts uppercase to lowercase.Stop Analyzer
: Splits stop words (the, is) and converts uppercase to lowercase.Whitespace Analyzer
: Splits by whitespace without converting uppercase to lowercase.IK
: A Chinese tokenizer that requires plugin installation.ICU
: An internationalized tokenizer that requires plugin installation.jieba
tokenizer: A popular Chinese tokenizer.
Index Management#
Alias#
-
Group multiple indices
When creating indices by month, consider creating indices by day first. Using index templates can automatically create daily indices for log data, and then use monthly index aliases to categorize daily indices.
-
Flexibly change indices without modifying code, allowing zero-downtime migration of index data.
For example, when needing to change an index (modifying shards, mapping, renaming...), simply bind the newly created index to the corresponding alias, and after the data migration is complete, delete the binding relationship of the old index with the alias.
Zero-downtime migration reference official documentation: Changing Mapping with Zero Downtime
-
Can be used to create different "views" of the same index.
Suitable for multi-tenant scenarios, for example, if different users need to see different data under a certain index, a
filtered alias
can be created for filtering.
-
Scenarios with multiple physical indices needing to write through alias
When needing to write through an alias to multiple physical indices, specify a
write index
, and all index and update requests pointing to the alias will attempt to resolve to a single index, i.e., the write index. Each alias can only assign one index as a write index. If no write index is specified and the alias references multiple indices, writing is not allowed.
Example operation:
curl -X POST "localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d'
{
"actions" : [
{ "remove" : { "index" : "test1", "alias" : "alias1" } },
{ "add" : { "index" : "test2", "alias" : "alias1" } }
]
}
curl -X POST "localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d'
{
"actions" : [
{
"add" : {
"index" : "test1",
"alias" : "alias2",
"filter" : { "term" : { "user" : "kimchy" } }
}
}
]
}
Reference link: Index Aliases
Rollover API#
Create indices based on date rollover.
Reference link: Rollover Index
Index Mapping#
- Mapping settings
Index Template
Dynamic Template
: Dynamically set field types based on the data types recognized by Elasticsearch combined with field names.
Reference links:
Routing#
When indexing a document, it will be stored in a primary shard. How does Elasticsearch know which shard a document should be stored in? When we create a document, how does it decide whether it should be stored in shard 1
or shard 2
?
In fact, this process is determined by the following formula:
shard = hash(routing) % number_of_primary_shards
routing
is a variable value, which defaults to the document's _id
, but can also be set to a custom value. routing
generates a number through a hash function, and then this number is divided by number_of_primary_shards
(the number of primary shards) to obtain the remainder. This remainder, which falls between 0
and number_of_primary_shards-1
, indicates the position of the shard where the document resides.
This is why we must determine the number of primary shards when creating an index and never change this number: because if the number changes, all previously routed values will become invalid, and the documents will no longer be retrievable. (In newer versions, Es can support splitting or shrinking the primary shards of a certain index under certain conditions, but it can only be split into n times or reduced to the number of primary shards/n, and cannot change from 8 to 9 or 9 to 8.)
III. Cluster Architecture#
Cluster Roles#
- Master Node: Globally unique, responsible for cluster elections and managing cluster changes. The master node can also act as a data node, but this is not recommended.
- Data Node: Responsible for storing data and executing data-related operations. Generally, data read/write processes only interact with data nodes.
- Ingest Node: A concept introduced in Es 5.0. Preprocessing operations allow for data transformation through defined
processors
andpipeline
before indexing documents, i.e., before data is written. Reference link: Ingest Node - Coordinating Node: The coordinating node forwards client requests to the data nodes that store data. Each data node executes the request locally and returns the results to the coordinating node, which collects these results and merges them into a single global result.
- Tribe Node: Deprecated and replaced by Cross-cluster search.
Cluster Health Status#
- Green: All primary and replica shards are operating normally.
- Yellow: Primary shards are normal, but not all replica shards are operating normally, indicating potential single point of failure risks.
- Red: Some primary shards are not operating normally.
Each index also has the above three statuses; if any replica shard is abnormal, it is in Yellow status.
You can check the cluster status using curl -X GET "localhost:9200/_cluster/health?pretty"
, for more details refer to: Cluster health API.
Cluster Expansion#
When expanding the cluster and adding nodes, shards will be evenly distributed across the nodes in the cluster, achieving load balancing for indexing and search processes. These operations are automatically handled by the system.
Reference: Expansion Design.
Major Internal Modules#
Cluster#
The Cluster module encapsulates the implementation of cluster management executed by the master node, managing cluster status and maintaining cluster-level configuration information. Its main functions are:
- Manage cluster status and publish the newly generated cluster status to all nodes in the cluster.
- Call the allocation module to execute shard allocation, deciding which shards should be allocated to which nodes.
- Directly migrate shards among the nodes in the cluster to maintain data balance.
Allocation#
Encapsulates functions and strategies related to shard allocation, including the allocation of primary and replica shards. This module is called by the master node. The processes of creating new indices and fully restarting the cluster require shard allocation.
Discovery Module#
Responsible for discovering nodes in the cluster and electing the master node. When nodes join or leave the cluster, the master node takes appropriate actions. From a certain perspective, the discovery module plays a role similar to ZooKeeper, electing the master and managing the cluster topology.
Gateway#
Responsible for the persistent storage of the cluster state data broadcasted by the master and for restoring them during a complete cluster restart.
Indices Module#
Manages global-level index settings, excluding index-level settings (index settings are divided into global and per-index levels). It also encapsulates the functionality for index data recovery. The recovery of primary and replica shards needed during the cluster startup phase is implemented in this module.
HTTP#
The HTTP module allows access to ES APIs via JSON over HTTP. The HTTP module is essentially completely asynchronous, meaning there are no blocking threads waiting for responses. The benefit of using asynchronous communication for HTTP is to solve the C10k problem (10,000 concurrent connections). In some scenarios, consider using HTTP keepalive to improve performance. Note: Do not use HTTP chunking on the client side.
Transport Module#
Used for internal communication between nodes within the cluster. Each request from one node to another uses the transport module. Like the HTTP module, the transport module is also completely asynchronous. The transport module uses TCP communication, and each node maintains several TCP long connections with other nodes. All internal communications between nodes are carried by this module.
Engine#
The Engine module encapsulates operations on Lucene and calls to translog, serving as the ultimate provider for read and write operations on a shard.
ES uses the Guice framework for modular management. Guice is a lightweight dependency injection framework developed by Google. In software design, it is often said to depend on abstractions rather than concretions, and IoC is the realization of this concept, internally implementing object creation and management.