A healthy Wazuh deployment depends heavily on the stability of the Wazuh Indexer cluster.
When administrators log in to the dashboard and discover a yellow cluster status accompanied by unassigned shards, it often raises concerns about data availability, alert processing, and overall platform reliability.
The Wazuh Indexer, which is built on the OpenSearch search and analytics engine, stores security events, alerts, inventory data, vulnerability information, and other operational records generated throughout the Wazuh ecosystem.
Because the indexer serves as the data layer for the entire platform, any cluster health issue can have downstream effects on detection capabilities and dashboard functionality.
A yellow cluster status indicates that all primary shards are available, but one or more replica shards cannot be assigned to cluster nodes.
While the system remains operational, the cluster is running without full redundancy.
This means that if another node failure occurs before the replica shards are successfully allocated, some data may become unavailable.
Unassigned shards can also create several operational problems, including:
- Reduced fault tolerance
- Increased recovery times during failures
- Slower search and query performance
- Delayed alert visibility in the dashboard
- Indexing bottlenecks during peak workloads
- Increased risk of cluster instability
In many environments, yellow cluster health appears after infrastructure changes, node failures, storage limitations, version upgrades, or incorrect cluster settings.
Administrators frequently encounter the issue after building a new cluster, replacing hardware, expanding storage, or performing maintenance activities.
In this guide, you’ll learn:
- What a yellow cluster status means in Wazuh Indexer
- How shard allocation works internally
- Why shards become unassigned
- The most common root causes behind yellow cluster health
- How to diagnose allocation failures using cluster APIs
- Step-by-step methods to restore a healthy green cluster state
- Best practices to prevent future shard allocation problems
If you’re already dealing with memory-related Indexer issues, you may also find our guide on How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes useful, since JVM pressure is a frequent contributor to shard allocation failures.
Understanding Yellow Cluster Status in Wazuh Indexer
What Is Wazuh Indexer?
Wazuh Indexer is the component responsible for storing, indexing, and searching security data generated throughout the Wazuh platform.
Built on OpenSearch, it acts as the centralized data repository where alerts, agent information, vulnerability data, inventory records, and audit logs are stored for fast retrieval and analysis.
Without the indexer, the Wazuh Dashboard would have no searchable data and many core security monitoring features would cease to function.
Role of the Indexer in the Wazuh Architecture
A typical Wazuh deployment consists of three major components:
- Wazuh Manager
- Wazuh Indexer
- Wazuh Dashboard
The workflow generally looks like this:
Endpoints
│
▼
Wazuh Agents
│
▼
Wazuh Manager
│
▼
Wazuh Indexer
│
▼
Wazuh DashboardThe manager receives and analyzes security events from agents.
Once processed, the resulting alerts are forwarded to the indexer where they are stored inside OpenSearch indices.
The dashboard then queries the indexer whenever users perform searches, view alerts, investigate incidents, or generate reports.
Relationship Between Wazuh Manager, Indexer, and Dashboard
Each component depends on the others:
| Component | Primary Function |
|---|---|
| Wazuh Manager | Event analysis and rule processing |
| Wazuh Indexer | Data storage and search |
| Wazuh Dashboard | Visualization and management interface |
If the indexer develops shard allocation issues, the dashboard may begin showing:
- Missing data
- Delayed search results
- Incomplete visualizations
- Dashboard loading problems
In severe cases, you may encounter errors similar to those discussed in Wazuh Dashboard Not Loading? Complete Troubleshooting Guide and Troubleshooting “No Matching Indices Found” Error in Wazuh Dashboard.
What Does a Yellow Cluster Status Mean?
OpenSearch clusters report health using three primary states:
| Status | Meaning |
|---|---|
| Green | All primary and replica shards are assigned |
| Yellow | All primary shards assigned, one or more replica shards unassigned |
| Red | One or more primary shards are unassigned |
A yellow status indicates that the cluster remains operational because all primary data is still available.
However, redundancy is compromised.
If a node containing a primary shard fails while its replica remains unassigned, data availability may be affected until recovery procedures are completed.
OpenSearch officially defines yellow health as a state where primary shards are active but replica allocation has not been fully completed.
Primary Shards vs Replica Shards
Every OpenSearch index is divided into shards.
Primary Shards
Primary shards store the original indexed data.
Every document written to an index is first stored within a primary shard.
Replica Shards
Replica shards are copies of primary shards.
They provide:
- High availability
- Improved fault tolerance
- Additional search capacity
- Faster recovery during failures
For example:
Index: wazuh-alerts
Primary Shard 1 → Node A
Replica Shard 1 → Node BIf Node A fails, Node B can immediately serve the data.
Why Yellow Status Usually Indicates Replica Allocation Problems
In most Wazuh deployments, yellow status occurs because replica shards cannot be assigned.
Common reasons include:
- Single-node clusters
- Offline cluster members
- Disk space limitations
- Allocation restrictions
- Resource exhaustion
Since primary shards remain healthy, the cluster stays functional, but administrators should still resolve the issue to restore redundancy.
What Are Unassigned Shards?
An unassigned shard is a shard that OpenSearch knows should exist but cannot currently place on any node.
The shard remains part of cluster metadata but has no active location.
How Shards Become Unassigned
Common scenarios include:
- Node crashes
- Server reboots
- Storage failures
- Disk watermark violations
- Replica allocation conflicts
- Cluster reconfiguration errors
- Index corruption
When one of these events occurs, OpenSearch attempts to relocate the shard automatically.
If relocation fails, the shard becomes unassigned.
Impact on Search Performance and Fault Tolerance
Unassigned shards affect cluster behavior in several ways.
Reduced fault tolerance
The biggest concern is the loss of redundancy. If a primary shard fails before a replica becomes available, recovery becomes much more difficult.
Longer recovery times
Clusters with unassigned replicas typically require more time to rebalance after node failures.
Reduced search scalability
Replica shards help distribute query workloads across multiple nodes. Missing replicas force all searches onto fewer resources.
Potential indexing delays
In heavily loaded clusters, allocation issues can increase indexing pressure and slow ingestion rates.
Research from the OpenSearch community and production best-practice guidance consistently emphasizes maintaining fully assigned shards to maximize resilience and cluster performance.
Common Causes of Unassigned Shards in Wazuh Indexer
Single-Node Wazuh Deployments with Replica Shards
The most common cause of yellow cluster health is a single-node deployment configured with replica shards.
OpenSearch intentionally prevents replica shards from being placed on the same node as their primary shard.
For example:
Node 1
├─ Primary Shard 1
└─ Replica Shard 1 ❌Because both copies would reside on the same server, redundancy would provide no protection against node failure.
As a result, replica shards remain unassigned and the cluster reports yellow status.
Why Replicas Cannot Be Allocated on the Same Node
This behavior is by design.
OpenSearch enforces shard allocation awareness to ensure fault tolerance.
A replica must reside on a different node than its primary counterpart.
Common Default Configuration Issues
Administrators often encounter yellow status immediately after:
- Installing a new single-node Wazuh environment
- Creating new indices with default replica settings
- Migrating from a clustered environment to a standalone server
In these cases, reducing the replica count to zero may be appropriate.
Node Failures or Offline Nodes
In multi-node clusters, yellow health frequently occurs when one or more nodes become unavailable.
Cluster Member Outages
Possible causes include:
- Hardware failures
- Power outages
- VM crashes
- Operating system failures
When a node disappears, OpenSearch attempts to reassign affected shards elsewhere.
If reassignment cannot occur, shards become unassigned.
Network Interruptions
Temporary connectivity problems can cause nodes to leave the cluster unexpectedly.
Examples include:
- Firewall rule changes
- DNS failures
- Routing issues
- VLAN misconfigurations
Node Communication Problems
OpenSearch relies on continuous cluster communication.
Issues with certificates, transport-layer connectivity, or node discovery can prevent healthy shard allocation.
If you’re troubleshooting certificate-related cluster communication failures, see How to Fix Wazuh Certificate Errors.
Disk Watermark Thresholds Reached
OpenSearch includes protective mechanisms that prevent shard allocation when storage becomes critically low.
This behavior helps avoid data corruption and catastrophic disk exhaustion.
Low Disk Watermark
At the low watermark threshold, OpenSearch begins avoiding allocation to nodes with limited free space.
High Disk Watermark
At the high watermark threshold, OpenSearch actively relocates shards away from overloaded nodes.
Flood-Stage Watermark
At flood-stage levels, indices may become read-only to prevent further storage consumption.
This frequently causes both indexing and allocation problems.
Cluster Allocation Settings Preventing Assignment
Misconfigured cluster settings can unintentionally block shard placement.
Disabled Shard Allocation
Administrators sometimes disable allocation during maintenance operations and forget to re-enable it afterward.
Examples include:
cluster.routing.allocation.enable: noneThis immediately prevents shard assignment.
Allocation Filtering Rules
Node filters can restrict where shards are allowed to reside.
Improper filters may eliminate all eligible nodes.
Cluster Routing Restrictions
Routing constraints based on node attributes can also prevent successful allocation.
Insufficient Cluster Resources
Resource exhaustion is another common cause of yellow cluster status.
Memory Pressure
When JVM memory usage becomes excessive, nodes may reject allocation requests or experience instability.
This is one reason why proper heap sizing is critical.
See How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.
CPU Bottlenecks
High CPU utilization can slow cluster operations, delay rebalancing, and extend shard recovery times.
If system-wide resource pressure exists, review Why Is Wazuh Using High CPU? Troubleshooting Guide.
JVM Heap Limitations
Insufficient heap allocation can cause:
- Long garbage collection pauses
- Slow cluster state updates
- Failed shard recoveries
- Node instability
Corrupted or Missing Index Data
Occasionally, OpenSearch cannot allocate a shard because the underlying data is damaged.
Unexpected Shutdowns
Abrupt power loss or forced process termination can leave shard data in an inconsistent state.
Filesystem Corruption
Storage corruption may prevent OpenSearch from reading required shard files.
Storage Failures
Failing SSDs, RAID problems, and network storage interruptions can all trigger unassigned shard scenarios.
Version Mismatches Between Cluster Nodes
Mixed software versions can create shard allocation problems after upgrades.
Mixed-Version Clusters
Running different Indexer/OpenSearch versions across nodes may lead to compatibility issues.
While temporary mixed-version states are often supported during rolling upgrades, prolonged version inconsistencies can cause cluster instability.
Upgrade-Related Allocation Issues
Administrators sometimes encounter yellow status after:
- Partial upgrades
- Failed upgrades
- Interrupted migrations
- Incomplete cluster restarts
When troubleshooting post-upgrade issues, verify that all cluster nodes are running compatible versions and that shard migrations completed successfully.
How to Check Cluster Health
Before attempting to fix unassigned shards, you should first verify the current health of your Wazuh Indexer cluster.
OpenSearch provides several APIs that reveal cluster status, shard allocation information, node availability, and resource utilization.
Checking cluster health helps you determine:
- Whether the cluster is green, yellow, or red
- How many shards are affected
- Whether node failures are involved
- If the issue is isolated or cluster-wide
- Which troubleshooting steps should be performed next
Verify Current Cluster Status
The quickest way to check cluster health is using the Cluster Health API:
curl -k -u admin:password \
https://localhost:9200/_cluster/health?pretty
Example output:
{
"cluster_name": "wazuh-cluster",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 3,
"active_primary_shards": 120,
"active_shards": 240,
"unassigned_shards": 5
}
This command provides a high-level overview of the cluster’s current condition.
If your cluster status is yellow, pay special attention to the number of unassigned shards and available nodes.
Key Fields to Review
status
The status field indicates overall cluster health.
Possible values include:
| Status | Meaning |
|---|---|
| Green | All primary and replica shards assigned |
| Yellow | Replica shards unassigned |
| Red | Primary shards unavailable |
A yellow cluster is generally less urgent than a red cluster because data remains accessible through primary shards.
active_shards
This field shows the total number of active shards currently assigned within the cluster.
A sudden decrease may indicate node failures or allocation issues.
unassigned_shards
This is one of the most important metrics when troubleshooting yellow cluster health.
Example:
"unassigned_shards": 12
Any value greater than zero indicates allocation problems that require investigation.
number_of_nodes
This field displays the number of nodes currently participating in the cluster.
Compare this value against your expected cluster size.
For example:
"number_of_nodes": 2
If you expect three nodes but only see two, a node may be offline or unable to communicate with the cluster.
Check Cluster Health from Wazuh Dashboard
Administrators can also review cluster health through the Wazuh Dashboard.
This approach is useful when API access is unavailable or when performing routine health checks.
Navigating to Indexer Monitoring Views
Depending on your Wazuh version, navigate to:
Dashboard
└── Index Management
└── Indices
or
Dashboard
└── Dev Tools
Some deployments may also expose OpenSearch monitoring dashboards that provide:
- Cluster health status
- Node availability
- Shard allocation statistics
- JVM utilization
- Disk usage metrics
Identifying Warning Indicators
Common indicators of shard allocation problems include:
- Yellow cluster health warnings
- Unassigned shard alerts
- Missing index replicas
- Increased cluster recovery activity
- Delayed dashboard searches
- Incomplete visualizations
If dashboard searches are returning incomplete results, you may also encounter symptoms discussed in Troubleshooting “No Matching Indices Found” Error in Wazuh Dashboard.
Similarly, severe allocation issues can eventually contribute to dashboard availability problems covered in Wazuh Dashboard Not Loading? Complete Troubleshooting Guide.
Identify Which Shards Are Unassigned
Once you’ve confirmed that the cluster is yellow, the next step is determining exactly which shards are affected.
The Cluster Health API only reports the number of unassigned shards.
To identify the specific indices involved, you’ll need to inspect shard allocation details.
List Unassigned Shards
Use the Cat Shards API:
curl -k -u admin:password \
https://localhost:9200/_cat/shards?v
Example output:
index shard prirep state node
wazuh-alerts-4.x-2026.06 0 p STARTED node-1
wazuh-alerts-4.x-2026.06 0 r UNASSIGNED
This output provides a detailed view of every shard in the cluster.
Understanding the Output
Several fields are particularly useful when investigating allocation problems.
Index Name
The index column identifies which index owns the shard.
Example:
wazuh-alerts-4.x-2026.06
This helps determine whether the problem affects:
- Alert indices
- Vulnerability indices
- Inventory data
- Custom indices
Shard Number
The shard field identifies the specific shard ID.
Example:
0
1
2
Multiple unassigned shards from the same index often indicate a broader allocation problem.
Primary vs Replica
The prirep column indicates whether the shard is primary or replica.
Values include:
p = Primary
r = Replica
Most yellow cluster issues involve replica shards.
Example:
r
If primary shards become unassigned, the cluster status typically changes to red.
Current State
The state field displays the shard’s current status.
Examples include:
STARTED
INITIALIZING
RELOCATING
UNASSIGNED
Any shard marked as UNASSIGNED requires further investigation.
Display Only Unassigned Shards
Large clusters may contain hundreds or thousands of shards.
To simplify analysis, display only key fields:
curl -k -u admin:password \
"https://localhost:9200/_cat/shards?h=index,shard,prirep,state,node"
Example output:
wazuh-alerts-4.x-2026.06 0 r UNASSIGNED
wazuh-monitoring-2026.06 1 r UNASSIGNED
This allows you to quickly identify:
- Which indices are affected
- Whether the shard is primary or replica
- The scale of the allocation problem
At this stage, you’ve identified what is unassigned.
The next step is determining why OpenSearch refuses to allocate those shards.
Use the Cluster Allocation Explain API
The Cluster Allocation Explain API is one of the most valuable troubleshooting tools available in OpenSearch.
Rather than guessing why shards remain unassigned, this API provides detailed allocation decisions generated by the cluster’s allocation engine.
In many cases, the explanation immediately identifies the root cause.
Determine Why a Shard Cannot Be Assigned
Run the following command:
curl -k -u admin:password \
-X GET "https://localhost:9200/_cluster/allocation/explain?pretty"
Example response:
{
"index": "wazuh-alerts-4.x-2026.06",
"shard": 0,
"primary": false,
"current_state": "unassigned",
"can_allocate": "no"
}
The output may also include detailed node-by-node allocation decisions that explain why OpenSearch rejected candidate nodes.
This API is considered one of the primary diagnostic tools by both the OpenSearch and Elasticsearch engineering teams when investigating shard allocation failures.
How to Interpret Allocation Explanations
The most important information appears in the allocation decision section.
NO Decisions
A NO decision means OpenSearch found no eligible node for shard placement.
Example causes include:
- Disk thresholds exceeded
- Allocation disabled
- Node attribute restrictions
- Missing nodes
- Replica conflicts
Example:
"can_allocate": "no"
These situations typically require configuration changes before allocation can proceed.
THROTTLE Decisions
A THROTTLE decision indicates that allocation is allowed but temporarily delayed.
This commonly occurs when:
- Large recoveries are already running
- Cluster rebalancing is active
- Recovery limits have been reached
Example:
"can_allocate": "throttle"
In many cases, waiting for existing recoveries to finish resolves the issue automatically.
Disk-Related Restrictions
The allocation explanation may indicate that a node exceeds configured disk watermarks.
Example message:
node exceeds high disk watermark
When this occurs, OpenSearch blocks new shard allocations until sufficient storage is freed.
If disk pressure is widespread across the cluster, additional storage or node expansion may be required.
Replica Allocation Conflicts
One of the most common explanations in Wazuh environments is:
cannot allocate replica shard to same node as primary shard
This typically occurs when:
- A single-node deployment uses replicas
- Too few nodes exist for the configured replica count
- One or more cluster nodes are offline
For example:
Cluster Nodes: 1
Replicas: 1
Because the replica cannot be stored on the same node as its primary shard, OpenSearch leaves the replica unassigned and reports yellow cluster health.
Once you’ve identified the allocation reason, you can begin applying targeted fixes rather than relying on trial-and-error troubleshooting.
Fix 1: Reduce Replica Count in Single-Node Deployments
For single-node Wazuh installations, unassigned replica shards are by far the most common reason for a yellow cluster status.
In many cases, nothing is actually broken. OpenSearch is simply unable to place replica shards because no additional nodes exist within the cluster.
If your deployment contains only one Indexer node, reducing the replica count to zero is often the correct solution.
Why Replica Shards Cause Yellow Status
OpenSearch is designed to maintain redundancy by storing replica shards on different nodes than their corresponding primary shards.
When a cluster contains only a single node, this requirement cannot be satisfied.
As a result:
- Primary shards remain assigned
- Replica shards remain unassigned
- Cluster health becomes yellow
Replica Allocation Requirements
Consider the following example:
Node-1
├── Primary Shard 0
└── Replica Shard 0 ❌
OpenSearch intentionally prevents the replica from being placed on the same node.
The allocation engine treats this as an invalid configuration because both copies would be lost if the server failed.
According to OpenSearch shard allocation guidelines, replica shards must always reside on separate nodes to provide meaningful fault tolerance.
Single-Node Limitations
A single-node cluster can safely operate without replicas.
However, administrators should understand the trade-off:
| Configuration | Fault Tolerance |
|---|---|
| 1 Primary + 1 Replica | High |
| 1 Primary + 0 Replicas | None |
For labs, test environments, proof-of-concept deployments, and small standalone servers, setting replicas to zero is generally acceptable.
Production environments should typically use multiple Indexer nodes instead.
Check Current Replica Configuration
Before making changes, verify the current replica settings.
Run:
curl -k -u admin:password \
"https://localhost:9200/_all/_settings?pretty"
Look for:
"number_of_replicas": "1"
or
"number_of_replicas": "2"
If replicas are configured but only one node exists in the cluster, yellow status is expected.
Set Replicas to Zero
Update all indices:
curl -k -u admin:password \
-X PUT "https://localhost:9200/*/_settings" \
-H 'Content-Type: application/json' \
-d '{
"index": {
"number_of_replicas": 0
}
}'
Expected response:
{
"acknowledged": true
}
This instructs OpenSearch to stop expecting replica shards.
The previously unassigned replicas will disappear from cluster health calculations.
Verify Cluster Returns to Green
After applying the change, verify cluster health:
curl -k -u admin:password \
https://localhost:9200/_cluster/health?pretty
Expected output:
{
"status": "green"
}
If the cluster remains yellow, another allocation issue is likely present and additional troubleshooting is required.
Fix 2: Restore Offline Indexer Nodes
In multi-node deployments, yellow cluster health frequently occurs because one or more Indexer nodes have gone offline.
When a node leaves the cluster unexpectedly, OpenSearch may be unable to allocate replica shards until the missing node returns or replacement capacity becomes available.
Check Cluster Nodes
First, determine which nodes are currently participating in the cluster.
Run:
curl -k -u admin:password \
"https://localhost:9200/_cat/nodes?v"
Example output:
ip heap.percent ram.percent cpu load_1m node.role master name
10.0.0.10 45 70 12 0.32 dimr * node-1
10.0.0.11 52 68 18 0.40 dimr - node-2
Compare the results against your expected cluster design.
For example:
| Expected Nodes | Detected Nodes |
|---|---|
| 3 | 2 |
If a node is missing, that is likely the cause of the unassigned shards.
Verify Node Connectivity
If nodes are missing, investigate communication issues before modifying cluster settings.
Network Communication Checks
Confirm basic connectivity between Indexer servers:
ping node-2
or
telnet node-2 9300
Connectivity failures may indicate routing or network problems.
Firewall Verification
Ensure firewall rules allow OpenSearch transport traffic between cluster members.
Common issues include:
- Newly applied firewall policies
- Cloud security group changes
- VLAN segmentation changes
- Internal ACL restrictions
OpenSearch transport traffic must be permitted between all cluster nodes.
TLS Configuration Validation
TLS certificate problems can prevent nodes from joining the cluster.
Review:
- Node certificates
- Certificate expiration dates
- Trusted CA configuration
- Transport layer TLS settings
Certificate-related cluster communication failures are discussed in detail in How to Fix Wazuh Certificate Errors.
Restart Failed Indexer Services
If connectivity appears normal but a node remains offline, restart the Indexer service.
Linux Systems
On Linux systems:
systemctl restart wazuh-indexer
Monitor logs during startup:
journalctl -u wazuh-indexer -f
Watch for:
- Cluster join failures
- TLS errors
- Memory allocation issues
- Corrupted index warnings
Verify Service Status
After restarting, confirm the service is running correctly:
systemctl status wazuh-indexer
Example healthy output:
Active: active (running)
Once the node rejoins the cluster, OpenSearch should begin allocating previously unassigned replica shards automatically.
If memory-related crashes caused the outage, review How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.
Fix 3: Resolve Disk Watermark Issues
Disk usage is one of the most common causes of unassigned shards in production Wazuh clusters.
To prevent storage exhaustion and potential data corruption, OpenSearch enforces disk watermark thresholds that restrict shard allocation when storage becomes critically low.
When these thresholds are exceeded, replica shards often become unassigned and cluster health changes to yellow.
Check Disk Usage Across Nodes
Begin by reviewing allocation and storage usage:
curl -k -u admin:password \
"https://localhost:9200/_cat/allocation?v"
Example output:
shards disk.indices disk.used disk.avail disk.total disk.percent host
120 450gb 780gb 20gb 800gb 97 node-1
Pay particular attention to:
- disk.percent
- disk.avail
- disk.used
Nodes approaching capacity are common candidates for allocation failures.
Understand Watermark Thresholds
OpenSearch uses three primary disk watermarks.
Low Watermark
Default: approximately 85% disk utilization.
Once exceeded, OpenSearch begins avoiding new shard allocations on the affected node.
High Watermark
Default: approximately 90% disk utilization.
At this threshold, OpenSearch actively relocates shards away from the node whenever possible.
Flood Stage Watermark
Default: approximately 95% disk utilization.
At this stage:
- Indices may become read-only
- Indexing operations may fail
- Allocation restrictions become much more aggressive
OpenSearch engineers strongly recommend maintaining sufficient free space to avoid reaching flood-stage conditions.
Free Disk Space
If disk watermarks are preventing allocation, the safest solution is to reduce storage consumption.
Delete Old Indices
Many Wazuh environments retain historical alert data longer than necessary.
Review older indices:
curl -k -u admin:password \
"https://localhost:9200/_cat/indices?v"
Delete unused historical data after confirming retention requirements.
Before removing data, review your retention strategy in How to Configure Wazuh Log Retention.
Expand Storage
If data growth is expected to continue, increasing storage capacity is often preferable to deleting data.
Options include:
- Expanding virtual disks
- Adding larger SSDs
- Migrating to larger storage volumes
- Adding additional Indexer nodes
Archive Historical Data
Older security data may be archived to external storage instead of remaining in the active cluster.
Common destinations include:
- Object storage
- Backup repositories
- Long-term compliance archives
This approach preserves historical visibility while reducing Indexer storage pressure.
Temporarily Adjust Watermark Settings
If immediate allocation is required while storage remediation is underway, you can temporarily increase watermark thresholds.
curl -k -u admin:password \
-X PUT "https://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' \
-d '{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "95%"
}
}'
This may allow shard allocation to proceed temporarily.
However, this should be considered a short-term workaround rather than a permanent solution.
Running clusters near full capacity increases the risk of:
- Performance degradation
- Allocation instability
- Index corruption
- Flood-stage lockouts
After freeing space or expanding storage, restore watermark values to their recommended settings and verify that all shards have been successfully assigned.
Fix 4: Re-Enable Shard Allocation
Administrators often disable shard allocation temporarily during maintenance activities, cluster upgrades, node replacements, or migration operations.
If allocation is not re-enabled afterward, replica shards remain unassigned and the cluster continues reporting a yellow status.
Fortunately, this is one of the easiest allocation problems to diagnose and resolve.
Check Current Allocation Settings
Begin by reviewing the cluster’s allocation configuration:
curl -k -u admin:password \
"https://localhost:9200/_cluster/settings?pretty"
Look for settings similar to:
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"enable": "none"
}
}
}
}
}
or
"cluster.routing.allocation.enable": "primaries"
Common values include:
| Setting | Meaning |
|---|---|
| all | Allocate all shard types |
| primaries | Allocate only primary shards |
| new_primaries | Allocate newly created primaries only |
| none | Disable all shard allocation |
If the setting is anything other than all, it may explain why replica shards remain unassigned.
Enable Allocation
Restore normal shard allocation using:
curl -k -u admin:password \
-X PUT "https://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' \
-d '{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}'
Expected response:
{
"acknowledged": true
}
This instructs OpenSearch to resume normal allocation and rebalancing operations across the cluster.
Verify Allocation Resumes
After enabling allocation, monitor cluster recovery progress:
curl -k -u admin:password \
"https://localhost:9200/_cluster/health?pretty"
You should see:
- Decreasing unassigned shard counts
- Increasing active shard counts
- Ongoing recovery activity
You can also monitor shard movement:
curl -k -u admin:password \
"https://localhost:9200/_cat/recovery?v"
Example:
index shard time type stage
wazuh-alerts-4.x 0 10s peer done
If allocation resumes successfully, cluster health should eventually transition from yellow to green.
After maintenance windows, always verify that allocation has been re-enabled before returning the cluster to production service.
Fix 5: Address Memory and JVM Pressure
Insufficient memory is another common reason shards remain unassigned.
Even when disk space and node availability are healthy, OpenSearch may refuse allocations if nodes are experiencing excessive JVM pressure.
High heap utilization can slow cluster state updates, delay shard recoveries, and cause node instability.
Check JVM Heap Utilization
Start by reviewing JVM statistics:
curl -k -u admin:password \
"https://localhost:9200/_nodes/stats/jvm?pretty"
Example output:
{
"heap_used_percent": 87,
"heap_max_in_bytes": 8589934592
}
Pay particular attention to:
- heap_used_percent
- heap_max_in_bytes
- garbage collection statistics
- memory pool utilization
Clusters consistently operating near heap limits frequently experience allocation problems.
Identify Memory Bottlenecks
Several warning signs indicate memory-related allocation issues.
High Heap Utilization
A healthy OpenSearch cluster generally maintains enough free heap space to handle indexing, searches, and shard recovery operations.
Warning signs include:
Heap Usage > 85%
Sustained utilization above this threshold often correlates with performance degradation.
Frequent Garbage Collection
Excessive garbage collection activity can indicate memory pressure.
Symptoms include:
- Slow queries
- Delayed indexing
- Cluster state update delays
- Recovery timeouts
Example log entries:
[gc][young] duration [3.4s]
or
[gc][old] duration [12.7s]
Long GC pauses can temporarily prevent allocation activities from completing.
Out-of-Memory Conditions
The most severe scenario involves JVM crashes.
Look for errors such as:
java.lang.OutOfMemoryError
or
heap space exhausted
Nodes experiencing OOM events may leave the cluster entirely, creating additional unassigned shards.
Recommended Heap Configuration
Proper heap sizing is one of the most important aspects of Indexer stability.
Sizing Guidelines
General OpenSearch recommendations include:
| Server RAM | Recommended Heap |
|---|---|
| 8 GB | 4 GB |
| 16 GB | 8 GB |
| 32 GB | 16 GB |
| 64 GB | 31 GB |
OpenSearch engineers generally recommend allocating approximately 50% of available system memory to the JVM while leaving sufficient RAM for the operating system and filesystem cache.
Heap Tuning Best Practices
For Wazuh Indexer environments:
- Keep Xms and Xmx identical
- Avoid heap sizes larger than 31–32 GB
- Monitor garbage collection behavior regularly
- Review shard counts periodically
- Scale nodes before memory becomes constrained
For a deeper discussion of heap sizing and memory optimization, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.
If excessive CPU usage accompanies memory pressure, you may also find Why Is Wazuh Using High CPU? Troubleshooting Guide useful.
After addressing memory issues, OpenSearch may automatically resume shard allocation once sufficient resources become available.
Fix 6: Force a Shard Reroute
In some situations, the original allocation problem has already been resolved, but OpenSearch has not yet retried assigning affected shards.
When this occurs, manually triggering a reroute can accelerate recovery.
When Manual Rerouting Is Appropriate
Rerouting should generally be considered after the root cause has been fixed.
Examples include:
- A failed node has rejoined the cluster
- Disk space has been reclaimed
- Allocation settings have been restored
- Memory pressure has subsided
- Network connectivity has been repaired
A reroute should not be used as a substitute for correcting the underlying problem.
Recovery Scenarios
Common situations where rerouting helps include:
- Cluster recovery after maintenance
- Node replacement projects
- Storage expansion events
- Recovery from temporary outages
- Post-upgrade shard recovery
For example:
Problem Fixed
↓
Shards Still Unassigned
↓
Trigger Reroute
↓
Allocation Resumes
Post-Maintenance Allocation Issues
Administrators often encounter lingering unassigned shards after:
- Cluster restarts
- Rolling upgrades
- Node migrations
- Configuration changes
Even after the environment becomes healthy again, some allocation decisions may need to be retried.
Retry Failed Allocations
Trigger a reroute using:
curl -k -u admin:password \
-X POST "https://localhost:9200/_cluster/reroute?retry_failed=true"
Expected response:
{
"acknowledged": true
}
The retry_failed parameter instructs OpenSearch to revisit shards that were previously rejected during allocation attempts.
This often resolves allocation delays after transient failures.
Verify Recovery Progress
After initiating the reroute, monitor cluster health:
curl -k -u admin:password \
"https://localhost:9200/_cluster/health?pretty"
Watch for improvements in:
{
"status": "green",
"unassigned_shards": 0
}
You can also monitor ongoing recovery operations:
curl -k -u admin:password \
"https://localhost:9200/_cat/recovery?v"
Additionally, verify shard placement:
curl -k -u admin:password \
"https://localhost:9200/_cat/shards?v"
Successful recovery typically follows this sequence:
UNASSIGNED
↓
INITIALIZING
↓
RELOCATING
↓
STARTED
Once all shards reach the STARTED state and no unassigned shards remain, cluster health should return to green and full redundancy will be restored.
Fix 7: Recover Corrupted Indices
Although less common than disk, memory, or allocation configuration issues, index corruption can also leave shards permanently unassigned.
When OpenSearch cannot read shard metadata or underlying segment files, allocation attempts may repeatedly fail even though cluster resources are otherwise healthy.
In these situations, administrators must identify the corrupted index and either restore it from backup or rebuild it.
Detect Corrupted Indexes
The first step is determining whether corruption is actually responsible for the allocation failure.
The Cluster Allocation Explain API often reveals corruption-related errors, but Indexer logs usually provide the most detailed information.
Reviewing Indexer Logs
Review Indexer logs on affected nodes:
journalctl -u wazuh-indexer -f
or
tail -f /var/log/wazuh-indexer/wazuh-cluster.log
Look for repeated shard recovery failures involving the same index.
Common Corruption Indicators
Common warning messages include:
corrupt index
failed shard recovery
corrupted segment file
checksum mismatch
translog corruption detected
Potential causes include:
- Unexpected server shutdowns
- Storage controller failures
- Filesystem corruption
- Faulty disks
- Incomplete writes during crashes
According to OpenSearch engineering guidance, storage integrity problems should be investigated immediately because corruption may indicate underlying hardware failures rather than isolated software issues.
Restore from Snapshot
If a recent snapshot exists, restoration is typically the safest recovery method.
Snapshots preserve:
- Index mappings
- Documents
- Settings
- Shard metadata
Restoring from a known-good backup often resolves corruption without requiring manual reconstruction.
Snapshot Recovery Process
A typical recovery workflow looks like:
Identify Corrupted Index
↓
Delete Damaged Index
↓
Restore Snapshot
↓
Verify Recovery
Example restore command:
curl -k -u admin:password \
-X POST "https://localhost:9200/_snapshot/repository/snapshot_name/_restore"
The exact syntax depends on your snapshot repository configuration.
Validation Steps
After restoration:
- Verify index health.
- Check shard assignment.
- Confirm document counts.
- Validate dashboard searches.
- Review cluster health.
Run:
curl -k -u admin:password \
"https://localhost:9200/_cluster/health?pretty"
and verify that unassigned shard counts decrease.
Rebuild Problematic Indices
If no usable snapshot exists, rebuilding the index may be necessary.
This is generally considered a last-resort option.
Last-Resort Recovery Option
The recovery process usually involves:
Delete Corrupted Index
↓
Create New Index
↓
Reingest Data
↓
Validate Searches
Depending on the affected index, data may be regenerated from:
- Wazuh agents
- Log sources
- External SIEM feeds
- Historical archives
Potential Data-Loss Considerations
Before deleting a corrupted index, understand the consequences.
Possible outcomes include:
- Loss of historical alerts
- Missing vulnerability records
- Incomplete compliance data
- Reduced forensic visibility
For this reason, maintaining regular backups is critical.
If retention and recovery planning are part of your environment, review How to Configure Wazuh Log Retention for additional guidance.
Monitoring Cluster Recovery
After applying a fix, you should continuously monitor the cluster until all shards have been successfully assigned.
Even when the root cause has been resolved, large clusters may require time to relocate shards, perform recoveries, and rebalance workloads.
Monitoring progress helps confirm that corrective actions are actually working.
Track Cluster Health Changes
The simplest approach is to repeatedly query cluster health.
On Linux systems:
watch -n 10 'curl -k -u admin:password https://localhost:9200/_cluster/health?pretty'
This refreshes cluster health every ten seconds.
Pay particular attention to:
- status
- active_shards
- relocating_shards
- initializing_shards
- unassigned_shards
A healthy recovery generally shows:
Unassigned Shards ↓
Initializing Shards ↑
Active Shards ↑
Monitor Shard Allocation Progress
To track individual shard recoveries:
curl -k -u admin:password \
"https://localhost:9200/_cat/recovery?v"
Example output:
index shard stage
wazuh-alerts 0 init
wazuh-alerts 1 done
Recovery stages commonly include:
| Stage | Meaning |
|---|---|
| init | Recovery started |
| index | Segment transfer in progress |
| verify_index | Integrity validation |
| translog | Transaction log replay |
| done | Recovery complete |
Large clusters containing terabytes of data may require hours to fully recover.
Confirm Green Cluster Status
A cluster should not be considered fully healthy until it returns to green.
Run:
curl -k -u admin:password \
"https://localhost:9200/_cluster/health?pretty"
Expected output:
{
"status": "green"
}
All Shards Assigned
Verify that no unassigned shards remain:
"unassigned_shards": 0
This confirms successful allocation.
Replica Allocation Completed
Confirm replica shards are active:
curl -k -u admin:password \
"https://localhost:9200/_cat/shards?v"
Replica entries should display:
STARTED
rather than:
UNASSIGNED
No Allocation Warnings
Finally, ensure that:
- Allocation Explain API reports no issues
- Cluster logs are free of allocation errors
- Dashboard monitoring shows healthy status
- Searches return expected results
Once these checks are complete, the recovery process can be considered successful.
Best Practices to Prevent Future Unassigned Shards
Most shard allocation issues are preventable through proactive cluster management.
The following best practices help maintain healthy Wazuh Indexer environments and reduce the likelihood of future yellow cluster states.
Size Clusters Appropriately
Many allocation issues originate from undersized infrastructure.
When planning a cluster, consider:
- Daily log volume
- Agent count
- Retention requirements
- Search workload
- Future growth projections
Avoid running production deployments with minimal hardware resources.
Monitor Disk Usage Before Watermarks Trigger
Waiting until disks exceed watermark thresholds often results in emergency remediation efforts.
Instead:
- Monitor storage utilization continuously
- Configure alerts at 70–80% utilization
- Expand capacity before critical thresholds are reached
Proactive storage management significantly reduces allocation-related incidents.
Regularly Review Cluster Health
Cluster health checks should become part of routine operational procedures.
Recommended monitoring includes:
_cluster/health
_cat/shards
_cat/allocation
Early detection allows administrators to resolve minor issues before they escalate.
Use Appropriate Replica Counts
Replica settings should match cluster architecture.
Examples:
| Cluster Size | Recommended Replicas |
|---|---|
| 1 Node | 0 |
| 2 Nodes | 1 |
| 3+ Nodes | 1 or more depending on redundancy goals |
Overly aggressive replica counts can create unnecessary allocation pressure.
Maintain Consistent Node Versions
Mixed-version clusters frequently create allocation complications.
Best practices include:
- Upgrade all nodes promptly
- Follow supported upgrade paths
- Avoid prolonged mixed-version deployments
- Validate compatibility before upgrades
This reduces the risk of shard recovery failures after maintenance.
Configure Snapshot Backups
Snapshots are one of the most effective protections against index corruption and data loss.
A comprehensive backup strategy should include:
- Scheduled snapshots
- Offsite storage
- Recovery testing
- Retention policies
OpenSearch experts consistently recommend snapshots as the primary recovery mechanism for catastrophic index failures.
Monitor JVM Heap and Garbage Collection
Heap pressure is often an early warning sign of future allocation problems.
Monitor:
- Heap utilization
- GC frequency
- GC duration
- Node memory consumption
If memory usage trends upward over time, address the issue before allocation failures occur.
Plan Capacity Growth Ahead of Time
Successful Wazuh deployments rarely remain static.
Agent counts increase, log volumes grow, and retention requirements expand.
Capacity planning should include:
- Storage growth forecasting
- Heap sizing reviews
- Node expansion planning
- Performance trend analysis
Organizations that regularly evaluate future resource requirements experience significantly fewer allocation-related outages than those that react only after problems appear.
By combining proactive monitoring, proper sizing, regular backups, and disciplined maintenance procedures, administrators can dramatically reduce the likelihood of encountering yellow cluster health and unassigned shard issues in the future.
Frequently Asked Questions (FAQ)
Question: What causes yellow cluster status in Wazuh Indexer?
A yellow cluster status occurs when all primary shards are available but one or more replica shards remain unassigned.
The most common causes include:
- Single-node deployments with replica shards enabled
- Offline Indexer nodes
- Disk watermark threshold violations
- Disabled shard allocation
- JVM memory pressure
- Network communication issues
- Corrupted indices
- Version mismatches between cluster nodes
In most environments, replica allocation failures are responsible for the majority of yellow cluster health incidents.
Question: Is a yellow cluster status dangerous?
A yellow cluster status is not immediately critical because all primary shards remain available and searchable.
However, it should not be ignored.
A yellow cluster indicates reduced fault tolerance. If a node containing primary shards fails before replica shards are assigned, data availability may be affected.
While less severe than a red cluster, yellow status should still be investigated and resolved as soon as practical.
Question: Can Wazuh function normally with a yellow cluster?
In many cases, yes.
The Wazuh Manager, Indexer, and Dashboard generally continue operating because primary shards remain available.
However, administrators may experience:
- Reduced redundancy
- Slower search performance
- Longer recovery times during failures
- Increased risk of data unavailability if another node fails
The cluster may appear healthy to end users while still carrying significant operational risk.
Question: Why are replica shards unassigned in a single-node deployment?
OpenSearch does not allow a replica shard to be stored on the same node as its primary shard.
For example:
Node-1
├── Primary Shard
└── Replica Shard ❌
Because there is no second node available, replica allocation becomes impossible.
As a result:
- Primary shards remain active
- Replica shards remain unassigned
- Cluster health becomes yellow
For standalone Wazuh deployments, setting the replica count to zero is typically the correct solution.
Question: How do I identify which shards are unassigned?
The easiest method is using the Cat Shards API:
curl -k -u admin:password \
"https://localhost:9200/_cat/shards?v"
You can also display only relevant fields:
curl -k -u admin:password \
"https://localhost:9200/_cat/shards?h=index,shard,prirep,state,node"
Look for entries with:
UNASSIGNED
The output will identify:
- Affected indices
- Shard numbers
- Primary or replica designation
- Current allocation status
Question: What is the fastest way to return a cluster to green?
The answer depends entirely on the root cause.
Examples include:
| Cause | Fix |
|---|---|
| Single-node replicas | Set replicas to zero |
| Offline node | Restore node connectivity |
| Disk watermarks | Free storage space |
| Disabled allocation | Re-enable shard allocation |
| JVM pressure | Resolve memory bottlenecks |
The fastest way to identify the correct fix is usually the Cluster Allocation Explain API:
curl -k -u admin:password \
-X GET "https://localhost:9200/_cluster/allocation/explain?pretty"
This API typically reveals the exact reason allocation is failing.
Question: Can low disk space cause unassigned shards?
Yes.
Low disk space is one of the most common causes of unassigned shards in production environments.
When configured disk watermarks are exceeded, OpenSearch may:
- Block new allocations
- Relocate existing shards
- Mark indices read-only
- Prevent replica recovery
Administrators should monitor storage utilization long before disk watermarks are reached.
Question: Should I manually reroute unassigned shards?
Manual rerouting can be useful, but only after the root cause has been fixed.
For example:
- A node has returned to service
- Disk space has been freed
- Allocation settings have been corrected
In these cases:
curl -k -u admin:password \
-X POST "https://localhost:9200/_cluster/reroute?retry_failed=true"
may accelerate recovery.
However, rerouting should not be used as a substitute for addressing the underlying allocation problem.
Question: What is the difference between yellow and red cluster status?
The difference is based on which shard types are unavailable.
| Status | Meaning |
|---|---|
| Green | All shards assigned |
| Yellow | Replica shards unassigned |
| Red | Primary shards unassigned |
A yellow cluster still has access to all primary data.
A red cluster indicates that some primary shards are unavailable, which may result in missing or inaccessible data.
Red cluster status generally requires more urgent intervention.
Question: Will restarting Wazuh Indexer fix unassigned shards?
Sometimes, but not always.
Restarting the Indexer may help if the issue involves:
- Temporary node failures
- Hung allocation processes
- Short-lived communication problems
However, restarting will not fix:
- Replica allocation conflicts
- Disk watermark violations
- Disabled allocation settings
- Corrupted indices
- Improper replica configurations
Before restarting services, administrators should determine the actual allocation failure reason using the Cluster Allocation Explain API.
Conclusion
A yellow cluster status in Wazuh Indexer indicates that replica shards cannot be assigned, leaving the cluster operational but without full redundancy.
While the system typically continues processing alerts and serving dashboard queries, unresolved unassigned shards increase operational risk and reduce the cluster’s ability to withstand future failures.
The most common causes include:
- Single-node deployments with replica shards enabled
- Offline or unreachable Indexer nodes
- Disk watermark threshold violations
- Disabled shard allocation settings
- JVM memory pressure
- Corrupted indices
- Cluster version inconsistencies
The key to resolving yellow cluster health efficiently is identifying the exact reason OpenSearch is refusing shard allocation rather than relying on trial-and-error troubleshooting.
A recommended troubleshooting workflow is:
Check Cluster Health
↓
Identify Unassigned Shards
↓
Run Allocation Explain API
↓
Determine Root Cause
↓
Apply Targeted Fix
↓
Monitor Recovery
↓
Verify Green Status
By using tools such as:
_cluster/health_cat/shards_cat/allocation_cluster/allocation/explain
administrators can quickly diagnose and resolve most shard allocation problems.
Long-term stability depends on proactive cluster management.
Proper sizing, adequate storage capacity, regular health monitoring, consistent node versions, JVM tuning, and reliable snapshot backups all play critical roles in preventing future allocation failures.
If you’re building or maintaining a production Wazuh deployment, consider reviewing How to Build a Wazuh Indexer Cluster, How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes, and How to Configure Wazuh Log Retention to further improve Indexer reliability and resilience.
With the right monitoring practices and troubleshooting methodology, you can keep your Wazuh Indexer cluster healthy, maintain full shard redundancy, and prevent yellow cluster status issues from disrupting security operations.

Be First to Comment