Fixing Wazuh 502 Bad Gateway: Troubleshooting Guide

A 502 Bad Gateway in a Wazuh deployment indicates that the Wazuh Dashboard (frontend layer) is unable to receive a valid response from its upstream backend services, typically the indexer or API layer.

In practical terms, the dashboard is acting as a reverse-proxied UI (commonly via Nginx), and it cannot successfully communicate with the backend service responsible for search and analytics.

Why It Commonly Appears After Upgrades, Restarts, or Cluster Changes

This error frequently surfaces in operational scenarios such as:

  • Post-upgrade service startup delays (indexer not fully ready)
  • Cluster reconfiguration or node replacement
  • Certificate rotation or security plugin changes
  • Resource contention during restart storms

In distributed Wazuh stacks, service dependency order matters: if the indexer layer is not fully healthy, the dashboard will fail with a 502 rather than a graceful degraded state.

Overview of Wazuh–OpenSearch Architecture

A typical deployment follows this flow:

Dashboard → API Layer → Indexer (OpenSearch)

  • Dashboard renders UI and queries backend APIs
  • API layer authenticates and routes requests
  • OpenSearch handles indexing and query execution

If any layer in this chain is unavailable or misconfigured, the dashboard cannot complete request routing, resulting in a 502 error.

For architecture reference, see:


Understanding the Root Cause

 

What Triggers a 502 in the Wazuh Dashboard

A 502 occurs when the reverse proxy receives an invalid response (or no response) from upstream services.

In Wazuh environments, this is almost always a backend availability or routing issue.

Backend (OpenSearch) Unreachable

If the indexer cluster is:

  • Down
  • Not initialized
  • Failing TLS handshake
  • Experiencing JVM heap exhaustion

…the dashboard cannot retrieve required data for rendering.

Reverse Proxy (Nginx/Apache) Misrouting

When Nginx is used in front of the dashboard:

  • Incorrect upstream IP/port leads to failed proxy_pass
  • Misconfigured SSL termination breaks backend communication

Wazuh Indexer Cluster Not Ready

Even when services are “running,” the cluster may still be:

  • Yellow or red status
  • Recovering shards
  • Initializing security plugins

This leads to partial readiness states where the dashboard cannot complete API calls.

Role of OpenSearch in Dashboard Rendering

The dashboard depends heavily on indexer queries for:

  • Alerts
  • Security events
  • Rule execution results
  • Agent telemetry

If OpenSearch cannot respond within timeout thresholds, the proxy interprets this as a backend failure and returns a 502.

Difference Between “502 Bad Gateway” vs “Not Ready Yet”

These two errors are often confused:

  • 502 Bad Gateway → hard failure at proxy or upstream connection level
  • “Dashboard Server Is Not Ready Yet” → service startup delay or initialization phase

Related guide:


Common Causes of Wazuh Dashboard 502 Errors

 

OpenSearch Service Down or Unhealthy

When the OpenSearch service is down or degraded:

  • Dashboard loses access to indices
  • Queries fail immediately
  • Proxy returns 502 due to no upstream response

Common triggers:

  • Node crash
  • Disk full conditions
  • Corrupted shard allocation

Cluster Not Initialized

A partially initialized cluster often occurs after:

  • Fresh installs
  • Multi-node scaling events
  • Security plugin misconfiguration

During this state, the cluster exists but cannot serve requests.

Node Failures or Memory Pressure

Frequent in production environments:

  • High JVM heap usage causes GC thrashing
  • Node becomes unresponsive
  • Cluster health drops to RED

Service Not Binding to Expected Port (9200/443)

If OpenSearch is misconfigured:

  • Service may bind only to localhost
  • TLS port mismatch breaks dashboard connectivity
  • Health checks fail silently

Network or Firewall Misconfiguration

Blocked Internal Traffic Between Dashboard and Indexer

Common in cloud or containerized deployments:

  • Security groups blocking port 9200/443
  • Kubernetes NetworkPolicies restricting traffic

Incorrect Security Group Rules

In AWS-style environments:

  • Dashboard cannot reach indexer endpoint
  • Inter-node communication fails

Related reading:

Reverse Proxy Misconfiguration

Nginx Upstream Pointing to Wrong Host/Port

If proxy_pass points incorrectly:

  • Requests never reach backend
  • Immediate 502 responses occur

Timeout Misconfigurations

  • Default timeouts too short for large queries
  • Slow indexer responses dropped by proxy

SSL Termination Issues

  • Mismatched TLS versions between dashboard and indexer
  • Invalid certificate chain

Certificate or Security Plugin Failures

Expired TLS Certificates

If certificates expire:

  • Handshake fails instantly
  • Dashboard cannot establish secure connection

OpenSearch Security Plugin Misalignment

Security plugin mismatches cause:

  • Authentication rejection loops
  • Invalid node trust relationships

Auth Handshake Failures

Often seen after:

  • Cluster scaling
  • Manual cert rotation

Resource Exhaustion

High CPU / RAM Usage in OpenSearch Cluster

When system resources are saturated:

  • Query latency spikes
  • Requests time out
  • Proxy returns 502

JVM Heap Pressure Causing Request Drops

  • Garbage collection pauses stall responses
  • Nodes become temporarily unreachable

Related optimization guide:


Step-by-Step Troubleshooting Guide

 

Check Service Status

Start by confirming that all core Wazuh components are actually running.

A 502 often masks a simple service outage.

Verify the dashboard service:

systemctl status wazuh-dashboard

Check indexer service (OpenSearch-based):

systemctl status wazuh-indexer

For cluster health, query the indexer directly:

curl -k -u admin:admin https://localhost:9200/_cluster/health?pretty

Cluster states:

  • Green → Fully operational (no action needed)
  • Yellow → Functional but replicas missing
  • Red → Critical shard failures (likely root cause of 502)

Validate OpenSearch Connectivity

A large percentage of 502 errors come from broken or misrouted backend connections.

Test API accessibility:

curl -k https://<opensearch-host>:9200

If authentication is enabled:

curl -k -u admin:password https://<opensearch-host>:9200

Key checks:

  • DNS resolution correctness
  • Firewall rules allowing port 9200 / 443
  • Correct protocol (HTTP vs HTTPS mismatch is common)

Inspect Logs

Logs are the most reliable source of truth in Wazuh troubleshooting.

Wazuh Dashboard Logs

/var/log/wazuh-dashboard/wazuh-dashboard.log

Look for:

  • upstream connection refused
  • timeout errors
  • TLS handshake failures

OpenSearch Logs

/var/log/opensearch/opensearch.log

Look for:

  • shard allocation failures
  • JVM heap pressure warnings
  • cluster state blocks

System Logs

journalctl -u wazuh-dashboard --no-pager
journalctl -u wazuh-indexer --no-pager

Review Reverse Proxy Settings

Most 502 errors surface at the Nginx/Apache layer when upstream routing fails.

Validate Nginx/Apache Config

For Nginx:

upstream wazuh_dashboard {
    server 127.0.0.1:5601;
}

Check:

  • Correct upstream IP/port
  • No stale or duplicated upstream blocks

Increase Timeout Thresholds

If OpenSearch is slow under load:

proxy_connect_timeout 120;
proxy_read_timeout 300;
proxy_send_timeout 300;

Related reference:

Verify Certificates and Authentication

TLS and security plugin mismatches are a major hidden cause of 502 errors.

Check TLS Handshake Errors

Look for:

  • SSL handshake failed
  • certificate expired
  • x509: certificate signed by unknown authority

Validate CA Chain Consistency

Ensure:

  • Same CA used across dashboard and indexer
  • Intermediate certs are correctly installed

Security Plugin Compatibility

The OpenSearch security plugin must match:

  • Wazuh version
  • Indexer version
  • Dashboard version

Related guide:


Advanced Fixes

 

Restart Sequence Correction

Incorrect restart order is a classic cause of transient 502 errors.

Proper sequence:

  1. OpenSearch (Indexer)
  2. Wazuh manager
  3. Wazuh dashboard

This ensures backend readiness before UI initialization.

Reset OpenSearch Cluster State (If Corrupted)

If cluster metadata is corrupted or stuck in RED state:

Use caution—this can impact stored alerts and indices.

Typical scenarios requiring reset:

  • Broken shard allocation loops
  • Failed rolling upgrades
  • Persistent red cluster state

High-level approach:

  • Stop services
  • Clear problematic cluster metadata
  • Reinitialize nodes cleanly

Rebuild Wazuh Dashboard Index

Corrupted dashboard indices can trigger persistent 502 errors.

Fix approach:

  • Delete stale dashboard index
  • Allow system to regenerate fresh index on startup

Common symptom:

  • Dashboard loads partially then fails API calls

Reference:


Adjust JVM and Heap Settings

OpenSearch is JVM-heavy, and improper memory allocation leads to instability.

Recommended adjustments:

  • Set heap size to ~50% of system RAM (max 32GB rule applies)
  • Ensure Xms and Xmx are equal
  • Monitor GC pauses

Common configuration file:

/etc/opensearch/jvm.options

Symptoms of bad tuning:

  • 502 under load spikes
  • intermittent request failures
  • cluster node restarts

Prevention Best Practices

 

Monitor OpenSearch Cluster Health Continuously

Implement continuous monitoring of:

  • Cluster state (green/yellow/red)
  • Node availability
  • Shard allocation status

This prevents silent degradation that leads to 502 failures.

Set Alerting for 502 and Connection Failures

Configure alerting for:

  • HTTP 502 spikes
  • upstream connection failures
  • indexer downtime events

This allows proactive remediation before full dashboard outage.

Use Proper Resource Allocation (CPU/RAM)

Under-provisioning is a leading cause of instability:

  • Ensure sufficient heap memory for OpenSearch
  • Avoid CPU contention with other workloads
  • Separate indexer nodes in larger deployments

Automate Service Health Checks on Startup

Implement startup validation scripts:

  • Confirm OpenSearch is reachable before starting dashboard
  • Validate cluster state before exposing UI

This avoids partial startup failures that manifest as 502 errors.

Maintain Certificate Lifecycle Management

TLS mismanagement is one of the most preventable causes of 502 errors:

  • Rotate certificates before expiry
  • Use centralized CA management
  • Monitor expiration dates proactively

Related reference:


When the Issue Indicates a Bigger Problem

A Wazuh dashboard 502 Bad Gateway error is often treated as a surface-level connectivity issue, but in mature environments it can signal deeper architectural or capacity problems in your stack, especially within the Wazuh and OpenSearch layer.

Cluster Scaling Limitations

If 502 errors begin appearing during peak ingestion windows or sustained query loads, the system may be hitting fundamental scaling ceilings.

Common indicators:

  • Increasing query latency before failure
  • Nodes frequently entering “yellow” or “red” state
  • Uneven shard distribution across cluster nodes

This typically means the current cluster topology cannot support the ingestion-to-query ratio.

Under-Provisioned Nodes

Resource starvation is one of the most common long-term causes of recurring 502 errors.

Symptoms include:

  • JVM heap consistently near maximum allocation
  • CPU saturation during indexing spikes
  • Frequent garbage collection pauses

When nodes are under-provisioned, even healthy configurations will intermittently fail under load, leading to upstream timeouts that manifest as 502 errors at the dashboard layer.

High Ingestion Rates Causing Backlog

If event ingestion exceeds indexing throughput:

  • Queues begin to accumulate in OpenSearch
  • Shard refresh delays increase
  • Query responses become delayed or dropped

This backlog leads directly to proxy-level failures because the dashboard cannot receive timely responses.

Need for Distributed Architecture Redesign

Persistent or recurring 502 errors across multiple nodes may indicate structural design limitations rather than misconfiguration.

This may require:

  • Separating indexer nodes from dashboard/API workloads
  • Introducing dedicated hot/warm tier storage
  • Scaling horizontally with additional OpenSearch nodes
  • Load balancing ingestion pipelines

In larger deployments, treating 502 errors as isolated incidents often delays the necessary architectural evolution.


FAQ

Question: Why am I getting 502 Bad Gateway on Wazuh Dashboard?

In most cases, the error occurs because the dashboard cannot communicate with its backend services—primarily OpenSearch or the reverse proxy layer is failing to route requests correctly.

Question: Is this a dashboard issue or OpenSearch issue?

In the majority of cases, it originates from:

  • Unhealthy or unreachable OpenSearch cluster
  • Reverse proxy misconfiguration (Nginx/Apache upstream issues)
  • Network or firewall restrictions between components

The dashboard itself is usually not the root cause.

Question: Can restarting Wazuh fix the error?

Sometimes, yes—but only in limited cases such as:

  • Temporary service startup sequencing issues
  • Short-lived resource spikes
  • Stale connections after a restart

If the underlying issue is cluster health, memory pressure, or misconfiguration, restarts only provide temporary relief.

Reference:

Question: Does 502 mean data loss?

No. A 502 Bad Gateway error is strictly a communication failure between services, not a data integrity issue.

Your:

  • Logs
  • Alerts
  • Indices
    remain intact unless a separate storage or cluster corruption event is present.

Question: How do I confirm OpenSearch is the root cause?

Direct validation is the most reliable approach:

  • Check cluster health:

    curl -k https://localhost:9200/_cluster/health?pretty
  • Test API responsiveness:

    curl -k https://localhost:9200

If these fail or return errors, the issue lies within the indexer layer rather than the dashboard.


Conclusion

The Wazuh dashboard 502 Bad Gateway error is almost always a symptom of upstream service disruption rather than a standalone UI issue.

Recap of Primary Root Causes

Most occurrences trace back to:

  • Unhealthy or unresponsive OpenSearch cluster
  • Reverse proxy misrouting or timeout misconfiguration
  • Resource exhaustion (CPU, memory, JVM heap)
  • TLS or certificate mismatches between components

Importance of Checking OpenSearch Health First

The first and most reliable diagnostic step is always verifying the health of the indexer layer in OpenSearch.

If the cluster is degraded, all downstream services will reflect failure symptoms like 502 errors.

Emphasis on Structured Troubleshooting vs Random Restarts

Blind restarts may temporarily mask symptoms but do not resolve:

  • Cluster instability
  • Memory pressure issues
  • Misconfigured proxy routing
  • Network segmentation problems

A structured diagnostic approach prevents repeated outages and reduces mean time to recovery.

Final Recommendation: Proactive Monitoring and Proper Resource Planning

Long-term stability depends on:

  • Continuous monitoring of cluster health and node performance
  • Proper CPU/RAM provisioning for indexing workloads
  • Automated alerting for early warning signals (latency, red cluster state, failed nodes)
  • Planned scaling strategies for ingestion growth

For environments running production workloads, treating 502 errors as “rare incidents” rather than “capacity signals” is one of the most common operational mistakes.

Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *