Why Is Wazuh Using High CPU? Troubleshooting Guide

Wazuh is a distributed security monitoring system built on a modular architecture composed of agents, manager, indexer, and dashboard components.

At a high level:

  • Agents collect telemetry from endpoints (logs, FIM events, system activity).
  • Wazuh Manager processes incoming events, applies decoders, evaluates rules, and triggers alerts.
  • Indexer (OpenSearch/Elasticsearch-based) stores and indexes security events for search and correlation.
  • Dashboard provides visualization and investigation capabilities.

This architecture is powerful, but inherently resource-intensive.

In SIEM/XDR systems like Wazuh, CPU spikes are expected under certain workloads, especially during:

  • High log ingestion bursts
  • Rule evaluation surges
  • Indexing backpressure in OpenSearch/Elasticsearch

According to NIST’s continuous monitoring guidance, security telemetry pipelines must be treated as “high-throughput analytical systems” where compute demand fluctuates significantly under real-time detection workloads.

Symptoms of High CPU Usage

When CPU saturation occurs in a Wazuh environment, common symptoms include:

  • Delayed or missing alerts in the dashboard
  • Increased event processing latency on the manager
  • Slow query performance in the dashboard
  • Agent communication lag or queue buildup
  • System-wide performance degradation on the Wazuh node(s)

In production deployments, these symptoms often indicate bottlenecks in either rule processing (manager layer) or indexing throughput (storage layer).


How to Identify High CPU Usage in Wazuh

Before fixing CPU issues, you need to pinpoint which component is responsible.

Using top, htop, and System Monitoring Tools

Start with standard Linux observability tools:

  • top / htop → Identify processes consuming CPU in real time
  • pidstat → Break down CPU usage per process thread
  • vmstat → Detect CPU run queue pressure
  • iostat → Check if CPU spikes correlate with disk I/O saturation

Focus on:

  • wazuh-analysisd
  • wazuh-remoted
  • wazuh-db
  • filebeat (if used in your pipeline)
  • OpenSearch/Elasticsearch JVM processes

Checking Wazuh Manager Processes

On the manager node, the most CPU-heavy components are typically:

  • analysisd → rule evaluation engine (most common culprit)
  • logcollector → log ingestion and normalization
  • wazuh-db → state tracking and integrity data handling

A sustained high CPU on analysisd usually indicates:

  • Too many active rules
  • High event throughput
  • Inefficient decoding patterns

Indexer and OpenSearch CPU Impact

If CPU spikes originate from the indexer layer:

  • OpenSearch/Elasticsearch JVM heap pressure may be high
  • Garbage collection cycles increase CPU consumption
  • Shard rebalancing or indexing bursts may overload CPU

Elastic’s performance documentation notes that indexing throughput is tightly coupled with heap sizing and shard strategy, and misconfiguration can lead to CPU saturation during ingestion spikes.

Correlating Spikes with Log Ingestion Rates

To confirm root cause:

  • Compare CPU spikes with log ingestion rate (EPS: events per second)
  • Check if spikes align with agent deployment changes or traffic surges
  • Review queue metrics in Wazuh manager logs

A sudden increase in EPS without filtering is one of the most common triggers of CPU saturation.


Most Common Causes of High CPU Usage

 

Excessive Log Volume

High log volume is the primary driver of CPU overload in Wazuh environments.

Typical causes:

  • No log filtering at the agent level
  • High-frequency system logs (auditd, syslog, application debug logs)
  • Misconfigured syslog ingestion pipelines flooding the manager

When every event is forwarded without filtering, the manager is forced to:

  1. Decode each event
  2. Apply rule matching
  3. Evaluate correlation logic

This leads to exponential CPU growth under load.

Rule Overload and Inefficient Decoders

Wazuh performance heavily depends on rule and decoder efficiency.

Common issues:

  • Too many active rules (especially unused or redundant rules)
  • Complex regex patterns in decoders
  • Duplicate rule evaluation across multiple rule groups

Each event may trigger dozens or even hundreds of rule evaluations, significantly increasing CPU usage in analysisd.

Reference for deeper tuning:

Wazuh Manager Bottlenecks

The manager layer is often the first point of failure under load.

Key bottleneck patterns:

  • analysisd saturation → CPU maxed due to rule evaluation backlog
  • Queue backlog issues → events waiting in processing queues
  • Thread contention → multiple workers competing for CPU cycles

When queues fill up, latency increases and CPU remains pinned at high utilization as the system tries to catch up.

Indexer / OpenSearch Pressure

The indexing layer can silently become the CPU bottleneck.

Typical issues include:

  • Heavy indexing load from high event ingestion rates
  • Poor shard configuration (too many or too large shards)
  • Insufficient JVM heap allocation causing frequent garbage collection cycles

OpenSearch documentation highlights that improper shard sizing can significantly degrade indexing throughput and increase CPU consumption due to coordination overhead.

Agent Misconfiguration

Poor endpoint configuration can push unnecessary load upstream.

Common misconfigurations:

  • Over-reporting agents sending verbose logs
  • File Integrity Monitoring (FIM) scanning too frequently
  • Audit and rootcheck modules enabled with overly aggressive policies

This leads to:

  • High event volume at the source
  • Amplified processing load on the manager
  • Increased indexing pressure downstream

Related references:


Step-by-Step Troubleshooting Guide

This section focuses on isolating whether CPU pressure originates from the Wazuh manager, indexer, or ingestion pipeline, and then progressively reducing load in a controlled manner.

Check System Resource Usage

Start by establishing a baseline of system utilization.

Identify top CPU-consuming processes

Run standard Linux profiling tools:

  • top / htop → quick real-time view of CPU-heavy processes
  • pidstat -u 1 → per-process CPU usage over time
  • ps -eo pid,ppid,cmd,%cpu,%mem --sort=-%cpu | head → snapshot of top consumers

Focus on:

  • wazuh-analysisd
  • wazuh-remoted
  • wazuh-db
  • OpenSearch / Elasticsearch Java process
  • Filebeat (if used in ingestion pipeline)

Validate whether issue is manager vs indexer

A key distinction:

  • Manager CPU spike
    • High analysisd or logcollector
    • High rule evaluation latency
    • Increased queue size in /var/ossec/queue/
  • Indexer CPU spike
    • High JVM CPU usage
    • Frequent garbage collection cycles
    • Slow indexing or shard reallocation

This separation is critical because tuning strategies differ significantly between layers.

Reference:


Analyze Wazuh Logs

Wazuh logs provide direct visibility into bottlenecks and queue saturation.

Key log files

  • /var/ossec/logs/ossec.log → core manager activity and errors
  • /var/ossec/logs/alerts/alerts.json → generated alerts and rule activity

What to look for

  • Repeated warnings like:
    • “Queue is full”
    • “Too many events received”
    • “Analysisd high load”
  • Dropped event messages
  • Frequent decoder failures
  • Sudden spikes in alert generation frequency

A pattern of repeated queue warnings is a strong indicator that CPU is being saturated due to ingestion or rule processing overload.

Reduce Log Ingestion Load

One of the most effective ways to immediately reduce CPU pressure is to lower event volume before it reaches the manager.

Filter noisy logs at agent level

  • Exclude verbose system logs (debug-level application logs)
  • Limit high-frequency event sources (e.g., auditd, syslog spam)
  • Apply ignore rules in agent configuration

Disable unnecessary modules

Disable modules that are not required in your environment:

  • Rootcheck (if not actively used)
  • Unused FIM directories
  • Excess cloud integrations or collectors

Reducing upstream noise directly reduces analysisd CPU load.

Related internal articles:

  • How to Configure File Integrity Monitoring (FIM) in Wazuh
  • How to Monitor Linux Endpoints Using Wazuh

Optimize Rules and Decoders

Rule and decoder efficiency has a direct impact on CPU usage in the manager layer.

Disable unused rules

  • Audit active rule sets
  • Remove rules that are not relevant to your environment
  • Disable entire rule groups if not needed

Merge duplicate rules

  • Consolidate overlapping detection logic
  • Avoid multiple rules triggering on the same event pattern
  • Reduce redundant regex evaluation

Use rule frequency tuning

  • Apply frequency and timeframe options to limit repetitive triggering
  • Prevent high-volume alerts from repeatedly firing on identical conditions

This reduces both CPU usage and alert noise.

Reference:

Tune Indexer Settings

If the bottleneck is in OpenSearch/Elasticsearch, indexing configuration must be optimized.

Adjust shard size

  • Avoid excessive small shards (high overhead)
  • Avoid oversized shards (slow queries and merges)
  • Aim for balanced shard distribution per node

Increase heap memory (if needed)

  • Ensure JVM heap is appropriately sized (commonly 50% of system RAM up to safe limits)
  • Monitor garbage collection frequency—frequent GC = CPU waste

Reduce indexing refresh rate

  • Increase refresh_interval to reduce indexing overhead
  • Batch indexing where possible to reduce CPU spikes

Performance Optimization Best Practices

Once immediate CPU issues are stabilized, long-term optimizations help prevent recurrence.

Enable log throttling

  • Limit repetitive event ingestion
  • Prevent burst traffic from overwhelming analysis pipeline

Use centralized filtering strategies

  • Filter logs at ingestion layer rather than manager
  • Standardize syslog filtering across all agents
  • Apply consistent log severity thresholds

Optimize agent configurations

  • Reduce unnecessary FIM monitoring paths
  • Disable unused integrations per endpoint type
  • Tune log collection frequency per environment role (server vs workstation)

Scale Wazuh horizontally (multi-node setup)

  • Split roles across multiple nodes:
    • Manager nodes
    • Indexer nodes
    • Dashboard nodes
  • Distribute ingestion load to avoid single-node CPU saturation

This is especially important in environments exceeding high EPS (events per second).

Regular performance audits

  • Monitor CPU trends over time
  • Review rule efficiency quarterly
  • Analyze ingestion growth patterns
  • Benchmark system under peak load conditions

Reference:


Advanced Debugging Techniques

For persistent or complex CPU issues, deeper system-level diagnostics are required.

Enable debug logging in Wazuh manager

  • Increase log verbosity in ossec.conf
  • Helps identify rule processing delays and queue bottlenecks
  • Useful for pinpointing inefficient decoders or rules

Use performance profiling tools

  • pidstat → CPU usage per thread over time
  • perf top → kernel-level function call hotspots
  • strace → system call tracing for bottleneck detection

These tools help determine whether CPU usage is driven by:

  • User-space rule evaluation
  • Kernel I/O waits
  • Indexing or disk bottlenecks

Monitor queue metrics in real time

Key areas:

  • Event queue depth (queue/fts/, queue/rids/)
  • Agent buffer backlog
  • Analysisd processing lag

A continuously growing queue is a direct indicator that processing capacity is below ingestion rate.

Trace rule execution timing

  • Identify slow rules using debug logs
  • Detect regex-heavy rules causing CPU spikes
  • Reorder or disable inefficient rules based on execution cost

This level of tracing is often necessary in large-scale deployments where rule complexity becomes the primary performance limiter.


When to Scale Your Wazuh Deployment

Scaling becomes necessary when optimization alone can no longer stabilize CPU usage or ingestion throughput.

At this point, the issue is no longer configuration efficiency—it is architectural capacity.

CPU consistently above threshold (>80–90%)

Sustained high CPU utilization on the manager or indexer nodes indicates that the system is operating at or beyond its designed processing capacity.

Key signals:

  • analysisd or OpenSearch processes consistently pegged near max CPU
  • No improvement after rule tuning or log filtering
  • Increased event processing latency even under normal load

At this stage, additional tuning yields diminishing returns.

High event ingestion rates

A rapid increase in EPS (events per second) is one of the strongest indicators that scaling is required.

Common triggers:

  • New logging sources (cloud integrations, Kubernetes clusters)
  • Increased audit verbosity across endpoints
  • Security incidents generating burst telemetry

When ingestion grows faster than processing capacity, CPU saturation becomes unavoidable without horizontal scaling.

 Reference:

Growing number of endpoints

As agent count increases:

  • Rule evaluation workload scales linearly (or worse, depending on rule complexity)
  • Log aggregation pressure increases on the manager
  • Queue depth grows under peak traffic

Large environments require:

  • Multi-manager deployments
  • Load-balanced agent distribution
  • Dedicated indexer clusters

Reference:

Indexer unable to keep up with ingestion

When the indexer becomes the bottleneck:

  • Indexing latency increases
  • CPU usage remains high even during idle periods
  • Shard reallocation or GC cycles dominate processing time

This typically indicates the need for:

  • Additional indexer nodes
  • Better shard distribution
  • Increased hardware resources per node

Reference:


Frequently Asked Questions (FAQ)

Question: Why is Wazuh using so much CPU?

High CPU usage in Wazuh is typically caused by excessive log ingestion, inefficient rule evaluation, or indexer bottlenecks.

The most common root cause is unfiltered high-volume telemetry overwhelming the analysisd process.

Question: Which Wazuh process consumes the most CPU?

In most deployments:

  • Manager layer: wazuh-analysisd is the primary CPU consumer
  • Indexer layer: OpenSearch/Elasticsearch JVM process dominates CPU usage

The exact bottleneck depends on whether the system is rule-bound or indexing-bound.

Question: Can reducing rules improve performance?

Yes. Reducing active rules directly lowers CPU consumption in analysisd because fewer evaluations are performed per event.

Best practices:

  • Disable unused rulesets
  • Remove redundant detection logic
  • Avoid overly complex regex patterns

Reference:

Question: Does increasing memory reduce CPU usage?

Not directly.

Increasing memory may:

  • Reduce garbage collection pressure on the indexer
  • Improve caching efficiency

However, CPU usage is primarily driven by:

  • Rule evaluation complexity
  • Event volume
  • Indexing workload

So memory tuning helps indirectly, not as a primary fix.

Question: How do I monitor Wazuh performance effectively?

Effective monitoring requires visibility across all layers:

  • System tools: top, htop, pidstat
  • Wazuh logs: /var/ossec/logs/ossec.log
  • Indexer metrics: JVM heap, GC activity, shard health
  • Queue monitoring: event backlog and processing delays

A strong approach is correlating:

  • CPU spikes
  • EPS (event ingestion rate)
  • Queue depth
  • Alert latency

Reference:


Conclusion

High CPU usage in Wazuh is rarely caused by a single factor.

It is usually the result of compounding pressure across ingestion, rule evaluation, and indexing layers.

Recap main causes of high CPU usage

The most common contributors include:

  • Excessive log volume without filtering
  • Inefficient or overloaded rule sets
  • Manager-side bottlenecks in analysisd
  • Indexer pressure from shard or heap misconfiguration
  • Misconfigured or overly verbose agents

Importance of tuning and monitoring

Sustainable Wazuh performance depends on continuous tuning:

  • Reducing noise at the source (agents)
  • Optimizing detection logic (rules/decoders)
  • Ensuring indexing efficiency (OpenSearch tuning)
  • Monitoring system health proactively rather than reactively

Without ongoing observability, CPU issues tend to reappear as environments scale.

Recommendation: proactive optimization over reactive troubleshooting

Instead of waiting for CPU spikes to impact alerting or system stability, organizations should:

  • Establish baseline performance metrics
  • Continuously audit rule and log efficiency
  • Scale architecture before saturation occurs

Internal reference cluster for ongoing optimization:

  • Wazuh vs Splunk
  • Wazuh vs Graylog
  • Wazuh vs OSSIM

A properly tuned Wazuh deployment is not just about preventing CPU spikes—it is about maintaining predictable detection performance under evolving security workloads.

Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *