Wazuh is a distributed security monitoring system built on a modular architecture composed of agents, manager, indexer, and dashboard components.
At a high level:
- Agents collect telemetry from endpoints (logs, FIM events, system activity).
- Wazuh Manager processes incoming events, applies decoders, evaluates rules, and triggers alerts.
- Indexer (OpenSearch/Elasticsearch-based) stores and indexes security events for search and correlation.
- Dashboard provides visualization and investigation capabilities.
This architecture is powerful, but inherently resource-intensive.
In SIEM/XDR systems like Wazuh, CPU spikes are expected under certain workloads, especially during:
- High log ingestion bursts
- Rule evaluation surges
- Indexing backpressure in OpenSearch/Elasticsearch
According to NIST’s continuous monitoring guidance, security telemetry pipelines must be treated as “high-throughput analytical systems” where compute demand fluctuates significantly under real-time detection workloads.
Symptoms of High CPU Usage
When CPU saturation occurs in a Wazuh environment, common symptoms include:
- Delayed or missing alerts in the dashboard
- Increased event processing latency on the manager
- Slow query performance in the dashboard
- Agent communication lag or queue buildup
- System-wide performance degradation on the Wazuh node(s)
In production deployments, these symptoms often indicate bottlenecks in either rule processing (manager layer) or indexing throughput (storage layer).
How to Identify High CPU Usage in Wazuh
Before fixing CPU issues, you need to pinpoint which component is responsible.
Using top, htop, and System Monitoring Tools
Start with standard Linux observability tools:
top/htop→ Identify processes consuming CPU in real timepidstat→ Break down CPU usage per process threadvmstat→ Detect CPU run queue pressureiostat→ Check if CPU spikes correlate with disk I/O saturation
Focus on:
wazuh-analysisdwazuh-remotedwazuh-dbfilebeat(if used in your pipeline)- OpenSearch/Elasticsearch JVM processes
Checking Wazuh Manager Processes
On the manager node, the most CPU-heavy components are typically:
- analysisd → rule evaluation engine (most common culprit)
- logcollector → log ingestion and normalization
- wazuh-db → state tracking and integrity data handling
A sustained high CPU on analysisd usually indicates:
- Too many active rules
- High event throughput
- Inefficient decoding patterns
Indexer and OpenSearch CPU Impact
If CPU spikes originate from the indexer layer:
- OpenSearch/Elasticsearch JVM heap pressure may be high
- Garbage collection cycles increase CPU consumption
- Shard rebalancing or indexing bursts may overload CPU
Elastic’s performance documentation notes that indexing throughput is tightly coupled with heap sizing and shard strategy, and misconfiguration can lead to CPU saturation during ingestion spikes.
Correlating Spikes with Log Ingestion Rates
To confirm root cause:
- Compare CPU spikes with log ingestion rate (EPS: events per second)
- Check if spikes align with agent deployment changes or traffic surges
- Review queue metrics in Wazuh manager logs
A sudden increase in EPS without filtering is one of the most common triggers of CPU saturation.
Most Common Causes of High CPU Usage
Excessive Log Volume
High log volume is the primary driver of CPU overload in Wazuh environments.
Typical causes:
- No log filtering at the agent level
- High-frequency system logs (auditd, syslog, application debug logs)
- Misconfigured syslog ingestion pipelines flooding the manager
When every event is forwarded without filtering, the manager is forced to:
- Decode each event
- Apply rule matching
- Evaluate correlation logic
This leads to exponential CPU growth under load.
Rule Overload and Inefficient Decoders
Wazuh performance heavily depends on rule and decoder efficiency.
Common issues:
- Too many active rules (especially unused or redundant rules)
- Complex regex patterns in decoders
- Duplicate rule evaluation across multiple rule groups
Each event may trigger dozens or even hundreds of rule evaluations, significantly increasing CPU usage in analysisd.
Reference for deeper tuning:
Wazuh Manager Bottlenecks
The manager layer is often the first point of failure under load.
Key bottleneck patterns:
- analysisd saturation → CPU maxed due to rule evaluation backlog
- Queue backlog issues → events waiting in processing queues
- Thread contention → multiple workers competing for CPU cycles
When queues fill up, latency increases and CPU remains pinned at high utilization as the system tries to catch up.
Indexer / OpenSearch Pressure
The indexing layer can silently become the CPU bottleneck.
Typical issues include:
- Heavy indexing load from high event ingestion rates
- Poor shard configuration (too many or too large shards)
- Insufficient JVM heap allocation causing frequent garbage collection cycles
OpenSearch documentation highlights that improper shard sizing can significantly degrade indexing throughput and increase CPU consumption due to coordination overhead.
Agent Misconfiguration
Poor endpoint configuration can push unnecessary load upstream.
Common misconfigurations:
- Over-reporting agents sending verbose logs
- File Integrity Monitoring (FIM) scanning too frequently
- Audit and rootcheck modules enabled with overly aggressive policies
This leads to:
- High event volume at the source
- Amplified processing load on the manager
- Increased indexing pressure downstream
Related references:
- How to Install a Wazuh Agent on Windows Server
- How to Monitor Windows Event Logs Using Wazuh
- How to Monitor Failed SSH Login Attempts Using Wazuh
Step-by-Step Troubleshooting Guide
This section focuses on isolating whether CPU pressure originates from the Wazuh manager, indexer, or ingestion pipeline, and then progressively reducing load in a controlled manner.
Check System Resource Usage
Start by establishing a baseline of system utilization.
Identify top CPU-consuming processes
Run standard Linux profiling tools:
top/htop→ quick real-time view of CPU-heavy processespidstat -u 1→ per-process CPU usage over timeps -eo pid,ppid,cmd,%cpu,%mem --sort=-%cpu | head→ snapshot of top consumers
Focus on:
wazuh-analysisdwazuh-remotedwazuh-db- OpenSearch / Elasticsearch Java process
- Filebeat (if used in ingestion pipeline)
Validate whether issue is manager vs indexer
A key distinction:
- Manager CPU spike
- High
analysisdorlogcollector - High rule evaluation latency
- Increased queue size in
/var/ossec/queue/
- High
- Indexer CPU spike
- High JVM CPU usage
- Frequent garbage collection cycles
- Slow indexing or shard reallocation
This separation is critical because tuning strategies differ significantly between layers.
Reference:
Analyze Wazuh Logs
Wazuh logs provide direct visibility into bottlenecks and queue saturation.
Key log files
/var/ossec/logs/ossec.log→ core manager activity and errors/var/ossec/logs/alerts/alerts.json→ generated alerts and rule activity
What to look for
- Repeated warnings like:
- “Queue is full”
- “Too many events received”
- “Analysisd high load”
- Dropped event messages
- Frequent decoder failures
- Sudden spikes in alert generation frequency
A pattern of repeated queue warnings is a strong indicator that CPU is being saturated due to ingestion or rule processing overload.
Reduce Log Ingestion Load
One of the most effective ways to immediately reduce CPU pressure is to lower event volume before it reaches the manager.
Filter noisy logs at agent level
- Exclude verbose system logs (debug-level application logs)
- Limit high-frequency event sources (e.g., auditd, syslog spam)
- Apply
ignorerules in agent configuration
Disable unnecessary modules
Disable modules that are not required in your environment:
- Rootcheck (if not actively used)
- Unused FIM directories
- Excess cloud integrations or collectors
Reducing upstream noise directly reduces analysisd CPU load.
Related internal articles:
- How to Configure File Integrity Monitoring (FIM) in Wazuh
- How to Monitor Linux Endpoints Using Wazuh
Optimize Rules and Decoders
Rule and decoder efficiency has a direct impact on CPU usage in the manager layer.
Disable unused rules
- Audit active rule sets
- Remove rules that are not relevant to your environment
- Disable entire rule groups if not needed
Merge duplicate rules
- Consolidate overlapping detection logic
- Avoid multiple rules triggering on the same event pattern
- Reduce redundant regex evaluation
Use rule frequency tuning
- Apply
frequencyandtimeframeoptions to limit repetitive triggering - Prevent high-volume alerts from repeatedly firing on identical conditions
This reduces both CPU usage and alert noise.
Reference:
Tune Indexer Settings
If the bottleneck is in OpenSearch/Elasticsearch, indexing configuration must be optimized.
Adjust shard size
- Avoid excessive small shards (high overhead)
- Avoid oversized shards (slow queries and merges)
- Aim for balanced shard distribution per node
Increase heap memory (if needed)
- Ensure JVM heap is appropriately sized (commonly 50% of system RAM up to safe limits)
- Monitor garbage collection frequency—frequent GC = CPU waste
Reduce indexing refresh rate
- Increase
refresh_intervalto reduce indexing overhead - Batch indexing where possible to reduce CPU spikes
Performance Optimization Best Practices
Once immediate CPU issues are stabilized, long-term optimizations help prevent recurrence.
Enable log throttling
- Limit repetitive event ingestion
- Prevent burst traffic from overwhelming analysis pipeline
Use centralized filtering strategies
- Filter logs at ingestion layer rather than manager
- Standardize syslog filtering across all agents
- Apply consistent log severity thresholds
Optimize agent configurations
- Reduce unnecessary FIM monitoring paths
- Disable unused integrations per endpoint type
- Tune log collection frequency per environment role (server vs workstation)
Scale Wazuh horizontally (multi-node setup)
- Split roles across multiple nodes:
- Manager nodes
- Indexer nodes
- Dashboard nodes
- Distribute ingestion load to avoid single-node CPU saturation
This is especially important in environments exceeding high EPS (events per second).
Regular performance audits
- Monitor CPU trends over time
- Review rule efficiency quarterly
- Analyze ingestion growth patterns
- Benchmark system under peak load conditions
Reference:
Advanced Debugging Techniques
For persistent or complex CPU issues, deeper system-level diagnostics are required.
Enable debug logging in Wazuh manager
- Increase log verbosity in
ossec.conf - Helps identify rule processing delays and queue bottlenecks
- Useful for pinpointing inefficient decoders or rules
Use performance profiling tools
pidstat→ CPU usage per thread over timeperf top→ kernel-level function call hotspotsstrace→ system call tracing for bottleneck detection
These tools help determine whether CPU usage is driven by:
- User-space rule evaluation
- Kernel I/O waits
- Indexing or disk bottlenecks
Monitor queue metrics in real time
Key areas:
- Event queue depth (
queue/fts/,queue/rids/) - Agent buffer backlog
- Analysisd processing lag
A continuously growing queue is a direct indicator that processing capacity is below ingestion rate.
Trace rule execution timing
- Identify slow rules using debug logs
- Detect regex-heavy rules causing CPU spikes
- Reorder or disable inefficient rules based on execution cost
This level of tracing is often necessary in large-scale deployments where rule complexity becomes the primary performance limiter.
When to Scale Your Wazuh Deployment
Scaling becomes necessary when optimization alone can no longer stabilize CPU usage or ingestion throughput.
At this point, the issue is no longer configuration efficiency—it is architectural capacity.
CPU consistently above threshold (>80–90%)
Sustained high CPU utilization on the manager or indexer nodes indicates that the system is operating at or beyond its designed processing capacity.
Key signals:
analysisdor OpenSearch processes consistently pegged near max CPU- No improvement after rule tuning or log filtering
- Increased event processing latency even under normal load
At this stage, additional tuning yields diminishing returns.
High event ingestion rates
A rapid increase in EPS (events per second) is one of the strongest indicators that scaling is required.
Common triggers:
- New logging sources (cloud integrations, Kubernetes clusters)
- Increased audit verbosity across endpoints
- Security incidents generating burst telemetry
When ingestion grows faster than processing capacity, CPU saturation becomes unavoidable without horizontal scaling.
Reference:
Growing number of endpoints
As agent count increases:
- Rule evaluation workload scales linearly (or worse, depending on rule complexity)
- Log aggregation pressure increases on the manager
- Queue depth grows under peak traffic
Large environments require:
- Multi-manager deployments
- Load-balanced agent distribution
- Dedicated indexer clusters
Reference:
Indexer unable to keep up with ingestion
When the indexer becomes the bottleneck:
- Indexing latency increases
- CPU usage remains high even during idle periods
- Shard reallocation or GC cycles dominate processing time
This typically indicates the need for:
- Additional indexer nodes
- Better shard distribution
- Increased hardware resources per node
Reference:
Frequently Asked Questions (FAQ)
Question: Why is Wazuh using so much CPU?
High CPU usage in Wazuh is typically caused by excessive log ingestion, inefficient rule evaluation, or indexer bottlenecks.
The most common root cause is unfiltered high-volume telemetry overwhelming the analysisd process.
Question: Which Wazuh process consumes the most CPU?
In most deployments:
- Manager layer:
wazuh-analysisdis the primary CPU consumer - Indexer layer: OpenSearch/Elasticsearch JVM process dominates CPU usage
The exact bottleneck depends on whether the system is rule-bound or indexing-bound.
Question: Can reducing rules improve performance?
Yes. Reducing active rules directly lowers CPU consumption in analysisd because fewer evaluations are performed per event.
Best practices:
- Disable unused rulesets
- Remove redundant detection logic
- Avoid overly complex regex patterns
Reference:
Question: Does increasing memory reduce CPU usage?
Not directly.
Increasing memory may:
- Reduce garbage collection pressure on the indexer
- Improve caching efficiency
However, CPU usage is primarily driven by:
- Rule evaluation complexity
- Event volume
- Indexing workload
So memory tuning helps indirectly, not as a primary fix.
Question: How do I monitor Wazuh performance effectively?
Effective monitoring requires visibility across all layers:
- System tools:
top,htop,pidstat - Wazuh logs:
/var/ossec/logs/ossec.log - Indexer metrics: JVM heap, GC activity, shard health
- Queue monitoring: event backlog and processing delays
A strong approach is correlating:
- CPU spikes
- EPS (event ingestion rate)
- Queue depth
- Alert latency
Reference:
Conclusion
High CPU usage in Wazuh is rarely caused by a single factor.
It is usually the result of compounding pressure across ingestion, rule evaluation, and indexing layers.
Recap main causes of high CPU usage
The most common contributors include:
- Excessive log volume without filtering
- Inefficient or overloaded rule sets
- Manager-side bottlenecks in
analysisd - Indexer pressure from shard or heap misconfiguration
- Misconfigured or overly verbose agents
Importance of tuning and monitoring
Sustainable Wazuh performance depends on continuous tuning:
- Reducing noise at the source (agents)
- Optimizing detection logic (rules/decoders)
- Ensuring indexing efficiency (OpenSearch tuning)
- Monitoring system health proactively rather than reactively
Without ongoing observability, CPU issues tend to reappear as environments scale.
Recommendation: proactive optimization over reactive troubleshooting
Instead of waiting for CPU spikes to impact alerting or system stability, organizations should:
- Establish baseline performance metrics
- Continuously audit rule and log efficiency
- Scale architecture before saturation occurs
Internal reference cluster for ongoing optimization:
- Wazuh vs Splunk
- Wazuh vs Graylog
- Wazuh vs OSSIM
A properly tuned Wazuh deployment is not just about preventing CPU spikes—it is about maintaining predictable detection performance under evolving security workloads.

Be First to Comment