As your security environment grows, Wazuh can quickly begin processing millions of events every day.
Endpoint telemetry, system logs, cloud events, file integrity monitoring, vulnerability scans, and custom detection rules all compete for CPU, memory, storage, and indexing resources.
Without proper tuning, even a well-designed deployment can suffer from high resource utilization, delayed alerts, slow dashboards, dropped events, and excessive false positives.
A variety of factors can negatively impact performance, including excessive log collection, inefficient custom rules, oversized File Integrity Monitoring (FIM) configurations, insufficient OpenSearch heap memory, overloaded managers, poor storage performance, and unnecessary event duplication.
As environments scale, these issues compound and can significantly reduce both detection speed and operational visibility.
Several major components influence overall Wazuh performance:
- Wazuh agents
- Logcollector
- Syscollector
- Syscheck (File Integrity Monitoring)
- Rootcheck
- Active Response
- Wazuh Manager
- OpenSearch Indexer
- Wazuh Dashboard
- Storage subsystem
- Network bandwidth
- Detection rules
- Decoders
- Index lifecycle management
Each component contributes differently to overall system performance, making end-to-end optimization essential rather than focusing on only a single bottleneck.
In this guide, you’ll learn how Wazuh processes security data, where performance bottlenecks typically occur, which configuration settings have the greatest impact, and how to optimize every major component of the platform.
You’ll also learn practical techniques for scaling Wazuh, reducing unnecessary workload, improving indexing performance, minimizing false positives, and building a faster, more stable deployment for enterprise environments.
The performance of a Wazuh deployment is closely tied to overall monitoring architecture.
See The Complete Wazuh Monitoring Guide to understand how monitoring components generate and process security telemetry throughout your environment.
Understanding Wazuh Performance
Optimizing Wazuh starts with understanding how security events travel through the platform.
Every log, file change, vulnerability scan, or endpoint event passes through multiple processing stages before appearing as an alert in the dashboard.
Performance issues can occur at any point in this pipeline, making it essential to understand each component’s role.
How Wazuh Processes Security Data
A simplified processing pipeline looks like this:
Endpoint
│
▼
Wazuh Agent
│
▼
Log Collection
(Syscheck / Syscollector / Rootcheck)
│
▼
Secure Agent Communication
│
▼
Wazuh Manager
│
├── Decoders
├── Rules
├── Correlation
└── Active Response
│
▼
OpenSearch Indexer
│
▼
Wazuh Dashboard
Each stage consumes different system resources and can become a bottleneck under heavy workloads.
Wazuh Agents
The Wazuh agent runs on monitored endpoints and is responsible for collecting security telemetry.
Depending on its configuration, an agent may collect:
- Operating system logs
- Windows Event Logs
- Linux Syslog
- Application logs
- File Integrity Monitoring events
- Inventory information
- Vulnerability detection data
- Security configuration assessments
Although each individual agent consumes relatively little CPU, thousands of agents can collectively generate enormous event volumes that stress the manager and indexer.
Reducing unnecessary data collection at the endpoint is often the most effective optimization strategy because it eliminates unnecessary processing throughout the rest of the pipeline.
Logcollector
Logcollector continuously monitors configured log sources and forwards new entries to the Wazuh manager.
Performance issues commonly occur when administrators:
- Monitor unnecessary log files
- Collect verbose debug logs
- Read duplicate log sources
- Include excessive wildcard paths
- Process extremely high-volume applications
Poor log collection strategies often generate far more events than security teams actually need.
For a detailed walkthrough of preventing lost events during heavy log ingestion, see Fix Wazuh Logcollector Dropped Messages.
Syscollector
Syscollector inventories endpoint assets such as:
- Installed software
- Hardware
- Operating system information
- Running processes
- Network interfaces
- Packages
Because inventory data changes infrequently, aggressive scan intervals usually provide little additional value while increasing CPU usage and network traffic.
Scheduling inventory scans appropriately helps reduce unnecessary endpoint load.
Syscheck (File Integrity Monitoring)
Syscheck monitors file systems for:
- File creation
- File deletion
- Permission changes
- Ownership changes
- Content modifications
- Registry changes (Windows)
While File Integrity Monitoring is one of Wazuh’s most valuable security capabilities, it is also one of the most resource-intensive.
Scanning large directory trees, frequently changing files, build directories, container volumes, package caches, or temporary folders can consume significant CPU and generate excessive alerts.
Learn how to dramatically reduce resource consumption in How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU.
Rootcheck
Rootcheck searches systems for indicators of compromise, rootkits, hidden processes, suspicious ports, and unauthorized system modifications.
Since rootkit detection is generally performed on scheduled intervals rather than continuously, performance impact is usually modest.
However, unnecessarily frequent scans across thousands of endpoints can noticeably increase CPU utilization.
Active Response
Active Response automatically executes predefined remediation actions when certain rules trigger.
Examples include:
- Blocking malicious IP addresses
- Killing malicious processes
- Disabling compromised accounts
- Running custom scripts
Performance issues rarely originate from Active Response itself but can arise when response scripts are inefficient or trigger excessively due to noisy detection rules.
Wazuh Manager
The manager acts as the central processing engine.
Its responsibilities include:
- Receiving agent events
- Decoding logs
- Evaluating detection rules
- Correlating events
- Generating alerts
- Coordinating Active Response
- Forwarding alerts to the indexer
As deployments grow, the manager often becomes the primary CPU bottleneck because every incoming event must pass through its rule engine.
Inefficient custom rules, excessive event volume, and unnecessary correlation logic significantly increase processing time.
Indexer (OpenSearch)
After alerts are generated, they are stored inside OpenSearch.
The indexer is responsible for:
- Writing alerts
- Maintaining indexes
- Compressing data
- Executing searches
- Aggregations
- Dashboard queries
High indexing latency, insufficient heap memory, disk bottlenecks, or oversized shards can dramatically reduce overall system responsiveness.
Learn how to properly size Java heap memory in How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.
Dashboard
The Wazuh Dashboard provides visualization, search, reporting, and investigation capabilities.
Dashboard performance depends on:
- Query complexity
- Index size
- OpenSearch performance
- Browser resources
- Visualization configuration
- Aggregation speed
A slow dashboard often indicates underlying indexing or storage bottlenecks rather than problems with the interface itself.
Where Performance Bottlenecks Usually Occur
Although every environment is different, performance issues tend to appear in a handful of predictable areas.
Endpoint Resource Usage
Agents consume CPU while collecting logs, monitoring files, scanning configurations, and generating telemetry.
Common causes include:
- Oversized FIM configurations
- Excessive Windows Event Logs
- Frequent inventory scans
- Large log files
- High-frequency scheduled scans
Manager Processing
The manager evaluates every incoming event against thousands of detection rules.
Heavy workloads increase:
- CPU utilization
- Processing queues
- Event latency
- Memory consumption
Large enterprise deployments often require clustering or load balancing to distribute processing.
Rule Evaluation
Custom rules with inefficient matching logic increase processing time considerably.
Common issues include:
- Overly broad regex patterns
- Excessive nested rules
- Duplicate rules
- Poor rule ordering
- Expensive correlation logic
Event Decoding
Before rule evaluation, every log must be decoded into structured fields.
Complex decoders and malformed log formats increase parsing overhead and reduce throughput.
Alert Indexing
Writing alerts to OpenSearch requires:
- JSON serialization
- Index mapping
- Shard selection
- Disk writes
- Replication
- Segment merging
Slow disks or poorly configured indexes can create indexing backlogs that delay alert availability.
Search Performance
Large indexes increase search latency, particularly when dashboards execute multiple aggregations simultaneously.
Performance depends heavily on:
- Heap allocation
- Index lifecycle policies
- Shard sizing
- Query optimization
Dashboard Rendering
Complex visualizations and large time ranges require significant processing.
Rendering delays commonly result from expensive backend queries rather than browser limitations.
Storage Limitations
Storage performance affects nearly every component.
Slow disks increase:
- Indexing latency
- Search times
- Snapshot duration
- Recovery speed
- Cluster stability
Using SSD or NVMe storage typically provides substantial improvements for high-ingestion environments.
Expert Insight: The official Wazuh documentation recommends carefully limiting collected data, tuning monitored directories, and optimizing manager and indexer resources before simply increasing hardware capacity. Eliminating unnecessary workload generally produces larger performance gains than adding CPU alone.
Key Factors That Affect Wazuh Performance
Even powerful servers can struggle if Wazuh is configured inefficiently.
Most performance problems stem from excessive data collection rather than insufficient hardware.
Understanding the primary workload drivers helps prioritize optimization efforts.
Log Volume
Every collected log must be transmitted, decoded, evaluated against detection rules, indexed, stored, and queried.
As log volume increases, resource consumption rises across every component of the platform.
The most effective optimization strategy is often reducing unnecessary events before they ever reach the manager.
High Event Ingestion
Organizations monitoring thousands of endpoints may process hundreds of thousands, or even millions, of events each hour.
High ingestion rates increase:
- CPU utilization
- Memory usage
- Network bandwidth
- Indexing latency
- Storage consumption
- Search complexity
Instead of collecting everything, prioritize logs with meaningful security value.
Excessive Windows Event Logs
Windows Event Logs are among the largest contributors to event volume.
Administrators frequently collect:
- Security
- System
- Application
- PowerShell
- Sysmon
- DNS
- Task Scheduler
- Print Service
- WMI
- Defender
Without filtering, these channels often generate significant noise and unnecessary processing.
Verbose Application Logging
Applications running in debug or verbose modes can generate thousands of events every minute.
Examples include:
- Web servers
- Database servers
- Java applications
- Containers
- Kubernetes workloads
- Development environments
Whenever possible, reduce logging verbosity in production while preserving security-relevant events.
Duplicate Log Collection
Duplicate events waste CPU, storage, bandwidth, and indexing capacity.
Common causes include:
- Monitoring identical log files twice
- Collecting Windows logs through multiple mechanisms
- Duplicate syslog forwarding
- Multiple agents monitoring shared resources
- SIEM integrations forwarding identical events
Removing duplicate collection improves performance without sacrificing visibility.
Expert Insight: According to the OpenSearch project, reducing unnecessary indexing workload typically provides greater improvements than hardware upgrades because indexing is one of the most resource-intensive operations performed by the search engine.
Excessive event volume often leads to noisy detections.
Learn practical filtering techniques in How to Reduce False Positives in Wazuh.
If excessive event volume is driving CPU utilization on the manager, see Why Is Wazuh Using High CPU? Troubleshooting Guide.
File Integrity Monitoring (FIM)
File Integrity Monitoring (FIM) is one of Wazuh’s most valuable security capabilities because it detects unauthorized changes to files, directories, registry keys, and system configurations.
However, it is also one of the most resource-intensive modules in the platform.
Improperly configured FIM can significantly increase CPU utilization on endpoints, generate millions of events, and overwhelm the Wazuh manager.
Optimizing FIM is usually one of the quickest ways to improve overall Wazuh performance without sacrificing meaningful security visibility.
Large Directories
Monitoring large directory trees dramatically increases the amount of work performed during every scan.
Examples include:
- User home directories
- Development repositories
- Virtual machine images
- Docker volumes
- Kubernetes persistent volumes
- Backup directories
- Package caches
- Temporary folders
- Log archives
Many of these locations contain hundreds of thousands of files that rarely provide useful security telemetry.
Instead of monitoring entire drives, focus on directories containing:
- System binaries
- Configuration files
- Critical application data
- Authentication files
- Startup scripts
- Security-sensitive executables
Reducing the number of monitored files directly lowers CPU usage, memory consumption, and event generation.
Frequent File Changes
Some directories experience constant file modifications.
Examples include:
- Web server access logs
- Application log directories
- Browser caches
- Temporary files
- Database transaction logs
- Container overlay filesystems
- Build artifacts
- CI/CD workspaces
Monitoring rapidly changing files generates a continuous stream of FIM events that consume processing resources across the entire Wazuh pipeline.
Exclude high-churn directories whenever possible and monitor only files that provide meaningful security value.
Real-Time Monitoring Overhead
Real-time monitoring enables Wazuh to detect file changes immediately instead of waiting for scheduled scans.
While this improves detection speed, it also increases endpoint resource usage because the operating system continuously watches monitored files for changes.
In environments with frequent write operations, real-time monitoring can generate substantial CPU activity.
A balanced approach often works best:
- Use real-time monitoring for critical system directories.
- Schedule periodic scans for lower-risk locations.
- Exclude temporary or frequently changing paths.
This approach preserves rapid detection for sensitive assets while reducing unnecessary workload.
Hash Calculation Costs
Whenever a monitored file changes, Wazuh calculates cryptographic hashes to verify file integrity.
Depending on configuration, this may include:
- MD5
- SHA-1
- SHA-256
Although modern processors calculate hashes efficiently, hashing thousands of large files consumes noticeable CPU time and disk I/O.
Hash calculations become especially expensive when monitoring:
- Large databases
- Virtual machine disks
- Backup files
- ISO images
- Media repositories
Limiting hash generation to security-critical files significantly reduces resource consumption while maintaining effective integrity monitoring.
Expert Insight: The official Wazuh documentation recommends carefully defining monitored paths and excluding frequently changing directories to reduce unnecessary File Integrity Monitoring workload. Targeted monitoring provides better scalability than attempting to monitor entire filesystems.
For a complete walkthrough of reducing File Integrity Monitoring resource usage, see How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU.
Detection Rules
Detection rules determine whether incoming events represent suspicious or malicious activity.
Every event received by the Wazuh manager is evaluated against thousands of rules, making rule processing one of the largest contributors to CPU utilization.
Well-designed rules improve both detection quality and system performance.
Poorly written rules can dramatically slow event processing and increase alert latency.
Expensive Custom Rules
Custom rules are extremely powerful but often introduce unnecessary overhead.
Common performance issues include:
- Matching against every incoming event
- Multiple nested conditions
- Broad wildcard matching
- Large lookup lists
- Unnecessary regular expressions
- Duplicate rule logic
Each additional condition requires more CPU cycles during evaluation.
Whenever possible, create narrowly scoped rules that evaluate only relevant event types.
Large Rulesets
Many organizations continually add community rules, compliance packs, vendor content, and internally developed detections.
While comprehensive coverage improves visibility, oversized rulesets increase processing time because every event must be compared against more detection logic.
Regularly review your ruleset to:
- Remove obsolete rules
- Disable unused integrations
- Consolidate duplicate detections
- Archive deprecated content
- Prioritize high-value detections
Smaller, well-maintained rulesets generally perform better than excessively large collections.
Regex Complexity
Regular expressions are among the most CPU-intensive operations performed during rule evaluation.
Poorly optimized regex patterns can:
- Require excessive backtracking
- Evaluate unnecessary text
- Consume significant CPU
- Delay event processing
Examples of inefficient patterns include:
- Nested wildcards
- Broad “match everything” expressions
- Repeated capture groups
- Unanchored expressions
Whenever possible:
- Match specific fields instead of entire log messages.
- Use exact string matching when practical.
- Anchor regex patterns to expected positions.
- Keep expressions as simple as possible.
Even small regex optimizations can noticeably improve throughput in high-volume environments.
Rule Chaining
Rule chaining allows one rule to trigger another, enabling sophisticated correlation and threat detection.
However, deep dependency chains increase processing time because multiple rules must execute before an alert is generated.
Complex correlation logic should be reserved for high-value detections rather than routine event processing.
A practical optimization strategy is to:
- Perform simple filtering first.
- Eliminate obvious benign events.
- Reserve advanced correlation for suspicious activity.
This minimizes unnecessary computation while preserving detection accuracy.
Expert Insight: Security engineers generally recommend filtering low-value events as early as possible in the processing pipeline. Reducing unnecessary rule evaluations improves throughput and allows computationally expensive correlation logic to focus on higher-risk events.
Inefficient detection logic often contributes to alert fatigue.
See How to Reduce False Positives in Wazuh for techniques that improve both performance and detection quality.
OpenSearch Performance
The Wazuh Indexer, powered by OpenSearch, stores alerts and powers dashboard searches, visualizations, and investigations.
Even if the Wazuh manager processes events efficiently, poor OpenSearch performance can create indexing delays, slow searches, and unresponsive dashboards.
Properly tuning the indexer is essential for large-scale deployments.
Heap Size
OpenSearch relies on the Java Virtual Machine (JVM), making heap allocation one of its most important performance settings.
Heap memory stores:
- Search caches
- Field data
- Query results
- Cluster metadata
- Index structures
Insufficient heap memory may cause:
- Frequent garbage collection
- Slow searches
- Indexing delays
- Node instability
- Out-of-memory errors
Conversely, allocating excessive heap reduces the operating system’s available file cache, which can also hurt performance.
OpenSearch generally recommends allocating approximately 50% of available RAM to the JVM heap while leaving sufficient memory for the operating system.
JVM Garbage Collection
Garbage collection periodically frees unused Java memory.
Under heavy workloads, frequent garbage collection pauses can temporarily interrupt indexing and query execution.
Common symptoms include:
- Dashboard freezes
- Indexing latency
- High CPU utilization
- Slow searches
- Cluster instability
Monitoring garbage collection activity helps identify whether memory tuning is required before adding additional hardware.
Shard Configuration
Every index is divided into one or more shards.
Improper shard sizing is a common cause of poor OpenSearch performance.
Too many small shards increase:
- Cluster overhead
- Memory usage
- Search coordination
- Metadata processing
Oversized shards increase:
- Recovery time
- Rebalancing duration
- Query latency
A balanced shard strategy improves indexing efficiency while maintaining fast search performance.
Disk I/O
Disk performance directly affects nearly every OpenSearch operation.
Slow storage increases:
- Alert indexing latency
- Search response times
- Segment merging
- Snapshot duration
- Recovery performance
Enterprise deployments typically benefit from SSD or NVMe storage because indexing workloads involve continuous random reads and writes.
Usually, storage latency becomes the primary bottleneck long before CPU resources are exhausted.
Storage Capacity
Storage planning extends beyond simply having enough free disk space.
As indexes grow larger:
- Searches become slower.
- Snapshot sizes increase.
- Recovery takes longer.
- Merge operations consume more resources.
- Cluster maintenance becomes more difficult.
Implementing index lifecycle management (ILM), retention policies, and regular index cleanup helps maintain consistent performance over time.
Expert Insight: The OpenSearch project emphasizes that efficient memory allocation, appropriate shard sizing, and fast storage often deliver greater performance improvements than simply adding CPU cores. Proper cluster design is critical for maintaining indexing and query performance at scale.
If memory pressure is causing indexing delays or crashes, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.
Hardware Resources
Although software optimization should always come before hardware upgrades, adequate infrastructure is essential for maintaining a responsive and reliable Wazuh deployment.
Every component, from agents to the manager and OpenSearch Indexer, depends on sufficient compute, memory, storage, and network resources to process security events efficiently.
Simply adding more hardware is rarely enough to solve performance problems caused by excessive logging, inefficient detection rules, or poor configuration.
However, properly sized infrastructure provides the foundation needed for stable, scalable security monitoring.
CPU
CPU is one of the most heavily utilized resources in a Wazuh deployment.
The processor is responsible for:
- Event decoding
- Rule evaluation
- Log parsing
- File integrity monitoring
- Data compression
- Search execution
- Dashboard queries
- OpenSearch indexing
High CPU utilization often indicates one or more of the following:
- Excessive event ingestion
- Inefficient custom rules
- Large FIM workloads
- Complex regular expressions
- Heavy search activity
- Frequent OpenSearch garbage collection
Monitor sustained CPU usage rather than occasional spikes.
Temporary increases during scheduled scans or indexing operations are normal, while consistently high utilization usually indicates a bottleneck that requires investigation.
Whenever possible:
- Separate the Wazuh Manager and OpenSearch Indexer onto dedicated servers.
- Scale horizontally for enterprise deployments.
- Reduce unnecessary workload before increasing CPU resources.
Memory
Memory plays a critical role in maintaining smooth performance across every component.
Insufficient RAM can lead to:
- Swapping
- Slow searches
- Queue backlogs
- Delayed alerts
- Dashboard latency
- OpenSearch instability
Memory is particularly important for:
- OpenSearch heap allocation
- Operating system page cache
- Search caches
- Manager processing queues
- Agent buffers
Regular monitoring helps identify gradual memory growth that may indicate oversized indexes, insufficient heap allocation, or increasing workload.
Disk Performance
Security monitoring platforms perform continuous disk operations.
Examples include:
- Writing alerts
- Reading log files
- Updating indexes
- Performing snapshots
- Merging index segments
- Searching historical data
Traditional hard drives often become performance bottlenecks under sustained indexing workloads.
Solid-state drives (SSD) and NVMe storage typically provide:
- Faster indexing
- Lower search latency
- Quicker recovery
- Improved dashboard responsiveness
- Better cluster stability
Storage performance frequently has a greater impact on OpenSearch responsiveness than additional CPU cores.
Network Bandwidth
Every agent continuously communicates with the Wazuh manager.
Bandwidth requirements increase as organizations collect:
- Security logs
- File integrity events
- Vulnerability information
- Cloud telemetry
- Container logs
- Windows Event Logs
Network congestion may result in:
- Delayed event delivery
- Agent disconnections
- Increased processing queues
- Synchronization delays
- Dropped messages
While most deployments do not saturate modern enterprise networks, geographically distributed environments should monitor network latency and bandwidth utilization to ensure reliable agent communication.
Expert Insight: Wazuh recommends sizing infrastructure based on expected event volume and deployment scale rather than endpoint count alone. A relatively small number of servers generating high log volumes can consume more resources than thousands of lightly monitored endpoints.
High CPU utilization is often caused by workload distribution rather than insufficient hardware.
See Why Is Wazuh Using High CPU? Troubleshooting Guide for practical troubleshooting techniques.
Agent Configuration
The Wazuh agent serves as the first stage of the data collection pipeline.
Efficient agent configuration reduces unnecessary workload before events ever reach the manager, making it one of the most effective ways to optimize overall platform performance.
Instead of processing every available data source, configure agents to collect only information that supports your organization’s security objectives.
Monitoring Frequency
Monitoring frequency determines how often an agent performs scheduled tasks such as inventory collection, policy evaluation, and integrity scans.
Very short intervals increase:
- CPU utilization
- Disk activity
- Network traffic
- Event generation
Longer intervals reduce resource consumption while remaining appropriate for information that changes infrequently.
Different monitoring tasks should use intervals that reflect the expected rate of change.
For example:
- Hardware inventory may only require daily collection.
- Software inventory may be collected every few hours.
- Security logs should be monitored continuously.
- File Integrity Monitoring depends on the sensitivity of monitored files.
Module Selection
Every enabled module consumes system resources.
Common Wazuh modules include:
- Logcollector
- Syscheck
- Syscollector
- Rootcheck
- Vulnerability Detection
- Security Configuration Assessment
- Active Response
Not every endpoint requires every module.
For example:
- Database servers may prioritize log monitoring.
- Domain controllers may emphasize authentication events.
- Development systems may require different monitoring than production servers.
- Container hosts may benefit from specialized configurations.
Disabling unnecessary modules reduces endpoint overhead and lowers the total event volume processed by the manager.
Scan Intervals
Scheduled scans should balance detection speed with resource consumption.
Aggressive scanning schedules may:
- Increase endpoint CPU usage.
- Generate duplicate data.
- Produce unnecessary network traffic.
- Create processing spikes on the manager.
Review scan schedules for:
- Syscheck
- Rootcheck
- Syscollector
- Vulnerability Detection
- Security Configuration Assessment
Adjust intervals based on operational requirements rather than using identical settings across every endpoint.
Event Buffering
Temporary spikes in event generation can overwhelm network links or the Wazuh manager.
Event buffering helps agents temporarily store events until they can be transmitted successfully.
Proper buffering improves reliability by:
- Reducing dropped events
- Handling temporary network interruptions
- Smoothing traffic bursts
- Preventing unnecessary retransmissions
However, excessively large buffers may increase endpoint memory usage and delay alert delivery if events accumulate faster than they can be processed.
Finding the appropriate balance depends on expected event volume and network reliability.
Expert Insight: Many experienced Wazuh administrators recommend optimizing agents before tuning the manager because every unnecessary event eliminated at the endpoint reduces processing, indexing, storage, and search workload throughout the entire platform.
If agents are generating excessive log traffic that overwhelms the manager, see Fix Wazuh Logcollector Dropped Messages for techniques to improve ingestion reliability.
Measuring Wazuh Performance
Performance optimization should always be driven by measurable data rather than assumptions.
Establishing performance baselines allows administrators to identify bottlenecks, validate configuration changes, and monitor long-term trends as the environment grows.
Regular monitoring also helps detect gradual degradation before it affects security operations.
Performance Metrics to Monitor
Several key metrics provide a comprehensive view of overall Wazuh health.
Rather than focusing on a single resource, monitor the entire processing pipeline, from endpoint collection to dashboard visualization, to identify where delays originate.
CPU Utilization
CPU usage indicates how efficiently the platform processes incoming events.
Monitor CPU consumption for:
- Wazuh agents
- Wazuh Manager
- OpenSearch Indexer
- Dashboard server
Sustained high CPU utilization often indicates:
- Excessive log volume
- Expensive detection rules
- Heavy File Integrity Monitoring
- Large search workloads
- Insufficient hardware resources
Trend CPU usage over time to identify workload growth before it becomes a critical issue.
Memory Consumption
Memory usage provides insight into system stability.
Monitor:
- Total RAM utilization
- JVM heap usage
- Swap activity
- Operating system page cache
- Process memory growth
Unexpected increases may indicate:
- Memory leaks
- Oversized indexes
- Growing search caches
- Poor heap allocation
Consistent monitoring helps prevent unexpected service interruptions.
Disk Usage
Storage monitoring should include both capacity and performance.
Track:
- Available disk space
- Disk throughput
- IOPS
- Read latency
- Write latency
- Snapshot storage
Running out of storage can halt indexing, while slow storage significantly increases search and dashboard response times.
Indexing Latency
Indexing latency measures how quickly alerts become searchable after being generated.
Increasing latency often indicates:
- Slow disks
- Insufficient heap memory
- Indexing backlogs
- Large merge operations
- Heavy ingestion workloads
Keeping indexing delays low ensures analysts can investigate threats in near real time.
Search Latency
Search latency measures how long OpenSearch requires to execute queries.
Slow searches may result from:
- Large indexes
- Poor shard sizing
- Expensive aggregations
- Insufficient memory
- Heavy concurrent searches
Tracking search performance helps maintain a responsive dashboard experience.
Queue Sizes
Internal queues temporarily hold events awaiting processing.
Monitor queue growth throughout the pipeline.
Rapidly increasing queues often indicate downstream bottlenecks such as:
- Overloaded managers
- Slow indexing
- Network congestion
- Rule evaluation delays
Persistent queue growth should be investigated before events begin dropping.
Agent Connection Status
Healthy agents continuously communicate with the Wazuh manager.
Monitor:
- Connected agents
- Disconnected agents
- Authentication failures
- Communication latency
- Synchronization delays
Unexpected agent disconnects may indicate network issues, overloaded managers, certificate problems, or endpoint resource exhaustion.
Events per Second (EPS)
Events per Second (EPS) is one of the most important capacity planning metrics.
Tracking EPS helps administrators:
- Estimate infrastructure requirements
- Detect workload spikes
- Measure optimization improvements
- Forecast future hardware needs
Monitor both:
- Average EPS
- Peak EPS
Peak ingestion rates often determine infrastructure sizing because temporary spikes can overload systems even when average workloads remain relatively low.
Expert Insight: Capacity planning guides from OpenSearch emphasize monitoring workload trends over time rather than relying on instantaneous resource usage. Long-term metrics reveal growth patterns and help organizations scale infrastructure before performance degradation impacts production environments.
If monitoring reveals excessive manager CPU utilization during peak ingestion periods, see Why Is Wazuh Using High CPU? Troubleshooting Guide.
Useful Linux Monitoring Tools
Effective Wazuh performance tuning requires visibility at the operating system level.
Linux provides a set of low-level diagnostic tools that help identify CPU saturation, memory pressure, disk bottlenecks, and I/O contention.
These tools are essential for distinguishing between application-level inefficiencies and infrastructure constraints.
top
top provides a real-time view of system resource utilization.
It helps identify:
- Processes consuming high CPU
- Memory-heavy services
- Load averages
- System-wide resource pressure
In Wazuh environments, top is commonly used to detect spikes in:
- Wazuh Manager CPU usage during rule evaluation
- OpenSearch JVM memory consumption
- Log processing surges during ingestion bursts
htop
htop is an enhanced, interactive version of top.
It provides:
- Color-coded CPU and memory usage
- Per-core CPU utilization
- Easier process navigation
- Tree view of process relationships
It is particularly useful for quickly identifying whether bottlenecks originate from:
- OpenSearch (Java processes)
- Wazuh manager processes
- System-level I/O contention
vmstat
vmstat provides insight into system performance at the kernel level.
It reports:
- CPU scheduling
- Memory usage
- Swap activity
- Block I/O
- System interrupts
Key indicators of performance issues include:
- High swap usage (memory pressure)
- High CPU wait time (I/O bottlenecks)
- Frequent context switching (overloaded CPU)
iostat
iostat focuses on disk performance and is critical for diagnosing OpenSearch bottlenecks.
It helps monitor:
- Disk read/write throughput
- I/O wait times
- Device utilization
High I/O wait is a strong indicator that:
- Indexing is saturating storage
- Disk latency is limiting search performance
- Snapshot or merge operations are overwhelming the system
sar
sar (System Activity Reporter) is useful for historical performance analysis.
It tracks:
- CPU utilization over time
- Memory consumption trends
- Network activity
- Disk I/O history
Unlike real-time tools, sar is valuable for identifying recurring performance patterns such as:
- Daily ingestion spikes
- Scheduled scan overhead
- Nightly indexing pressure
free
free provides a snapshot of system memory usage.
It shows:
- Total RAM
- Used memory
- Available memory
- Buffers and cache
In Wazuh deployments, low available memory often correlates with:
- OpenSearch heap pressure
- Large query workloads
- Excessive indexing activity
df
df monitors disk space usage.
It is essential for ensuring:
- Index storage does not reach capacity limits
- Log partitions do not fill up
- Snapshot repositories remain functional
Running out of disk space can halt indexing entirely, making this one of the most critical monitoring tools.
dstat
dstat provides a combined view of CPU, memory, disk, and network usage.
It is especially useful for:
- Correlating network spikes with event ingestion
- Identifying I/O bursts during indexing
- Observing system-wide resource contention in real time
Wazuh Logs That Help Diagnose Performance Problems
Wazuh generates multiple log streams across its architecture.
These logs are essential for diagnosing performance bottlenecks, failed processing stages, and system-level inefficiencies.
Each component provides different visibility into system behavior.
Manager Logs
The Wazuh manager logs are the primary source of operational diagnostics.
They help identify:
- Rule evaluation delays
- Event decoding errors
- Queue overflows
- Active response execution issues
- Agent communication problems
Common performance-related symptoms include:
- Increased event latency warnings
- Buffer overflow messages
- Rule processing bottlenecks
- Dropped event indicators
When diagnosing high CPU usage or alert delays, manager logs are usually the first place to investigate.
If manager CPU is consistently high during event processing, see Why Is Wazuh Using High CPU? Troubleshooting Guide.
Agent Logs
Agent logs provide insight into endpoint-side performance issues.
They help identify:
- Logcollector failures
- File Integrity Monitoring overload
- Syscollector delays
- Connectivity issues with the manager
- Buffer saturation on endpoints
Typical performance signals include:
- Missed log entries
- High local CPU usage on endpoints
- Buffer overflow warnings
- Delayed event transmission
Agent-side issues often cascade into manager-side performance problems when events are retransmitted or batched inefficiently.
OpenSearch Logs
OpenSearch logs are critical for diagnosing indexing and search performance issues.
They reveal:
- Heap memory pressure
- Garbage collection activity
- Slow queries
- Shard rebalancing
- Indexing failures
- Disk watermark warnings
Common performance indicators include:
- Long GC pause times
- Thread pool rejections
- Index write delays
- Shard allocation failures
These logs are essential when dashboards become slow or alerts are delayed in appearing.
For memory-related crashes or instability, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.
Dashboard Logs
The Wazuh Dashboard logs help diagnose frontend and query-layer performance issues.
They include:
- API request latency
- Failed query executions
- Visualization rendering errors
- Authentication delays
- Backend connection issues
While the dashboard is rarely the root cause of performance issues, it often exposes upstream problems such as slow indexing or inefficient queries.
Optimizing Wazuh Agents
Wazuh agents are the first line of data collection and have a significant impact on overall system performance.
Poorly configured agents generate excessive data, increasing load across the entire pipeline, from network transmission to manager processing and OpenSearch indexing.
Effective optimization focuses on reducing unnecessary telemetry while preserving security visibility.
Reduce Unnecessary Log Collection
Not all logs provide meaningful security value.
Collecting everything leads to unnecessary noise, higher CPU usage, and increased storage consumption.
Focus on:
- Security-relevant logs
- Authentication events
- System-critical application logs
- Endpoint behavior indicators
Avoid collecting:
- Debug logs in production
- High-frequency application logs
- Redundant telemetry sources
Reducing log collection at the source is one of the most effective performance optimizations available.
Exclude Noisy Log Sources
Certain log sources generate excessive, low-value events.
Common examples include:
- Browser caches
- Temporary application files
- Container runtime logs
- Build directories
- High-frequency debug outputs
Excluding these sources prevents unnecessary ingestion and reduces downstream processing load.
Filter Unnecessary Events
Filtering allows agents to discard irrelevant events before transmission.
This reduces:
- Network bandwidth usage
- Manager CPU load
- Indexing overhead
- Storage consumption
Event filtering is particularly useful in high-volume environments where only a subset of logs is relevant for security monitoring.
Limit Verbose Applications
Applications running in verbose or debug mode can overwhelm Wazuh systems with excessive logs.
Examples include:
- Web servers in debug mode
- Database systems with query logging enabled
- Development environments
- Container orchestration platforms with high verbosity settings
Whenever possible, adjust logging levels to production-appropriate settings while preserving security-relevant events.
Optimize File Integrity Monitoring
File Integrity Monitoring (FIM) is one of the most resource-intensive Wazuh features.
Proper optimization is essential for maintaining system stability and preventing unnecessary CPU and disk usage.
See How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU for a deeper breakdown of optimization strategies.
Reduce Monitored Directories
Monitoring fewer directories significantly reduces CPU usage and event generation.
Prioritize:
- System binaries
- Security-critical configuration files
- Authentication directories
- Application configuration paths
Avoid broad directory monitoring such as entire file systems or user home directories unless explicitly required.
Exclude Temporary Folders
Temporary and cache directories generate constant file changes that produce high event volumes.
Common exclusions include:
/tmp- Application cache directories
- Browser cache locations
- Build output directories
- Container ephemeral storage
Excluding these paths prevents unnecessary FIM load.
Increase Scan Intervals
Frequent scans can overwhelm endpoints, especially in large file systems.
Increasing scan intervals:
- Reduces CPU usage
- Decreases disk I/O
- Lowers event volume
This is particularly effective for non-critical directories.
Disable Unnecessary Hashing
Hash calculation is one of the most expensive operations in FIM.
Reducing hashing frequency or limiting it to critical files helps:
- Lower CPU consumption
- Reduce disk I/O
- Improve scan performance
Only enable hashing where integrity verification is truly required.
Monitor Only Critical Files
The most effective FIM optimization strategy is narrowing scope.
Focus on:
- Authentication files
- System binaries
- Configuration files
- Privilege escalation paths
Avoid monitoring files that change frequently without security implications.
Optimize Scheduled Scans
Scheduled scans contribute significantly to endpoint and manager workload, especially in large environments.
Proper tuning ensures consistent performance without compromising detection coverage.
Syscheck
Syscheck scans detect file changes and configuration modifications.
Poor configuration can result in excessive CPU usage and large event volumes.
Optimization strategies include:
- Reducing scan scope
- Increasing scan intervals
- Excluding high-churn directories
Rootcheck
Rootcheck identifies rootkits and system compromises.
To optimize performance:
- Avoid overly frequent scans
- Focus on critical endpoints
- Schedule scans during off-peak hours
Vulnerability Scans
Vulnerability detection consumes CPU and network resources.
Optimization approaches include:
- Staggering scan schedules
- Reducing scan frequency on stable systems
- Prioritizing high-risk assets
Inventory Collection
Inventory modules (Syscollector) gather system information.
To reduce overhead:
- Increase collection intervals
- Limit unnecessary data types
- Avoid redundant collection across environments
Tune Agent Resource Usage
Beyond individual modules, overall agent behavior must be tuned to ensure efficient resource utilization.
Reduce Polling Frequency
Frequent polling increases CPU usage and network traffic.
Adjust polling intervals based on:
- Asset criticality
- Change frequency
- Security requirements
Optimize Buffering
Agent buffers temporarily store events during network interruptions or bursts.
Proper configuration helps:
- Prevent data loss
- Smooth traffic spikes
- Reduce retransmissions
However, oversized buffers can increase memory usage and delay event delivery.
Disable Unused Modules
Every enabled module consumes resources.
Disabling unused modules reduces:
- CPU usage
- Memory consumption
- Network traffic
- Manager processing load
Only enable modules that directly support your monitoring objectives.
Tune Agent Resource Usage
Agent-level tuning is one of the highest-leverage optimization strategies in Wazuh because every event eliminated at the endpoint reduces load across the entire pipeline, manager processing, indexing, storage, and search.
Reduce Polling Frequency
Frequent polling increases CPU usage, disk activity, and network traffic on endpoints.
Adjust polling intervals based on how often data actually changes:
- Increase Syscollector intervals for stable systems
- Reduce inventory refresh frequency on large fleets
- Avoid overly aggressive scan schedules for low-risk endpoints
Over-polling often produces redundant data without improving detection capability.
Optimize Buffering
Agent buffering temporarily stores events when network or manager throughput is limited.
Proper tuning helps:
- Smooth traffic spikes
- Prevent event loss during transient outages
- Reduce retransmission overhead
However, excessive buffering can:
- Increase endpoint memory usage
- Delay event delivery
- Mask upstream bottlenecks
Buffer size should reflect expected peak ingestion, not theoretical maximums.
Disable Unused Modules
Every enabled module consumes CPU, memory, and I/O resources.
Commonly unnecessary modules depending on environment include:
- Vulnerability Detection on non-production systems
- Rootcheck on containerized workloads
- Syscollector on short-lived instances
- Active Response where manual remediation is preferred
Disabling unused modules reduces endpoint overhead and significantly lowers total event volume entering the system.
Optimizing the Wazuh Manager
The Wazuh Manager is responsible for decoding events, evaluating rules, performing correlation, and generating alerts.
It is often the primary CPU bottleneck in large deployments.
Optimize Rule Processing
Rule evaluation is one of the most expensive operations in the Wazuh pipeline.
Each incoming event is compared against thousands of rules, making efficiency critical.
Remove Unused Rules
Unused or irrelevant rules still consume CPU during evaluation.
Optimization steps include:
- Disabling unused compliance packs
- Removing legacy detections
- Eliminating duplicate rule sets
- Pruning environment-specific irrelevant rules
A smaller, well-maintained ruleset significantly improves throughput.
Simplify Regex Patterns
Regular expressions are computationally expensive and should be used sparingly.
Optimization strategies:
- Prefer exact string matching over regex when possible
- Anchor patterns to reduce backtracking
- Avoid nested wildcards and overly broad expressions
- Limit regex to high-value detections only
Even minor regex improvements can reduce CPU usage at scale.
Optimize Rule Order
Wazuh evaluates rules sequentially, meaning inefficient ordering increases processing time.
Best practices:
- Place high-frequency rules early
- Filter benign events before complex evaluation
- Prioritize simple conditions before expensive logic
Efficient rule ordering reduces unnecessary computation.
Reduce Expensive Correlations
Correlation rules combine multiple events into higher-level detections but are computationally intensive.
To optimize:
- Limit correlation depth
- Avoid overly broad matching windows
- Use correlation only for high-confidence detections
- Pre-filter events before correlation logic executes
Reduce False Positives
False positives increase system load by generating unnecessary alerts, increasing indexing volume, and overwhelming analysts.
See How to Reduce False Positives in Wazuh for detailed tuning strategies.
Rule Tuning
Fine-tuning detection rules improves both accuracy and performance.
Approaches include:
- Adjusting rule severity levels
- Narrowing event conditions
- Disabling overly sensitive detections
- Aligning rules with real environment behavior
Well-tuned rules reduce unnecessary processing downstream.
Threshold Adjustments
Threshold-based rules trigger only after a defined number of events occur.
Proper tuning:
- Reduces alert noise
- Prevents repeated triggering for benign behavior
- Improves signal-to-noise ratio
However, thresholds must be balanced to avoid missing genuine threats.
Event Suppression
Event suppression prevents repeated alerts from identical or low-value events.
Benefits include:
- Reduced indexing load
- Lower storage usage
- Improved dashboard clarity
Suppression should be applied carefully to avoid hiding meaningful anomalies.
Custom Rule Refinement
Custom rules should be reviewed regularly to ensure efficiency.
Key improvements:
- Remove redundant conditions
- Avoid overlapping logic
- Consolidate similar rules
- Optimize field matching
Poorly designed custom rules are a common source of performance degradation.
Improve Queue Performance
Wazuh uses internal queues to manage event flow between agents, the manager, and the indexer.
Queue inefficiencies often lead to event delays or drops.
Event Queues
Event queues temporarily store incoming logs before processing.
When queues become saturated:
- Events are delayed
- Memory usage increases
- Processing latency grows
Queue saturation typically indicates downstream bottlenecks in rule processing or indexing.
Processing Workers
Processing workers handle event decoding and rule evaluation.
To optimize:
- Ensure sufficient worker allocation for workload size
- Scale horizontally in high-ingestion environments
- Avoid CPU contention between manager processes
Insufficient workers lead to backlogs and delayed alert generation.
Connection Tuning
Agent-to-manager connections must be stable and efficient.
Optimization includes:
- Proper TCP configuration
- Load balancing for large deployments
- Reducing connection churn
- Ensuring consistent network latency
Connection instability increases retransmissions and queue pressure.
Optimize Active Response
Active Response automates mitigation actions but can become a performance burden if misconfigured.
Avoid Unnecessary Executions
Each response action consumes CPU and system resources.
Avoid triggering responses for:
- Low-confidence alerts
- High-frequency benign events
- Non-actionable detections
Overuse of automation can significantly increase system load.
Configure Cooldown Periods
Cooldown periods prevent repeated execution of the same response within a short timeframe.
Benefits include:
- Reduced system thrashing
- Lower CPU usage
- Prevention of redundant actions
Cooldowns are essential in noisy environments.
Limit Automation Scope
Active Response should be reserved for high-confidence threats.
Best practices:
- Apply to critical severity rules only
- Restrict execution to specific endpoints
- Avoid broad system-wide automation
This ensures responsiveness without overwhelming system resources.
Optimizing OpenSearch Performance
OpenSearch is responsible for indexing, storing, and searching Wazuh alerts.
Poor configuration here can severely impact dashboard performance and alert visibility.
Tune JVM Heap Size
JVM heap size directly affects indexing stability and search performance.
See How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes for detailed configuration guidance.
Recommended Heap Allocation
General best practices include:
- Allocate ~50% of system RAM to heap (up to a safe limit)
- Avoid exceeding ~32 GB heap due to JVM pointer optimization limits
- Ensure remaining RAM is available for OS file cache
Balanced heap allocation improves both indexing and search performance.
Garbage Collection Tuning
Garbage collection (GC) affects latency and responsiveness.
Symptoms of poor GC tuning:
- Query delays
- Indexing pauses
- CPU spikes
- Irregular performance patterns
Optimizing GC reduces pause times and improves system stability.
Avoid Oversized Heaps
Excessively large heaps can:
- Increase GC pause duration
- Reduce OS cache efficiency
- Degrade search performance
Proper sizing is more effective than simply maximizing memory allocation.
Optimize Index Management
Efficient index management ensures OpenSearch remains performant as data grows.
Index Lifecycle Management (ILM)
ILM automates index transitions through stages:
- Hot (active indexing)
- Warm (reduced activity)
- Cold (archival)
- Delete (removal)
This prevents uncontrolled index growth.
Index Rotation
Regular index rotation:
- Limits shard size
- Improves search efficiency
- Reduces indexing overhead
Proper rotation policies are essential for long-term scalability.
Retention Policies
Retention policies define how long data is stored.
Benefits:
- Controlled storage growth
- Faster queries
- Reduced maintenance overhead
Retention should align with compliance requirements.
Delete Old Indices
Old indices should be removed or archived to prevent:
- Storage exhaustion
- Slow searches
- Increased cluster overhead
Automated cleanup improves long-term performance stability.
Improve Search Performance
Search performance directly affects dashboard responsiveness and analyst efficiency.
Optimize Mappings
Efficient mappings reduce indexing and search overhead.
Best practices:
- Use appropriate field types
- Avoid unnecessary full-text indexing
- Disable unused fields
Poor mappings increase storage and query complexity.
Reduce Shard Count
Too many shards increase cluster overhead.
Effects include:
- Higher memory usage
- Slower queries
- Increased coordination overhead
Proper shard sizing improves performance significantly.
Merge Segments
Segment merging improves search efficiency by reducing index fragmentation.
Benefits:
- Faster queries
- Lower disk usage
- Improved indexing stability
However, merging should be balanced to avoid excessive I/O load.
Query Optimization
Inefficient queries degrade performance.
Optimization strategies:
- Avoid wildcard-heavy searches
- Use time filters whenever possible
- Limit aggregation complexity
- Narrow query scope
Well-structured queries dramatically improve dashboard responsiveness.
Storage Optimization
Storage is a foundational component of Wazuh performance.
SSD vs HDD
SSDs provide significantly better performance than HDDs:
- Lower latency
- Higher IOPS
- Faster indexing
- Improved search performance
HDDs often become bottlenecks in high-ingestion environments.
RAID Considerations
Firstly, RAID configuration impacts redundancy and performance:
- RAID 1 improves redundancy
- RAID 10 balances performance and redundancy
- RAID 5 may introduce write penalties
RAID selection should reflect workload intensity and resilience requirements.
Disk Monitoring
Continuous disk monitoring helps prevent:
- Storage exhaustion
- Performance degradation
- Indexing failures
Key metrics include:
- Disk usage
- I/O latency
- Throughput
- Queue depth
Reducing High CPU Usage
High CPU usage in Wazuh environments typically results from cumulative inefficiencies across multiple components rather than a single issue.
Common Causes
File Integrity Monitoring
- Large directory scans
- Frequent file changes
- Real-time monitoring overhead
- Excessive hashing
Rule Evaluation
- Expensive regex patterns
- Large rulesets
- Poor rule ordering
- Excessive correlation logic
Large Log Volumes
- Excessive Windows Event Logs
- Verbose application logging
- Duplicate log sources
- High ingestion rates
OpenSearch Indexing
- Large shards
- Insufficient heap memory
- Slow disk performance
- High garbage collection activity
Agent Scanning
- Frequent Syscollector scans
- Overactive vulnerability detection
- High-frequency polling intervals
Addressing these areas holistically produces the most significant performance improvements.
Troubleshooting High CPU
High CPU usage in Wazuh environments is rarely caused by a single component.
Instead, it typically results from a combination of excessive event ingestion, inefficient rule processing, heavy File Integrity Monitoring workloads, and indexing pressure in OpenSearch.
For a deeper breakdown of root causes and diagnostics, see Why Is Wazuh Using High CPU? Troubleshooting Guide.
Identify the Affected Process
The first step is to determine which component is consuming CPU resources.
Key processes to inspect:
wazuh-manager(rule evaluation, decoding, correlation)filebeat/ log forwarders (log shipping)java(OpenSearch JVM)- Agent processes (endpoint-side load)
Use system monitoring tools to isolate whether CPU usage is concentrated on:
- A single node (localized issue)
- A cluster-wide pattern (systemic issue)
- Specific time windows (scheduled scans or ingestion spikes)
Analyze Workload
Once the affected process is identified, evaluate the workload it is handling.
Common workload indicators include:
- Events per second (EPS) spikes
- Large bursts of Windows Event Logs
- FIM scan activity
- Scheduled vulnerability scans
- Heavy dashboard query traffic
Understanding workload patterns helps distinguish between expected peak behavior and misconfiguration-driven overload.
Review Configuration
Configuration issues are a leading cause of sustained CPU saturation.
Focus on:
- Log sources and verbosity levels
- Rule complexity and redundancy
- FIM scope and scan frequency
- OpenSearch heap allocation
- Index shard configuration
Many CPU issues are resolved by removing unnecessary processing rather than increasing hardware capacity.
Apply Targeted Optimizations
After identifying the bottleneck, apply specific fixes:
- Reduce log ingestion volume
- Simplify or disable expensive rules
- Optimize FIM configurations
- Tune OpenSearch heap and shard settings
- Adjust agent polling intervals
Targeted changes are more effective than broad system upgrades.
Fixing Memory Problems
Memory issues in Wazuh deployments primarily originate from OpenSearch heap pressure, large datasets, and inefficient query patterns.
If left unresolved, they can lead to service instability, slow searches, and system crashes.
Common Memory Issues
OpenSearch Heap Exhaustion
When JVM heap memory is insufficient, OpenSearch may:
- Trigger frequent garbage collection
- Reject indexing requests
- Crash under load
- Degrade search performance
This is one of the most common causes of Wazuh instability in large deployments.
Large Rule Sets
Excessive rule complexity indirectly contributes to memory pressure by:
- Increasing event processing time
- Expanding in-memory queues
- Raising correlation overhead
Heavy Searches
Complex queries, especially those with aggregations over large time ranges, increase:
- Memory consumption
- CPU usage
- GC frequency
Memory Leaks
Although less common, misconfigured plugins or inefficient processes can gradually increase memory usage over time, eventually leading to instability.
Best Practices
Heap Sizing
Proper heap sizing is critical for OpenSearch stability.
Key principles:
- Allocate approximately 50% of system RAM to JVM heap (within safe limits)
- Avoid exceeding JVM pointer optimization thresholds (~32 GB heap)
- Preserve sufficient RAM for OS file caching
For detailed configuration guidance, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.
JVM Monitoring
Monitor garbage collection behavior to detect early signs of memory stress:
- GC pause frequency
- Heap usage trends
- Allocation rates
- Old generation pressure
Memory Alerts
Set alerts for:
- Sustained high heap usage
- Swap activity
- Increasing GC pause times
- Indexing latency spikes
Early detection prevents cascading system failures.
Capacity Planning
Memory requirements should scale with:
- Event ingestion rate
- Index size
- Retention period
- Dashboard query complexity
Proper planning prevents reactive scaling and unexpected outages.
Optimizing Log Collection
Log collection is one of the highest-impact areas for performance optimization because it directly determines how much data enters the Wazuh pipeline.
Reduce Logcollector Overhead
Logcollector continuously monitors configured sources. Inefficient configuration can overwhelm both endpoints and the manager.
Exclude Unnecessary Logs
Exclude sources that do not contribute to security visibility, such as:
- Debug logs
- Temporary application logs
- Cache directories
- Development artifacts
Optimize Polling
Excessive polling increases CPU usage and network traffic.
Best practices:
- Increase polling intervals for low-value logs
- Avoid unnecessary real-time monitoring where scheduling is sufficient
- Align polling frequency with log generation rates
Filter Duplicate Events
Duplicate log sources significantly increase processing overhead.
Common duplication sources:
- Multiple agents monitoring the same file
- Redundant syslog forwarding
- Overlapping application logging configurations
Reduce Noisy Applications
Verbose applications generate excessive logs that provide little security value.
Examples include:
- Debug-enabled web servers
- Database query logging
- Container runtime verbosity
- Development tools in production
Prevent Dropped Messages
Dropped logs indicate that the system is overwhelmed and cannot process events fast enough.
See Fix Wazuh Logcollector Dropped Messages for detailed mitigation strategies.
Increase Buffers
Larger buffers help absorb short-term spikes in log volume, but must be carefully balanced to avoid memory pressure on endpoints.
Reduce Log Bursts
Control sudden ingestion spikes by:
- Staggering agent reporting intervals
- Reducing simultaneous scan schedules
- Avoiding synchronized batch jobs across endpoints
Improve Storage Performance
Slow storage increases backlog formation and contributes to dropped messages.
Upgrading to SSD or NVMe significantly improves ingestion stability.
Verify Manager Throughput
Ensure the Wazuh manager can process incoming events at peak load.
If ingestion exceeds processing capacity, queue buildup and event drops become inevitable.
Scaling Wazuh for Large Environments
As environments grow, single-node or minimally configured deployments become insufficient.
Scaling ensures Wazuh can handle increasing event volume while maintaining performance and reliability.
Horizontal Scaling
Horizontal scaling distributes workload across multiple nodes.
Multiple Managers
Deploying multiple Wazuh managers:
- Distributes event processing
- Reduces CPU bottlenecks
- Improves fault tolerance
Load Balancing
Load balancers distribute agent traffic across available managers, preventing overloading of a single node.
Distributed Architecture
A distributed design separates:
- Agents
- Managers
- Indexers
- Dashboards
This improves scalability and isolates performance bottlenecks.
OpenSearch Cluster Scaling
OpenSearch must scale alongside the Wazuh manager to maintain performance.
Dedicated Master Nodes
Master nodes handle cluster coordination and should not be burdened with indexing workloads.
Data Nodes
Data nodes store and index logs. Scaling data nodes improves:
- Indexing throughput
- Query performance
- Storage capacity
Coordinating Nodes
Coordinating nodes handle search and aggregation requests, improving dashboard responsiveness.
Replica Planning
Replicas improve:
- Fault tolerance
- Read performance
- Query distribution
However, they also increase storage requirements and indexing overhead.
Agent Scaling Best Practices
Enrollment Strategy
Efficient onboarding prevents configuration drift and performance issues.
Best practices:
- Use centralized enrollment
- Apply consistent policies
- Automate configuration distribution
Group Policies
Grouping agents allows:
- Consistent configuration
- Reduced management overhead
- Targeted optimization strategies
Configuration Management
Automated configuration management ensures:
- Uniform logging policies
- Controlled FIM scope
- Consistent scan intervals
Wazuh Performance Optimization Checklist
A structured checklist ensures consistent tuning across environments.
- Monitor CPU, memory, disk, and network utilization
- Reduce unnecessary log collection
- Tune File Integrity Monitoring
- Remove noisy detection rules
- Reduce false positives
- Optimize OpenSearch heap size
- Configure index lifecycle management
- Optimize shard allocation
- Rotate and delete old indices
- Increase Logcollector efficiency
- Monitor indexing latency
- Benchmark after configuration changes
- Scale infrastructure before bottlenecks occur
- Review performance metrics regularly
Common Wazuh Performance Mistakes
Many performance issues stem from predictable configuration mistakes.
Monitoring Everything
Collecting all possible logs creates excessive noise and unnecessary processing overhead.
Ignoring Noisy Logs
Failing to filter verbose applications significantly increases ingestion volume.
Oversized FIM Configurations
Monitoring entire filesystems leads to massive CPU and storage consumption.
Poor OpenSearch Heap Configuration
Incorrect heap sizing causes instability, slow searches, and indexing failures.
Too Many Shards
Excessive shards increase cluster overhead and reduce efficiency.
Keeping Data Forever
Unlimited retention leads to storage exhaustion and degraded performance.
Ignoring Capacity Planning
Lack of planning results in reactive scaling and unexpected outages.
Not Monitoring Performance Metrics
Without metrics, optimization becomes guesswork rather than engineering.
Frequently Asked Questions (FAQ)
Question: What is Wazuh performance optimization?
It is the process of tuning agents, managers, and OpenSearch to improve event processing efficiency, reduce resource usage, and increase detection speed.
Question: Why is Wazuh using so much CPU?
Common causes include high log volume, inefficient rules, File Integrity Monitoring overload, and OpenSearch indexing pressure.
Question: How do I reduce Wazuh memory usage?
Optimize OpenSearch heap size, reduce query complexity, and limit data ingestion.
Question: How can I make Wazuh faster?
Reduce log volume, tune rules, optimize indexing, and improve storage performance.
Question: How do I optimize File Integrity Monitoring?
Limit monitored directories, exclude temporary folders, and reduce scan frequency.
Question: Why is OpenSearch crashing with Wazuh?
Usually due to heap exhaustion, poor shard configuration, or insufficient memory allocation.
Question: How do I reduce false positives in Wazuh?
Tune rules, adjust thresholds, and suppress noisy events.
Question: What causes dropped Logcollector messages?
High ingestion rates, insufficient buffers, or manager throughput limitations.
Question: How many events per second can Wazuh handle?
It depends on hardware, configuration, and tuning; optimized deployments can scale to very high EPS.
Question: Should I use SSDs for Wazuh?
Yes. SSDs significantly improve indexing, search, and overall system responsiveness.
Question: What hardware is recommended for production Wazuh deployments?
Multi-core CPUs, sufficient RAM for OpenSearch heap, and SSD/NVMe storage are recommended.
Question: How often should I review Wazuh performance?
Regularly, ideally continuously via monitoring dashboards, with deeper reviews during scaling or configuration changes.
Conclusion
Wazuh performance optimization is a continuous process rather than a one-time configuration task.
The most impactful improvements come from reducing unnecessary workload at the source, tuning detection logic, and ensuring that OpenSearch is properly sized and maintained.
The key strategies across all environments include minimizing log noise, optimizing File Integrity Monitoring, refining detection rules, properly configuring OpenSearch heap and shards, and continuously monitoring system metrics to detect early signs of degradation.
As deployments scale, proactive capacity planning becomes essential.
Performance issues are far easier to prevent than to resolve after they impact production systems.
For deeper implementation guidance, explore the following detailed optimization guides:

Be First to Comment