The Complete Wazuh Performance Optimization Guide

As your security environment grows, Wazuh can quickly begin processing millions of events every day.

Endpoint telemetry, system logs, cloud events, file integrity monitoring, vulnerability scans, and custom detection rules all compete for CPU, memory, storage, and indexing resources.

Without proper tuning, even a well-designed deployment can suffer from high resource utilization, delayed alerts, slow dashboards, dropped events, and excessive false positives.

A variety of factors can negatively impact performance, including excessive log collection, inefficient custom rules, oversized File Integrity Monitoring (FIM) configurations, insufficient OpenSearch heap memory, overloaded managers, poor storage performance, and unnecessary event duplication.

As environments scale, these issues compound and can significantly reduce both detection speed and operational visibility.

Several major components influence overall Wazuh performance:

  • Wazuh agents
  • Logcollector
  • Syscollector
  • Syscheck (File Integrity Monitoring)
  • Rootcheck
  • Active Response
  • Wazuh Manager
  • OpenSearch Indexer
  • Wazuh Dashboard
  • Storage subsystem
  • Network bandwidth
  • Detection rules
  • Decoders
  • Index lifecycle management

Each component contributes differently to overall system performance, making end-to-end optimization essential rather than focusing on only a single bottleneck.

In this guide, you’ll learn how Wazuh processes security data, where performance bottlenecks typically occur, which configuration settings have the greatest impact, and how to optimize every major component of the platform.

You’ll also learn practical techniques for scaling Wazuh, reducing unnecessary workload, improving indexing performance, minimizing false positives, and building a faster, more stable deployment for enterprise environments.

The performance of a Wazuh deployment is closely tied to overall monitoring architecture.

See The Complete Wazuh Monitoring Guide to understand how monitoring components generate and process security telemetry throughout your environment.


Understanding Wazuh Performance

Optimizing Wazuh starts with understanding how security events travel through the platform.

Every log, file change, vulnerability scan, or endpoint event passes through multiple processing stages before appearing as an alert in the dashboard.

Performance issues can occur at any point in this pipeline, making it essential to understand each component’s role.

How Wazuh Processes Security Data

A simplified processing pipeline looks like this:

Endpoint
     │
     ▼
Wazuh Agent
     │
     ▼
Log Collection
(Syscheck / Syscollector / Rootcheck)
     │
     ▼
Secure Agent Communication
     │
     ▼
Wazuh Manager
     │
     ├── Decoders
     ├── Rules
     ├── Correlation
     └── Active Response
     │
     ▼
OpenSearch Indexer
     │
     ▼
Wazuh Dashboard

Each stage consumes different system resources and can become a bottleneck under heavy workloads.

Wazuh Agents

The Wazuh agent runs on monitored endpoints and is responsible for collecting security telemetry.

Depending on its configuration, an agent may collect:

  • Operating system logs
  • Windows Event Logs
  • Linux Syslog
  • Application logs
  • File Integrity Monitoring events
  • Inventory information
  • Vulnerability detection data
  • Security configuration assessments

Although each individual agent consumes relatively little CPU, thousands of agents can collectively generate enormous event volumes that stress the manager and indexer.

Reducing unnecessary data collection at the endpoint is often the most effective optimization strategy because it eliminates unnecessary processing throughout the rest of the pipeline.

Logcollector

Logcollector continuously monitors configured log sources and forwards new entries to the Wazuh manager.

Performance issues commonly occur when administrators:

  • Monitor unnecessary log files
  • Collect verbose debug logs
  • Read duplicate log sources
  • Include excessive wildcard paths
  • Process extremely high-volume applications

Poor log collection strategies often generate far more events than security teams actually need.

 For a detailed walkthrough of preventing lost events during heavy log ingestion, see Fix Wazuh Logcollector Dropped Messages.

Syscollector

Syscollector inventories endpoint assets such as:

  • Installed software
  • Hardware
  • Operating system information
  • Running processes
  • Network interfaces
  • Packages

Because inventory data changes infrequently, aggressive scan intervals usually provide little additional value while increasing CPU usage and network traffic.

Scheduling inventory scans appropriately helps reduce unnecessary endpoint load.

Syscheck (File Integrity Monitoring)

Syscheck monitors file systems for:

  • File creation
  • File deletion
  • Permission changes
  • Ownership changes
  • Content modifications
  • Registry changes (Windows)

While File Integrity Monitoring is one of Wazuh’s most valuable security capabilities, it is also one of the most resource-intensive.

Scanning large directory trees, frequently changing files, build directories, container volumes, package caches, or temporary folders can consume significant CPU and generate excessive alerts.

Learn how to dramatically reduce resource consumption in How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU.

Rootcheck

Rootcheck searches systems for indicators of compromise, rootkits, hidden processes, suspicious ports, and unauthorized system modifications.

Since rootkit detection is generally performed on scheduled intervals rather than continuously, performance impact is usually modest.

However, unnecessarily frequent scans across thousands of endpoints can noticeably increase CPU utilization.

Active Response

Active Response automatically executes predefined remediation actions when certain rules trigger.

Examples include:

  • Blocking malicious IP addresses
  • Killing malicious processes
  • Disabling compromised accounts
  • Running custom scripts

Performance issues rarely originate from Active Response itself but can arise when response scripts are inefficient or trigger excessively due to noisy detection rules.

Wazuh Manager

The manager acts as the central processing engine.

Its responsibilities include:

  • Receiving agent events
  • Decoding logs
  • Evaluating detection rules
  • Correlating events
  • Generating alerts
  • Coordinating Active Response
  • Forwarding alerts to the indexer

As deployments grow, the manager often becomes the primary CPU bottleneck because every incoming event must pass through its rule engine.

Inefficient custom rules, excessive event volume, and unnecessary correlation logic significantly increase processing time.

Indexer (OpenSearch)

After alerts are generated, they are stored inside OpenSearch.

The indexer is responsible for:

  • Writing alerts
  • Maintaining indexes
  • Compressing data
  • Executing searches
  • Aggregations
  • Dashboard queries

High indexing latency, insufficient heap memory, disk bottlenecks, or oversized shards can dramatically reduce overall system responsiveness.

Learn how to properly size Java heap memory in How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.

Dashboard

The Wazuh Dashboard provides visualization, search, reporting, and investigation capabilities.

Dashboard performance depends on:

  • Query complexity
  • Index size
  • OpenSearch performance
  • Browser resources
  • Visualization configuration
  • Aggregation speed

A slow dashboard often indicates underlying indexing or storage bottlenecks rather than problems with the interface itself.

Where Performance Bottlenecks Usually Occur

Although every environment is different, performance issues tend to appear in a handful of predictable areas.

Endpoint Resource Usage

Agents consume CPU while collecting logs, monitoring files, scanning configurations, and generating telemetry.

Common causes include:

  • Oversized FIM configurations
  • Excessive Windows Event Logs
  • Frequent inventory scans
  • Large log files
  • High-frequency scheduled scans

Manager Processing

The manager evaluates every incoming event against thousands of detection rules.

Heavy workloads increase:

  • CPU utilization
  • Processing queues
  • Event latency
  • Memory consumption

Large enterprise deployments often require clustering or load balancing to distribute processing.

Rule Evaluation

Custom rules with inefficient matching logic increase processing time considerably.

Common issues include:

  • Overly broad regex patterns
  • Excessive nested rules
  • Duplicate rules
  • Poor rule ordering
  • Expensive correlation logic

Event Decoding

Before rule evaluation, every log must be decoded into structured fields.

Complex decoders and malformed log formats increase parsing overhead and reduce throughput.

Alert Indexing

Writing alerts to OpenSearch requires:

  • JSON serialization
  • Index mapping
  • Shard selection
  • Disk writes
  • Replication
  • Segment merging

Slow disks or poorly configured indexes can create indexing backlogs that delay alert availability.

Search Performance

Large indexes increase search latency, particularly when dashboards execute multiple aggregations simultaneously.

Performance depends heavily on:

  • Heap allocation
  • Index lifecycle policies
  • Shard sizing
  • Query optimization

Dashboard Rendering

Complex visualizations and large time ranges require significant processing.

Rendering delays commonly result from expensive backend queries rather than browser limitations.

Storage Limitations

Storage performance affects nearly every component.

Slow disks increase:

  • Indexing latency
  • Search times
  • Snapshot duration
  • Recovery speed
  • Cluster stability

Using SSD or NVMe storage typically provides substantial improvements for high-ingestion environments.

Expert Insight: The official Wazuh documentation recommends carefully limiting collected data, tuning monitored directories, and optimizing manager and indexer resources before simply increasing hardware capacity. Eliminating unnecessary workload generally produces larger performance gains than adding CPU alone.


Key Factors That Affect Wazuh Performance

Even powerful servers can struggle if Wazuh is configured inefficiently.

Most performance problems stem from excessive data collection rather than insufficient hardware.

Understanding the primary workload drivers helps prioritize optimization efforts.

Log Volume

Every collected log must be transmitted, decoded, evaluated against detection rules, indexed, stored, and queried.

As log volume increases, resource consumption rises across every component of the platform.

The most effective optimization strategy is often reducing unnecessary events before they ever reach the manager.

High Event Ingestion

Organizations monitoring thousands of endpoints may process hundreds of thousands, or even millions, of events each hour.

High ingestion rates increase:

  • CPU utilization
  • Memory usage
  • Network bandwidth
  • Indexing latency
  • Storage consumption
  • Search complexity

Instead of collecting everything, prioritize logs with meaningful security value.

Excessive Windows Event Logs

Windows Event Logs are among the largest contributors to event volume.

Administrators frequently collect:

  • Security
  • System
  • Application
  • PowerShell
  • Sysmon
  • DNS
  • Task Scheduler
  • Print Service
  • WMI
  • Defender

Without filtering, these channels often generate significant noise and unnecessary processing.

Verbose Application Logging

Applications running in debug or verbose modes can generate thousands of events every minute.

Examples include:

  • Web servers
  • Database servers
  • Java applications
  • Containers
  • Kubernetes workloads
  • Development environments

Whenever possible, reduce logging verbosity in production while preserving security-relevant events.

Duplicate Log Collection

Duplicate events waste CPU, storage, bandwidth, and indexing capacity.

Common causes include:

  • Monitoring identical log files twice
  • Collecting Windows logs through multiple mechanisms
  • Duplicate syslog forwarding
  • Multiple agents monitoring shared resources
  • SIEM integrations forwarding identical events

Removing duplicate collection improves performance without sacrificing visibility.

Expert Insight: According to the OpenSearch project, reducing unnecessary indexing workload typically provides greater improvements than hardware upgrades because indexing is one of the most resource-intensive operations performed by the search engine.

 Excessive event volume often leads to noisy detections.

Learn practical filtering techniques in How to Reduce False Positives in Wazuh.

 If excessive event volume is driving CPU utilization on the manager, see Why Is Wazuh Using High CPU? Troubleshooting Guide.


File Integrity Monitoring (FIM)

File Integrity Monitoring (FIM) is one of Wazuh’s most valuable security capabilities because it detects unauthorized changes to files, directories, registry keys, and system configurations.

However, it is also one of the most resource-intensive modules in the platform.

Improperly configured FIM can significantly increase CPU utilization on endpoints, generate millions of events, and overwhelm the Wazuh manager.

Optimizing FIM is usually one of the quickest ways to improve overall Wazuh performance without sacrificing meaningful security visibility.

Large Directories

Monitoring large directory trees dramatically increases the amount of work performed during every scan.

Examples include:

  • User home directories
  • Development repositories
  • Virtual machine images
  • Docker volumes
  • Kubernetes persistent volumes
  • Backup directories
  • Package caches
  • Temporary folders
  • Log archives

Many of these locations contain hundreds of thousands of files that rarely provide useful security telemetry.

Instead of monitoring entire drives, focus on directories containing:

  • System binaries
  • Configuration files
  • Critical application data
  • Authentication files
  • Startup scripts
  • Security-sensitive executables

Reducing the number of monitored files directly lowers CPU usage, memory consumption, and event generation.

Frequent File Changes

Some directories experience constant file modifications.

Examples include:

  • Web server access logs
  • Application log directories
  • Browser caches
  • Temporary files
  • Database transaction logs
  • Container overlay filesystems
  • Build artifacts
  • CI/CD workspaces

Monitoring rapidly changing files generates a continuous stream of FIM events that consume processing resources across the entire Wazuh pipeline.

Exclude high-churn directories whenever possible and monitor only files that provide meaningful security value.

Real-Time Monitoring Overhead

Real-time monitoring enables Wazuh to detect file changes immediately instead of waiting for scheduled scans.

While this improves detection speed, it also increases endpoint resource usage because the operating system continuously watches monitored files for changes.

In environments with frequent write operations, real-time monitoring can generate substantial CPU activity.

A balanced approach often works best:

  • Use real-time monitoring for critical system directories.
  • Schedule periodic scans for lower-risk locations.
  • Exclude temporary or frequently changing paths.

This approach preserves rapid detection for sensitive assets while reducing unnecessary workload.

Hash Calculation Costs

Whenever a monitored file changes, Wazuh calculates cryptographic hashes to verify file integrity.

Depending on configuration, this may include:

  • MD5
  • SHA-1
  • SHA-256

Although modern processors calculate hashes efficiently, hashing thousands of large files consumes noticeable CPU time and disk I/O.

Hash calculations become especially expensive when monitoring:

  • Large databases
  • Virtual machine disks
  • Backup files
  • ISO images
  • Media repositories

Limiting hash generation to security-critical files significantly reduces resource consumption while maintaining effective integrity monitoring.

Expert Insight: The official Wazuh documentation recommends carefully defining monitored paths and excluding frequently changing directories to reduce unnecessary File Integrity Monitoring workload. Targeted monitoring provides better scalability than attempting to monitor entire filesystems.

For a complete walkthrough of reducing File Integrity Monitoring resource usage, see How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU.


Detection Rules

Detection rules determine whether incoming events represent suspicious or malicious activity.

Every event received by the Wazuh manager is evaluated against thousands of rules, making rule processing one of the largest contributors to CPU utilization.

Well-designed rules improve both detection quality and system performance.

Poorly written rules can dramatically slow event processing and increase alert latency.

Expensive Custom Rules

Custom rules are extremely powerful but often introduce unnecessary overhead.

Common performance issues include:

  • Matching against every incoming event
  • Multiple nested conditions
  • Broad wildcard matching
  • Large lookup lists
  • Unnecessary regular expressions
  • Duplicate rule logic

Each additional condition requires more CPU cycles during evaluation.

Whenever possible, create narrowly scoped rules that evaluate only relevant event types.

Large Rulesets

Many organizations continually add community rules, compliance packs, vendor content, and internally developed detections.

While comprehensive coverage improves visibility, oversized rulesets increase processing time because every event must be compared against more detection logic.

Regularly review your ruleset to:

  • Remove obsolete rules
  • Disable unused integrations
  • Consolidate duplicate detections
  • Archive deprecated content
  • Prioritize high-value detections

Smaller, well-maintained rulesets generally perform better than excessively large collections.

Regex Complexity

Regular expressions are among the most CPU-intensive operations performed during rule evaluation.

Poorly optimized regex patterns can:

  • Require excessive backtracking
  • Evaluate unnecessary text
  • Consume significant CPU
  • Delay event processing

Examples of inefficient patterns include:

  • Nested wildcards
  • Broad “match everything” expressions
  • Repeated capture groups
  • Unanchored expressions

Whenever possible:

  • Match specific fields instead of entire log messages.
  • Use exact string matching when practical.
  • Anchor regex patterns to expected positions.
  • Keep expressions as simple as possible.

Even small regex optimizations can noticeably improve throughput in high-volume environments.

Rule Chaining

Rule chaining allows one rule to trigger another, enabling sophisticated correlation and threat detection.

However, deep dependency chains increase processing time because multiple rules must execute before an alert is generated.

Complex correlation logic should be reserved for high-value detections rather than routine event processing.

A practical optimization strategy is to:

  • Perform simple filtering first.
  • Eliminate obvious benign events.
  • Reserve advanced correlation for suspicious activity.

This minimizes unnecessary computation while preserving detection accuracy.

Expert Insight: Security engineers generally recommend filtering low-value events as early as possible in the processing pipeline. Reducing unnecessary rule evaluations improves throughput and allows computationally expensive correlation logic to focus on higher-risk events.

Inefficient detection logic often contributes to alert fatigue.

See How to Reduce False Positives in Wazuh for techniques that improve both performance and detection quality.


OpenSearch Performance

The Wazuh Indexer, powered by OpenSearch, stores alerts and powers dashboard searches, visualizations, and investigations.

Even if the Wazuh manager processes events efficiently, poor OpenSearch performance can create indexing delays, slow searches, and unresponsive dashboards.

Properly tuning the indexer is essential for large-scale deployments.

Heap Size

OpenSearch relies on the Java Virtual Machine (JVM), making heap allocation one of its most important performance settings.

Heap memory stores:

  • Search caches
  • Field data
  • Query results
  • Cluster metadata
  • Index structures

Insufficient heap memory may cause:

  • Frequent garbage collection
  • Slow searches
  • Indexing delays
  • Node instability
  • Out-of-memory errors

Conversely, allocating excessive heap reduces the operating system’s available file cache, which can also hurt performance.

OpenSearch generally recommends allocating approximately 50% of available RAM to the JVM heap while leaving sufficient memory for the operating system.

JVM Garbage Collection

Garbage collection periodically frees unused Java memory.

Under heavy workloads, frequent garbage collection pauses can temporarily interrupt indexing and query execution.

Common symptoms include:

  • Dashboard freezes
  • Indexing latency
  • High CPU utilization
  • Slow searches
  • Cluster instability

Monitoring garbage collection activity helps identify whether memory tuning is required before adding additional hardware.

Shard Configuration

Every index is divided into one or more shards.

Improper shard sizing is a common cause of poor OpenSearch performance.

Too many small shards increase:

  • Cluster overhead
  • Memory usage
  • Search coordination
  • Metadata processing

Oversized shards increase:

  • Recovery time
  • Rebalancing duration
  • Query latency

A balanced shard strategy improves indexing efficiency while maintaining fast search performance.

Disk I/O

Disk performance directly affects nearly every OpenSearch operation.

Slow storage increases:

  • Alert indexing latency
  • Search response times
  • Segment merging
  • Snapshot duration
  • Recovery performance

Enterprise deployments typically benefit from SSD or NVMe storage because indexing workloads involve continuous random reads and writes.

Usually, storage latency  becomes the primary bottleneck long before CPU resources are exhausted.

Storage Capacity

Storage planning extends beyond simply having enough free disk space.

As indexes grow larger:

  • Searches become slower.
  • Snapshot sizes increase.
  • Recovery takes longer.
  • Merge operations consume more resources.
  • Cluster maintenance becomes more difficult.

Implementing index lifecycle management (ILM), retention policies, and regular index cleanup helps maintain consistent performance over time.

Expert Insight: The OpenSearch project emphasizes that efficient memory allocation, appropriate shard sizing, and fast storage often deliver greater performance improvements than simply adding CPU cores. Proper cluster design is critical for maintaining indexing and query performance at scale.

If memory pressure is causing indexing delays or crashes, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.


Hardware Resources

Although software optimization should always come before hardware upgrades, adequate infrastructure is essential for maintaining a responsive and reliable Wazuh deployment.

Every component, from agents to the manager and OpenSearch Indexer, depends on sufficient compute, memory, storage, and network resources to process security events efficiently.

Simply adding more hardware is rarely enough to solve performance problems caused by excessive logging, inefficient detection rules, or poor configuration.

However, properly sized infrastructure provides the foundation needed for stable, scalable security monitoring.

CPU

CPU is one of the most heavily utilized resources in a Wazuh deployment.

The processor is responsible for:

  • Event decoding
  • Rule evaluation
  • Log parsing
  • File integrity monitoring
  • Data compression
  • Search execution
  • Dashboard queries
  • OpenSearch indexing

High CPU utilization often indicates one or more of the following:

  • Excessive event ingestion
  • Inefficient custom rules
  • Large FIM workloads
  • Complex regular expressions
  • Heavy search activity
  • Frequent OpenSearch garbage collection

Monitor sustained CPU usage rather than occasional spikes.

Temporary increases during scheduled scans or indexing operations are normal, while consistently high utilization usually indicates a bottleneck that requires investigation.

Whenever possible:

  • Separate the Wazuh Manager and OpenSearch Indexer onto dedicated servers.
  • Scale horizontally for enterprise deployments.
  • Reduce unnecessary workload before increasing CPU resources.

Memory

Memory plays a critical role in maintaining smooth performance across every component.

Insufficient RAM can lead to:

  • Swapping
  • Slow searches
  • Queue backlogs
  • Delayed alerts
  • Dashboard latency
  • OpenSearch instability

Memory is particularly important for:

  • OpenSearch heap allocation
  • Operating system page cache
  • Search caches
  • Manager processing queues
  • Agent buffers

Regular monitoring helps identify gradual memory growth that may indicate oversized indexes, insufficient heap allocation, or increasing workload.

Disk Performance

Security monitoring platforms perform continuous disk operations.

Examples include:

  • Writing alerts
  • Reading log files
  • Updating indexes
  • Performing snapshots
  • Merging index segments
  • Searching historical data

Traditional hard drives often become performance bottlenecks under sustained indexing workloads.

Solid-state drives (SSD) and NVMe storage typically provide:

  • Faster indexing
  • Lower search latency
  • Quicker recovery
  • Improved dashboard responsiveness
  • Better cluster stability

Storage performance frequently has a greater impact on OpenSearch responsiveness than additional CPU cores.

Network Bandwidth

Every agent continuously communicates with the Wazuh manager.

Bandwidth requirements increase as organizations collect:

  • Security logs
  • File integrity events
  • Vulnerability information
  • Cloud telemetry
  • Container logs
  • Windows Event Logs

Network congestion may result in:

  • Delayed event delivery
  • Agent disconnections
  • Increased processing queues
  • Synchronization delays
  • Dropped messages

While most deployments do not saturate modern enterprise networks, geographically distributed environments should monitor network latency and bandwidth utilization to ensure reliable agent communication.

Expert Insight: Wazuh recommends sizing infrastructure based on expected event volume and deployment scale rather than endpoint count alone. A relatively small number of servers generating high log volumes can consume more resources than thousands of lightly monitored endpoints.

 High CPU utilization is often caused by workload distribution rather than insufficient hardware.

See Why Is Wazuh Using High CPU? Troubleshooting Guide for practical troubleshooting techniques.


Agent Configuration

The Wazuh agent serves as the first stage of the data collection pipeline.

Efficient agent configuration reduces unnecessary workload before events ever reach the manager, making it one of the most effective ways to optimize overall platform performance.

Instead of processing every available data source, configure agents to collect only information that supports your organization’s security objectives.

Monitoring Frequency

Monitoring frequency determines how often an agent performs scheduled tasks such as inventory collection, policy evaluation, and integrity scans.

Very short intervals increase:

  • CPU utilization
  • Disk activity
  • Network traffic
  • Event generation

Longer intervals reduce resource consumption while remaining appropriate for information that changes infrequently.

Different monitoring tasks should use intervals that reflect the expected rate of change.

For example:

  • Hardware inventory may only require daily collection.
  • Software inventory may be collected every few hours.
  • Security logs should be monitored continuously.
  • File Integrity Monitoring depends on the sensitivity of monitored files.

Module Selection

Every enabled module consumes system resources.

Common Wazuh modules include:

  • Logcollector
  • Syscheck
  • Syscollector
  • Rootcheck
  • Vulnerability Detection
  • Security Configuration Assessment
  • Active Response

Not every endpoint requires every module.

For example:

  • Database servers may prioritize log monitoring.
  • Domain controllers may emphasize authentication events.
  • Development systems may require different monitoring than production servers.
  • Container hosts may benefit from specialized configurations.

Disabling unnecessary modules reduces endpoint overhead and lowers the total event volume processed by the manager.

Scan Intervals

Scheduled scans should balance detection speed with resource consumption.

Aggressive scanning schedules may:

  • Increase endpoint CPU usage.
  • Generate duplicate data.
  • Produce unnecessary network traffic.
  • Create processing spikes on the manager.

Review scan schedules for:

  • Syscheck
  • Rootcheck
  • Syscollector
  • Vulnerability Detection
  • Security Configuration Assessment

Adjust intervals based on operational requirements rather than using identical settings across every endpoint.

Event Buffering

Temporary spikes in event generation can overwhelm network links or the Wazuh manager.

Event buffering helps agents temporarily store events until they can be transmitted successfully.

Proper buffering improves reliability by:

  • Reducing dropped events
  • Handling temporary network interruptions
  • Smoothing traffic bursts
  • Preventing unnecessary retransmissions

However, excessively large buffers may increase endpoint memory usage and delay alert delivery if events accumulate faster than they can be processed.

Finding the appropriate balance depends on expected event volume and network reliability.

Expert Insight: Many experienced Wazuh administrators recommend optimizing agents before tuning the manager because every unnecessary event eliminated at the endpoint reduces processing, indexing, storage, and search workload throughout the entire platform.

If agents are generating excessive log traffic that overwhelms the manager, see Fix Wazuh Logcollector Dropped Messages for techniques to improve ingestion reliability.


Measuring Wazuh Performance

Performance optimization should always be driven by measurable data rather than assumptions.

Establishing performance baselines allows administrators to identify bottlenecks, validate configuration changes, and monitor long-term trends as the environment grows.

Regular monitoring also helps detect gradual degradation before it affects security operations.

Performance Metrics to Monitor

Several key metrics provide a comprehensive view of overall Wazuh health.

Rather than focusing on a single resource, monitor the entire processing pipeline, from endpoint collection to dashboard visualization, to identify where delays originate.

CPU Utilization

CPU usage indicates how efficiently the platform processes incoming events.

Monitor CPU consumption for:

  • Wazuh agents
  • Wazuh Manager
  • OpenSearch Indexer
  • Dashboard server

Sustained high CPU utilization often indicates:

  • Excessive log volume
  • Expensive detection rules
  • Heavy File Integrity Monitoring
  • Large search workloads
  • Insufficient hardware resources

Trend CPU usage over time to identify workload growth before it becomes a critical issue.

Memory Consumption

Memory usage provides insight into system stability.

Monitor:

  • Total RAM utilization
  • JVM heap usage
  • Swap activity
  • Operating system page cache
  • Process memory growth

Unexpected increases may indicate:

  • Memory leaks
  • Oversized indexes
  • Growing search caches
  • Poor heap allocation

Consistent monitoring helps prevent unexpected service interruptions.

Disk Usage

Storage monitoring should include both capacity and performance.

Track:

  • Available disk space
  • Disk throughput
  • IOPS
  • Read latency
  • Write latency
  • Snapshot storage

Running out of storage can halt indexing, while slow storage significantly increases search and dashboard response times.

Indexing Latency

Indexing latency measures how quickly alerts become searchable after being generated.

Increasing latency often indicates:

  • Slow disks
  • Insufficient heap memory
  • Indexing backlogs
  • Large merge operations
  • Heavy ingestion workloads

Keeping indexing delays low ensures analysts can investigate threats in near real time.

Search Latency

Search latency measures how long OpenSearch requires to execute queries.

Slow searches may result from:

  • Large indexes
  • Poor shard sizing
  • Expensive aggregations
  • Insufficient memory
  • Heavy concurrent searches

Tracking search performance helps maintain a responsive dashboard experience.

Queue Sizes

Internal queues temporarily hold events awaiting processing.

Monitor queue growth throughout the pipeline.

Rapidly increasing queues often indicate downstream bottlenecks such as:

  • Overloaded managers
  • Slow indexing
  • Network congestion
  • Rule evaluation delays

Persistent queue growth should be investigated before events begin dropping.

Agent Connection Status

Healthy agents continuously communicate with the Wazuh manager.

Monitor:

  • Connected agents
  • Disconnected agents
  • Authentication failures
  • Communication latency
  • Synchronization delays

Unexpected agent disconnects may indicate network issues, overloaded managers, certificate problems, or endpoint resource exhaustion.

Events per Second (EPS)

Events per Second (EPS) is one of the most important capacity planning metrics.

Tracking EPS helps administrators:

  • Estimate infrastructure requirements
  • Detect workload spikes
  • Measure optimization improvements
  • Forecast future hardware needs

Monitor both:

  • Average EPS
  • Peak EPS

Peak ingestion rates often determine infrastructure sizing because temporary spikes can overload systems even when average workloads remain relatively low.

Expert Insight: Capacity planning guides from OpenSearch emphasize monitoring workload trends over time rather than relying on instantaneous resource usage. Long-term metrics reveal growth patterns and help organizations scale infrastructure before performance degradation impacts production environments.

If monitoring reveals excessive manager CPU utilization during peak ingestion periods, see Why Is Wazuh Using High CPU? Troubleshooting Guide.


Useful Linux Monitoring Tools

Effective Wazuh performance tuning requires visibility at the operating system level.

Linux provides a set of low-level diagnostic tools that help identify CPU saturation, memory pressure, disk bottlenecks, and I/O contention.

These tools are essential for distinguishing between application-level inefficiencies and infrastructure constraints.

top

top provides a real-time view of system resource utilization.

It helps identify:

  • Processes consuming high CPU
  • Memory-heavy services
  • Load averages
  • System-wide resource pressure

In Wazuh environments, top is commonly used to detect spikes in:

  • Wazuh Manager CPU usage during rule evaluation
  • OpenSearch JVM memory consumption
  • Log processing surges during ingestion bursts

htop

htop is an enhanced, interactive version of top.

It provides:

  • Color-coded CPU and memory usage
  • Per-core CPU utilization
  • Easier process navigation
  • Tree view of process relationships

It is particularly useful for quickly identifying whether bottlenecks originate from:

  • OpenSearch (Java processes)
  • Wazuh manager processes
  • System-level I/O contention

vmstat

vmstat provides insight into system performance at the kernel level.

It reports:

  • CPU scheduling
  • Memory usage
  • Swap activity
  • Block I/O
  • System interrupts

Key indicators of performance issues include:

  • High swap usage (memory pressure)
  • High CPU wait time (I/O bottlenecks)
  • Frequent context switching (overloaded CPU)

iostat

iostat focuses on disk performance and is critical for diagnosing OpenSearch bottlenecks.

It helps monitor:

  • Disk read/write throughput
  • I/O wait times
  • Device utilization

High I/O wait is a strong indicator that:

  • Indexing is saturating storage
  • Disk latency is limiting search performance
  • Snapshot or merge operations are overwhelming the system

sar

sar (System Activity Reporter) is useful for historical performance analysis.

It tracks:

  • CPU utilization over time
  • Memory consumption trends
  • Network activity
  • Disk I/O history

Unlike real-time tools, sar is valuable for identifying recurring performance patterns such as:

  • Daily ingestion spikes
  • Scheduled scan overhead
  • Nightly indexing pressure

free

free provides a snapshot of system memory usage.

It shows:

  • Total RAM
  • Used memory
  • Available memory
  • Buffers and cache

In Wazuh deployments, low available memory often correlates with:

  • OpenSearch heap pressure
  • Large query workloads
  • Excessive indexing activity

df

df monitors disk space usage.

It is essential for ensuring:

  • Index storage does not reach capacity limits
  • Log partitions do not fill up
  • Snapshot repositories remain functional

Running out of disk space can halt indexing entirely, making this one of the most critical monitoring tools.

dstat

dstat provides a combined view of CPU, memory, disk, and network usage.

It is especially useful for:

  • Correlating network spikes with event ingestion
  • Identifying I/O bursts during indexing
  • Observing system-wide resource contention in real time

Wazuh Logs That Help Diagnose Performance Problems

Wazuh generates multiple log streams across its architecture.

These logs are essential for diagnosing performance bottlenecks, failed processing stages, and system-level inefficiencies.

Each component provides different visibility into system behavior.

Manager Logs

The Wazuh manager logs are the primary source of operational diagnostics.

They help identify:

  • Rule evaluation delays
  • Event decoding errors
  • Queue overflows
  • Active response execution issues
  • Agent communication problems

Common performance-related symptoms include:

  • Increased event latency warnings
  • Buffer overflow messages
  • Rule processing bottlenecks
  • Dropped event indicators

When diagnosing high CPU usage or alert delays, manager logs are usually the first place to investigate.

If manager CPU is consistently high during event processing, see Why Is Wazuh Using High CPU? Troubleshooting Guide.

Agent Logs

Agent logs provide insight into endpoint-side performance issues.

They help identify:

  • Logcollector failures
  • File Integrity Monitoring overload
  • Syscollector delays
  • Connectivity issues with the manager
  • Buffer saturation on endpoints

Typical performance signals include:

  • Missed log entries
  • High local CPU usage on endpoints
  • Buffer overflow warnings
  • Delayed event transmission

Agent-side issues often cascade into manager-side performance problems when events are retransmitted or batched inefficiently.

OpenSearch Logs

OpenSearch logs are critical for diagnosing indexing and search performance issues.

They reveal:

  • Heap memory pressure
  • Garbage collection activity
  • Slow queries
  • Shard rebalancing
  • Indexing failures
  • Disk watermark warnings

Common performance indicators include:

  • Long GC pause times
  • Thread pool rejections
  • Index write delays
  • Shard allocation failures

These logs are essential when dashboards become slow or alerts are delayed in appearing.

For memory-related crashes or instability, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.

Dashboard Logs

The Wazuh Dashboard logs help diagnose frontend and query-layer performance issues.

They include:

  • API request latency
  • Failed query executions
  • Visualization rendering errors
  • Authentication delays
  • Backend connection issues

While the dashboard is rarely the root cause of performance issues, it often exposes upstream problems such as slow indexing or inefficient queries.


Optimizing Wazuh Agents

Wazuh agents are the first line of data collection and have a significant impact on overall system performance.

Poorly configured agents generate excessive data, increasing load across the entire pipeline, from network transmission to manager processing and OpenSearch indexing.

Effective optimization focuses on reducing unnecessary telemetry while preserving security visibility.

Reduce Unnecessary Log Collection

Not all logs provide meaningful security value.

Collecting everything leads to unnecessary noise, higher CPU usage, and increased storage consumption.

Focus on:

  • Security-relevant logs
  • Authentication events
  • System-critical application logs
  • Endpoint behavior indicators

Avoid collecting:

  • Debug logs in production
  • High-frequency application logs
  • Redundant telemetry sources

Reducing log collection at the source is one of the most effective performance optimizations available.

Exclude Noisy Log Sources

Certain log sources generate excessive, low-value events.

Common examples include:

  • Browser caches
  • Temporary application files
  • Container runtime logs
  • Build directories
  • High-frequency debug outputs

Excluding these sources prevents unnecessary ingestion and reduces downstream processing load.

Filter Unnecessary Events

Filtering allows agents to discard irrelevant events before transmission.

This reduces:

  • Network bandwidth usage
  • Manager CPU load
  • Indexing overhead
  • Storage consumption

Event filtering is particularly useful in high-volume environments where only a subset of logs is relevant for security monitoring.

Limit Verbose Applications

Applications running in verbose or debug mode can overwhelm Wazuh systems with excessive logs.

Examples include:

  • Web servers in debug mode
  • Database systems with query logging enabled
  • Development environments
  • Container orchestration platforms with high verbosity settings

Whenever possible, adjust logging levels to production-appropriate settings while preserving security-relevant events.


Optimize File Integrity Monitoring

File Integrity Monitoring (FIM) is one of the most resource-intensive Wazuh features.

Proper optimization is essential for maintaining system stability and preventing unnecessary CPU and disk usage.

See How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU for a deeper breakdown of optimization strategies.

Reduce Monitored Directories

Monitoring fewer directories significantly reduces CPU usage and event generation.

Prioritize:

  • System binaries
  • Security-critical configuration files
  • Authentication directories
  • Application configuration paths

Avoid broad directory monitoring such as entire file systems or user home directories unless explicitly required.

Exclude Temporary Folders

Temporary and cache directories generate constant file changes that produce high event volumes.

Common exclusions include:

  • /tmp
  • Application cache directories
  • Browser cache locations
  • Build output directories
  • Container ephemeral storage

Excluding these paths prevents unnecessary FIM load.

Increase Scan Intervals

Frequent scans can overwhelm endpoints, especially in large file systems.

Increasing scan intervals:

  • Reduces CPU usage
  • Decreases disk I/O
  • Lowers event volume

This is particularly effective for non-critical directories.

Disable Unnecessary Hashing

Hash calculation is one of the most expensive operations in FIM.

Reducing hashing frequency or limiting it to critical files helps:

  • Lower CPU consumption
  • Reduce disk I/O
  • Improve scan performance

Only enable hashing where integrity verification is truly required.

Monitor Only Critical Files

The most effective FIM optimization strategy is narrowing scope.

Focus on:

  • Authentication files
  • System binaries
  • Configuration files
  • Privilege escalation paths

Avoid monitoring files that change frequently without security implications.


Optimize Scheduled Scans

Scheduled scans contribute significantly to endpoint and manager workload, especially in large environments.

Proper tuning ensures consistent performance without compromising detection coverage.

Syscheck

Syscheck scans detect file changes and configuration modifications.

Poor configuration can result in excessive CPU usage and large event volumes.

Optimization strategies include:

  • Reducing scan scope
  • Increasing scan intervals
  • Excluding high-churn directories

Rootcheck

Rootcheck identifies rootkits and system compromises.

To optimize performance:

  • Avoid overly frequent scans
  • Focus on critical endpoints
  • Schedule scans during off-peak hours

Vulnerability Scans

Vulnerability detection consumes CPU and network resources.

Optimization approaches include:

  • Staggering scan schedules
  • Reducing scan frequency on stable systems
  • Prioritizing high-risk assets

Inventory Collection

Inventory modules (Syscollector) gather system information.

To reduce overhead:

  • Increase collection intervals
  • Limit unnecessary data types
  • Avoid redundant collection across environments

Tune Agent Resource Usage

Beyond individual modules, overall agent behavior must be tuned to ensure efficient resource utilization.

Reduce Polling Frequency

Frequent polling increases CPU usage and network traffic.

Adjust polling intervals based on:

  • Asset criticality
  • Change frequency
  • Security requirements

Optimize Buffering

Agent buffers temporarily store events during network interruptions or bursts.

Proper configuration helps:

  • Prevent data loss
  • Smooth traffic spikes
  • Reduce retransmissions

However, oversized buffers can increase memory usage and delay event delivery.

Disable Unused Modules

Every enabled module consumes resources.

Disabling unused modules reduces:

  • CPU usage
  • Memory consumption
  • Network traffic
  • Manager processing load

Only enable modules that directly support your monitoring objectives.


Tune Agent Resource Usage

Agent-level tuning is one of the highest-leverage optimization strategies in Wazuh because every event eliminated at the endpoint reduces load across the entire pipeline, manager processing, indexing, storage, and search.

Reduce Polling Frequency

Frequent polling increases CPU usage, disk activity, and network traffic on endpoints.

Adjust polling intervals based on how often data actually changes:

  • Increase Syscollector intervals for stable systems
  • Reduce inventory refresh frequency on large fleets
  • Avoid overly aggressive scan schedules for low-risk endpoints

Over-polling often produces redundant data without improving detection capability.

Optimize Buffering

Agent buffering temporarily stores events when network or manager throughput is limited.

Proper tuning helps:

  • Smooth traffic spikes
  • Prevent event loss during transient outages
  • Reduce retransmission overhead

However, excessive buffering can:

  • Increase endpoint memory usage
  • Delay event delivery
  • Mask upstream bottlenecks

Buffer size should reflect expected peak ingestion, not theoretical maximums.

Disable Unused Modules

Every enabled module consumes CPU, memory, and I/O resources.

Commonly unnecessary modules depending on environment include:

  • Vulnerability Detection on non-production systems
  • Rootcheck on containerized workloads
  • Syscollector on short-lived instances
  • Active Response where manual remediation is preferred

Disabling unused modules reduces endpoint overhead and significantly lowers total event volume entering the system.


Optimizing the Wazuh Manager

The Wazuh Manager is responsible for decoding events, evaluating rules, performing correlation, and generating alerts.

It is often the primary CPU bottleneck in large deployments.

Optimize Rule Processing

Rule evaluation is one of the most expensive operations in the Wazuh pipeline.

Each incoming event is compared against thousands of rules, making efficiency critical.

Remove Unused Rules

Unused or irrelevant rules still consume CPU during evaluation.

Optimization steps include:

  • Disabling unused compliance packs
  • Removing legacy detections
  • Eliminating duplicate rule sets
  • Pruning environment-specific irrelevant rules

A smaller, well-maintained ruleset significantly improves throughput.

Simplify Regex Patterns

Regular expressions are computationally expensive and should be used sparingly.

Optimization strategies:

  • Prefer exact string matching over regex when possible
  • Anchor patterns to reduce backtracking
  • Avoid nested wildcards and overly broad expressions
  • Limit regex to high-value detections only

Even minor regex improvements can reduce CPU usage at scale.

Optimize Rule Order

Wazuh evaluates rules sequentially, meaning inefficient ordering increases processing time.

Best practices:

  • Place high-frequency rules early
  • Filter benign events before complex evaluation
  • Prioritize simple conditions before expensive logic

Efficient rule ordering reduces unnecessary computation.

Reduce Expensive Correlations

Correlation rules combine multiple events into higher-level detections but are computationally intensive.

To optimize:

  • Limit correlation depth
  • Avoid overly broad matching windows
  • Use correlation only for high-confidence detections
  • Pre-filter events before correlation logic executes

Reduce False Positives

False positives increase system load by generating unnecessary alerts, increasing indexing volume, and overwhelming analysts.

See How to Reduce False Positives in Wazuh for detailed tuning strategies.

Rule Tuning

Fine-tuning detection rules improves both accuracy and performance.

Approaches include:

  • Adjusting rule severity levels
  • Narrowing event conditions
  • Disabling overly sensitive detections
  • Aligning rules with real environment behavior

Well-tuned rules reduce unnecessary processing downstream.

Threshold Adjustments

Threshold-based rules trigger only after a defined number of events occur.

Proper tuning:

  • Reduces alert noise
  • Prevents repeated triggering for benign behavior
  • Improves signal-to-noise ratio

However, thresholds must be balanced to avoid missing genuine threats.

Event Suppression

Event suppression prevents repeated alerts from identical or low-value events.

Benefits include:

  • Reduced indexing load
  • Lower storage usage
  • Improved dashboard clarity

Suppression should be applied carefully to avoid hiding meaningful anomalies.

Custom Rule Refinement

Custom rules should be reviewed regularly to ensure efficiency.

Key improvements:

  • Remove redundant conditions
  • Avoid overlapping logic
  • Consolidate similar rules
  • Optimize field matching

Poorly designed custom rules are a common source of performance degradation.


Improve Queue Performance

Wazuh uses internal queues to manage event flow between agents, the manager, and the indexer.

Queue inefficiencies often lead to event delays or drops.

Event Queues

Event queues temporarily store incoming logs before processing.

When queues become saturated:

  • Events are delayed
  • Memory usage increases
  • Processing latency grows

Queue saturation typically indicates downstream bottlenecks in rule processing or indexing.

Processing Workers

Processing workers handle event decoding and rule evaluation.

To optimize:

  • Ensure sufficient worker allocation for workload size
  • Scale horizontally in high-ingestion environments
  • Avoid CPU contention between manager processes

Insufficient workers lead to backlogs and delayed alert generation.

Connection Tuning

Agent-to-manager connections must be stable and efficient.

Optimization includes:

  • Proper TCP configuration
  • Load balancing for large deployments
  • Reducing connection churn
  • Ensuring consistent network latency

Connection instability increases retransmissions and queue pressure.


Optimize Active Response

Active Response automates mitigation actions but can become a performance burden if misconfigured.

Avoid Unnecessary Executions

Each response action consumes CPU and system resources.

Avoid triggering responses for:

  • Low-confidence alerts
  • High-frequency benign events
  • Non-actionable detections

Overuse of automation can significantly increase system load.

Configure Cooldown Periods

Cooldown periods prevent repeated execution of the same response within a short timeframe.

Benefits include:

  • Reduced system thrashing
  • Lower CPU usage
  • Prevention of redundant actions

Cooldowns are essential in noisy environments.

Limit Automation Scope

Active Response should be reserved for high-confidence threats.

Best practices:

  • Apply to critical severity rules only
  • Restrict execution to specific endpoints
  • Avoid broad system-wide automation

This ensures responsiveness without overwhelming system resources.


Optimizing OpenSearch Performance

OpenSearch is responsible for indexing, storing, and searching Wazuh alerts.

Poor configuration here can severely impact dashboard performance and alert visibility.

Tune JVM Heap Size

JVM heap size directly affects indexing stability and search performance.

See How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes for detailed configuration guidance.

Recommended Heap Allocation

General best practices include:

  • Allocate ~50% of system RAM to heap (up to a safe limit)
  • Avoid exceeding ~32 GB heap due to JVM pointer optimization limits
  • Ensure remaining RAM is available for OS file cache

Balanced heap allocation improves both indexing and search performance.

Garbage Collection Tuning

Garbage collection (GC) affects latency and responsiveness.

Symptoms of poor GC tuning:

  • Query delays
  • Indexing pauses
  • CPU spikes
  • Irregular performance patterns

Optimizing GC reduces pause times and improves system stability.

Avoid Oversized Heaps

Excessively large heaps can:

  • Increase GC pause duration
  • Reduce OS cache efficiency
  • Degrade search performance

Proper sizing is more effective than simply maximizing memory allocation.


Optimize Index Management

Efficient index management ensures OpenSearch remains performant as data grows.

Index Lifecycle Management (ILM)

ILM automates index transitions through stages:

  • Hot (active indexing)
  • Warm (reduced activity)
  • Cold (archival)
  • Delete (removal)

This prevents uncontrolled index growth.

Index Rotation

Regular index rotation:

  • Limits shard size
  • Improves search efficiency
  • Reduces indexing overhead

Proper rotation policies are essential for long-term scalability.

Retention Policies

Retention policies define how long data is stored.

Benefits:

  • Controlled storage growth
  • Faster queries
  • Reduced maintenance overhead

Retention should align with compliance requirements.

Delete Old Indices

Old indices should be removed or archived to prevent:

  • Storage exhaustion
  • Slow searches
  • Increased cluster overhead

Automated cleanup improves long-term performance stability.


Improve Search Performance

Search performance directly affects dashboard responsiveness and analyst efficiency.

Optimize Mappings

Efficient mappings reduce indexing and search overhead.

Best practices:

  • Use appropriate field types
  • Avoid unnecessary full-text indexing
  • Disable unused fields

Poor mappings increase storage and query complexity.

Reduce Shard Count

Too many shards increase cluster overhead.

Effects include:

  • Higher memory usage
  • Slower queries
  • Increased coordination overhead

Proper shard sizing improves performance significantly.

Merge Segments

Segment merging improves search efficiency by reducing index fragmentation.

Benefits:

  • Faster queries
  • Lower disk usage
  • Improved indexing stability

However, merging should be balanced to avoid excessive I/O load.

Query Optimization

Inefficient queries degrade performance.

Optimization strategies:

  • Avoid wildcard-heavy searches
  • Use time filters whenever possible
  • Limit aggregation complexity
  • Narrow query scope

Well-structured queries dramatically improve dashboard responsiveness.


Storage Optimization

Storage is a foundational component of Wazuh performance.

SSD vs HDD

SSDs provide significantly better performance than HDDs:

  • Lower latency
  • Higher IOPS
  • Faster indexing
  • Improved search performance

HDDs often become bottlenecks in high-ingestion environments.

RAID Considerations

Firstly, RAID configuration impacts redundancy and performance:

  • RAID 1 improves redundancy
  • RAID 10 balances performance and redundancy
  • RAID 5 may introduce write penalties

RAID selection should reflect workload intensity and resilience requirements.

Disk Monitoring

Continuous disk monitoring helps prevent:

  • Storage exhaustion
  • Performance degradation
  • Indexing failures

Key metrics include:

  • Disk usage
  • I/O latency
  • Throughput
  • Queue depth

Reducing High CPU Usage

High CPU usage in Wazuh environments typically results from cumulative inefficiencies across multiple components rather than a single issue.

Common Causes

File Integrity Monitoring

  • Large directory scans
  • Frequent file changes
  • Real-time monitoring overhead
  • Excessive hashing

Rule Evaluation

  • Expensive regex patterns
  • Large rulesets
  • Poor rule ordering
  • Excessive correlation logic

Large Log Volumes

  • Excessive Windows Event Logs
  • Verbose application logging
  • Duplicate log sources
  • High ingestion rates

OpenSearch Indexing

  • Large shards
  • Insufficient heap memory
  • Slow disk performance
  • High garbage collection activity

Agent Scanning

  • Frequent Syscollector scans
  • Overactive vulnerability detection
  • High-frequency polling intervals

Addressing these areas holistically produces the most significant performance improvements.


Troubleshooting High CPU

High CPU usage in Wazuh environments is rarely caused by a single component.

Instead, it typically results from a combination of excessive event ingestion, inefficient rule processing, heavy File Integrity Monitoring workloads, and indexing pressure in OpenSearch.

For a deeper breakdown of root causes and diagnostics, see Why Is Wazuh Using High CPU? Troubleshooting Guide.

Identify the Affected Process

The first step is to determine which component is consuming CPU resources.

Key processes to inspect:

  • wazuh-manager (rule evaluation, decoding, correlation)
  • filebeat / log forwarders (log shipping)
  • java (OpenSearch JVM)
  • Agent processes (endpoint-side load)

Use system monitoring tools to isolate whether CPU usage is concentrated on:

  • A single node (localized issue)
  • A cluster-wide pattern (systemic issue)
  • Specific time windows (scheduled scans or ingestion spikes)

Analyze Workload

Once the affected process is identified, evaluate the workload it is handling.

Common workload indicators include:

  • Events per second (EPS) spikes
  • Large bursts of Windows Event Logs
  • FIM scan activity
  • Scheduled vulnerability scans
  • Heavy dashboard query traffic

Understanding workload patterns helps distinguish between expected peak behavior and misconfiguration-driven overload.

Review Configuration

Configuration issues are a leading cause of sustained CPU saturation.

Focus on:

  • Log sources and verbosity levels
  • Rule complexity and redundancy
  • FIM scope and scan frequency
  • OpenSearch heap allocation
  • Index shard configuration

Many CPU issues are resolved by removing unnecessary processing rather than increasing hardware capacity.

Apply Targeted Optimizations

After identifying the bottleneck, apply specific fixes:

  • Reduce log ingestion volume
  • Simplify or disable expensive rules
  • Optimize FIM configurations
  • Tune OpenSearch heap and shard settings
  • Adjust agent polling intervals

Targeted changes are more effective than broad system upgrades.


Fixing Memory Problems

Memory issues in Wazuh deployments primarily originate from OpenSearch heap pressure, large datasets, and inefficient query patterns.

If left unresolved, they can lead to service instability, slow searches, and system crashes.

Common Memory Issues

OpenSearch Heap Exhaustion

When JVM heap memory is insufficient, OpenSearch may:

  • Trigger frequent garbage collection
  • Reject indexing requests
  • Crash under load
  • Degrade search performance

This is one of the most common causes of Wazuh instability in large deployments.

Large Rule Sets

Excessive rule complexity indirectly contributes to memory pressure by:

  • Increasing event processing time
  • Expanding in-memory queues
  • Raising correlation overhead

Heavy Searches

Complex queries, especially those with aggregations over large time ranges, increase:

  • Memory consumption
  • CPU usage
  • GC frequency

Memory Leaks

Although less common, misconfigured plugins or inefficient processes can gradually increase memory usage over time, eventually leading to instability.

Best Practices

Heap Sizing

Proper heap sizing is critical for OpenSearch stability.

Key principles:

  • Allocate approximately 50% of system RAM to JVM heap (within safe limits)
  • Avoid exceeding JVM pointer optimization thresholds (~32 GB heap)
  • Preserve sufficient RAM for OS file caching

For detailed configuration guidance, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.

JVM Monitoring

Monitor garbage collection behavior to detect early signs of memory stress:

  • GC pause frequency
  • Heap usage trends
  • Allocation rates
  • Old generation pressure

Memory Alerts

Set alerts for:

  • Sustained high heap usage
  • Swap activity
  • Increasing GC pause times
  • Indexing latency spikes

Early detection prevents cascading system failures.

Capacity Planning

Memory requirements should scale with:

  • Event ingestion rate
  • Index size
  • Retention period
  • Dashboard query complexity

Proper planning prevents reactive scaling and unexpected outages.


Optimizing Log Collection

Log collection is one of the highest-impact areas for performance optimization because it directly determines how much data enters the Wazuh pipeline.

Reduce Logcollector Overhead

Logcollector continuously monitors configured sources. Inefficient configuration can overwhelm both endpoints and the manager.

Exclude Unnecessary Logs

Exclude sources that do not contribute to security visibility, such as:

  • Debug logs
  • Temporary application logs
  • Cache directories
  • Development artifacts

Optimize Polling

Excessive polling increases CPU usage and network traffic.

Best practices:

  • Increase polling intervals for low-value logs
  • Avoid unnecessary real-time monitoring where scheduling is sufficient
  • Align polling frequency with log generation rates

Filter Duplicate Events

Duplicate log sources significantly increase processing overhead.

Common duplication sources:

  • Multiple agents monitoring the same file
  • Redundant syslog forwarding
  • Overlapping application logging configurations

Reduce Noisy Applications

Verbose applications generate excessive logs that provide little security value.

Examples include:

  • Debug-enabled web servers
  • Database query logging
  • Container runtime verbosity
  • Development tools in production

Prevent Dropped Messages

Dropped logs indicate that the system is overwhelmed and cannot process events fast enough.

See Fix Wazuh Logcollector Dropped Messages for detailed mitigation strategies.

Increase Buffers

Larger buffers help absorb short-term spikes in log volume, but must be carefully balanced to avoid memory pressure on endpoints.

Reduce Log Bursts

Control sudden ingestion spikes by:

  • Staggering agent reporting intervals
  • Reducing simultaneous scan schedules
  • Avoiding synchronized batch jobs across endpoints

Improve Storage Performance

Slow storage increases backlog formation and contributes to dropped messages.

Upgrading to SSD or NVMe significantly improves ingestion stability.

Verify Manager Throughput

Ensure the Wazuh manager can process incoming events at peak load.

If ingestion exceeds processing capacity, queue buildup and event drops become inevitable.


Scaling Wazuh for Large Environments

As environments grow, single-node or minimally configured deployments become insufficient.

Scaling ensures Wazuh can handle increasing event volume while maintaining performance and reliability.

Horizontal Scaling

Horizontal scaling distributes workload across multiple nodes.

Multiple Managers

Deploying multiple Wazuh managers:

  • Distributes event processing
  • Reduces CPU bottlenecks
  • Improves fault tolerance

Load Balancing

Load balancers distribute agent traffic across available managers, preventing overloading of a single node.

Distributed Architecture

A distributed design separates:

  • Agents
  • Managers
  • Indexers
  • Dashboards

This improves scalability and isolates performance bottlenecks.

OpenSearch Cluster Scaling

OpenSearch must scale alongside the Wazuh manager to maintain performance.

Dedicated Master Nodes

Master nodes handle cluster coordination and should not be burdened with indexing workloads.

Data Nodes

Data nodes store and index logs. Scaling data nodes improves:

  • Indexing throughput
  • Query performance
  • Storage capacity

Coordinating Nodes

Coordinating nodes handle search and aggregation requests, improving dashboard responsiveness.

Replica Planning

Replicas improve:

  • Fault tolerance
  • Read performance
  • Query distribution

However, they also increase storage requirements and indexing overhead.

Agent Scaling Best Practices

Enrollment Strategy

Efficient onboarding prevents configuration drift and performance issues.

Best practices:

  • Use centralized enrollment
  • Apply consistent policies
  • Automate configuration distribution

Group Policies

Grouping agents allows:

  • Consistent configuration
  • Reduced management overhead
  • Targeted optimization strategies

Configuration Management

Automated configuration management ensures:

  • Uniform logging policies
  • Controlled FIM scope
  • Consistent scan intervals

Wazuh Performance Optimization Checklist

A structured checklist ensures consistent tuning across environments.

  • Monitor CPU, memory, disk, and network utilization
  • Reduce unnecessary log collection
  • Tune File Integrity Monitoring
  • Remove noisy detection rules
  • Reduce false positives
  • Optimize OpenSearch heap size
  • Configure index lifecycle management
  • Optimize shard allocation
  • Rotate and delete old indices
  • Increase Logcollector efficiency
  • Monitor indexing latency
  • Benchmark after configuration changes
  • Scale infrastructure before bottlenecks occur
  • Review performance metrics regularly

Common Wazuh Performance Mistakes

Many performance issues stem from predictable configuration mistakes.

Monitoring Everything

Collecting all possible logs creates excessive noise and unnecessary processing overhead.

Ignoring Noisy Logs

Failing to filter verbose applications significantly increases ingestion volume.

Oversized FIM Configurations

Monitoring entire filesystems leads to massive CPU and storage consumption.

Poor OpenSearch Heap Configuration

Incorrect heap sizing causes instability, slow searches, and indexing failures.

Too Many Shards

Excessive shards increase cluster overhead and reduce efficiency.

Keeping Data Forever

Unlimited retention leads to storage exhaustion and degraded performance.

Ignoring Capacity Planning

Lack of planning results in reactive scaling and unexpected outages.

Not Monitoring Performance Metrics

Without metrics, optimization becomes guesswork rather than engineering.


Frequently Asked Questions (FAQ)

Question: What is Wazuh performance optimization?

It is the process of tuning agents, managers, and OpenSearch to improve event processing efficiency, reduce resource usage, and increase detection speed.

Question: Why is Wazuh using so much CPU?

Common causes include high log volume, inefficient rules, File Integrity Monitoring overload, and OpenSearch indexing pressure.

Question: How do I reduce Wazuh memory usage?

Optimize OpenSearch heap size, reduce query complexity, and limit data ingestion.

Question: How can I make Wazuh faster?

Reduce log volume, tune rules, optimize indexing, and improve storage performance.

Question: How do I optimize File Integrity Monitoring?

Limit monitored directories, exclude temporary folders, and reduce scan frequency.

Question: Why is OpenSearch crashing with Wazuh?

Usually due to heap exhaustion, poor shard configuration, or insufficient memory allocation.

Question: How do I reduce false positives in Wazuh?

Tune rules, adjust thresholds, and suppress noisy events.

Question: What causes dropped Logcollector messages?

High ingestion rates, insufficient buffers, or manager throughput limitations.

Question: How many events per second can Wazuh handle?

It depends on hardware, configuration, and tuning; optimized deployments can scale to very high EPS.

Question: Should I use SSDs for Wazuh?

Yes. SSDs significantly improve indexing, search, and overall system responsiveness.

Question: What hardware is recommended for production Wazuh deployments?

Multi-core CPUs, sufficient RAM for OpenSearch heap, and SSD/NVMe storage are recommended.

Question: How often should I review Wazuh performance?

Regularly, ideally continuously via monitoring dashboards, with deeper reviews during scaling or configuration changes.


Conclusion

Wazuh performance optimization is a continuous process rather than a one-time configuration task.

The most impactful improvements come from reducing unnecessary workload at the source, tuning detection logic, and ensuring that OpenSearch is properly sized and maintained.

The key strategies across all environments include minimizing log noise, optimizing File Integrity Monitoring, refining detection rules, properly configuring OpenSearch heap and shards, and continuously monitoring system metrics to detect early signs of degradation.

As deployments scale, proactive capacity planning becomes essential.

Performance issues are far easier to prevent than to resolve after they impact production systems.

For deeper implementation guidance, explore the following detailed optimization guides:

Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *