The Complete Wazuh Performance Optimization Guide

As your security environment grows, Wazuh can quickly begin processing millions of events every day.

Endpoint telemetry, system logs, cloud events, file integrity monitoring, vulnerability scans, and custom detection rules all compete for CPU, memory, storage, and indexing resources.

Without proper tuning, even a well-designed deployment can suffer from high resource utilization, delayed alerts, slow dashboards, dropped events, and excessive false positives.

A variety of factors can negatively impact performance, including excessive log collection, inefficient custom rules, oversized File Integrity Monitoring (FIM) configurations, insufficient OpenSearch heap memory, overloaded managers, poor storage performance, and unnecessary event duplication.

As environments scale, these issues compound and can significantly reduce both detection speed and operational visibility.

Several major components influence overall Wazuh performance:

Wazuh agents
Logcollector
Syscollector
Syscheck (File Integrity Monitoring)
Rootcheck
Active Response
Wazuh Manager
OpenSearch Indexer
Wazuh Dashboard
Storage subsystem
Network bandwidth
Detection rules
Decoders
Index lifecycle management

Each component contributes differently to overall system performance, making end-to-end optimization essential rather than focusing on only a single bottleneck.

In this guide, you’ll learn how Wazuh processes security data, where performance bottlenecks typically occur, which configuration settings have the greatest impact, and how to optimize every major component of the platform.

You’ll also learn practical techniques for scaling Wazuh, reducing unnecessary workload, improving indexing performance, minimizing false positives, and building a faster, more stable deployment for enterprise environments.

The performance of a Wazuh deployment is closely tied to overall monitoring architecture.

See The Complete Wazuh Monitoring Guide to understand how monitoring components generate and process security telemetry throughout your environment.

Understanding Wazuh Performance

Optimizing Wazuh starts with understanding how security events travel through the platform.

Every log, file change, vulnerability scan, or endpoint event passes through multiple processing stages before appearing as an alert in the dashboard.

Performance issues can occur at any point in this pipeline, making it essential to understand each component’s role.

How Wazuh Processes Security Data

A simplified processing pipeline looks like this:

Endpoint
     │
     ▼
Wazuh Agent
     │
     ▼
Log Collection
(Syscheck / Syscollector / Rootcheck)
     │
     ▼
Secure Agent Communication
     │
     ▼
Wazuh Manager
     │
     ├── Decoders
     ├── Rules
     ├── Correlation
     └── Active Response
     │
     ▼
OpenSearch Indexer
     │
     ▼
Wazuh Dashboard

Each stage consumes different system resources and can become a bottleneck under heavy workloads.

Wazuh Agents

The Wazuh agent runs on monitored endpoints and is responsible for collecting security telemetry.

Depending on its configuration, an agent may collect:

Operating system logs
Windows Event Logs
Linux Syslog
Application logs
File Integrity Monitoring events
Inventory information
Vulnerability detection data
Security configuration assessments

Although each individual agent consumes relatively little CPU, thousands of agents can collectively generate enormous event volumes that stress the manager and indexer.

Reducing unnecessary data collection at the endpoint is often the most effective optimization strategy because it eliminates unnecessary processing throughout the rest of the pipeline.

Logcollector

Logcollector continuously monitors configured log sources and forwards new entries to the Wazuh manager.

Performance issues commonly occur when administrators:

Monitor unnecessary log files
Collect verbose debug logs
Read duplicate log sources
Include excessive wildcard paths
Process extremely high-volume applications

Poor log collection strategies often generate far more events than security teams actually need.

For a detailed walkthrough of preventing lost events during heavy log ingestion, see Fix Wazuh Logcollector Dropped Messages.

Syscollector

Syscollector inventories endpoint assets such as:

Installed software
Hardware
Operating system information
Running processes
Network interfaces
Packages

Because inventory data changes infrequently, aggressive scan intervals usually provide little additional value while increasing CPU usage and network traffic.

Scheduling inventory scans appropriately helps reduce unnecessary endpoint load.

Syscheck (File Integrity Monitoring)

Syscheck monitors file systems for:

File creation
File deletion
Permission changes
Ownership changes
Content modifications
Registry changes (Windows)

While File Integrity Monitoring is one of Wazuh’s most valuable security capabilities, it is also one of the most resource-intensive.

Scanning large directory trees, frequently changing files, build directories, container volumes, package caches, or temporary folders can consume significant CPU and generate excessive alerts.

Learn how to dramatically reduce resource consumption in How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU.

Rootcheck

Rootcheck searches systems for indicators of compromise, rootkits, hidden processes, suspicious ports, and unauthorized system modifications.

Since rootkit detection is generally performed on scheduled intervals rather than continuously, performance impact is usually modest.

However, unnecessarily frequent scans across thousands of endpoints can noticeably increase CPU utilization.

Active Response

Active Response automatically executes predefined remediation actions when certain rules trigger.

Examples include:

Blocking malicious IP addresses
Killing malicious processes
Disabling compromised accounts
Running custom scripts

Performance issues rarely originate from Active Response itself but can arise when response scripts are inefficient or trigger excessively due to noisy detection rules.

Wazuh Manager

The manager acts as the central processing engine.

Its responsibilities include:

Receiving agent events
Decoding logs
Evaluating detection rules
Correlating events
Generating alerts
Coordinating Active Response
Forwarding alerts to the indexer

As deployments grow, the manager often becomes the primary CPU bottleneck because every incoming event must pass through its rule engine.

Inefficient custom rules, excessive event volume, and unnecessary correlation logic significantly increase processing time.

Indexer (OpenSearch)

After alerts are generated, they are stored inside OpenSearch.

The indexer is responsible for:

Writing alerts
Maintaining indexes
Compressing data
Executing searches
Aggregations
Dashboard queries

High indexing latency, insufficient heap memory, disk bottlenecks, or oversized shards can dramatically reduce overall system responsiveness.

Learn how to properly size Java heap memory in How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.

Dashboard

The Wazuh Dashboard provides visualization, search, reporting, and investigation capabilities.

Dashboard performance depends on:

Query complexity
Index size
OpenSearch performance
Browser resources
Visualization configuration
Aggregation speed

A slow dashboard often indicates underlying indexing or storage bottlenecks rather than problems with the interface itself.

Where Performance Bottlenecks Usually Occur

Although every environment is different, performance issues tend to appear in a handful of predictable areas.

Endpoint Resource Usage

Agents consume CPU while collecting logs, monitoring files, scanning configurations, and generating telemetry.

Common causes include:

Oversized FIM configurations
Excessive Windows Event Logs
Frequent inventory scans
Large log files
High-frequency scheduled scans

Manager Processing

The manager evaluates every incoming event against thousands of detection rules.

Heavy workloads increase:

CPU utilization
Processing queues
Event latency
Memory consumption

Large enterprise deployments often require clustering or load balancing to distribute processing.

Rule Evaluation

Custom rules with inefficient matching logic increase processing time considerably.

Common issues include:

Overly broad regex patterns
Excessive nested rules
Duplicate rules
Poor rule ordering
Expensive correlation logic

Event Decoding

Before rule evaluation, every log must be decoded into structured fields.

Complex decoders and malformed log formats increase parsing overhead and reduce throughput.

Alert Indexing

Writing alerts to OpenSearch requires:

JSON serialization
Index mapping
Shard selection
Disk writes
Replication
Segment merging

Slow disks or poorly configured indexes can create indexing backlogs that delay alert availability.

Search Performance

Large indexes increase search latency, particularly when dashboards execute multiple aggregations simultaneously.

Performance depends heavily on:

Heap allocation
Index lifecycle policies
Shard sizing
Query optimization

Dashboard Rendering

Complex visualizations and large time ranges require significant processing.

Rendering delays commonly result from expensive backend queries rather than browser limitations.

Storage Limitations

Storage performance affects nearly every component.

Slow disks increase:

Indexing latency
Search times
Snapshot duration
Recovery speed
Cluster stability

Using SSD or NVMe storage typically provides substantial improvements for high-ingestion environments.

Expert Insight: The official Wazuh documentation recommends carefully limiting collected data, tuning monitored directories, and optimizing manager and indexer resources before simply increasing hardware capacity. Eliminating unnecessary workload generally produces larger performance gains than adding CPU alone.

Key Factors That Affect Wazuh Performance

Even powerful servers can struggle if Wazuh is configured inefficiently.

Most performance problems stem from excessive data collection rather than insufficient hardware.

Understanding the primary workload drivers helps prioritize optimization efforts.

Log Volume

Every collected log must be transmitted, decoded, evaluated against detection rules, indexed, stored, and queried.

As log volume increases, resource consumption rises across every component of the platform.

The most effective optimization strategy is often reducing unnecessary events before they ever reach the manager.

High Event Ingestion

Organizations monitoring thousands of endpoints may process hundreds of thousands, or even millions, of events each hour.

High ingestion rates increase:

CPU utilization
Memory usage
Network bandwidth
Indexing latency
Storage consumption
Search complexity

Instead of collecting everything, prioritize logs with meaningful security value.

Excessive Windows Event Logs

Windows Event Logs are among the largest contributors to event volume.

Administrators frequently collect:

Security
System
Application
PowerShell
Sysmon
DNS
Task Scheduler
Print Service
WMI
Defender

Without filtering, these channels often generate significant noise and unnecessary processing.

Verbose Application Logging

Applications running in debug or verbose modes can generate thousands of events every minute.

Examples include:

Web servers
Database servers
Java applications
Containers
Kubernetes workloads
Development environments

Whenever possible, reduce logging verbosity in production while preserving security-relevant events.

Duplicate Log Collection

Duplicate events waste CPU, storage, bandwidth, and indexing capacity.

Common causes include:

Monitoring identical log files twice
Collecting Windows logs through multiple mechanisms
Duplicate syslog forwarding
Multiple agents monitoring shared resources
SIEM integrations forwarding identical events

Removing duplicate collection improves performance without sacrificing visibility.

Expert Insight: According to the OpenSearch project, reducing unnecessary indexing workload typically provides greater improvements than hardware upgrades because indexing is one of the most resource-intensive operations performed by the search engine.

Excessive event volume often leads to noisy detections.

Learn practical filtering techniques in How to Reduce False Positives in Wazuh.

If excessive event volume is driving CPU utilization on the manager, see Why Is Wazuh Using High CPU? Troubleshooting Guide.

File Integrity Monitoring (FIM)

File Integrity Monitoring (FIM) is one of Wazuh’s most valuable security capabilities because it detects unauthorized changes to files, directories, registry keys, and system configurations.

However, it is also one of the most resource-intensive modules in the platform.

Improperly configured FIM can significantly increase CPU utilization on endpoints, generate millions of events, and overwhelm the Wazuh manager.

Optimizing FIM is usually one of the quickest ways to improve overall Wazuh performance without sacrificing meaningful security visibility.

Large Directories

Monitoring large directory trees dramatically increases the amount of work performed during every scan.

Examples include:

User home directories
Development repositories
Virtual machine images
Docker volumes
Kubernetes persistent volumes
Backup directories
Package caches
Temporary folders
Log archives

Many of these locations contain hundreds of thousands of files that rarely provide useful security telemetry.

Instead of monitoring entire drives, focus on directories containing:

System binaries
Configuration files
Critical application data
Authentication files
Startup scripts
Security-sensitive executables

Reducing the number of monitored files directly lowers CPU usage, memory consumption, and event generation.

Frequent File Changes

Some directories experience constant file modifications.

Examples include:

Web server access logs
Application log directories
Browser caches
Temporary files
Database transaction logs
Container overlay filesystems
Build artifacts
CI/CD workspaces

Monitoring rapidly changing files generates a continuous stream of FIM events that consume processing resources across the entire Wazuh pipeline.

Exclude high-churn directories whenever possible and monitor only files that provide meaningful security value.

Real-Time Monitoring Overhead

Real-time monitoring enables Wazuh to detect file changes immediately instead of waiting for scheduled scans.

While this improves detection speed, it also increases endpoint resource usage because the operating system continuously watches monitored files for changes.

In environments with frequent write operations, real-time monitoring can generate substantial CPU activity.

A balanced approach often works best:

Use real-time monitoring for critical system directories.
Schedule periodic scans for lower-risk locations.
Exclude temporary or frequently changing paths.

This approach preserves rapid detection for sensitive assets while reducing unnecessary workload.

Hash Calculation Costs

Whenever a monitored file changes, Wazuh calculates cryptographic hashes to verify file integrity.

Depending on configuration, this may include:

MD5
SHA-1
SHA-256

Although modern processors calculate hashes efficiently, hashing thousands of large files consumes noticeable CPU time and disk I/O.

Hash calculations become especially expensive when monitoring:

Large databases
Virtual machine disks
Backup files
ISO images
Media repositories

Limiting hash generation to security-critical files significantly reduces resource consumption while maintaining effective integrity monitoring.

Expert Insight: The official Wazuh documentation recommends carefully defining monitored paths and excluding frequently changing directories to reduce unnecessary File Integrity Monitoring workload. Targeted monitoring provides better scalability than attempting to monitor entire filesystems.

For a complete walkthrough of reducing File Integrity Monitoring resource usage, see How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU.

Detection Rules

Detection rules determine whether incoming events represent suspicious or malicious activity.

Every event received by the Wazuh manager is evaluated against thousands of rules, making rule processing one of the largest contributors to CPU utilization.

Well-designed rules improve both detection quality and system performance.

Poorly written rules can dramatically slow event processing and increase alert latency.

Expensive Custom Rules

Custom rules are extremely powerful but often introduce unnecessary overhead.

Common performance issues include:

Matching against every incoming event
Multiple nested conditions
Broad wildcard matching
Large lookup lists
Unnecessary regular expressions
Duplicate rule logic

Each additional condition requires more CPU cycles during evaluation.

Whenever possible, create narrowly scoped rules that evaluate only relevant event types.

Large Rulesets

Many organizations continually add community rules, compliance packs, vendor content, and internally developed detections.

While comprehensive coverage improves visibility, oversized rulesets increase processing time because every event must be compared against more detection logic.

Regularly review your ruleset to:

Remove obsolete rules
Disable unused integrations
Consolidate duplicate detections
Archive deprecated content
Prioritize high-value detections

Smaller, well-maintained rulesets generally perform better than excessively large collections.

Regex Complexity

Regular expressions are among the most CPU-intensive operations performed during rule evaluation.

Poorly optimized regex patterns can:

Require excessive backtracking
Evaluate unnecessary text
Consume significant CPU
Delay event processing

Examples of inefficient patterns include:

Nested wildcards
Broad “match everything” expressions
Repeated capture groups
Unanchored expressions

Whenever possible:

Match specific fields instead of entire log messages.
Use exact string matching when practical.
Anchor regex patterns to expected positions.
Keep expressions as simple as possible.

Even small regex optimizations can noticeably improve throughput in high-volume environments.

Rule Chaining

Rule chaining allows one rule to trigger another, enabling sophisticated correlation and threat detection.

However, deep dependency chains increase processing time because multiple rules must execute before an alert is generated.

Complex correlation logic should be reserved for high-value detections rather than routine event processing.

A practical optimization strategy is to:

Perform simple filtering first.
Eliminate obvious benign events.
Reserve advanced correlation for suspicious activity.

This minimizes unnecessary computation while preserving detection accuracy.

Expert Insight: Security engineers generally recommend filtering low-value events as early as possible in the processing pipeline. Reducing unnecessary rule evaluations improves throughput and allows computationally expensive correlation logic to focus on higher-risk events.

Inefficient detection logic often contributes to alert fatigue.

See How to Reduce False Positives in Wazuh for techniques that improve both performance and detection quality.

OpenSearch Performance

The Wazuh Indexer, powered by OpenSearch, stores alerts and powers dashboard searches, visualizations, and investigations.

Even if the Wazuh manager processes events efficiently, poor OpenSearch performance can create indexing delays, slow searches, and unresponsive dashboards.

Properly tuning the indexer is essential for large-scale deployments.

Heap Size

OpenSearch relies on the Java Virtual Machine (JVM), making heap allocation one of its most important performance settings.

Heap memory stores:

Search caches
Field data
Query results
Cluster metadata
Index structures

Insufficient heap memory may cause:

Frequent garbage collection
Slow searches
Indexing delays
Node instability
Out-of-memory errors

Conversely, allocating excessive heap reduces the operating system’s available file cache, which can also hurt performance.

OpenSearch generally recommends allocating approximately 50% of available RAM to the JVM heap while leaving sufficient memory for the operating system.

JVM Garbage Collection

Garbage collection periodically frees unused Java memory.

Under heavy workloads, frequent garbage collection pauses can temporarily interrupt indexing and query execution.

Common symptoms include:

Dashboard freezes
Indexing latency
High CPU utilization
Slow searches
Cluster instability

Monitoring garbage collection activity helps identify whether memory tuning is required before adding additional hardware.

Shard Configuration

Every index is divided into one or more shards.

Improper shard sizing is a common cause of poor OpenSearch performance.

Too many small shards increase:

Cluster overhead
Memory usage
Search coordination
Metadata processing

Oversized shards increase:

Recovery time
Rebalancing duration
Query latency

A balanced shard strategy improves indexing efficiency while maintaining fast search performance.

Disk I/O

Disk performance directly affects nearly every OpenSearch operation.

Slow storage increases:

Alert indexing latency
Search response times
Segment merging
Snapshot duration
Recovery performance

Enterprise deployments typically benefit from SSD or NVMe storage because indexing workloads involve continuous random reads and writes.

Usually, storage latency becomes the primary bottleneck long before CPU resources are exhausted.

Storage Capacity

Storage planning extends beyond simply having enough free disk space.

As indexes grow larger:

Searches become slower.
Snapshot sizes increase.
Recovery takes longer.
Merge operations consume more resources.
Cluster maintenance becomes more difficult.

Implementing index lifecycle management (ILM), retention policies, and regular index cleanup helps maintain consistent performance over time.

Expert Insight: The OpenSearch project emphasizes that efficient memory allocation, appropriate shard sizing, and fast storage often deliver greater performance improvements than simply adding CPU cores. Proper cluster design is critical for maintaining indexing and query performance at scale.

If memory pressure is causing indexing delays or crashes, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.

Hardware Resources

Although software optimization should always come before hardware upgrades, adequate infrastructure is essential for maintaining a responsive and reliable Wazuh deployment.

Every component, from agents to the manager and OpenSearch Indexer, depends on sufficient compute, memory, storage, and network resources to process security events efficiently.

Simply adding more hardware is rarely enough to solve performance problems caused by excessive logging, inefficient detection rules, or poor configuration.

However, properly sized infrastructure provides the foundation needed for stable, scalable security monitoring.

CPU

CPU is one of the most heavily utilized resources in a Wazuh deployment.

The processor is responsible for:

Event decoding
Rule evaluation
Log parsing
File integrity monitoring
Data compression
Search execution
Dashboard queries
OpenSearch indexing

High CPU utilization often indicates one or more of the following:

Excessive event ingestion
Inefficient custom rules
Large FIM workloads
Complex regular expressions
Heavy search activity
Frequent OpenSearch garbage collection

Monitor sustained CPU usage rather than occasional spikes.

Temporary increases during scheduled scans or indexing operations are normal, while consistently high utilization usually indicates a bottleneck that requires investigation.

Whenever possible:

Separate the Wazuh Manager and OpenSearch Indexer onto dedicated servers.
Scale horizontally for enterprise deployments.
Reduce unnecessary workload before increasing CPU resources.

Memory

Memory plays a critical role in maintaining smooth performance across every component.

Insufficient RAM can lead to:

Swapping
Slow searches
Queue backlogs
Delayed alerts
Dashboard latency
OpenSearch instability

Memory is particularly important for:

OpenSearch heap allocation
Operating system page cache
Search caches
Manager processing queues
Agent buffers

Regular monitoring helps identify gradual memory growth that may indicate oversized indexes, insufficient heap allocation, or increasing workload.

Disk Performance

Security monitoring platforms perform continuous disk operations.

Examples include:

Writing alerts
Reading log files
Updating indexes
Performing snapshots
Merging index segments
Searching historical data

Traditional hard drives often become performance bottlenecks under sustained indexing workloads.

Solid-state drives (SSD) and NVMe storage typically provide:

Faster indexing
Lower search latency
Quicker recovery
Improved dashboard responsiveness
Better cluster stability

Storage performance frequently has a greater impact on OpenSearch responsiveness than additional CPU cores.

Network Bandwidth

Every agent continuously communicates with the Wazuh manager.

Bandwidth requirements increase as organizations collect:

Security logs
File integrity events
Vulnerability information
Cloud telemetry
Container logs
Windows Event Logs

Network congestion may result in:

Delayed event delivery
Agent disconnections
Increased processing queues
Synchronization delays
Dropped messages

While most deployments do not saturate modern enterprise networks, geographically distributed environments should monitor network latency and bandwidth utilization to ensure reliable agent communication.

Expert Insight: Wazuh recommends sizing infrastructure based on expected event volume and deployment scale rather than endpoint count alone. A relatively small number of servers generating high log volumes can consume more resources than thousands of lightly monitored endpoints.

High CPU utilization is often caused by workload distribution rather than insufficient hardware.

See Why Is Wazuh Using High CPU? Troubleshooting Guide for practical troubleshooting techniques.

Agent Configuration

The Wazuh agent serves as the first stage of the data collection pipeline.

Efficient agent configuration reduces unnecessary workload before events ever reach the manager, making it one of the most effective ways to optimize overall platform performance.

Instead of processing every available data source, configure agents to collect only information that supports your organization’s security objectives.

Monitoring Frequency

Monitoring frequency determines how often an agent performs scheduled tasks such as inventory collection, policy evaluation, and integrity scans.

Very short intervals increase:

CPU utilization
Disk activity
Network traffic
Event generation

Longer intervals reduce resource consumption while remaining appropriate for information that changes infrequently.

Different monitoring tasks should use intervals that reflect the expected rate of change.

For example:

Hardware inventory may only require daily collection.
Software inventory may be collected every few hours.
Security logs should be monitored continuously.
File Integrity Monitoring depends on the sensitivity of monitored files.

Module Selection

Every enabled module consumes system resources.

Common Wazuh modules include:

Logcollector
Syscheck
Syscollector
Rootcheck
Vulnerability Detection
Security Configuration Assessment
Active Response

Not every endpoint requires every module.

For example:

Database servers may prioritize log monitoring.
Domain controllers may emphasize authentication events.
Development systems may require different monitoring than production servers.
Container hosts may benefit from specialized configurations.

Disabling unnecessary modules reduces endpoint overhead and lowers the total event volume processed by the manager.

Scan Intervals

Scheduled scans should balance detection speed with resource consumption.

Aggressive scanning schedules may:

Increase endpoint CPU usage.
Generate duplicate data.
Produce unnecessary network traffic.
Create processing spikes on the manager.

Review scan schedules for:

Syscheck
Rootcheck
Syscollector
Vulnerability Detection
Security Configuration Assessment

Adjust intervals based on operational requirements rather than using identical settings across every endpoint.

Event Buffering

Temporary spikes in event generation can overwhelm network links or the Wazuh manager.

Event buffering helps agents temporarily store events until they can be transmitted successfully.

Proper buffering improves reliability by:

Reducing dropped events
Handling temporary network interruptions
Smoothing traffic bursts
Preventing unnecessary retransmissions

However, excessively large buffers may increase endpoint memory usage and delay alert delivery if events accumulate faster than they can be processed.

Finding the appropriate balance depends on expected event volume and network reliability.

Expert Insight: Many experienced Wazuh administrators recommend optimizing agents before tuning the manager because every unnecessary event eliminated at the endpoint reduces processing, indexing, storage, and search workload throughout the entire platform.

If agents are generating excessive log traffic that overwhelms the manager, see Fix Wazuh Logcollector Dropped Messages for techniques to improve ingestion reliability.

Measuring Wazuh Performance

Performance optimization should always be driven by measurable data rather than assumptions.

Establishing performance baselines allows administrators to identify bottlenecks, validate configuration changes, and monitor long-term trends as the environment grows.

Regular monitoring also helps detect gradual degradation before it affects security operations.

Performance Metrics to Monitor

Several key metrics provide a comprehensive view of overall Wazuh health.

Rather than focusing on a single resource, monitor the entire processing pipeline, from endpoint collection to dashboard visualization, to identify where delays originate.

CPU Utilization

CPU usage indicates how efficiently the platform processes incoming events.

Monitor CPU consumption for:

Wazuh agents
Wazuh Manager
OpenSearch Indexer
Dashboard server

Sustained high CPU utilization often indicates:

Excessive log volume
Expensive detection rules
Heavy File Integrity Monitoring
Large search workloads
Insufficient hardware resources

Trend CPU usage over time to identify workload growth before it becomes a critical issue.

Memory Consumption

Memory usage provides insight into system stability.

Monitor:

Total RAM utilization
JVM heap usage
Swap activity
Operating system page cache
Process memory growth

Unexpected increases may indicate:

Memory leaks
Oversized indexes
Growing search caches
Poor heap allocation

Consistent monitoring helps prevent unexpected service interruptions.

Disk Usage

Storage monitoring should include both capacity and performance.

Track:

Available disk space
Disk throughput
IOPS
Read latency
Write latency
Snapshot storage

Running out of storage can halt indexing, while slow storage significantly increases search and dashboard response times.

Indexing Latency

Indexing latency measures how quickly alerts become searchable after being generated.

Increasing latency often indicates:

Slow disks
Insufficient heap memory
Indexing backlogs
Large merge operations
Heavy ingestion workloads

Keeping indexing delays low ensures analysts can investigate threats in near real time.

Search Latency

Search latency measures how long OpenSearch requires to execute queries.

Slow searches may result from:

Large indexes
Poor shard sizing
Expensive aggregations
Insufficient memory
Heavy concurrent searches

Tracking search performance helps maintain a responsive dashboard experience.

Queue Sizes

Internal queues temporarily hold events awaiting processing.

Monitor queue growth throughout the pipeline.

Rapidly increasing queues often indicate downstream bottlenecks such as:

Overloaded managers
Slow indexing
Network congestion
Rule evaluation delays

Persistent queue growth should be investigated before events begin dropping.

Agent Connection Status

Healthy agents continuously communicate with the Wazuh manager.

Monitor:

Connected agents
Disconnected agents
Authentication failures
Communication latency
Synchronization delays

Unexpected agent disconnects may indicate network issues, overloaded managers, certificate problems, or endpoint resource exhaustion.

Events per Second (EPS)

Events per Second (EPS) is one of the most important capacity planning metrics.

Tracking EPS helps administrators:

Estimate infrastructure requirements
Detect workload spikes
Measure optimization improvements
Forecast future hardware needs

Monitor both:

Average EPS
Peak EPS

Peak ingestion rates often determine infrastructure sizing because temporary spikes can overload systems even when average workloads remain relatively low.

Expert Insight: Capacity planning guides from OpenSearch emphasize monitoring workload trends over time rather than relying on instantaneous resource usage. Long-term metrics reveal growth patterns and help organizations scale infrastructure before performance degradation impacts production environments.

If monitoring reveals excessive manager CPU utilization during peak ingestion periods, see Why Is Wazuh Using High CPU? Troubleshooting Guide.

Useful Linux Monitoring Tools

Effective Wazuh performance tuning requires visibility at the operating system level.

Linux provides a set of low-level diagnostic tools that help identify CPU saturation, memory pressure, disk bottlenecks, and I/O contention.

These tools are essential for distinguishing between application-level inefficiencies and infrastructure constraints.

top

top provides a real-time view of system resource utilization.

It helps identify:

Processes consuming high CPU
Memory-heavy services
Load averages
System-wide resource pressure

In Wazuh environments, top is commonly used to detect spikes in:

Wazuh Manager CPU usage during rule evaluation
OpenSearch JVM memory consumption
Log processing surges during ingestion bursts

htop

htop is an enhanced, interactive version of top.

It provides:

Color-coded CPU and memory usage
Per-core CPU utilization
Easier process navigation
Tree view of process relationships

It is particularly useful for quickly identifying whether bottlenecks originate from:

OpenSearch (Java processes)
Wazuh manager processes
System-level I/O contention

vmstat

vmstat provides insight into system performance at the kernel level.

It reports:

CPU scheduling
Memory usage
Swap activity
Block I/O
System interrupts

Key indicators of performance issues include:

High swap usage (memory pressure)
High CPU wait time (I/O bottlenecks)
Frequent context switching (overloaded CPU)

iostat

iostat focuses on disk performance and is critical for diagnosing OpenSearch bottlenecks.

It helps monitor:

Disk read/write throughput
I/O wait times
Device utilization

High I/O wait is a strong indicator that:

Indexing is saturating storage
Disk latency is limiting search performance
Snapshot or merge operations are overwhelming the system

sar

sar (System Activity Reporter) is useful for historical performance analysis.

It tracks:

CPU utilization over time
Memory consumption trends
Network activity
Disk I/O history

Unlike real-time tools, sar is valuable for identifying recurring performance patterns such as:

Daily ingestion spikes
Scheduled scan overhead
Nightly indexing pressure

free

free provides a snapshot of system memory usage.

It shows:

Total RAM
Used memory
Available memory
Buffers and cache

In Wazuh deployments, low available memory often correlates with:

OpenSearch heap pressure
Large query workloads
Excessive indexing activity

df

df monitors disk space usage.

It is essential for ensuring:

Index storage does not reach capacity limits
Log partitions do not fill up
Snapshot repositories remain functional

Running out of disk space can halt indexing entirely, making this one of the most critical monitoring tools.

dstat

dstat provides a combined view of CPU, memory, disk, and network usage.

It is especially useful for:

Correlating network spikes with event ingestion
Identifying I/O bursts during indexing
Observing system-wide resource contention in real time

Wazuh Logs That Help Diagnose Performance Problems

Wazuh generates multiple log streams across its architecture.

These logs are essential for diagnosing performance bottlenecks, failed processing stages, and system-level inefficiencies.

Each component provides different visibility into system behavior.

Manager Logs

The Wazuh manager logs are the primary source of operational diagnostics.

They help identify:

Rule evaluation delays
Event decoding errors
Queue overflows
Active response execution issues
Agent communication problems

Common performance-related symptoms include:

Increased event latency warnings
Buffer overflow messages
Rule processing bottlenecks
Dropped event indicators

When diagnosing high CPU usage or alert delays, manager logs are usually the first place to investigate.

If manager CPU is consistently high during event processing, see Why Is Wazuh Using High CPU? Troubleshooting Guide.

Agent Logs

Agent logs provide insight into endpoint-side performance issues.

They help identify:

Logcollector failures
File Integrity Monitoring overload
Syscollector delays
Connectivity issues with the manager
Buffer saturation on endpoints

Typical performance signals include:

Missed log entries
High local CPU usage on endpoints
Buffer overflow warnings
Delayed event transmission

Agent-side issues often cascade into manager-side performance problems when events are retransmitted or batched inefficiently.

OpenSearch Logs

OpenSearch logs are critical for diagnosing indexing and search performance issues.

They reveal:

Heap memory pressure
Garbage collection activity
Slow queries
Shard rebalancing
Indexing failures
Disk watermark warnings

Common performance indicators include:

Long GC pause times
Thread pool rejections
Index write delays
Shard allocation failures

These logs are essential when dashboards become slow or alerts are delayed in appearing.

For memory-related crashes or instability, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.

Dashboard Logs

The Wazuh Dashboard logs help diagnose frontend and query-layer performance issues.

They include:

API request latency
Failed query executions
Visualization rendering errors
Authentication delays
Backend connection issues

While the dashboard is rarely the root cause of performance issues, it often exposes upstream problems such as slow indexing or inefficient queries.

Optimizing Wazuh Agents

Wazuh agents are the first line of data collection and have a significant impact on overall system performance.

Poorly configured agents generate excessive data, increasing load across the entire pipeline, from network transmission to manager processing and OpenSearch indexing.

Effective optimization focuses on reducing unnecessary telemetry while preserving security visibility.

Reduce Unnecessary Log Collection

Not all logs provide meaningful security value.

Collecting everything leads to unnecessary noise, higher CPU usage, and increased storage consumption.

Focus on:

Security-relevant logs
Authentication events
System-critical application logs
Endpoint behavior indicators

Avoid collecting:

Debug logs in production
High-frequency application logs
Redundant telemetry sources

Reducing log collection at the source is one of the most effective performance optimizations available.

Exclude Noisy Log Sources

Certain log sources generate excessive, low-value events.

Common examples include:

Browser caches
Temporary application files
Container runtime logs
Build directories
High-frequency debug outputs

Excluding these sources prevents unnecessary ingestion and reduces downstream processing load.

Filter Unnecessary Events

Filtering allows agents to discard irrelevant events before transmission.

This reduces:

Network bandwidth usage
Manager CPU load
Indexing overhead
Storage consumption

Event filtering is particularly useful in high-volume environments where only a subset of logs is relevant for security monitoring.

Limit Verbose Applications

Applications running in verbose or debug mode can overwhelm Wazuh systems with excessive logs.

Examples include:

Web servers in debug mode
Database systems with query logging enabled
Development environments
Container orchestration platforms with high verbosity settings

Whenever possible, adjust logging levels to production-appropriate settings while preserving security-relevant events.

Optimize File Integrity Monitoring

File Integrity Monitoring (FIM) is one of the most resource-intensive Wazuh features.

Proper optimization is essential for maintaining system stability and preventing unnecessary CPU and disk usage.

See How to Stop Wazuh File Integrity Monitoring (FIM) From Eating Your CPU for a deeper breakdown of optimization strategies.

Reduce Monitored Directories

Monitoring fewer directories significantly reduces CPU usage and event generation.

Prioritize:

System binaries
Security-critical configuration files
Authentication directories
Application configuration paths

Avoid broad directory monitoring such as entire file systems or user home directories unless explicitly required.

Exclude Temporary Folders

Temporary and cache directories generate constant file changes that produce high event volumes.

Common exclusions include:

/tmp
Application cache directories
Browser cache locations
Build output directories
Container ephemeral storage

Excluding these paths prevents unnecessary FIM load.

Increase Scan Intervals

Frequent scans can overwhelm endpoints, especially in large file systems.

Increasing scan intervals:

Reduces CPU usage
Decreases disk I/O
Lowers event volume

This is particularly effective for non-critical directories.

Disable Unnecessary Hashing

Hash calculation is one of the most expensive operations in FIM.

Reducing hashing frequency or limiting it to critical files helps:

Lower CPU consumption
Reduce disk I/O
Improve scan performance

Only enable hashing where integrity verification is truly required.

Monitor Only Critical Files

The most effective FIM optimization strategy is narrowing scope.

Focus on:

Authentication files
System binaries
Configuration files
Privilege escalation paths

Avoid monitoring files that change frequently without security implications.

Optimize Scheduled Scans

Scheduled scans contribute significantly to endpoint and manager workload, especially in large environments.

Proper tuning ensures consistent performance without compromising detection coverage.

Syscheck

Syscheck scans detect file changes and configuration modifications.

Poor configuration can result in excessive CPU usage and large event volumes.

Optimization strategies include:

Reducing scan scope
Increasing scan intervals
Excluding high-churn directories

Rootcheck

Rootcheck identifies rootkits and system compromises.

To optimize performance:

Avoid overly frequent scans
Focus on critical endpoints
Schedule scans during off-peak hours

Vulnerability Scans

Vulnerability detection consumes CPU and network resources.

Optimization approaches include:

Staggering scan schedules
Reducing scan frequency on stable systems
Prioritizing high-risk assets

Inventory Collection

Inventory modules (Syscollector) gather system information.

To reduce overhead:

Increase collection intervals
Limit unnecessary data types
Avoid redundant collection across environments

Tune Agent Resource Usage

Beyond individual modules, overall agent behavior must be tuned to ensure efficient resource utilization.

Reduce Polling Frequency

Frequent polling increases CPU usage and network traffic.

Adjust polling intervals based on:

Asset criticality
Change frequency
Security requirements

Optimize Buffering

Agent buffers temporarily store events during network interruptions or bursts.

Proper configuration helps:

Prevent data loss
Smooth traffic spikes
Reduce retransmissions

However, oversized buffers can increase memory usage and delay event delivery.

Disable Unused Modules

Every enabled module consumes resources.

Disabling unused modules reduces:

CPU usage
Memory consumption
Network traffic
Manager processing load

Only enable modules that directly support your monitoring objectives.

Tune Agent Resource Usage

Agent-level tuning is one of the highest-leverage optimization strategies in Wazuh because every event eliminated at the endpoint reduces load across the entire pipeline, manager processing, indexing, storage, and search.

Reduce Polling Frequency

Frequent polling increases CPU usage, disk activity, and network traffic on endpoints.

Adjust polling intervals based on how often data actually changes:

Increase Syscollector intervals for stable systems
Reduce inventory refresh frequency on large fleets
Avoid overly aggressive scan schedules for low-risk endpoints

Over-polling often produces redundant data without improving detection capability.

Optimize Buffering

Agent buffering temporarily stores events when network or manager throughput is limited.

Proper tuning helps:

Smooth traffic spikes
Prevent event loss during transient outages
Reduce retransmission overhead

However, excessive buffering can:

Increase endpoint memory usage
Delay event delivery
Mask upstream bottlenecks

Buffer size should reflect expected peak ingestion, not theoretical maximums.

Disable Unused Modules

Every enabled module consumes CPU, memory, and I/O resources.

Commonly unnecessary modules depending on environment include:

Vulnerability Detection on non-production systems
Rootcheck on containerized workloads
Syscollector on short-lived instances
Active Response where manual remediation is preferred

Disabling unused modules reduces endpoint overhead and significantly lowers total event volume entering the system.

Optimizing the Wazuh Manager

The Wazuh Manager is responsible for decoding events, evaluating rules, performing correlation, and generating alerts.

It is often the primary CPU bottleneck in large deployments.

Optimize Rule Processing

Rule evaluation is one of the most expensive operations in the Wazuh pipeline.

Each incoming event is compared against thousands of rules, making efficiency critical.

Remove Unused Rules

Unused or irrelevant rules still consume CPU during evaluation.

Optimization steps include:

Disabling unused compliance packs
Removing legacy detections
Eliminating duplicate rule sets
Pruning environment-specific irrelevant rules

A smaller, well-maintained ruleset significantly improves throughput.

Simplify Regex Patterns

Regular expressions are computationally expensive and should be used sparingly.

Optimization strategies:

Prefer exact string matching over regex when possible
Anchor patterns to reduce backtracking
Avoid nested wildcards and overly broad expressions
Limit regex to high-value detections only

Even minor regex improvements can reduce CPU usage at scale.

Optimize Rule Order

Wazuh evaluates rules sequentially, meaning inefficient ordering increases processing time.

Best practices:

Place high-frequency rules early
Filter benign events before complex evaluation
Prioritize simple conditions before expensive logic

Efficient rule ordering reduces unnecessary computation.

Reduce Expensive Correlations

Correlation rules combine multiple events into higher-level detections but are computationally intensive.

To optimize:

Limit correlation depth
Avoid overly broad matching windows
Use correlation only for high-confidence detections
Pre-filter events before correlation logic executes

Reduce False Positives

False positives increase system load by generating unnecessary alerts, increasing indexing volume, and overwhelming analysts.

See How to Reduce False Positives in Wazuh for detailed tuning strategies.

Rule Tuning

Fine-tuning detection rules improves both accuracy and performance.

Approaches include:

Adjusting rule severity levels
Narrowing event conditions
Disabling overly sensitive detections
Aligning rules with real environment behavior

Well-tuned rules reduce unnecessary processing downstream.

Threshold Adjustments

Threshold-based rules trigger only after a defined number of events occur.

Proper tuning:

Reduces alert noise
Prevents repeated triggering for benign behavior
Improves signal-to-noise ratio

However, thresholds must be balanced to avoid missing genuine threats.

Event Suppression

Event suppression prevents repeated alerts from identical or low-value events.

Benefits include:

Reduced indexing load
Lower storage usage
Improved dashboard clarity

Suppression should be applied carefully to avoid hiding meaningful anomalies.

Custom Rule Refinement

Custom rules should be reviewed regularly to ensure efficiency.

Key improvements:

Remove redundant conditions
Avoid overlapping logic
Consolidate similar rules
Optimize field matching

Poorly designed custom rules are a common source of performance degradation.

Improve Queue Performance

Wazuh uses internal queues to manage event flow between agents, the manager, and the indexer.

Queue inefficiencies often lead to event delays or drops.

Event Queues

Event queues temporarily store incoming logs before processing.

When queues become saturated:

Events are delayed
Memory usage increases
Processing latency grows

Queue saturation typically indicates downstream bottlenecks in rule processing or indexing.

Processing Workers

Processing workers handle event decoding and rule evaluation.

To optimize:

Ensure sufficient worker allocation for workload size
Scale horizontally in high-ingestion environments
Avoid CPU contention between manager processes

Insufficient workers lead to backlogs and delayed alert generation.

Connection Tuning

Agent-to-manager connections must be stable and efficient.

Optimization includes:

Proper TCP configuration
Load balancing for large deployments
Reducing connection churn
Ensuring consistent network latency

Connection instability increases retransmissions and queue pressure.

Optimize Active Response

Active Response automates mitigation actions but can become a performance burden if misconfigured.

Avoid Unnecessary Executions

Each response action consumes CPU and system resources.

Avoid triggering responses for:

Low-confidence alerts
High-frequency benign events
Non-actionable detections

Overuse of automation can significantly increase system load.

Configure Cooldown Periods

Cooldown periods prevent repeated execution of the same response within a short timeframe.

Benefits include:

Reduced system thrashing
Lower CPU usage
Prevention of redundant actions

Cooldowns are essential in noisy environments.

Limit Automation Scope

Active Response should be reserved for high-confidence threats.

Best practices:

Apply to critical severity rules only
Restrict execution to specific endpoints
Avoid broad system-wide automation

This ensures responsiveness without overwhelming system resources.

Optimizing OpenSearch Performance

OpenSearch is responsible for indexing, storing, and searching Wazuh alerts.

Poor configuration here can severely impact dashboard performance and alert visibility.

Tune JVM Heap Size

JVM heap size directly affects indexing stability and search performance.

See How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes for detailed configuration guidance.

Recommended Heap Allocation

General best practices include:

Allocate ~50% of system RAM to heap (up to a safe limit)
Avoid exceeding ~32 GB heap due to JVM pointer optimization limits
Ensure remaining RAM is available for OS file cache

Balanced heap allocation improves both indexing and search performance.

Garbage Collection Tuning

Garbage collection (GC) affects latency and responsiveness.

Symptoms of poor GC tuning:

Query delays
Indexing pauses
CPU spikes
Irregular performance patterns

Optimizing GC reduces pause times and improves system stability.

Avoid Oversized Heaps

Excessively large heaps can:

Increase GC pause duration
Reduce OS cache efficiency
Degrade search performance

Proper sizing is more effective than simply maximizing memory allocation.

Optimize Index Management

Efficient index management ensures OpenSearch remains performant as data grows.

Index Lifecycle Management (ILM)

ILM automates index transitions through stages:

Hot (active indexing)
Warm (reduced activity)
Cold (archival)
Delete (removal)

This prevents uncontrolled index growth.

Index Rotation

Regular index rotation:

Limits shard size
Improves search efficiency
Reduces indexing overhead

Proper rotation policies are essential for long-term scalability.

Retention Policies

Retention policies define how long data is stored.

Benefits:

Controlled storage growth
Faster queries
Reduced maintenance overhead

Retention should align with compliance requirements.

Delete Old Indices

Old indices should be removed or archived to prevent:

Storage exhaustion
Slow searches
Increased cluster overhead

Automated cleanup improves long-term performance stability.

Improve Search Performance

Search performance directly affects dashboard responsiveness and analyst efficiency.

Optimize Mappings

Efficient mappings reduce indexing and search overhead.

Best practices:

Use appropriate field types
Avoid unnecessary full-text indexing
Disable unused fields

Poor mappings increase storage and query complexity.

Reduce Shard Count

Too many shards increase cluster overhead.

Effects include:

Higher memory usage
Slower queries
Increased coordination overhead

Proper shard sizing improves performance significantly.

Merge Segments

Segment merging improves search efficiency by reducing index fragmentation.

Benefits:

Faster queries
Lower disk usage
Improved indexing stability

However, merging should be balanced to avoid excessive I/O load.

Query Optimization

Inefficient queries degrade performance.

Optimization strategies:

Avoid wildcard-heavy searches
Use time filters whenever possible
Limit aggregation complexity
Narrow query scope

Well-structured queries dramatically improve dashboard responsiveness.

Storage Optimization

Storage is a foundational component of Wazuh performance.

SSD vs HDD

SSDs provide significantly better performance than HDDs:

Lower latency
Higher IOPS
Faster indexing
Improved search performance

HDDs often become bottlenecks in high-ingestion environments.

RAID Considerations

Firstly, RAID configuration impacts redundancy and performance:

RAID 1 improves redundancy
RAID 10 balances performance and redundancy
RAID 5 may introduce write penalties

RAID selection should reflect workload intensity and resilience requirements.

Disk Monitoring

Continuous disk monitoring helps prevent:

Storage exhaustion
Performance degradation
Indexing failures

Key metrics include:

Disk usage
I/O latency
Throughput
Queue depth

Reducing High CPU Usage

High CPU usage in Wazuh environments typically results from cumulative inefficiencies across multiple components rather than a single issue.

Common Causes

File Integrity Monitoring

Large directory scans
Frequent file changes
Real-time monitoring overhead
Excessive hashing

Rule Evaluation

Expensive regex patterns
Large rulesets
Poor rule ordering
Excessive correlation logic

Large Log Volumes

Excessive Windows Event Logs
Verbose application logging
Duplicate log sources
High ingestion rates

OpenSearch Indexing

Large shards
Insufficient heap memory
Slow disk performance
High garbage collection activity

Agent Scanning

Frequent Syscollector scans
Overactive vulnerability detection
High-frequency polling intervals

Addressing these areas holistically produces the most significant performance improvements.

Troubleshooting High CPU

High CPU usage in Wazuh environments is rarely caused by a single component.

Instead, it typically results from a combination of excessive event ingestion, inefficient rule processing, heavy File Integrity Monitoring workloads, and indexing pressure in OpenSearch.

For a deeper breakdown of root causes and diagnostics, see Why Is Wazuh Using High CPU? Troubleshooting Guide.

Identify the Affected Process

The first step is to determine which component is consuming CPU resources.

Key processes to inspect:

wazuh-manager (rule evaluation, decoding, correlation)
filebeat / log forwarders (log shipping)
java (OpenSearch JVM)
Agent processes (endpoint-side load)

Use system monitoring tools to isolate whether CPU usage is concentrated on:

A single node (localized issue)
A cluster-wide pattern (systemic issue)
Specific time windows (scheduled scans or ingestion spikes)

Analyze Workload

Once the affected process is identified, evaluate the workload it is handling.

Common workload indicators include:

Events per second (EPS) spikes
Large bursts of Windows Event Logs
FIM scan activity
Scheduled vulnerability scans
Heavy dashboard query traffic

Understanding workload patterns helps distinguish between expected peak behavior and misconfiguration-driven overload.

Review Configuration

Configuration issues are a leading cause of sustained CPU saturation.

Focus on:

Log sources and verbosity levels
Rule complexity and redundancy
FIM scope and scan frequency
OpenSearch heap allocation
Index shard configuration

Many CPU issues are resolved by removing unnecessary processing rather than increasing hardware capacity.

Apply Targeted Optimizations

After identifying the bottleneck, apply specific fixes:

Reduce log ingestion volume
Simplify or disable expensive rules
Optimize FIM configurations
Tune OpenSearch heap and shard settings
Adjust agent polling intervals

Targeted changes are more effective than broad system upgrades.

Fixing Memory Problems

Memory issues in Wazuh deployments primarily originate from OpenSearch heap pressure, large datasets, and inefficient query patterns.

If left unresolved, they can lead to service instability, slow searches, and system crashes.

Common Memory Issues

OpenSearch Heap Exhaustion

When JVM heap memory is insufficient, OpenSearch may:

Trigger frequent garbage collection
Reject indexing requests
Crash under load
Degrade search performance

This is one of the most common causes of Wazuh instability in large deployments.

Large Rule Sets

Excessive rule complexity indirectly contributes to memory pressure by:

Increasing event processing time
Expanding in-memory queues
Raising correlation overhead

Heavy Searches

Complex queries, especially those with aggregations over large time ranges, increase:

Memory consumption
CPU usage
GC frequency

Memory Leaks

Although less common, misconfigured plugins or inefficient processes can gradually increase memory usage over time, eventually leading to instability.

Best Practices

Heap Sizing

Proper heap sizing is critical for OpenSearch stability.

Key principles:

Allocate approximately 50% of system RAM to JVM heap (within safe limits)
Avoid exceeding JVM pointer optimization thresholds (~32 GB heap)
Preserve sufficient RAM for OS file caching

For detailed configuration guidance, see How to Tune OpenSearch Heap Size to Stop Wazuh High Memory Crashes.

JVM Monitoring

Monitor garbage collection behavior to detect early signs of memory stress:

GC pause frequency
Heap usage trends
Allocation rates
Old generation pressure

Memory Alerts

Set alerts for:

Sustained high heap usage
Swap activity
Increasing GC pause times
Indexing latency spikes

Early detection prevents cascading system failures.

Capacity Planning

Memory requirements should scale with:

Event ingestion rate
Index size
Retention period
Dashboard query complexity

Proper planning prevents reactive scaling and unexpected outages.

Optimizing Log Collection

Log collection is one of the highest-impact areas for performance optimization because it directly determines how much data enters the Wazuh pipeline.

Reduce Logcollector Overhead

Logcollector continuously monitors configured sources. Inefficient configuration can overwhelm both endpoints and the manager.

Exclude Unnecessary Logs

Exclude sources that do not contribute to security visibility, such as:

Debug logs
Temporary application logs
Cache directories
Development artifacts

Optimize Polling

Excessive polling increases CPU usage and network traffic.

Best practices:

Increase polling intervals for low-value logs
Avoid unnecessary real-time monitoring where scheduling is sufficient
Align polling frequency with log generation rates

Filter Duplicate Events

Duplicate log sources significantly increase processing overhead.

Common duplication sources:

Multiple agents monitoring the same file
Redundant syslog forwarding
Overlapping application logging configurations

Reduce Noisy Applications

Verbose applications generate excessive logs that provide little security value.

Examples include:

Debug-enabled web servers
Database query logging
Container runtime verbosity
Development tools in production

Prevent Dropped Messages

Dropped logs indicate that the system is overwhelmed and cannot process events fast enough.

See Fix Wazuh Logcollector Dropped Messages for detailed mitigation strategies.

Increase Buffers

Larger buffers help absorb short-term spikes in log volume, but must be carefully balanced to avoid memory pressure on endpoints.

Reduce Log Bursts

Control sudden ingestion spikes by:

Staggering agent reporting intervals
Reducing simultaneous scan schedules
Avoiding synchronized batch jobs across endpoints

Improve Storage Performance

Slow storage increases backlog formation and contributes to dropped messages.

Upgrading to SSD or NVMe significantly improves ingestion stability.

Verify Manager Throughput

Ensure the Wazuh manager can process incoming events at peak load.

If ingestion exceeds processing capacity, queue buildup and event drops become inevitable.

Scaling Wazuh for Large Environments

As environments grow, single-node or minimally configured deployments become insufficient.

Scaling ensures Wazuh can handle increasing event volume while maintaining performance and reliability.

Horizontal Scaling

Horizontal scaling distributes workload across multiple nodes.

Multiple Managers

Deploying multiple Wazuh managers:

Distributes event processing
Reduces CPU bottlenecks
Improves fault tolerance

Load Balancing

Load balancers distribute agent traffic across available managers, preventing overloading of a single node.

Distributed Architecture

A distributed design separates:

Agents
Managers
Indexers
Dashboards

This improves scalability and isolates performance bottlenecks.

OpenSearch Cluster Scaling

OpenSearch must scale alongside the Wazuh manager to maintain performance.

Dedicated Master Nodes

Master nodes handle cluster coordination and should not be burdened with indexing workloads.

Data Nodes

Data nodes store and index logs. Scaling data nodes improves:

Indexing throughput
Query performance
Storage capacity

Coordinating Nodes

Coordinating nodes handle search and aggregation requests, improving dashboard responsiveness.

Replica Planning

Replicas improve:

Fault tolerance
Read performance
Query distribution

However, they also increase storage requirements and indexing overhead.

Agent Scaling Best Practices

Enrollment Strategy

Efficient onboarding prevents configuration drift and performance issues.

Best practices:

Use centralized enrollment
Apply consistent policies
Automate configuration distribution

Group Policies

Grouping agents allows:

Consistent configuration
Reduced management overhead
Targeted optimization strategies

Configuration Management

Automated configuration management ensures:

Uniform logging policies
Controlled FIM scope
Consistent scan intervals

Wazuh Performance Optimization Checklist

A structured checklist ensures consistent tuning across environments.

Monitor CPU, memory, disk, and network utilization
Reduce unnecessary log collection
Tune File Integrity Monitoring
Remove noisy detection rules
Reduce false positives
Optimize OpenSearch heap size
Configure index lifecycle management
Optimize shard allocation
Rotate and delete old indices
Increase Logcollector efficiency
Monitor indexing latency
Benchmark after configuration changes
Scale infrastructure before bottlenecks occur
Review performance metrics regularly

Common Wazuh Performance Mistakes

Many performance issues stem from predictable configuration mistakes.

Monitoring Everything

Collecting all possible logs creates excessive noise and unnecessary processing overhead.

Ignoring Noisy Logs

Failing to filter verbose applications significantly increases ingestion volume.

Oversized FIM Configurations

Monitoring entire filesystems leads to massive CPU and storage consumption.

Poor OpenSearch Heap Configuration

Incorrect heap sizing causes instability, slow searches, and indexing failures.

Too Many Shards

Excessive shards increase cluster overhead and reduce efficiency.

Keeping Data Forever

Unlimited retention leads to storage exhaustion and degraded performance.

Ignoring Capacity Planning

Lack of planning results in reactive scaling and unexpected outages.

Not Monitoring Performance Metrics

Without metrics, optimization becomes guesswork rather than engineering.

Frequently Asked Questions (FAQ)

Question: What is Wazuh performance optimization?

It is the process of tuning agents, managers, and OpenSearch to improve event processing efficiency, reduce resource usage, and increase detection speed.

Question: Why is Wazuh using so much CPU?

Common causes include high log volume, inefficient rules, File Integrity Monitoring overload, and OpenSearch indexing pressure.

Question: How do I reduce Wazuh memory usage?

Optimize OpenSearch heap size, reduce query complexity, and limit data ingestion.

Question: How can I make Wazuh faster?

Reduce log volume, tune rules, optimize indexing, and improve storage performance.

Question: How do I optimize File Integrity Monitoring?

Limit monitored directories, exclude temporary folders, and reduce scan frequency.

Question: Why is OpenSearch crashing with Wazuh?

Usually due to heap exhaustion, poor shard configuration, or insufficient memory allocation.

Question: How do I reduce false positives in Wazuh?

Tune rules, adjust thresholds, and suppress noisy events.

Question: What causes dropped Logcollector messages?

High ingestion rates, insufficient buffers, or manager throughput limitations.

Question: How many events per second can Wazuh handle?

It depends on hardware, configuration, and tuning; optimized deployments can scale to very high EPS.

Question: Should I use SSDs for Wazuh?

Yes. SSDs significantly improve indexing, search, and overall system responsiveness.

Question: What hardware is recommended for production Wazuh deployments?

Multi-core CPUs, sufficient RAM for OpenSearch heap, and SSD/NVMe storage are recommended.

Question: How often should I review Wazuh performance?

Regularly, ideally continuously via monitoring dashboards, with deeper reviews during scaling or configuration changes.

Conclusion

Wazuh performance optimization is a continuous process rather than a one-time configuration task.

The most impactful improvements come from reducing unnecessary workload at the source, tuning detection logic, and ensuring that OpenSearch is properly sized and maintained.

The key strategies across all environments include minimizing log noise, optimizing File Integrity Monitoring, refining detection rules, properly configuring OpenSearch heap and shards, and continuously monitoring system metrics to detect early signs of degradation.

As deployments scale, proactive capacity planning becomes essential.

Performance issues are far easier to prevent than to resolve after they impact production systems.

For deeper implementation guidance, explore the following detailed optimization guides: