Troubleshooting Wazuh Manager Core Dumps

A Wazuh Manager core dump is one of the clearest indicators that something has gone seriously wrong inside the Wazuh server.

When a critical Wazuh process crashes unexpectedly, the operating system may generate a core dump file containing a snapshot of the process’s memory, execution state, loaded libraries, and stack traces at the exact moment of failure.

While many administrators focus on restoring service availability after a crash, the core dump itself often contains the information needed to identify and permanently resolve the underlying problem.

Core dumps should never be treated as isolated incidents. In most environments, they are symptoms of deeper issues such as software bugs, memory corruption, resource exhaustion, incompatible integrations, corrupted databases, configuration errors, or operating system limitations.

Ignoring repeated core dumps can lead to recurring outages, degraded security visibility, and unreliable event processing.

During an outage, important security events may be delayed, dropped, or never processed at all, potentially allowing malicious activity to go unnoticed.

In this guide, you’ll learn how Wazuh Manager core dumps occur, how to identify the affected components, how to analyze crash data, and how to systematically troubleshoot the most common root causes. You’ll also learn preventive measures that reduce the likelihood of future crashes and improve overall Wazuh stability.

For a complete guide, see The Ultimate Wazuh Troubleshooting Guide: Fix Common Issues.

Understanding Wazuh Manager Core Dumps

What Is a Core Dump?

A core dump is a file generated by the Linux kernel when a running process terminates unexpectedly due to a fatal error such as a segmentation fault, illegal instruction, memory access violation, or abort signal.

Think of a core dump as a forensic snapshot of a crashed process.

It captures the internal state of the application at the exact moment the failure occurred, allowing developers and system administrators to reconstruct what happened.

Linux systems generate core dumps when core dump generation is enabled through system limits and kernel settings.

Depending on the distribution and configuration, these files may be stored directly on disk or managed by systemd-coredump.

Core dump files typically contain:

Process memory contents
Stack traces
Register values
Loaded shared libraries
Thread information
Execution state
Signal information that triggered the crash

This information is extremely valuable during troubleshooting because it allows engineers to determine the exact code path that caused the failure.

According to the Linux kernel documentation, core dumps are specifically designed to assist post-mortem debugging by preserving process state after abnormal termination.

For Wazuh deployments experiencing repeated crashes, core dump analysis often reveals the root cause far faster than reviewing log files alone.

How Wazuh Manager Generates Core Dumps

The Wazuh Manager consists of multiple services and internal modules working together to collect, analyze, store, and correlate security events.

Under normal conditions, these processes shut down gracefully when administrators stop the service. During a graceful shutdown, processes release resources, close connections, and exit cleanly without generating a core dump.

Core dumps occur when a process crashes unexpectedly before normal cleanup procedures can execute.

Several Wazuh components are commonly involved in crash scenarios:

analysisd

The analysis engine responsible for decoding events, matching rules, generating alerts, and processing incoming security data.

remoted

Handles communication between Wazuh agents and the manager, including event reception and agent connectivity.

logcollector

Responsible for collecting and forwarding local logs when installed on manager systems.

modulesd

Runs various Wazuh modules and integrations, including vulnerability detection and external data sources.

authd

Handles agent registration and authentication processes.

Related reading: INTERNAL LINK: /fix-authd-registration-failures-wazuh-agent-password-mismatched-guide/

wazuh-db

Manages internal database operations and agent-related data storage.

Related reading: Fixing wazuh-db Worker Thread Crashes

If any of these components encounter fatal software defects, corrupted data structures, memory allocation failures, or resource exhaustion conditions, Linux may generate a core dump before the process terminates.

Why Core Dumps Should Never Be Ignored

Many administrators make the mistake of restarting Wazuh services after a crash without investigating the cause.

While this may temporarily restore functionality, the underlying issue often remains unresolved.

Core dumps frequently indicate hidden stability problems such as:

Memory leaks
Software defects
Corrupted index data
Resource exhaustion
Integration failures
Operating system limitations
Incompatible package versions

When manager processes crash, event processing may be interrupted for seconds, minutes, or even hours depending on how quickly the issue is detected.

These interruptions can lead to:

Lost log events
Delayed alerts
Missed detections
Incomplete incident timelines
Agent communication failures

Security researchers at the SANS Institute have repeatedly emphasized that monitoring gaps and logging interruptions significantly reduce an organization’s ability to detect and investigate attacks.

Even if a crashed process automatically restarts, the resulting monitoring blind spot may leave critical security events unrecorded.

For this reason, every Wazuh Manager core dump should be treated as a high-priority operational incident requiring investigation and root cause analysis.

Common Symptoms of Wazuh Manager Core Dumps

Unexpected Service Restarts

One of the most common indicators of manager core dumps is frequent service restarts.

Administrators may notice that the Wazuh Manager service repeatedly transitions between running and failed states.

On systems managed by systemd, automatic restart policies often hide the original crash by immediately launching a new process instance.

Common indicators include:

Frequent service restarts in logs
Unexpected manager downtime
Intermittent dashboard functionality
Repeated crash-recovery cycles

You can often identify these patterns using:

systemctl status wazuh-manager
journalctl -u wazuh-manager

In severe cases, the service may enter a restart loop where crashes occur immediately after startup.

Missing or Delayed Security Alerts

Another major symptom is a sudden interruption in alert generation.

When critical components such as analysisd crash, incoming events may stop being analyzed even if agents continue sending logs.

Administrators may observe:

Missing alerts
Delayed detections
Reduced alert volume
Incomplete rule matches
Gaps in security monitoring

This behavior is often mistaken for rule configuration problems when the real issue is a crashed processing component.

If you’re troubleshooting missing detections, you may also find these guides useful:

Agent Connection Problems

Manager crashes frequently affect agent communication.

Depending on which component fails, agents may:

Disconnect unexpectedly
Fail health checks
Miss heartbeat acknowledgments
Experience registration failures
Stop transmitting events

Common symptoms include:

Agents showing disconnected status
Increased reconnect attempts
Communication timeout errors
Delayed event delivery

For environments already experiencing connectivity issues, review:

High Resource Utilization Before Crashes

Many Wazuh Manager crashes are preceded by abnormal resource consumption.

Before the core dump occurs, administrators may observe:

Memory Spikes

Rapid increases in RAM consumption can indicate memory leaks, oversized event queues, or database issues.

CPU Saturation

Excessive event processing workloads can push manager processes to 100% CPU utilization for extended periods.

File Descriptor Exhaustion

Large environments handling thousands of agent connections may exhaust available file descriptors, causing process instability.

The Open Source Security Foundation (OpenSSF) recommends continuous monitoring of resource consumption as part of production security platform reliability practices.

Tracking resource trends often helps administrators identify the root cause before the next crash occurs.

Core Dump Files Appearing on Disk

The most direct symptom is the appearance of core dump files themselves.

Depending on the Linux distribution and core dump configuration, files may appear in locations such as:

/var/lib/systemd/coredump/
/var/crash/
/tmp/

Common file naming patterns include:

core
core.12345
core.wazuh-analysisd.12345

Many administrators first discover the problem when:

Disk usage unexpectedly increases
Backup jobs begin processing large dump files
System monitoring tools report new files
Crash analysis directories fill with data

The presence of one or more core dump files should immediately trigger an investigation, especially if multiple dumps are generated over a short period of time.

Repeated dumps almost always indicate a persistent stability problem rather than an isolated incident.

Common Causes of Wazuh Manager Core Dumps

Understanding why a Wazuh Manager process crashed is the most important part of troubleshooting.

While the core dump itself provides evidence of the failure, identifying the underlying root cause prevents future outages and improves platform stability.

Below are the most common reasons Wazuh Manager components generate core dumps.

Memory Exhaustion and Out-of-Memory Conditions

Memory-related failures are among the leading causes of Wazuh Manager crashes.

As event volume grows, Wazuh must allocate memory for:

Event processing
Rule evaluation
Agent communication
Internal queues
Database operations
Vulnerability detection tasks

If memory consumption exceeds available resources, processes can become unstable and eventually crash.

Memory Leaks

A memory leak occurs when an application allocates memory but fails to release it after use.

Over time, leaked memory accumulates until the process exhausts available RAM or virtual memory resources.

Common indicators include:

Gradually increasing memory usage
Slower performance over time
Crashes after running for several days or weeks
Frequent OOM (Out of Memory) events

Excessive Event Volumes

Large environments can overwhelm manager processes with sudden bursts of events.

Examples include:

Malware outbreaks
Log forwarding loops
Firewall floods
Authentication storms
Network scanning activity

When event rates exceed processing capacity, resource consumption can spike dramatically.

Large Queues

If event ingestion exceeds processing speed, internal queues may grow uncontrollably.

Large queues consume memory and increase pressure on components such as analysisd and remoted.

Related reading:

Resource Starvation

Memory shortages often affect more than one subsystem.

When critical resources become scarce, Wazuh processes may fail to allocate memory required for normal operations, resulting in crashes and core dumps.

Corrupted Log Data

Wazuh processes millions of log events from different sources and formats.

Not all logs are well-formed.

Unexpected input can occasionally trigger parser failures or expose software defects.

Malformed Log Entries

Corrupted or improperly formatted logs may contain:

Missing fields
Broken JSON structures
Invalid delimiters
Unexpected character sequences

Although Wazuh is designed to handle malformed data gracefully, some edge cases can trigger crashes in affected versions.

Unexpected Encoding Formats

Encoding mismatches can create parsing problems.

Examples include:

UTF-8 versus UTF-16 mismatches
Invalid Unicode characters
Binary data embedded in text logs

These issues may cause decoders or integrations to behave unpredictably.

Oversized Events

Extremely large events can consume excessive memory during processing.

Examples include:

Multi-megabyte JSON payloads
Large audit logs
Oversized application logs
Corrupted log files

Oversized events have historically been responsible for processing failures across many SIEM and log management platforms.

Software Bugs and Version Defects

Not every crash is caused by environmental issues.

Sometimes the root cause is a software defect within Wazuh itself.

Known Bugs in Specific Releases

Every major software platform occasionally ships with defects that are later corrected through updates.

Before spending hours troubleshooting, check whether the crash matches a known issue documented in:

Wazuh release notes frequently contain bug fixes related to stability, memory management, and crash prevention.

Module-Specific Crashes

Certain modules may crash independently while the rest of the platform continues functioning.

Examples include:

Vulnerability detection modules
Cloud integrations
External connectors
Agent enrollment services

Module-specific failures often appear repeatedly under identical workloads.

Edge-Case Processing Failures

Many software defects only occur under unusual circumstances.

Examples include:

Extremely large event payloads
Rare decoder combinations
Unexpected agent behavior
Concurrent processing conditions

These failures can be difficult to reproduce without analyzing the core dump.

Third-Party Integration Issues

Many Wazuh deployments include custom integrations and external automation.

While powerful, these integrations can also introduce instability.

External Scripts

Custom scripts may:

Consume excessive resources
Return malformed data
Create deadlocks
Trigger unexpected module behavior

Poorly written integrations are a common source of intermittent crashes.

Custom Integrations

Organizations frequently connect Wazuh to:

Ticketing platforms
Threat intelligence feeds
SIEM solutions
Automation tools

Problems within these integrations can indirectly affect manager stability.

Related reading:

API-Related Crashes

API integrations occasionally trigger failures when:

Responses are malformed
Timeouts occur unexpectedly
Authentication failures are not handled correctly
Returned data exceeds expected limits

Reviewing integration logs often helps identify these issues.

Database and Internal Communication Problems

Several Wazuh components depend on internal database communication.

Failures in this area can cause instability throughout the platform.

wazuh-db Failures

The wazuh-db service handles important internal operations involving agent and configuration data.

When wazuh-db becomes unstable, dependent components may also fail.

Database Corruption

Corrupted databases can lead to:

Failed queries
Invalid responses
Unexpected process termination
Repeated crash cycles

Corruption may result from:

Improper shutdowns
Disk failures
File system corruption
Incomplete upgrades

IPC Communication Issues

Wazuh components communicate internally through Inter-Process Communication (IPC) mechanisms.

If IPC channels become corrupted or unavailable, processes may:

Hang indefinitely
Receive invalid responses
Terminate unexpectedly

These failures often appear in manager logs shortly before a crash.

Rule and Decoder Configuration Errors

Custom configurations can introduce instability when not properly tested.

Invalid Custom Rules

Incorrect rule syntax may cause parsing failures during startup or event processing.

Always validate custom rules before deploying them to production.

Related reading:

Recursive Logic Problems

Poorly designed rule chains can create excessive processing overhead.

Examples include:

Circular rule references
Excessive inheritance chains
Deep dependency relationships

These conditions can dramatically increase CPU and memory usage.

Decoder Parsing Issues

Custom decoders sometimes fail when encountering unexpected data.

Common problems include:

Incorrect regex patterns
Missing fields
Invalid assumptions about log structure

Decoder-related crashes often appear only when specific event types are processed.

Disk and File System Problems

Storage problems can affect every Wazuh component.

Full Disks

A full disk can prevent:

Log writing
Queue storage
Database updates
Temporary file creation

When critical write operations fail, processes may terminate unexpectedly.

Corrupted Storage

File system corruption can damage:

Databases
Index files
Queue files
Configuration files

Corruption often causes recurring crashes that persist across service restarts.

File Permission Issues

Incorrect ownership or permissions may prevent Wazuh components from accessing required files.

Symptoms include:

Startup failures
Unexpected process exits
Incomplete initialization
Core dumps during file operations

Operating System and Dependency Issues

The underlying operating system can also contribute to crashes.

Unsupported Libraries

Library mismatches may occur after:

Partial upgrades
Repository changes
Manual package installations

An incompatible shared library can cause immediate application crashes.

Broken Package Dependencies

Missing dependencies may prevent modules from functioning correctly.

Administrators should verify package integrity whenever crashes occur after upgrades.

Kernel-Related Compatibility Problems

Certain kernel versions occasionally expose compatibility issues with user-space applications.

According to guidance from the Linux Foundation, maintaining supported kernel and dependency combinations is an important best practice for production system stability.

When crashes begin shortly after operating system updates, dependency compatibility should be investigated immediately.

Step 1: Confirm That a Core Dump Occurred

Before investigating root causes, verify that a core dump was actually generated.

This helps distinguish true process crashes from configuration problems, graceful shutdowns, or service restarts.

Check Wazuh Service Status

Start by examining the current service state.

Using systemctl

Run:

systemctl status wazuh-manager

Look for:

Failed status indicators
Signal termination messages
Segmentation fault errors
Restart loop behavior

Recent crashes often appear directly within the status output.

Reviewing Recent Crashes

Systemd typically records termination events in the journal.

Use:

journalctl -u wazuh-manager -n 200

Look for messages containing:

Segmentation fault
Aborted
Core dumped
Killed
Signal 11
Signal 6

These entries frequently indicate that a core dump was generated.

Identify Terminated Processes

If the manager service restarted automatically, determine which component actually failed.

Examples include:

wazuh-analysisd
wazuh-remoted
wazuh-db
wazuh-modulesd
wazuh-authd

Knowing the affected process significantly narrows the investigation scope.

Check Kernel Messages

The Linux kernel often records crash details.

Use:

dmesg -T | grep -i "segfault"

or:

journalctl -k

Kernel logs frequently reveal:

Faulting addresses
Signal numbers
Memory violations
Crashed binaries

Review Service Logs

Examine Wazuh logs immediately before the crash.

Useful files include:

/var/ossec/logs/ossec.log

Look for:

Error messages
Module failures
Database issues
Queue overflows
Resource warnings

Many root causes become visible shortly before process termination.

Verify Core Dump Generation

After confirming a crash occurred, determine whether Linux created a dump file.

Using coredumpctl

On systemd-based systems:

coredumpctl list

Example output:

TIME                            PID   UID   GID SIG
Mon 2026-01-10 10:15:12 UTC    1234   0     0   11

This confirms a process generated a core dump.

You can inspect details using:

coredumpctl info

Using Core Dump Files Directly

On systems that store traditional core files:

find / -name "core*" 2>/dev/null

If files are present, a crash likely occurred and further analysis can begin.

Step 2: Locate Wazuh Core Dump Files

Once you’ve confirmed a crash occurred, locate the dump file associated with the failed process.

Common Core Dump Locations

The storage location depends on Linux distribution and core dump configuration.

systemd-coredump Storage

Modern distributions commonly use systemd-coredump.

Typical location:

/var/lib/systemd/coredump/

List available dumps:

coredumpctl list

Extract a dump if necessary:

coredumpctl dump <PID>

Traditional Core File Locations

Older systems often write core files directly to disk.

Common locations include:

/var/crash/
/tmp/

or the working directory of the crashed process.

Search for them using:

find / -type f -name "core*" 2>/dev/null

Custom Core Pattern Locations

Linux allows administrators to customize dump storage through:

cat /proc/sys/kernel/core_pattern

Example:

/var/core/core.%e.%p

This setting determines where newly generated dumps are written.

Determine Which Process Crashed

Finding a core file is only the first step.

Next, identify which Wazuh component generated it.

Mapping Dump Files to Wazuh Components

Many dump files include process information within the filename.

Examples:

core.wazuh-analysisd.12345
core.wazuh-remoted.9876
core.wazuh-db.5555

This immediately identifies the affected service.

If filenames are not descriptive, use:

coredumpctl info

or:

file core.*

to obtain executable information.

Identifying Affected Services

The most commonly crashed Wazuh processes include:

Process	Primary Function
analysisd	Event analysis and rule matching
remoted	Agent communications
wazuh-db	Internal database operations
modulesd	Module execution
authd	Agent enrollment and authentication

Identifying the crashed process early dramatically reduces troubleshooting time because it allows you to focus on the subsystem most likely responsible for the failure.

Step 3: Examine Wazuh Logs Before the Crash

Once you’ve identified the crashed process and located the core dump, the next step is reviewing Wazuh logs generated immediately before the failure.

In many cases, the logs reveal the root cause without requiring deep core dump analysis.

Fatal errors, resource exhaustion warnings, database communication failures, and malformed event processing issues often appear minutes or even seconds before a process terminates.

Review Manager Logs

Wazuh components generate extensive operational logs that provide valuable context surrounding a crash.

Rather than focusing only on the exact crash timestamp, examine activity occurring several minutes beforehand.

Many failures are preceded by warning messages that progressively worsen until the process becomes unstable.

Important Log Locations

The primary log file for troubleshooting manager crashes is:

/var/ossec/logs/ossec.log

Search recent activity using:

tail -500 /var/ossec/logs/ossec.log

Or filter by component:

grep analysisd /var/ossec/logs/ossec.log

grep remoted /var/ossec/logs/ossec.log

grep wazuh-db /var/ossec/logs/ossec.log

For system-level events, also review:

journalctl -u wazuh-manager

and

journalctl -xe

These logs frequently contain information unavailable within ossec.log itself.

Identifying Fatal Errors

Start by searching for obvious failure indicators.

Examples include:

ERROR
CRITICAL
FATAL
Aborted
Segmentation fault
Out of memory
Connection refused
Database error

Useful commands:

grep -Ei "fatal|critical|error|abort" /var/ossec/logs/ossec.log

Look especially for messages occurring immediately before the service restart or crash timestamp.

Many Wazuh crashes leave a clear trail of warnings before termination.

Look for Warning Messages Leading Up to the Crash

Warnings often provide the earliest indication of instability.

Administrators frequently overlook these messages because the service may continue functioning temporarily before finally crashing.

Queue Warnings

Queue-related warnings indicate that incoming events are arriving faster than they can be processed.

Examples include:

Queue is full
Event queue saturated
Messages dropped

Large queue backlogs can contribute to memory pressure and eventual process failures.

Memory Allocation Errors

Memory-related warnings should always be treated seriously.

Examples:

Cannot allocate memory
Out of memory
Allocation failed

These messages often appear before segmentation faults and process crashes.

Related reading:

Decoder Failures

Malformed or unexpected log formats can trigger decoder problems.

Examples:

Decoder error
Regex compilation failed
Invalid log format

Repeated decoder failures may indicate corrupted log sources or configuration issues.

Database Communication Errors

Database instability frequently affects multiple Wazuh components.

Watch for messages such as:

Database connection failed
Unable to communicate with wazuh-db
IPC timeout
Database unavailable

These warnings often precede crashes involving modulesd, analysisd, or wazuh-db itself.

Correlate Crash Timing

Finding errors is important, but understanding their relationship to the crash is even more valuable.

Building a Timeline

Create a timeline of events leading to the crash.

Document:

Time	Event
10:01	Queue warnings begin
10:05	Memory usage spikes
10:08	Database timeout errors appear
10:10	Process crashes
10:10	Core dump generated

This approach often reveals patterns that individual log entries cannot.

Matching Logs to Crash Events

Compare timestamps from:

ossec.log
journalctl
dmesg
coredumpctl
monitoring systems

Your goal is to identify what changed immediately before the failure occurred.

Experienced incident responders frequently emphasize timeline reconstruction as one of the most effective methods for identifying root causes because it helps distinguish symptoms from the actual triggering event.

Step 4: Analyze the Core Dump

Once you’ve collected relevant logs, it’s time to examine the core dump itself.

Core dump analysis can reveal exactly where the process failed and which function triggered the crash.

Even if you’re not a software developer, basic analysis often provides enough information to identify known bugs, resource issues, or module-specific failures.

Install Required Debugging Tools

Several tools are required before you can inspect a dump file.

GDB

The GNU Debugger (GDB) is the most common utility used for Linux crash analysis.

Install it on Debian-based systems:

sudo apt install gdb

On RHEL-based systems:

sudo yum install gdb

GDB allows you to inspect:

Stack traces
Thread information
Register values
Memory state
Crashed functions

The official GNU debugger documentation provides detailed guidance on post-mortem debugging techniques.

Debug Symbol Packages

Without debugging symbols, stack traces may contain limited information.

Install:

Wazuh debug packages (if available)
Operating system debug symbols
Library debug packages

Symbols allow GDB to display function names and source code references rather than memory addresses.

Open the Core Dump in GDB

After installing the necessary tools, load the dump.

Loading the Dump

Example:

gdb /var/ossec/bin/wazuh-analysisd core.12345

Or using systemd:

coredumpctl gdb <PID>

GDB will load the process state captured at the time of the crash.

Generating a Backtrace

The first command most engineers run is:

bt

or:

thread apply all bt

This generates a backtrace showing the sequence of function calls that led to the crash.

Example:

#0 process_event()
#1 decode_log()
#2 rule_matching()
#3 main()

The backtrace is often the single most valuable artifact produced during troubleshooting.

Understanding Backtrace Output

A backtrace may look intimidating at first, but several patterns are easy to recognize.

Function Call Stacks

The stack trace shows which functions were executing when the process failed.

Repeated function names may indicate:

Infinite recursion
Looping logic
Stack exhaustion

These patterns frequently point directly to software defects.

Segmentation Faults

A segmentation fault (SIGSEGV) occurs when a process attempts to access memory it does not own.

Example:

Program terminated with signal SIGSEGV
Segmentation fault

This is one of the most common causes of Wazuh core dumps.

Abort Signals

Abort signals typically appear as:

Program terminated with signal SIGABRT

These crashes often occur when internal safety checks detect invalid program states.

Memory Access Violations

Memory corruption indicators may include:

Invalid pointers
Null pointer dereferences
Buffer overflows
Corrupted heap structures

When these patterns appear, a software defect is often involved.

According to guidance from the GNU Project and major Linux distribution maintainers, stack traces and signal information are usually the most important artifacts for diagnosing application crashes.

Collect Information for Vendor Support

If the root cause is not immediately obvious, gather information before opening a support case or GitHub issue.

Backtrace Output

Save:

bt
thread apply all bt

outputs to a text file.

These are typically the first artifacts requested by support engineers.

System Details

Collect:

uname -a

cat /etc/os-release

This helps identify operating system compatibility issues.

Version Information

Document:

/var/ossec/bin/wazuh-control info

or:

rpm -qa | grep wazuh

or:

dpkg -l | grep wazuh

Include:

Wazuh version
Operating system version
Kernel version
Installed integrations
Deployment architecture

Providing complete diagnostic information significantly accelerates vendor troubleshooting.

Step 5: Verify System Resource Health

A surprisingly large percentage of Wazuh crashes are caused by resource exhaustion rather than software defects.

Before assuming a bug exists, verify that the underlying system has sufficient resources to support the workload.

Check Available Memory

Memory shortages are among the most common causes of instability.

Physical RAM

Review memory utilization:

free -h

Look for:

Very low available memory
Consistently high utilization
Frequent memory pressure events

Memory consumption approaching system limits should be investigated immediately.

Swap Usage

Check swap activity:

swapon --show

and

free -h

Heavy swap usage often indicates insufficient physical memory.

Systems relying extensively on swap frequently experience:

Increased latency
Slower event processing
Process instability
Unexpected crashes

Monitor CPU Utilization

CPU saturation can create cascading failures throughout the manager.

Sustained High CPU Usage

Monitor system load:

top

or:

htop

Look for:

CPU usage consistently above 80–90%
Load averages exceeding CPU core counts
Analysisd consuming excessive resources

Process-Level Analysis

Identify which processes are consuming resources:

ps aux --sort=-%cpu | head

and:

ps aux --sort=-%mem | head

This helps determine whether the crash is linked to a specific component.

Verify Disk Capacity

Storage exhaustion can destabilize Wazuh and its supporting services.

Filesystem Usage

Check available space:

df -h

Pay special attention to:

/
/var
/var/ossec
OpenSearch data volumes

Full filesystems commonly trigger service failures.

Inode Availability

A filesystem can run out of inodes even when free space remains.

Check inode consumption:

df -i

Low inode availability may prevent new files from being created.

Inspect File Descriptor Limits

Wazuh managers handling thousands of agents may encounter file descriptor limitations.

Current Limits

View current limits:

ulimit -n

Review system-wide settings:

cat /proc/sys/fs/file-max

Low limits can cause:

Connection failures
Queue problems
Service instability
Unexpected process exits

Increasing Limits When Necessary

If limits are too restrictive, adjust:

/etc/security/limits.conf

Example:

wazuh soft nofile 65535
wazuh hard nofile 65535

After increasing limits, restart affected services and continue monitoring.

Resource validation is a critical troubleshooting step because many crashes that initially appear to be software bugs ultimately turn out to be memory shortages, CPU saturation, disk exhaustion, or operating system limitations.

Step 6: Validate Wazuh Configuration

Configuration problems are a common source of Wazuh Manager instability, especially in environments with extensive customization.

Custom rules, decoders, integrations, and manual configuration changes can introduce unexpected behavior that eventually leads to process crashes.

If core dumps began appearing after a configuration change, validation should be one of your highest-priority troubleshooting steps.

Check Manager Configuration Syntax

Before investigating more complex causes, verify that the manager configuration is syntactically correct.

Even small formatting mistakes can create instability or prevent components from operating properly.

Validate ossec.conf

The primary Wazuh Manager configuration file is:

/var/ossec/etc/ossec.conf

Inspect the file for:

Missing XML tags
Invalid nesting
Duplicate configuration blocks
Typographical errors
Unsupported options

Wazuh logs often reveal configuration-related errors during startup.

Review:

cat /var/ossec/logs/ossec.log

immediately after restarting the service.

Identify Recent Changes

One of the fastest ways to locate a root cause is determining what changed before the crashes began.

Ask questions such as:

Were new rules recently added?
Was Wazuh upgraded?
Was an integration deployed?
Were decoder changes introduced?
Were manager settings modified?

Many incidents can be traced directly to a recent configuration change.

If version control is available, compare current and previous configurations.

Review Custom Rules

Custom detection rules are powerful but can introduce processing problems when improperly designed.

Detect Faulty Rule Logic

Review recently added rules for:

Invalid syntax
Unsupported fields
Excessive inheritance
Circular dependencies
Inefficient matching logic

Examples of problematic patterns include:

<if_sid>100001</if_sid>

referencing rules that do not exist or recursive rule chains that repeatedly trigger each other.

These issues can dramatically increase processing overhead and occasionally expose edge-case software defects.

Test Rule Changes Safely

Never deploy major rule changes directly to production without validation.

Use:

/var/ossec/bin/wazuh-logtest

to verify behavior before rollout.

This tool allows administrators to:

Test rule matching
Validate syntax
Verify decoder interactions
Identify unexpected behavior

Review Custom Decoders

Custom decoders are another frequent source of instability.

Decoder errors may not appear immediately and can remain hidden until a specific log format is processed.

Decoder Validation

Inspect custom decoders for:

Invalid regular expressions
Incorrect field mappings
Missing parent decoders
Unsupported XML elements

Validate decoder behavior using representative log samples before deployment.

Common Decoder Mistakes

The most common issues include:

Overly complex regex patterns
Greedy matching expressions
Invalid capture groups
Decoder inheritance errors
Assumptions about log structure

For example, a decoder may work perfectly with expected logs but fail when encountering malformed or unexpected input.

These edge cases can trigger excessive resource consumption or process instability under heavy workloads.

Step 7: Investigate Database and Module Failures

Many Wazuh Manager crashes originate from internal modules rather than the manager framework itself.

Database communication problems, module failures, and subsystem-specific defects can all produce core dumps.

The goal of this step is identifying whether a particular component is consistently involved in the crash.

Check wazuh-db Health

The wazuh-db service is one of the most important components within the Wazuh architecture.

Many manager functions rely on it for configuration management, agent information, and internal data operations.

Database Errors

Review logs for messages such as:

Database error
Database unavailable
Query failed
Database timeout

Search logs using:

grep -i database /var/ossec/logs/ossec.log

Repeated database errors often indicate corruption, communication failures, or resource exhaustion.

Communication Failures

Wazuh components communicate extensively with wazuh-db.

Common warning messages include:

Unable to communicate with wazuh-db
IPC timeout
Socket communication error
Connection lost

When communication breaks down, dependent processes may become unstable and eventually crash.

Review Wazuh Modules

Several Wazuh modules perform specialized functions and may crash independently under certain conditions.

Examine logs for module-specific errors.

Vulnerability Detection

The vulnerability detection module processes package inventory information and vulnerability feeds.

Potential issues include:

Corrupted vulnerability databases
Feed synchronization failures
Excessive resource consumption
Version compatibility problems

Syscollector

Syscollector gathers inventory information from monitored endpoints.

Problems may occur when:

Agents send unexpected inventory data
Large environments generate excessive inventory updates
Resource limits are reached

Review Syscollector-related log entries surrounding the crash.

FIM

File Integrity Monitoring (FIM) can generate significant processing workloads.

Potential crash contributors include:

Monitoring extremely large directories
Excessive file changes
Aggressive scan schedules
Resource exhaustion

Related reading:

SCA

Security Configuration Assessment (SCA) scans can place additional load on manager resources.

Review:

Scan frequency
Policy complexity
Concurrent scan activity

Large-scale SCA deployments occasionally expose scalability issues.

Identify Module-Specific Crashes

The objective is determining whether crashes consistently occur within the same subsystem.

Look for patterns such as:

Every crash involving modulesd
Every crash occurring during vulnerability scans
Every crash occurring after FIM activity
Every crash occurring during agent enrollment

Consistent patterns usually indicate a module-specific problem.

Isolating Problematic Components

If evidence points toward a specific module, isolate it for testing.

For example:

Disable the suspected module.
Restart Wazuh.
Monitor system stability.
Compare behavior before and after the change.

This controlled approach often confirms the root cause quickly.

Temporary Module Disablement for Testing

Disabling a module temporarily can help determine whether it is responsible for the crashes.

Examples include:

Vulnerability Detection
SCA
Syscollector
Third-party integrations

Do not leave critical security features disabled permanently, but temporary testing can provide valuable diagnostic information.

Document every change so that configurations can be restored after troubleshooting.

Step 8: Determine Whether the Crash Is a Known Bug

Not every core dump is caused by local configuration problems or infrastructure issues.

Sometimes the crash is the result of a documented software defect that has already been identified and fixed by the Wazuh development team.

Before investing excessive time in deep debugging, verify whether the issue is already known.

Verify Installed Wazuh Version

Begin by identifying the exact version running in your environment.

Examples:

rpm -qa | grep wazuh

or:

dpkg -l | grep wazuh

Document:

Manager version
Agent versions
Dashboard version
Indexer version

Version mismatches can sometimes contribute to instability.

Review Release Notes

Wazuh release notes frequently contain bug fixes addressing:

Memory leaks
Segmentation faults
Database crashes
Module instability
Integration failures

Pay particular attention to fixes involving the component identified in your backtrace.

For example, if analysisd generated the core dump, search release notes for analysisd-related fixes.

Search Known Issues

The next step is reviewing publicly reported bugs.

Search using:

Error messages
Backtrace functions
Signal names
Module names
Version numbers

You may discover that other administrators have already encountered the same issue.

Compare Crash Signatures

Core dump analysis becomes especially valuable when comparing crash signatures against known defects.

Matching Stack Traces

If your backtrace contains functions such as:

process_event()
decode_event()
db_query()

search those function names together with your Wazuh version.

Matching stack traces are often strong evidence that you’re encountering an existing bug.

Many software vendors use crash signatures as the primary method for categorizing and resolving defects.

Existing Bug Reports

Review issue reports for:

Similar stack traces
Similar workloads
Similar deployment architectures
Matching error messages

Pay attention to comments from maintainers because they often contain workarounds or temporary mitigations.

Fixed Versions

If a bug has already been fixed, upgrading may be the fastest resolution.

Before upgrading:

Verify the bug matches your symptoms.
Review release notes carefully.
Confirm upgrade compatibility.
Test in a non-production environment whenever possible.

Related reading:

How to Upgrade a Wazuh Agent

Many organizations spend days troubleshooting issues that have already been resolved in newer releases.

Checking known bugs early in the investigation process can save significant time and effort while reducing future stability risks.

Step 9: Apply Corrective Actions

After identifying the likely root cause of the crash, the next step is implementing corrective actions that permanently eliminate the issue.

Avoid the temptation to simply restart the manager and move on.

A successful troubleshooting effort should not only restore service but also reduce the likelihood of future crashes.

Upgrade to a Stable Release

If your investigation points to a known software defect, upgrading to a newer stable release is often the most effective solution.

Many Wazuh Manager core dumps are eventually traced back to bugs that have already been fixed by the development team.

Before upgrading:

Review release notes
Verify compatibility requirements
Back up critical configurations
Test upgrades in a staging environment
Validate custom rules and integrations

Pay particular attention to fixes involving:

analysisd crashes
memory leaks
database communication failures
module instability
agent communication issues

Fix Resource Bottlenecks

Resource exhaustion is one of the most common causes of manager instability.

If memory, CPU, disk, or file descriptor limitations contributed to the crash, address them before returning the system to production.

Common corrective actions include:

Increasing available RAM
Expanding swap space
Adding CPU resources
Increasing file descriptor limits
Expanding storage capacity
Reducing event ingestion rates

Organizations that proactively address infrastructure bottlenecks often eliminate recurring crash cycles without making any application-level changes.

Correct Configuration Errors

Configuration issues should be corrected immediately once identified.

Examples include:

Invalid XML syntax
Incorrect module settings
Broken integrations
Unsupported configuration options
Misconfigured cluster settings

After applying corrections:

Validate the configuration.
Restart affected services.
Review startup logs.
Monitor for recurring errors.

Repair Corrupted Files

Corrupted files frequently contribute to unexpected process failures.

Files that may require repair or replacement include:

Internal databases
Queue files
Configuration files
Index data
Integration artifacts

Potential indicators of corruption include:

Unexpected parsing failures
Repeated database errors
Invalid file format messages
Consistent crashes during startup

When corruption is suspected, restore affected files from a known-good backup whenever possible.

Remove Faulty Customizations

Customizations often introduce instability, especially after upgrades.

Examples include:

Custom scripts
Third-party integrations
Custom decoders
Custom rules
Modified startup procedures

Temporarily remove nonessential customizations and observe system behavior.

If crashes stop occurring, reintroduce customizations individually until the problematic component is identified.

Tune Event Processing Workloads

Large environments frequently overwhelm Wazuh through sheer event volume.

Potential tuning strategies include:

Filtering unnecessary logs
Reducing noisy event sources
Optimizing custom rules
Limiting excessive FIM activity
Adjusting scan schedules
Increasing processing capacity

Related reading:

The goal is ensuring that incoming workloads remain within the capacity of the manager infrastructure.

Preventing Future Wazuh Manager Core Dumps

While troubleshooting is important, prevention is even more valuable.

Organizations that implement proactive monitoring and maintenance practices experience significantly fewer stability incidents than those operating reactively.

The following best practices can dramatically reduce the likelihood of future core dumps.

Keep Wazuh Updated

Running outdated software increases exposure to:

Known bugs
Memory leaks
Stability defects
Security vulnerabilities
Compatibility issues

Establish a process for:

Reviewing release notes
Evaluating new versions
Testing upgrades
Deploying approved updates

According to guidance from the Wazuh project, staying current with supported releases is one of the most effective ways to maintain platform stability.

Monitor Resource Consumption Proactively

Resource-related crashes rarely occur without warning.

Monitor key metrics such as:

Memory utilization
CPU usage
Queue depth
Disk capacity
File descriptor usage
Process restart frequency

Alerting on abnormal trends allows administrators to intervene before instability develops.

Validate Configuration Changes Before Deployment

Every configuration change carries risk.

Before deploying modifications:

Review syntax carefully
Validate XML structures
Test integrations
Verify dependencies
Document changes

A formal change validation process can eliminate many avoidable outages.

Test Custom Rules and Decoders in Staging

Custom content should never be deployed directly to production without testing.

A staging environment allows administrators to verify:

Rule behavior
Decoder accuracy
Performance impact
Compatibility with existing configurations

Many production incidents originate from untested customizations rather than defects in Wazuh itself.

Implement Log and Performance Monitoring

Effective monitoring provides early warning signs before crashes occur.

Track:

Service restarts
Error messages
Queue growth
Database communication failures
Memory allocation warnings
Agent connectivity issues

Monitoring platforms should generate alerts whenever abnormal behavior is detected.

As noted by observability experts at the OpenTelemetry project, early detection of abnormal system behavior is critical for maintaining application reliability.

Establish Routine Health Checks

Periodic health reviews help identify hidden issues before they become critical.

A routine health check may include:

Reviewing logs
Verifying module status
Checking disk utilization
Examining memory trends
Confirming agent connectivity
Reviewing cluster health

Organizations that conduct regular health assessments often discover developing problems long before they cause outages.

Maintain Sufficient System Capacity

As deployments grow, infrastructure requirements increase.

Many Wazuh environments remain stable for months before suddenly experiencing crashes due to capacity constraints.

Review capacity regularly and plan for:

Additional agents
Higher event volumes
New integrations
Increased retention periods
Expanded security monitoring requirements

Maintaining adequate headroom helps prevent resource-related failures and improves overall reliability.

When to Escalate to Wazuh Support

Some crashes cannot be fully diagnosed internally.

If the root cause remains unclear after completing the troubleshooting process, escalation may be necessary.

Providing complete diagnostic information significantly improves the chances of a fast resolution.

Information to Collect Before Opening a Case

Support engineers can only work with the information provided.

Gather as much evidence as possible before opening a ticket or submitting a bug report.

Core Dump Files

Collect:

Original core dump files
coredumpctl output
Crash timestamps
Associated process names

These files often contain the most valuable diagnostic data.

Backtraces

Generate and save:

bt
thread apply all bt

outputs from GDB.

Backtraces are frequently the first artifact requested by developers.

Wazuh Logs

Include:

/var/ossec/logs/ossec.log

particularly entries immediately preceding the crash.

Capture:

Error messages
Warning messages
Service restart events
Database communication failures

Version Information

Document:

Wazuh Manager version
Wazuh Agent versions
Dashboard version
Indexer version
Operating system version
Kernel version

Version details often help identify known bugs quickly.

System Specifications

Provide:

CPU count
Available RAM
Storage configuration
Number of agents
Daily event volume
Cluster architecture

Environmental information helps support engineers reproduce conditions associated with the crash.

Creating a Useful Support Request

A well-prepared support request can reduce troubleshooting time from days to hours.

Diagnostic Information Checklist

Before submitting a case, ensure you have collected:

Core dump files
Stack traces
Wazuh logs
System logs
Version information
Resource utilization data
Configuration changes made before the crash
Relevant screenshots or error messages

The more evidence provided, the faster engineers can isolate the root cause.

Reproduction Details

One of the most valuable pieces of information is whether the crash can be reproduced consistently.

Document:

What happened before the crash
Which component failed
How frequently it occurs
Whether specific logs trigger the failure
Whether certain integrations are involved
Whether the crash appeared after an upgrade or configuration change

Providing clear reproduction steps dramatically increases the likelihood that developers can identify and fix the underlying problem.

By combining detailed diagnostics, core dump analysis, resource validation, configuration reviews, and proactive monitoring, most Wazuh Manager core dumps can be resolved systematically.

The key is treating every core dump as an opportunity to identify and eliminate the underlying cause rather than simply restoring service and waiting for the next crash.

Frequently Asked Questions (FAQ)

Question: What causes Wazuh Manager core dumps?

Wazuh Manager core dumps can be triggered by a wide range of issues, including:

Memory exhaustion
Memory leaks
Software defects
Corrupted log data
Database communication failures
Faulty custom rules or decoders
Third-party integration problems
Disk and filesystem issues
Operating system dependency conflicts

The core dump itself is not the root cause. It is evidence that a process terminated unexpectedly.

Identifying the underlying trigger requires reviewing logs, analyzing the dump, and examining system health.

Question: Where are Wazuh core dump files stored?

The location depends on your Linux distribution and core dump configuration.

Common locations include:

/var/lib/systemd/coredump/
/var/crash/
/tmp/

Some systems use custom storage paths defined by:

cat /proc/sys/kernel/core_pattern

If you’re unsure where dumps are being stored, use:

find / -name "core*" 2>/dev/null

or:

coredumpctl list

to locate them.

Question: How do I know which Wazuh process crashed?

Several methods can help identify the affected process.

Start by reviewing:

coredumpctl info

You can also inspect:

systemd logs
kernel logs
ossec.log
core dump filenames

Common Wazuh processes that generate core dumps include:

wazuh-analysisd
wazuh-remoted
wazuh-db
wazuh-modulesd
wazuh-authd

Identifying the crashed process is one of the most important steps because it narrows the investigation to a specific subsystem.

Question: Can a core dump cause data loss?

A core dump itself does not cause data loss.

However, the crash that generated the dump can interrupt:

Event processing
Alert generation
Agent communication
Database operations
Log collection

Depending on the duration of the outage, some security events may be delayed, missed, or lost entirely.

This is why recurring crashes should be treated as high-priority operational incidents.

Question: How do I analyze a Wazuh core dump using GDB?

Install GDB and load the dump file:

gdb /var/ossec/bin/wazuh-analysisd core.12345

For systemd-managed dumps:

coredumpctl gdb <PID>

After loading the dump, generate a stack trace using:

bt

or:

thread apply all bt

The resulting backtrace shows the function calls that occurred before the crash and is often the most valuable artifact during troubleshooting.

Question: Are core dumps always caused by software bugs?

No.

While software defects can certainly generate core dumps, many crashes are caused by environmental problems such as:

Insufficient memory
High CPU utilization
Full disks
Corrupted databases
Invalid configurations
Third-party integrations
Dependency conflicts

In production environments, resource-related issues are often just as common as software bugs.

Question: Should I delete core dump files after analysis?

Yes, in most cases.

Core dump files can consume significant disk space, especially when large processes crash.

However, do not delete them until:

Analysis has been completed.
Backtraces have been collected.
Required support artifacts have been archived.
Any support cases have been opened.

Once the necessary information has been preserved, old dumps can usually be removed safely.

Question: Can insufficient memory trigger Wazuh Manager crashes?

Absolutely.

Memory shortages are one of the most common causes of Wazuh instability.

When available memory becomes limited, Wazuh components may experience:

Allocation failures
Queue growth
Performance degradation
Process termination
Core dump generation

Administrators should regularly monitor:

free -h

and overall memory consumption trends to identify problems before they cause outages.

Question: How can I prevent recurring core dumps?

The most effective prevention strategies include:

Keeping Wazuh updated
Monitoring resource utilization
Testing configuration changes before deployment
Validating custom rules and decoders
Reviewing logs regularly
Performing routine health checks
Maintaining adequate system capacity

Proactive maintenance is significantly more effective than reacting to crashes after they occur.

Question: When should I contact Wazuh support?

Consider contacting support or opening a bug report when:

The root cause remains unclear after troubleshooting
Crashes continue after corrective actions
The backtrace points to a possible software defect
Multiple manager components are crashing
Core dumps appear immediately after upgrades
You suspect a previously unknown bug

Before escalating, gather:

Core dump files
GDB backtraces
Wazuh logs
System logs
Version information
System specifications

Providing complete diagnostic information dramatically improves the chances of a quick resolution.

Conclusion

Wazuh Manager core dumps are among the most serious indicators of instability within a Wazuh deployment.

While it may be tempting to simply restart the affected service and move on, doing so often leaves the underlying problem unresolved and increases the likelihood of future outages.

A systematic troubleshooting approach is far more effective.

The workflow outlined in this guide begins by confirming that a core dump actually occurred, locating the associated dump files, reviewing logs leading up to the crash, analyzing the dump with GDB, validating system resources, checking configurations, investigating database and module failures, and determining whether the issue matches a known software defect.

Throughout the investigation, the primary objective should be identifying the root cause rather than treating the symptoms.

Whether the issue stems from memory exhaustion, malformed log data, corrupted databases, faulty integrations, configuration errors, or a software bug, understanding why the process crashed is the key to preventing it from happening again.

Long-term stability depends on strong operational practices, including:

Keeping Wazuh updated with supported releases
Monitoring memory, CPU, disk, and queue utilization
Testing custom rules and decoders before deployment
Validating configuration changes carefully
Performing routine health checks
Maintaining sufficient infrastructure capacity as deployments grow

By combining proactive monitoring, disciplined change management, and thorough root-cause analysis, administrators can significantly reduce the frequency of Wazuh Manager crashes and maintain a more reliable, resilient, and effective security monitoring platform.