Troubleshooting Wazuh Manager Core Dumps

A Wazuh Manager core dump is one of the clearest indicators that something has gone seriously wrong inside the Wazuh server.

When a critical Wazuh process crashes unexpectedly, the operating system may generate a core dump file containing a snapshot of the process’s memory, execution state, loaded libraries, and stack traces at the exact moment of failure.

While many administrators focus on restoring service availability after a crash, the core dump itself often contains the information needed to identify and permanently resolve the underlying problem.

Core dumps should never be treated as isolated incidents.  In most environments, they are symptoms of deeper issues such as software bugs, memory corruption, resource exhaustion, incompatible integrations, corrupted databases, configuration errors, or operating system limitations.

Ignoring repeated core dumps can lead to recurring outages, degraded security visibility, and unreliable event processing.

During an outage, important security events may be delayed, dropped, or never processed at all, potentially allowing malicious activity to go unnoticed.

In this guide, you’ll learn how Wazuh Manager core dumps occur, how to identify the affected components, how to analyze crash data, and how to systematically troubleshoot the most common root causes. You’ll also learn preventive measures that reduce the likelihood of future crashes and improve overall Wazuh stability.


Understanding Wazuh Manager Core Dumps

 

What Is a Core Dump?

A core dump is a file generated by the Linux kernel when a running process terminates unexpectedly due to a fatal error such as a segmentation fault, illegal instruction, memory access violation, or abort signal.

Think of a core dump as a forensic snapshot of a crashed process.

It captures the internal state of the application at the exact moment the failure occurred, allowing developers and system administrators to reconstruct what happened.

Linux systems generate core dumps when core dump generation is enabled through system limits and kernel settings.

Depending on the distribution and configuration, these files may be stored directly on disk or managed by systemd-coredump.

Core dump files typically contain:

  • Process memory contents
  • Stack traces
  • Register values
  • Loaded shared libraries
  • Thread information
  • Execution state
  • Signal information that triggered the crash

This information is extremely valuable during troubleshooting because it allows engineers to determine the exact code path that caused the failure.

According to the Linux kernel documentation, core dumps are specifically designed to assist post-mortem debugging by preserving process state after abnormal termination.

For Wazuh deployments experiencing repeated crashes, core dump analysis often reveals the root cause far faster than reviewing log files alone.

How Wazuh Manager Generates Core Dumps

The Wazuh Manager consists of multiple services and internal modules working together to collect, analyze, store, and correlate security events.

Under normal conditions, these processes shut down gracefully when administrators stop the service. During a graceful shutdown, processes release resources, close connections, and exit cleanly without generating a core dump.

Core dumps occur when a process crashes unexpectedly before normal cleanup procedures can execute.

Several Wazuh components are commonly involved in crash scenarios:

analysisd

The analysis engine responsible for decoding events, matching rules, generating alerts, and processing incoming security data.

remoted

Handles communication between Wazuh agents and the manager, including event reception and agent connectivity.

logcollector

Responsible for collecting and forwarding local logs when installed on manager systems.

modulesd

Runs various Wazuh modules and integrations, including vulnerability detection and external data sources.

authd

Handles agent registration and authentication processes.

Related reading: INTERNAL LINK: /fix-authd-registration-failures-wazuh-agent-password-mismatched-guide/

wazuh-db

Manages internal database operations and agent-related data storage.

Related reading: Fixing wazuh-db Worker Thread Crashes

If any of these components encounter fatal software defects, corrupted data structures, memory allocation failures, or resource exhaustion conditions, Linux may generate a core dump before the process terminates.

Why Core Dumps Should Never Be Ignored

Many administrators make the mistake of restarting Wazuh services after a crash without investigating the cause.

While this may temporarily restore functionality, the underlying issue often remains unresolved.

Core dumps frequently indicate hidden stability problems such as:

  • Memory leaks
  • Software defects
  • Corrupted index data
  • Resource exhaustion
  • Integration failures
  • Operating system limitations
  • Incompatible package versions

When manager processes crash, event processing may be interrupted for seconds, minutes, or even hours depending on how quickly the issue is detected.

These interruptions can lead to:

  • Lost log events
  • Delayed alerts
  • Missed detections
  • Incomplete incident timelines
  • Agent communication failures

Security researchers at the SANS Institute have repeatedly emphasized that monitoring gaps and logging interruptions significantly reduce an organization’s ability to detect and investigate attacks.

Even if a crashed process automatically restarts, the resulting monitoring blind spot may leave critical security events unrecorded.

For this reason, every Wazuh Manager core dump should be treated as a high-priority operational incident requiring investigation and root cause analysis.


Common Symptoms of Wazuh Manager Core Dumps

 

Unexpected Service Restarts

One of the most common indicators of manager core dumps is frequent service restarts.

Administrators may notice that the Wazuh Manager service repeatedly transitions between running and failed states.

On systems managed by systemd, automatic restart policies often hide the original crash by immediately launching a new process instance.

Common indicators include:

  • Frequent service restarts in logs
  • Unexpected manager downtime
  • Intermittent dashboard functionality
  • Repeated crash-recovery cycles

You can often identify these patterns using:

systemctl status wazuh-manager
journalctl -u wazuh-manager

In severe cases, the service may enter a restart loop where crashes occur immediately after startup.

Missing or Delayed Security Alerts

Another major symptom is a sudden interruption in alert generation.

When critical components such as analysisd crash, incoming events may stop being analyzed even if agents continue sending logs.

Administrators may observe:

  • Missing alerts
  • Delayed detections
  • Reduced alert volume
  • Incomplete rule matches
  • Gaps in security monitoring

This behavior is often mistaken for rule configuration problems when the real issue is a crashed processing component.

If you’re troubleshooting missing detections, you may also find these guides useful:

Agent Connection Problems

Manager crashes frequently affect agent communication.

Depending on which component fails, agents may:

  • Disconnect unexpectedly
  • Fail health checks
  • Miss heartbeat acknowledgments
  • Experience registration failures
  • Stop transmitting events

Common symptoms include:

  • Agents showing disconnected status
  • Increased reconnect attempts
  • Communication timeout errors
  • Delayed event delivery

For environments already experiencing connectivity issues, review:

High Resource Utilization Before Crashes

Many Wazuh Manager crashes are preceded by abnormal resource consumption.

Before the core dump occurs, administrators may observe:

Memory Spikes

Rapid increases in RAM consumption can indicate memory leaks, oversized event queues, or database issues.

CPU Saturation

Excessive event processing workloads can push manager processes to 100% CPU utilization for extended periods.

Related reading: Why Is Wazuh Using High CPU? Troubleshooting Guide

File Descriptor Exhaustion

Large environments handling thousands of agent connections may exhaust available file descriptors, causing process instability.

The Open Source Security Foundation (OpenSSF) recommends continuous monitoring of resource consumption as part of production security platform reliability practices.

Tracking resource trends often helps administrators identify the root cause before the next crash occurs.

Core Dump Files Appearing on Disk

The most direct symptom is the appearance of core dump files themselves.

Depending on the Linux distribution and core dump configuration, files may appear in locations such as:

/var/lib/systemd/coredump/
/var/crash/
/tmp/

Common file naming patterns include:

core
core.12345
core.wazuh-analysisd.12345

Many administrators first discover the problem when:

  • Disk usage unexpectedly increases
  • Backup jobs begin processing large dump files
  • System monitoring tools report new files
  • Crash analysis directories fill with data

The presence of one or more core dump files should immediately trigger an investigation, especially if multiple dumps are generated over a short period of time.

Repeated dumps almost always indicate a persistent stability problem rather than an isolated incident.


Common Causes of Wazuh Manager Core Dumps

Understanding why a Wazuh Manager process crashed is the most important part of troubleshooting.

While the core dump itself provides evidence of the failure, identifying the underlying root cause prevents future outages and improves platform stability.

Below are the most common reasons Wazuh Manager components generate core dumps.

Memory Exhaustion and Out-of-Memory Conditions

Memory-related failures are among the leading causes of Wazuh Manager crashes.

As event volume grows, Wazuh must allocate memory for:

  • Event processing
  • Rule evaluation
  • Agent communication
  • Internal queues
  • Database operations
  • Vulnerability detection tasks

If memory consumption exceeds available resources, processes can become unstable and eventually crash.

Memory Leaks

A memory leak occurs when an application allocates memory but fails to release it after use.

Over time, leaked memory accumulates until the process exhausts available RAM or virtual memory resources.

Common indicators include:

  • Gradually increasing memory usage
  • Slower performance over time
  • Crashes after running for several days or weeks
  • Frequent OOM (Out of Memory) events

Excessive Event Volumes

Large environments can overwhelm manager processes with sudden bursts of events.

Examples include:

  • Malware outbreaks
  • Log forwarding loops
  • Firewall floods
  • Authentication storms
  • Network scanning activity

When event rates exceed processing capacity, resource consumption can spike dramatically.

Large Queues

If event ingestion exceeds processing speed, internal queues may grow uncontrollably.

Large queues consume memory and increase pressure on components such as analysisd and remoted.

Related reading:

Resource Starvation

Memory shortages often affect more than one subsystem.

When critical resources become scarce, Wazuh processes may fail to allocate memory required for normal operations, resulting in crashes and core dumps.

Corrupted Log Data

Wazuh processes millions of log events from different sources and formats.

Not all logs are well-formed.

Unexpected input can occasionally trigger parser failures or expose software defects.

Malformed Log Entries

Corrupted or improperly formatted logs may contain:

  • Missing fields
  • Broken JSON structures
  • Invalid delimiters
  • Unexpected character sequences

Although Wazuh is designed to handle malformed data gracefully, some edge cases can trigger crashes in affected versions.

Unexpected Encoding Formats

Encoding mismatches can create parsing problems.

Examples include:

  • UTF-8 versus UTF-16 mismatches
  • Invalid Unicode characters
  • Binary data embedded in text logs

These issues may cause decoders or integrations to behave unpredictably.

Oversized Events

Extremely large events can consume excessive memory during processing.

Examples include:

  • Multi-megabyte JSON payloads
  • Large audit logs
  • Oversized application logs
  • Corrupted log files

Oversized events have historically been responsible for processing failures across many SIEM and log management platforms.

Software Bugs and Version Defects

Not every crash is caused by environmental issues.

Sometimes the root cause is a software defect within Wazuh itself.

Known Bugs in Specific Releases

Every major software platform occasionally ships with defects that are later corrected through updates.

Before spending hours troubleshooting, check whether the crash matches a known issue documented in:

Wazuh release notes frequently contain bug fixes related to stability, memory management, and crash prevention.

Module-Specific Crashes

Certain modules may crash independently while the rest of the platform continues functioning.

Examples include:

  • Vulnerability detection modules
  • Cloud integrations
  • External connectors
  • Agent enrollment services

Module-specific failures often appear repeatedly under identical workloads.

Edge-Case Processing Failures

Many software defects only occur under unusual circumstances.

Examples include:

  • Extremely large event payloads
  • Rare decoder combinations
  • Unexpected agent behavior
  • Concurrent processing conditions

These failures can be difficult to reproduce without analyzing the core dump.

Third-Party Integration Issues

Many Wazuh deployments include custom integrations and external automation.

While powerful, these integrations can also introduce instability.

External Scripts

Custom scripts may:

  • Consume excessive resources
  • Return malformed data
  • Create deadlocks
  • Trigger unexpected module behavior

Poorly written integrations are a common source of intermittent crashes.

Custom Integrations

Organizations frequently connect Wazuh to:

  • Ticketing platforms
  • Threat intelligence feeds
  • SIEM solutions
  • Automation tools

Problems within these integrations can indirectly affect manager stability.

Related reading:

API-Related Crashes

API integrations occasionally trigger failures when:

  • Responses are malformed
  • Timeouts occur unexpectedly
  • Authentication failures are not handled correctly
  • Returned data exceeds expected limits

Reviewing integration logs often helps identify these issues.

Database and Internal Communication Problems

Several Wazuh components depend on internal database communication.

Failures in this area can cause instability throughout the platform.

wazuh-db Failures

The wazuh-db service handles important internal operations involving agent and configuration data.

When wazuh-db becomes unstable, dependent components may also fail.

Related reading:

Fixing wazuh-db Worker Thread Crashes

Database Corruption

Corrupted databases can lead to:

  • Failed queries
  • Invalid responses
  • Unexpected process termination
  • Repeated crash cycles

Corruption may result from:

  • Improper shutdowns
  • Disk failures
  • File system corruption
  • Incomplete upgrades

IPC Communication Issues

Wazuh components communicate internally through Inter-Process Communication (IPC) mechanisms.

If IPC channels become corrupted or unavailable, processes may:

  • Hang indefinitely
  • Receive invalid responses
  • Terminate unexpectedly

These failures often appear in manager logs shortly before a crash.

Rule and Decoder Configuration Errors

Custom configurations can introduce instability when not properly tested.

Invalid Custom Rules

Incorrect rule syntax may cause parsing failures during startup or event processing.

Always validate custom rules before deploying them to production.

Related reading:

Recursive Logic Problems

Poorly designed rule chains can create excessive processing overhead.

Examples include:

  • Circular rule references
  • Excessive inheritance chains
  • Deep dependency relationships

These conditions can dramatically increase CPU and memory usage.

Decoder Parsing Issues

Custom decoders sometimes fail when encountering unexpected data.

Common problems include:

  • Incorrect regex patterns
  • Missing fields
  • Invalid assumptions about log structure

Decoder-related crashes often appear only when specific event types are processed.

Disk and File System Problems

Storage problems can affect every Wazuh component.

Full Disks

A full disk can prevent:

  • Log writing
  • Queue storage
  • Database updates
  • Temporary file creation

When critical write operations fail, processes may terminate unexpectedly.

Corrupted Storage

File system corruption can damage:

  • Databases
  • Index files
  • Queue files
  • Configuration files

Corruption often causes recurring crashes that persist across service restarts.

File Permission Issues

Incorrect ownership or permissions may prevent Wazuh components from accessing required files.

Symptoms include:

  • Startup failures
  • Unexpected process exits
  • Incomplete initialization
  • Core dumps during file operations

Operating System and Dependency Issues

The underlying operating system can also contribute to crashes.

Unsupported Libraries

Library mismatches may occur after:

  • Partial upgrades
  • Repository changes
  • Manual package installations

An incompatible shared library can cause immediate application crashes.

Broken Package Dependencies

Missing dependencies may prevent modules from functioning correctly.

Administrators should verify package integrity whenever crashes occur after upgrades.

Kernel-Related Compatibility Problems

Certain kernel versions occasionally expose compatibility issues with user-space applications.

According to guidance from the Linux Foundation, maintaining supported kernel and dependency combinations is an important best practice for production system stability.

When crashes begin shortly after operating system updates, dependency compatibility should be investigated immediately.


Step 1: Confirm That a Core Dump Occurred

Before investigating root causes, verify that a core dump was actually generated.

This helps distinguish true process crashes from configuration problems, graceful shutdowns, or service restarts.

Check Wazuh Service Status

Start by examining the current service state.

Using systemctl

Run:

systemctl status wazuh-manager

Look for:

  • Failed status indicators
  • Signal termination messages
  • Segmentation fault errors
  • Restart loop behavior

Recent crashes often appear directly within the status output.

Reviewing Recent Crashes

Systemd typically records termination events in the journal.

Use:

journalctl -u wazuh-manager -n 200

Look for messages containing:

Segmentation fault
Aborted
Core dumped
Killed
Signal 11
Signal 6

These entries frequently indicate that a core dump was generated.

Identify Terminated Processes

If the manager service restarted automatically, determine which component actually failed.

Examples include:

  • wazuh-analysisd
  • wazuh-remoted
  • wazuh-db
  • wazuh-modulesd
  • wazuh-authd

Knowing the affected process significantly narrows the investigation scope.

Check Kernel Messages

The Linux kernel often records crash details.

Use:

dmesg -T | grep -i "segfault"

or:

journalctl -k

Kernel logs frequently reveal:

  • Faulting addresses
  • Signal numbers
  • Memory violations
  • Crashed binaries

Review Service Logs

Examine Wazuh logs immediately before the crash.

Useful files include:

/var/ossec/logs/ossec.log

Look for:

  • Error messages
  • Module failures
  • Database issues
  • Queue overflows
  • Resource warnings

Many root causes become visible shortly before process termination.

Verify Core Dump Generation

After confirming a crash occurred, determine whether Linux created a dump file.

Using coredumpctl

On systemd-based systems:

coredumpctl list

Example output:

TIME                            PID   UID   GID SIG
Mon 2026-01-10 10:15:12 UTC    1234   0     0   11

This confirms a process generated a core dump.

You can inspect details using:

coredumpctl info

Using Core Dump Files Directly

On systems that store traditional core files:

find / -name "core*" 2>/dev/null

If files are present, a crash likely occurred and further analysis can begin.


Step 2: Locate Wazuh Core Dump Files

Once you’ve confirmed a crash occurred, locate the dump file associated with the failed process.

Common Core Dump Locations

The storage location depends on Linux distribution and core dump configuration.

systemd-coredump Storage

Modern distributions commonly use systemd-coredump.

Typical location:

/var/lib/systemd/coredump/

List available dumps:

coredumpctl list

Extract a dump if necessary:

coredumpctl dump <PID>

Traditional Core File Locations

Older systems often write core files directly to disk.

Common locations include:

/var/crash/
/tmp/

or the working directory of the crashed process.

Search for them using:

find / -type f -name "core*" 2>/dev/null

Custom Core Pattern Locations

Linux allows administrators to customize dump storage through:

cat /proc/sys/kernel/core_pattern

Example:

/var/core/core.%e.%p

This setting determines where newly generated dumps are written.

Determine Which Process Crashed

Finding a core file is only the first step.

Next, identify which Wazuh component generated it.

Mapping Dump Files to Wazuh Components

Many dump files include process information within the filename.

Examples:

core.wazuh-analysisd.12345
core.wazuh-remoted.9876
core.wazuh-db.5555

This immediately identifies the affected service.

If filenames are not descriptive, use:

coredumpctl info

or:

file core.*

to obtain executable information.

Identifying Affected Services

The most commonly crashed Wazuh processes include:

ProcessPrimary Function
analysisdEvent analysis and rule matching
remotedAgent communications
wazuh-dbInternal database operations
modulesdModule execution
authdAgent enrollment and authentication

Identifying the crashed process early dramatically reduces troubleshooting time because it allows you to focus on the subsystem most likely responsible for the failure.


Step 3: Examine Wazuh Logs Before the Crash

Once you’ve identified the crashed process and located the core dump, the next step is reviewing Wazuh logs generated immediately before the failure.

In many cases, the logs reveal the root cause without requiring deep core dump analysis.

Fatal errors, resource exhaustion warnings, database communication failures, and malformed event processing issues often appear minutes or even seconds before a process terminates.

Review Manager Logs

Wazuh components generate extensive operational logs that provide valuable context surrounding a crash.

Rather than focusing only on the exact crash timestamp, examine activity occurring several minutes beforehand.

Many failures are preceded by warning messages that progressively worsen until the process becomes unstable.

Important Log Locations

The primary log file for troubleshooting manager crashes is:

/var/ossec/logs/ossec.log

Search recent activity using:

tail -500 /var/ossec/logs/ossec.log

Or filter by component:

grep analysisd /var/ossec/logs/ossec.log
grep remoted /var/ossec/logs/ossec.log
grep wazuh-db /var/ossec/logs/ossec.log

For system-level events, also review:

journalctl -u wazuh-manager

and

journalctl -xe

These logs frequently contain information unavailable within ossec.log itself.

Identifying Fatal Errors

Start by searching for obvious failure indicators.

Examples include:

ERROR
CRITICAL
FATAL
Aborted
Segmentation fault
Out of memory
Connection refused
Database error

Useful commands:

grep -Ei "fatal|critical|error|abort" /var/ossec/logs/ossec.log

Look especially for messages occurring immediately before the service restart or crash timestamp.

Many Wazuh crashes leave a clear trail of warnings before termination.

Look for Warning Messages Leading Up to the Crash

Warnings often provide the earliest indication of instability.

Administrators frequently overlook these messages because the service may continue functioning temporarily before finally crashing.

Queue Warnings

Queue-related warnings indicate that incoming events are arriving faster than they can be processed.

Examples include:

Queue is full
Event queue saturated
Messages dropped

Large queue backlogs can contribute to memory pressure and eventual process failures.

Related reading:

Fix Wazuh Logcollector Dropped Messages

Memory Allocation Errors

Memory-related warnings should always be treated seriously.

Examples:

Cannot allocate memory
Out of memory
Allocation failed

These messages often appear before segmentation faults and process crashes.

Related reading:

Decoder Failures

Malformed or unexpected log formats can trigger decoder problems.

Examples:

Decoder error
Regex compilation failed
Invalid log format

Repeated decoder failures may indicate corrupted log sources or configuration issues.

Related reading:

How to Create Custom Detection Rules in Wazuh (With Examples)

Database Communication Errors

Database instability frequently affects multiple Wazuh components.

Watch for messages such as:

Database connection failed
Unable to communicate with wazuh-db
IPC timeout
Database unavailable

These warnings often precede crashes involving modulesd, analysisd, or wazuh-db itself.

Related reading:

Fixing wazuh-db Worker Thread Crashes

Correlate Crash Timing

Finding errors is important, but understanding their relationship to the crash is even more valuable.

Building a Timeline

Create a timeline of events leading to the crash.

Document:

TimeEvent
10:01Queue warnings begin
10:05Memory usage spikes
10:08Database timeout errors appear
10:10Process crashes
10:10Core dump generated

This approach often reveals patterns that individual log entries cannot.

Matching Logs to Crash Events

Compare timestamps from:

  • ossec.log
  • journalctl
  • dmesg
  • coredumpctl
  • monitoring systems

Your goal is to identify what changed immediately before the failure occurred.

Experienced incident responders frequently emphasize timeline reconstruction as one of the most effective methods for identifying root causes because it helps distinguish symptoms from the actual triggering event.


Step 4: Analyze the Core Dump

Once you’ve collected relevant logs, it’s time to examine the core dump itself.

Core dump analysis can reveal exactly where the process failed and which function triggered the crash.

Even if you’re not a software developer, basic analysis often provides enough information to identify known bugs, resource issues, or module-specific failures.

Install Required Debugging Tools

Several tools are required before you can inspect a dump file.

GDB

The GNU Debugger (GDB) is the most common utility used for Linux crash analysis.

Install it on Debian-based systems:

sudo apt install gdb

On RHEL-based systems:

sudo yum install gdb

GDB allows you to inspect:

  • Stack traces
  • Thread information
  • Register values
  • Memory state
  • Crashed functions

The official GNU debugger documentation provides detailed guidance on post-mortem debugging techniques.

Debug Symbol Packages

Without debugging symbols, stack traces may contain limited information.

Install:

  • Wazuh debug packages (if available)
  • Operating system debug symbols
  • Library debug packages

Symbols allow GDB to display function names and source code references rather than memory addresses.

Open the Core Dump in GDB

After installing the necessary tools, load the dump.

Loading the Dump

Example:

gdb /var/ossec/bin/wazuh-analysisd core.12345

Or using systemd:

coredumpctl gdb <PID>

GDB will load the process state captured at the time of the crash.

Generating a Backtrace

The first command most engineers run is:

bt

or:

thread apply all bt

This generates a backtrace showing the sequence of function calls that led to the crash.

Example:

#0 process_event()
#1 decode_log()
#2 rule_matching()
#3 main()

The backtrace is often the single most valuable artifact produced during troubleshooting.

Understanding Backtrace Output

A backtrace may look intimidating at first, but several patterns are easy to recognize.

Function Call Stacks

The stack trace shows which functions were executing when the process failed.

Repeated function names may indicate:

  • Infinite recursion
  • Looping logic
  • Stack exhaustion

These patterns frequently point directly to software defects.

Segmentation Faults

A segmentation fault (SIGSEGV) occurs when a process attempts to access memory it does not own.

Example:

Program terminated with signal SIGSEGV
Segmentation fault

This is one of the most common causes of Wazuh core dumps.

Abort Signals

Abort signals typically appear as:

Program terminated with signal SIGABRT

These crashes often occur when internal safety checks detect invalid program states.

Memory Access Violations

Memory corruption indicators may include:

  • Invalid pointers
  • Null pointer dereferences
  • Buffer overflows
  • Corrupted heap structures

When these patterns appear, a software defect is often involved.

According to guidance from the GNU Project and major Linux distribution maintainers, stack traces and signal information are usually the most important artifacts for diagnosing application crashes.

Collect Information for Vendor Support

If the root cause is not immediately obvious, gather information before opening a support case or GitHub issue.

Backtrace Output

Save:

bt
thread apply all bt

outputs to a text file.

These are typically the first artifacts requested by support engineers.

System Details

Collect:

uname -a
cat /etc/os-release

This helps identify operating system compatibility issues.

Version Information

Document:

/var/ossec/bin/wazuh-control info

or:

rpm -qa | grep wazuh

or:

dpkg -l | grep wazuh

Include:

  • Wazuh version
  • Operating system version
  • Kernel version
  • Installed integrations
  • Deployment architecture

Providing complete diagnostic information significantly accelerates vendor troubleshooting.


Step 5: Verify System Resource Health

A surprisingly large percentage of Wazuh crashes are caused by resource exhaustion rather than software defects.

Before assuming a bug exists, verify that the underlying system has sufficient resources to support the workload.

Check Available Memory

Memory shortages are among the most common causes of instability.

Physical RAM

Review memory utilization:

free -h

Look for:

  • Very low available memory
  • Consistently high utilization
  • Frequent memory pressure events

Memory consumption approaching system limits should be investigated immediately.

Swap Usage

Check swap activity:

swapon --show

and

free -h

Heavy swap usage often indicates insufficient physical memory.

Systems relying extensively on swap frequently experience:

  • Increased latency
  • Slower event processing
  • Process instability
  • Unexpected crashes

Monitor CPU Utilization

CPU saturation can create cascading failures throughout the manager.

Sustained High CPU Usage

Monitor system load:

top

or:

htop

Look for:

  • CPU usage consistently above 80–90%
  • Load averages exceeding CPU core counts
  • Analysisd consuming excessive resources

Process-Level Analysis

Identify which processes are consuming resources:

ps aux --sort=-%cpu | head

and:

ps aux --sort=-%mem | head

This helps determine whether the crash is linked to a specific component.

Verify Disk Capacity

Storage exhaustion can destabilize Wazuh and its supporting services.

Filesystem Usage

Check available space:

df -h

Pay special attention to:

  • /
  • /var
  • /var/ossec
  • OpenSearch data volumes

Full filesystems commonly trigger service failures.

Related reading:

How to Fix a Yellow Cluster Status in Wazuh Indexer

Inode Availability

A filesystem can run out of inodes even when free space remains.

Check inode consumption:

df -i

Low inode availability may prevent new files from being created.

Inspect File Descriptor Limits

Wazuh managers handling thousands of agents may encounter file descriptor limitations.

Current Limits

View current limits:

ulimit -n

Review system-wide settings:

cat /proc/sys/fs/file-max

Low limits can cause:

  • Connection failures
  • Queue problems
  • Service instability
  • Unexpected process exits

Increasing Limits When Necessary

If limits are too restrictive, adjust:

/etc/security/limits.conf

Example:

wazuh soft nofile 65535
wazuh hard nofile 65535

After increasing limits, restart affected services and continue monitoring.

Resource validation is a critical troubleshooting step because many crashes that initially appear to be software bugs ultimately turn out to be memory shortages, CPU saturation, disk exhaustion, or operating system limitations.


Step 6: Validate Wazuh Configuration

Configuration problems are a common source of Wazuh Manager instability, especially in environments with extensive customization.

Custom rules, decoders, integrations, and manual configuration changes can introduce unexpected behavior that eventually leads to process crashes.

If core dumps began appearing after a configuration change, validation should be one of your highest-priority troubleshooting steps.

Check Manager Configuration Syntax

Before investigating more complex causes, verify that the manager configuration is syntactically correct.

Even small formatting mistakes can create instability or prevent components from operating properly.

Validate ossec.conf

The primary Wazuh Manager configuration file is:

/var/ossec/etc/ossec.conf

Inspect the file for:

  • Missing XML tags
  • Invalid nesting
  • Duplicate configuration blocks
  • Typographical errors
  • Unsupported options

Wazuh logs often reveal configuration-related errors during startup.

Review:

cat /var/ossec/logs/ossec.log

immediately after restarting the service.

Related reading:

How to Fix ossec.conf Syntax Errors in Wazuh Agents

Identify Recent Changes

One of the fastest ways to locate a root cause is determining what changed before the crashes began.

Ask questions such as:

  • Were new rules recently added?
  • Was Wazuh upgraded?
  • Was an integration deployed?
  • Were decoder changes introduced?
  • Were manager settings modified?

Many incidents can be traced directly to a recent configuration change.

If version control is available, compare current and previous configurations.

Review Custom Rules

Custom detection rules are powerful but can introduce processing problems when improperly designed.

Detect Faulty Rule Logic

Review recently added rules for:

  • Invalid syntax
  • Unsupported fields
  • Excessive inheritance
  • Circular dependencies
  • Inefficient matching logic

Examples of problematic patterns include:

<if_sid>100001</if_sid>

referencing rules that do not exist or recursive rule chains that repeatedly trigger each other.

These issues can dramatically increase processing overhead and occasionally expose edge-case software defects.

Test Rule Changes Safely

Never deploy major rule changes directly to production without validation.

Use:

/var/ossec/bin/wazuh-logtest

to verify behavior before rollout.

This tool allows administrators to:

  • Test rule matching
  • Validate syntax
  • Verify decoder interactions
  • Identify unexpected behavior

Review Custom Decoders

Custom decoders are another frequent source of instability.

Decoder errors may not appear immediately and can remain hidden until a specific log format is processed.

Decoder Validation

Inspect custom decoders for:

  • Invalid regular expressions
  • Incorrect field mappings
  • Missing parent decoders
  • Unsupported XML elements

Validate decoder behavior using representative log samples before deployment.

Common Decoder Mistakes

The most common issues include:

  • Overly complex regex patterns
  • Greedy matching expressions
  • Invalid capture groups
  • Decoder inheritance errors
  • Assumptions about log structure

For example, a decoder may work perfectly with expected logs but fail when encountering malformed or unexpected input.

These edge cases can trigger excessive resource consumption or process instability under heavy workloads.


Step 7: Investigate Database and Module Failures

Many Wazuh Manager crashes originate from internal modules rather than the manager framework itself.

Database communication problems, module failures, and subsystem-specific defects can all produce core dumps.

The goal of this step is identifying whether a particular component is consistently involved in the crash.

Check wazuh-db Health

The wazuh-db service is one of the most important components within the Wazuh architecture.

Many manager functions rely on it for configuration management, agent information, and internal data operations.

Database Errors

Review logs for messages such as:

Database error
Database unavailable
Query failed
Database timeout

Search logs using:

grep -i database /var/ossec/logs/ossec.log

Repeated database errors often indicate corruption, communication failures, or resource exhaustion.

Communication Failures

Wazuh components communicate extensively with wazuh-db.

Common warning messages include:

Unable to communicate with wazuh-db
IPC timeout
Socket communication error
Connection lost

When communication breaks down, dependent processes may become unstable and eventually crash.

Related reading:

Fixing wazuh-db Worker Thread Crashes

Review Wazuh Modules

Several Wazuh modules perform specialized functions and may crash independently under certain conditions.

Examine logs for module-specific errors.

Vulnerability Detection

The vulnerability detection module processes package inventory information and vulnerability feeds.

Potential issues include:

  • Corrupted vulnerability databases
  • Feed synchronization failures
  • Excessive resource consumption
  • Version compatibility problems

Related reading:

Wazuh Vulnerability Detection Not Working? Here’s How to Fix It

Syscollector

Syscollector gathers inventory information from monitored endpoints.

Problems may occur when:

  • Agents send unexpected inventory data
  • Large environments generate excessive inventory updates
  • Resource limits are reached

Review Syscollector-related log entries surrounding the crash.

FIM

File Integrity Monitoring (FIM) can generate significant processing workloads.

Potential crash contributors include:

  • Monitoring extremely large directories
  • Excessive file changes
  • Aggressive scan schedules
  • Resource exhaustion

Related reading:

SCA

Security Configuration Assessment (SCA) scans can place additional load on manager resources.

Review:

  • Scan frequency
  • Policy complexity
  • Concurrent scan activity

Large-scale SCA deployments occasionally expose scalability issues.

Identify Module-Specific Crashes

The objective is determining whether crashes consistently occur within the same subsystem.

Look for patterns such as:

  • Every crash involving modulesd
  • Every crash occurring during vulnerability scans
  • Every crash occurring after FIM activity
  • Every crash occurring during agent enrollment

Consistent patterns usually indicate a module-specific problem.

Isolating Problematic Components

If evidence points toward a specific module, isolate it for testing.

For example:

  1. Disable the suspected module.
  2. Restart Wazuh.
  3. Monitor system stability.
  4. Compare behavior before and after the change.

This controlled approach often confirms the root cause quickly.

Temporary Module Disablement for Testing

Disabling a module temporarily can help determine whether it is responsible for the crashes.

Examples include:

  • Vulnerability Detection
  • SCA
  • Syscollector
  • Third-party integrations

Do not leave critical security features disabled permanently, but temporary testing can provide valuable diagnostic information.

Document every change so that configurations can be restored after troubleshooting.


Step 8: Determine Whether the Crash Is a Known Bug

Not every core dump is caused by local configuration problems or infrastructure issues.

Sometimes the crash is the result of a documented software defect that has already been identified and fixed by the Wazuh development team.

Before investing excessive time in deep debugging, verify whether the issue is already known.

Verify Installed Wazuh Version

Begin by identifying the exact version running in your environment.

Examples:

rpm -qa | grep wazuh

or:

dpkg -l | grep wazuh

Document:

  • Manager version
  • Agent versions
  • Dashboard version
  • Indexer version

Version mismatches can sometimes contribute to instability.

Review Release Notes

Wazuh release notes frequently contain bug fixes addressing:

  • Memory leaks
  • Segmentation faults
  • Database crashes
  • Module instability
  • Integration failures

Pay particular attention to fixes involving the component identified in your backtrace.

For example, if analysisd generated the core dump, search release notes for analysisd-related fixes.

Search Known Issues

The next step is reviewing publicly reported bugs.

Search using:

  • Error messages
  • Backtrace functions
  • Signal names
  • Module names
  • Version numbers

You may discover that other administrators have already encountered the same issue.

Compare Crash Signatures

Core dump analysis becomes especially valuable when comparing crash signatures against known defects.

Matching Stack Traces

If your backtrace contains functions such as:

process_event()
decode_event()
db_query()

search those function names together with your Wazuh version.

Matching stack traces are often strong evidence that you’re encountering an existing bug.

Many software vendors use crash signatures as the primary method for categorizing and resolving defects.

Existing Bug Reports

Review issue reports for:

  • Similar stack traces
  • Similar workloads
  • Similar deployment architectures
  • Matching error messages

Pay attention to comments from maintainers because they often contain workarounds or temporary mitigations.

Fixed Versions

If a bug has already been fixed, upgrading may be the fastest resolution.

Before upgrading:

  1. Verify the bug matches your symptoms.
  2. Review release notes carefully.
  3. Confirm upgrade compatibility.
  4. Test in a non-production environment whenever possible.

Related reading:

How to Upgrade a Wazuh Agent

Many organizations spend days troubleshooting issues that have already been resolved in newer releases.

Checking known bugs early in the investigation process can save significant time and effort while reducing future stability risks.


Step 9: Apply Corrective Actions

After identifying the likely root cause of the crash, the next step is implementing corrective actions that permanently eliminate the issue.

Avoid the temptation to simply restart the manager and move on.

A successful troubleshooting effort should not only restore service but also reduce the likelihood of future crashes.

Upgrade to a Stable Release

If your investigation points to a known software defect, upgrading to a newer stable release is often the most effective solution.

Many Wazuh Manager core dumps are eventually traced back to bugs that have already been fixed by the development team.

Before upgrading:

  • Review release notes
  • Verify compatibility requirements
  • Back up critical configurations
  • Test upgrades in a staging environment
  • Validate custom rules and integrations

Pay particular attention to fixes involving:

  • analysisd crashes
  • memory leaks
  • database communication failures
  • module instability
  • agent communication issues

Fix Resource Bottlenecks

Resource exhaustion is one of the most common causes of manager instability.

If memory, CPU, disk, or file descriptor limitations contributed to the crash, address them before returning the system to production.

Common corrective actions include:

  • Increasing available RAM
  • Expanding swap space
  • Adding CPU resources
  • Increasing file descriptor limits
  • Expanding storage capacity
  • Reducing event ingestion rates

Organizations that proactively address infrastructure bottlenecks often eliminate recurring crash cycles without making any application-level changes.

Correct Configuration Errors

Configuration issues should be corrected immediately once identified.

Examples include:

  • Invalid XML syntax
  • Incorrect module settings
  • Broken integrations
  • Unsupported configuration options
  • Misconfigured cluster settings

After applying corrections:

  1. Validate the configuration.
  2. Restart affected services.
  3. Review startup logs.
  4. Monitor for recurring errors.

Repair Corrupted Files

Corrupted files frequently contribute to unexpected process failures.

Files that may require repair or replacement include:

  • Internal databases
  • Queue files
  • Configuration files
  • Index data
  • Integration artifacts

Potential indicators of corruption include:

  • Unexpected parsing failures
  • Repeated database errors
  • Invalid file format messages
  • Consistent crashes during startup

When corruption is suspected, restore affected files from a known-good backup whenever possible.

Remove Faulty Customizations

Customizations often introduce instability, especially after upgrades.

Examples include:

  • Custom scripts
  • Third-party integrations
  • Custom decoders
  • Custom rules
  • Modified startup procedures

Temporarily remove nonessential customizations and observe system behavior.

If crashes stop occurring, reintroduce customizations individually until the problematic component is identified.

Tune Event Processing Workloads

Large environments frequently overwhelm Wazuh through sheer event volume.

Potential tuning strategies include:

  • Filtering unnecessary logs
  • Reducing noisy event sources
  • Optimizing custom rules
  • Limiting excessive FIM activity
  • Adjusting scan schedules
  • Increasing processing capacity

Related reading:

The goal is ensuring that incoming workloads remain within the capacity of the manager infrastructure.


Preventing Future Wazuh Manager Core Dumps

While troubleshooting is important, prevention is even more valuable.

Organizations that implement proactive monitoring and maintenance practices experience significantly fewer stability incidents than those operating reactively.

The following best practices can dramatically reduce the likelihood of future core dumps.

Keep Wazuh Updated

Running outdated software increases exposure to:

  • Known bugs
  • Memory leaks
  • Stability defects
  • Security vulnerabilities
  • Compatibility issues

Establish a process for:

  • Reviewing release notes
  • Evaluating new versions
  • Testing upgrades
  • Deploying approved updates

According to guidance from the Wazuh project, staying current with supported releases is one of the most effective ways to maintain platform stability.

Monitor Resource Consumption Proactively

Resource-related crashes rarely occur without warning.

Monitor key metrics such as:

  • Memory utilization
  • CPU usage
  • Queue depth
  • Disk capacity
  • File descriptor usage
  • Process restart frequency

Alerting on abnormal trends allows administrators to intervene before instability develops.

Validate Configuration Changes Before Deployment

Every configuration change carries risk.

Before deploying modifications:

  • Review syntax carefully
  • Validate XML structures
  • Test integrations
  • Verify dependencies
  • Document changes

A formal change validation process can eliminate many avoidable outages.

Test Custom Rules and Decoders in Staging

Custom content should never be deployed directly to production without testing.

A staging environment allows administrators to verify:

  • Rule behavior
  • Decoder accuracy
  • Performance impact
  • Compatibility with existing configurations

Many production incidents originate from untested customizations rather than defects in Wazuh itself.

Implement Log and Performance Monitoring

Effective monitoring provides early warning signs before crashes occur.

Track:

  • Service restarts
  • Error messages
  • Queue growth
  • Database communication failures
  • Memory allocation warnings
  • Agent connectivity issues

Monitoring platforms should generate alerts whenever abnormal behavior is detected.

As noted by observability experts at the OpenTelemetry project, early detection of abnormal system behavior is critical for maintaining application reliability.

Establish Routine Health Checks

Periodic health reviews help identify hidden issues before they become critical.

A routine health check may include:

  • Reviewing logs
  • Verifying module status
  • Checking disk utilization
  • Examining memory trends
  • Confirming agent connectivity
  • Reviewing cluster health

Organizations that conduct regular health assessments often discover developing problems long before they cause outages.

Maintain Sufficient System Capacity

As deployments grow, infrastructure requirements increase.

Many Wazuh environments remain stable for months before suddenly experiencing crashes due to capacity constraints.

Review capacity regularly and plan for:

  • Additional agents
  • Higher event volumes
  • New integrations
  • Increased retention periods
  • Expanded security monitoring requirements

Maintaining adequate headroom helps prevent resource-related failures and improves overall reliability.


When to Escalate to Wazuh Support

Some crashes cannot be fully diagnosed internally.

If the root cause remains unclear after completing the troubleshooting process, escalation may be necessary.

Providing complete diagnostic information significantly improves the chances of a fast resolution.

Information to Collect Before Opening a Case

Support engineers can only work with the information provided.

Gather as much evidence as possible before opening a ticket or submitting a bug report.

Core Dump Files

Collect:

  • Original core dump files
  • coredumpctl output
  • Crash timestamps
  • Associated process names

These files often contain the most valuable diagnostic data.

Backtraces

Generate and save:

bt
thread apply all bt

outputs from GDB.

Backtraces are frequently the first artifact requested by developers.

Wazuh Logs

Include:

/var/ossec/logs/ossec.log

particularly entries immediately preceding the crash.

Capture:

  • Error messages
  • Warning messages
  • Service restart events
  • Database communication failures

Version Information

Document:

  • Wazuh Manager version
  • Wazuh Agent versions
  • Dashboard version
  • Indexer version
  • Operating system version
  • Kernel version

Version details often help identify known bugs quickly.

System Specifications

Provide:

  • CPU count
  • Available RAM
  • Storage configuration
  • Number of agents
  • Daily event volume
  • Cluster architecture

Environmental information helps support engineers reproduce conditions associated with the crash.

Creating a Useful Support Request

A well-prepared support request can reduce troubleshooting time from days to hours.

Diagnostic Information Checklist

Before submitting a case, ensure you have collected:

  • Core dump files
  • Stack traces
  • Wazuh logs
  • System logs
  • Version information
  • Resource utilization data
  • Configuration changes made before the crash
  • Relevant screenshots or error messages

The more evidence provided, the faster engineers can isolate the root cause.

Reproduction Details

One of the most valuable pieces of information is whether the crash can be reproduced consistently.

Document:

  • What happened before the crash
  • Which component failed
  • How frequently it occurs
  • Whether specific logs trigger the failure
  • Whether certain integrations are involved
  • Whether the crash appeared after an upgrade or configuration change

Providing clear reproduction steps dramatically increases the likelihood that developers can identify and fix the underlying problem.

By combining detailed diagnostics, core dump analysis, resource validation, configuration reviews, and proactive monitoring, most Wazuh Manager core dumps can be resolved systematically.

The key is treating every core dump as an opportunity to identify and eliminate the underlying cause rather than simply restoring service and waiting for the next crash.


Frequently Asked Questions (FAQ)

 

Question: What causes Wazuh Manager core dumps?

Wazuh Manager core dumps can be triggered by a wide range of issues, including:

  • Memory exhaustion
  • Memory leaks
  • Software defects
  • Corrupted log data
  • Database communication failures
  • Faulty custom rules or decoders
  • Third-party integration problems
  • Disk and filesystem issues
  • Operating system dependency conflicts

The core dump itself is not the root cause. It is evidence that a process terminated unexpectedly.

Identifying the underlying trigger requires reviewing logs, analyzing the dump, and examining system health.

Question: Where are Wazuh core dump files stored?

The location depends on your Linux distribution and core dump configuration.

Common locations include:

/var/lib/systemd/coredump/
/var/crash/
/tmp/

Some systems use custom storage paths defined by:

cat /proc/sys/kernel/core_pattern

If you’re unsure where dumps are being stored, use:

find / -name "core*" 2>/dev/null

or:

coredumpctl list

to locate them.

Question: How do I know which Wazuh process crashed?

Several methods can help identify the affected process.

Start by reviewing:

coredumpctl info

You can also inspect:

  • systemd logs
  • kernel logs
  • ossec.log
  • core dump filenames

Common Wazuh processes that generate core dumps include:

  • wazuh-analysisd
  • wazuh-remoted
  • wazuh-db
  • wazuh-modulesd
  • wazuh-authd

Identifying the crashed process is one of the most important steps because it narrows the investigation to a specific subsystem.

Question: Can a core dump cause data loss?

A core dump itself does not cause data loss.

However, the crash that generated the dump can interrupt:

  • Event processing
  • Alert generation
  • Agent communication
  • Database operations
  • Log collection

Depending on the duration of the outage, some security events may be delayed, missed, or lost entirely.

This is why recurring crashes should be treated as high-priority operational incidents.

Question: How do I analyze a Wazuh core dump using GDB?

Install GDB and load the dump file:

gdb /var/ossec/bin/wazuh-analysisd core.12345

For systemd-managed dumps:

coredumpctl gdb <PID>

After loading the dump, generate a stack trace using:

bt

or:

thread apply all bt

The resulting backtrace shows the function calls that occurred before the crash and is often the most valuable artifact during troubleshooting.

Question: Are core dumps always caused by software bugs?

No.

While software defects can certainly generate core dumps, many crashes are caused by environmental problems such as:

  • Insufficient memory
  • High CPU utilization
  • Full disks
  • Corrupted databases
  • Invalid configurations
  • Third-party integrations
  • Dependency conflicts

In production environments, resource-related issues are often just as common as software bugs.

Question: Should I delete core dump files after analysis?

Yes, in most cases.

Core dump files can consume significant disk space, especially when large processes crash.

However, do not delete them until:

  1. Analysis has been completed.
  2. Backtraces have been collected.
  3. Required support artifacts have been archived.
  4. Any support cases have been opened.

Once the necessary information has been preserved, old dumps can usually be removed safely.

Question: Can insufficient memory trigger Wazuh Manager crashes?

Absolutely.

Memory shortages are one of the most common causes of Wazuh instability.

When available memory becomes limited, Wazuh components may experience:

  • Allocation failures
  • Queue growth
  • Performance degradation
  • Process termination
  • Core dump generation

Administrators should regularly monitor:

free -h

and overall memory consumption trends to identify problems before they cause outages.

Question: How can I prevent recurring core dumps?

The most effective prevention strategies include:

  • Keeping Wazuh updated
  • Monitoring resource utilization
  • Testing configuration changes before deployment
  • Validating custom rules and decoders
  • Reviewing logs regularly
  • Performing routine health checks
  • Maintaining adequate system capacity

Proactive maintenance is significantly more effective than reacting to crashes after they occur.

Question: When should I contact Wazuh support?

Consider contacting support or opening a bug report when:

  • The root cause remains unclear after troubleshooting
  • Crashes continue after corrective actions
  • The backtrace points to a possible software defect
  • Multiple manager components are crashing
  • Core dumps appear immediately after upgrades
  • You suspect a previously unknown bug

Before escalating, gather:

  • Core dump files
  • GDB backtraces
  • Wazuh logs
  • System logs
  • Version information
  • System specifications

Providing complete diagnostic information dramatically improves the chances of a quick resolution.


Conclusion

Wazuh Manager core dumps are among the most serious indicators of instability within a Wazuh deployment.

While it may be tempting to simply restart the affected service and move on, doing so often leaves the underlying problem unresolved and increases the likelihood of future outages.

A systematic troubleshooting approach is far more effective.

The workflow outlined in this guide begins by confirming that a core dump actually occurred, locating the associated dump files, reviewing logs leading up to the crash, analyzing the dump with GDB, validating system resources, checking configurations, investigating database and module failures, and determining whether the issue matches a known software defect.

Throughout the investigation, the primary objective should be identifying the root cause rather than treating the symptoms.

Whether the issue stems from memory exhaustion, malformed log data, corrupted databases, faulty integrations, configuration errors, or a software bug, understanding why the process crashed is the key to preventing it from happening again.

Long-term stability depends on strong operational practices, including:

  • Keeping Wazuh updated with supported releases
  • Monitoring memory, CPU, disk, and queue utilization
  • Testing custom rules and decoders before deployment
  • Validating configuration changes carefully
  • Performing routine health checks
  • Maintaining sufficient infrastructure capacity as deployments grow

By combining proactive monitoring, disciplined change management, and thorough root-cause analysis, administrators can significantly reduce the frequency of Wazuh Manager crashes and maintain a more reliable, resilient, and effective security monitoring platform.

Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *