Serverhacks: Diagnosing and Resolving Linux Out-Of-Memory (OOM) Issues - Knowledgebase

Welcome back to Serverhacks—a collection of tips, tricks, and troubleshooting guides for servers, networking, and system administration. I’m Corels from Emmanuel Corels Creatives, and today we’re tackling one of the most critical challenges for system administrators: Linux Out-Of-Memory (OOM) issues. When a server runs out of memory, processes can be killed abruptly, services may become unresponsive, and overall system stability can be compromised. In this guide, we’ll walk through a systematic approach to diagnose OOM issues, identify their causes, and implement solutions to stabilize your system.

Understanding OOM Issues

When your Linux server exhausts its available memory, the kernel’s OOM killer is triggered. This mechanism terminates processes to free up memory, often targeting processes consuming high resources. Common causes include:

Memory leaks in applications
Misconfigured services consuming excessive cache
Insufficient swap space
Unexpected load spikes

A methodical approach helps you pinpoint the root cause and address the underlying problem before it leads to system instability.

Step 1: Identify OOM Events in Logs

Start by confirming that your system has encountered OOM issues.

Check Kernel Logs:
```
sudo dmesg | grep -i -E 'oom|killed process'
```
This command searches for OOM-related messages in the kernel ring buffer. Look for lines indicating that a process was killed due to memory exhaustion.
Review System Logs:
```
sudo journalctl -k | grep -i oom
```
This provides detailed kernel log entries related to OOM events. Identifying the exact time and affected processes can help narrow down the culprit.

Step 2: Monitor Memory Usage

Understanding your system’s memory consumption is crucial.

Real-Time Monitoring with top or htop:
```
top -o %MEM
```
or install and run:
```
htop
```
These tools show you which processes are consuming the most memory. Look for any runaway processes or unexpected spikes.
Check Free Memory and Swap:
```
free -m
```
This command displays memory and swap usage in megabytes. Note the values for total, used, free, and available memory. Low free memory and high swap usage are red flags.

Step 3: Analyze Process Memory Consumption

Drill down into specific processes to see if they are leaking memory or consuming more than expected.

Examine Detailed Process Information:
```
ps aux --sort=-%mem | head -n 10
```
This command lists the top 10 memory-consuming processes. Investigate any processes that seem abnormally high.
Use pmap for Process Memory Maps: For a specific process ID (PID), run:
```
sudo pmap -x <PID> | tail -n 1
```
This shows the total memory used by the process. Consistent growth over time can indicate a memory leak.

Step 4: Check and Configure Swap Space

Swap space acts as an overflow for RAM. Insufficient swap can lead to OOM conditions.

View Swap Usage:
```
free -m
```
Check if swap is being used excessively or if it’s nearly full.
Add or Increase Swap: If needed, create a swap file:
```
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
```
Then add it to /etc/fstab to make it persistent:
```
/swapfile none swap sw 0 0
```
Adjust the swap size based on your server’s needs.

Step 5: Optimize Application and Service Configurations

Sometimes, tuning application settings can prevent OOM conditions.

Configure PHP-FPM Memory Limits (if applicable): In your php.ini, set an appropriate memory limit:
```
memory_limit = 256M
```
Restart PHP-FPM:
```
sudo systemctl restart php7.4-fpm
```
Optimize Java Applications: For Java-based applications, adjust the JVM heap size using -Xms and -Xmx flags.
Tune Caching Services: If you use caching mechanisms like Redis or memcached, ensure their memory limits are configured appropriately to avoid consuming all available memory.

Step 6: Implement Resource Limits and Monitoring

To prevent runaway processes, you can set resource limits and employ proactive monitoring.

Set ulimit for Processes: Edit /etc/security/limits.conf to limit the maximum memory or number of processes for users:
```
* soft rss 1048576
* hard rss 2097152
```
This limits the resident set size (in KB) for processes.

Automate Monitoring with Scripts: Create a script that logs memory usage and alerts you when thresholds are exceeded. For example:

#!/bin/bash
MEM_USAGE=$(free -m | awk '/^Mem:/{print $3}')
THRESHOLD=800
if [ "$MEM_USAGE" -gt "$THRESHOLD" ]; then
    echo "Warning: High memory usage detected: ${MEM_USAGE}MB" | mail -s "Memory Alert" admin@yourdomain.com
fi

Schedule this script with cron:

crontab -e

Add:

*/5 * * * * /path/to/memory_check.sh

Final Thoughts

Diagnosing and resolving Linux OOM issues involves a comprehensive approach—reviewing logs, monitoring memory usage, analyzing process behavior, checking swap configuration, and optimizing application settings. By following these steps, you can identify the root causes of memory exhaustion and implement effective measures to keep your server stable.

Take your time to test each diagnostic step and adjust configurations based on your server’s specific workload. If you have any questions or need further assistance, feel free to reach out. Happy troubleshooting, and here’s to a smoothly running server environment!

Explained with clarity by
Corels – Admin, Emmanuel Corels Creatives

Categories

Categories

Tag Cloud

Support

Serverhacks: Diagnosing and Resolving Linux Out-Of-Memory (OOM) Issues Print

Understanding OOM Issues

Step 1: Identify OOM Events in Logs

Step 2: Monitor Memory Usage

Step 3: Analyze Process Memory Consumption

Step 4: Check and Configure Swap Space

Step 5: Optimize Application and Service Configurations

Step 6: Implement Resource Limits and Monitoring

Final Thoughts

Does this help?

Related Resources

Tag Cloud

Support

Categories

Categories

Tag Cloud

Support

Serverhacks: Diagnosing and Resolving Linux Out-Of-Memory (OOM) Issues Print

Understanding OOM Issues

Step 1: Identify OOM Events in Logs

Step 2: Monitor Memory Usage

Step 3: Analyze Process Memory Consumption

Step 4: Check and Configure Swap Space

Step 5: Optimize Application and Service Configurations

Step 6: Implement Resource Limits and Monitoring

Final Thoughts

Does this help?

Related Resources

Tag Cloud

Support

Generate Password