Serverhacks: Diagnosing and Resolving Linux Out-Of-Memory (OOM) Issues Print

  • Servers
  • 20

Welcome back to Serverhacks—a collection of tips, tricks, and troubleshooting guides for servers, networking, and system administration. I’m Corels from Emmanuel Corels Creatives, and today we’re tackling one of the most critical challenges for system administrators: Linux Out-Of-Memory (OOM) issues. When a server runs out of memory, processes can be killed abruptly, services may become unresponsive, and overall system stability can be compromised. In this guide, we’ll walk through a systematic approach to diagnose OOM issues, identify their causes, and implement solutions to stabilize your system.


Understanding OOM Issues

When your Linux server exhausts its available memory, the kernel’s OOM killer is triggered. This mechanism terminates processes to free up memory, often targeting processes consuming high resources. Common causes include:

  • Memory leaks in applications
  • Misconfigured services consuming excessive cache
  • Insufficient swap space
  • Unexpected load spikes

A methodical approach helps you pinpoint the root cause and address the underlying problem before it leads to system instability.


Step 1: Identify OOM Events in Logs

Start by confirming that your system has encountered OOM issues.

  • Check Kernel Logs:

    sudo dmesg | grep -i -E 'oom|killed process'
    

    This command searches for OOM-related messages in the kernel ring buffer. Look for lines indicating that a process was killed due to memory exhaustion.

  • Review System Logs:

    sudo journalctl -k | grep -i oom
    

    This provides detailed kernel log entries related to OOM events. Identifying the exact time and affected processes can help narrow down the culprit.


Step 2: Monitor Memory Usage

Understanding your system’s memory consumption is crucial.

  • Real-Time Monitoring with top or htop:

    top -o %MEM
    

    or install and run:

    htop
    

    These tools show you which processes are consuming the most memory. Look for any runaway processes or unexpected spikes.

  • Check Free Memory and Swap:

    free -m
    

    This command displays memory and swap usage in megabytes. Note the values for total, used, free, and available memory. Low free memory and high swap usage are red flags.


Step 3: Analyze Process Memory Consumption

Drill down into specific processes to see if they are leaking memory or consuming more than expected.

  • Examine Detailed Process Information:

    ps aux --sort=-%mem | head -n 10
    

    This command lists the top 10 memory-consuming processes. Investigate any processes that seem abnormally high.

  • Use pmap for Process Memory Maps: For a specific process ID (PID), run:

    sudo pmap -x <PID> | tail -n 1
    

    This shows the total memory used by the process. Consistent growth over time can indicate a memory leak.


Step 4: Check and Configure Swap Space

Swap space acts as an overflow for RAM. Insufficient swap can lead to OOM conditions.

  • View Swap Usage:

    free -m
    

    Check if swap is being used excessively or if it’s nearly full.

  • Add or Increase Swap: If needed, create a swap file:

    sudo fallocate -l 2G /swapfile
    sudo chmod 600 /swapfile
    sudo mkswap /swapfile
    sudo swapon /swapfile
    

    Then add it to /etc/fstab to make it persistent:

    /swapfile none swap sw 0 0
    

    Adjust the swap size based on your server’s needs.


Step 5: Optimize Application and Service Configurations

Sometimes, tuning application settings can prevent OOM conditions.

  • Configure PHP-FPM Memory Limits (if applicable): In your php.ini, set an appropriate memory limit:

    memory_limit = 256M
    

    Restart PHP-FPM:

    sudo systemctl restart php7.4-fpm
    
  • Optimize Java Applications: For Java-based applications, adjust the JVM heap size using -Xms and -Xmx flags.

  • Tune Caching Services: If you use caching mechanisms like Redis or memcached, ensure their memory limits are configured appropriately to avoid consuming all available memory.


Step 6: Implement Resource Limits and Monitoring

To prevent runaway processes, you can set resource limits and employ proactive monitoring.

  • Set ulimit for Processes: Edit /etc/security/limits.conf to limit the maximum memory or number of processes for users:

    * soft rss 1048576
    * hard rss 2097152
    

    This limits the resident set size (in KB) for processes.

  • Automate Monitoring with Scripts: Create a script that logs memory usage and alerts you when thresholds are exceeded. For example:

    #!/bin/bash
    MEM_USAGE=$(free -m | awk '/^Mem:/{print $3}')
    THRESHOLD=800
    if [ "$MEM_USAGE" -gt "$THRESHOLD" ]; then
        echo "Warning: High memory usage detected: ${MEM_USAGE}MB" | mail -s "Memory Alert" admin@yourdomain.com
    fi
    

    Schedule this script with cron:

    crontab -e
    

    Add:

    */5 * * * * /path/to/memory_check.sh
    

Final Thoughts

Diagnosing and resolving Linux OOM issues involves a comprehensive approach—reviewing logs, monitoring memory usage, analyzing process behavior, checking swap configuration, and optimizing application settings. By following these steps, you can identify the root causes of memory exhaustion and implement effective measures to keep your server stable.

Take your time to test each diagnostic step and adjust configurations based on your server’s specific workload. If you have any questions or need further assistance, feel free to reach out. Happy troubleshooting, and here’s to a smoothly running server environment!


Explained with clarity by
Corels – Admin, Emmanuel Corels Creatives


Does this help?

« Back