Out of memory ошибка ubuntu

I had the exact problem. I bough a new MSI laptop and installed 64GB of ram and I got «out of memory» when I selected the first item on the ubuntu boot menu.

I tried countless combinations of failures including what was describe in other posts on this site and other sites.

I determined that the problem is related to the boot program being a 32 bit program which allows a total of 4GB of memory space for the boot program to use. My laptop as an i7-12800hx processor with integrated graphics and an nvidia-3070ti.

I tried the same install usb drive on a PC with no integrated video card on the CPU and it worked fine. The PC it work on had a thread-ripper CPU with no integrated graphics and a geforce GTX 1080 pci-express graphics card.

To get the laptop to boot and install ubuntu 22.04.1, I had to go into my bios (right ctrl-shift, left alt-f2 to get to the advanced bios) and switch primary display to the PEG (PCI Express Graphics (nvidia rtx 3070ti)) and disable anything to do with the intel integrated graphics on the chip. Then I had to hook up a monitor with a lower resolution (1680×1050 ) (through a thunderbolt docking station) to get it to work. I tried with a 4K monitor but it did not sync up properly. Doing all of this prevented the mapping of shared memory to the graphics controller into the lower 4GB of ram.

After I did all of that, the installation worked. I could boot from the USB boot menu and install ubuntu 22.04. After the installation was complete, I had to switch my graphics back to normal in the bios.

This fix worked great for me. The installation was basically flawless and it did not involve creating new ramfs file systems or modifying grub. No software changes at all. It was simply disabling the integrated graphics in my bios for the install.

By the way, after about 30 failed attempts, I went and installed fedora 36 to see if it was related to ubuntu. Fedora installed fine the first time. I just could not get the drivers that I needed to use nvenc in ffmpeg and still have the system boot after loading drivers. It failed so I went back to trying to find the solution using ubuntu.

Also, I did leave the secure-boot disabled from one of my many earlier attempts. You will probably need to disable secure-boot functionality in your bios also. (See other posts on this topic for that)

I hope this post helps you solve your installation problem.

dave hansen

Check the System for Swap Information

Before we begin, we will take a look at our operating system to see if we already have some swap space available. We can have multiple swap files or swap partitions, but generally one should be enough.

We can see if the system has any configured swap by typing:

sudo swapon -s

Filename                Type        Size    Used    Priority

If you only get back the header of the table, as I’ve shown above, you do not currently have any swap space enabled.

Another, more familiar way of checking for swap space is with the free utility, which shows us system memory usage. We can see our current memory and swap usage in Megabytes by typing:

free -m
             total       used       free     shared    buffers     cached
Mem:          3953        154       3799          0          8         83
-/+ buffers/cache:         62       3890
Swap:            0          0          0

As you can see above, our total swap space in the system is «0». This matches what we saw with the previous command.

Check Available Space on the Hard Drive Partition

The typical way of allocating space for swap is to use a separate partition devoted to the task. However, altering the partitioning scheme is not always possible. We can just as easily create a swap file that resides on an existing partition.

Before we do this, we should be aware of our current disk usage. We can get this information by typing:

df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda         70G  5.3G   64G   4% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            2.0G   12K  2.0G   1% /dev
tmpfs           396M  312K  396M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            2.0G     0  2.0G   0% /run/shm
none            100M     0  100M   0% /run/user

As you can see on the first line, our hard drive partition has 70 Gigabytes available, so we have a huge amount of space to work with.

Although there are many opinions about the appropriate size of a swap space, it really depends on your personal preferences and your application requirements. Generally, an amount equal to or double the amount of RAM on your system is a good starting point.

Since our system has 8 Gigabytes of RAM, so we will create a swap space of 8 Gigabytes to match my system’s RAM.

Create a Swap File

Now that we know our available hard drive space, we can go about creating a swap file within our filesystem.

We will create a file called swapfile in our root (/) directory. The file must allocate the amount of space we want for our swap file. There are two main methods of doing this:

The Slower Method

Traditionally, we would create a file with preallocated space by using the dd command. This versatile disk utility writes from one location to another location.

We can use this to write zeros to the file from a special device in Linux systems located at /dev/zero that just spits out as many zeros as requested.

We specify the file size by using a combination of bs for block size and count for the number of blocks. What we assign to each parameter is almost entirely arbitrary. What matters is what the product of multiplying them turns out to be.

For instance, in our example, we’re looking to create a 4 Gigabyte file. We can do this by specifying a block size of 1 Gigabyte and a count of 4:

sudo dd if=/dev/zero of=/swapfile bs=1G count=8
8+0 records in
8+0 records out
8589934592 bytes (8.3 GB) copied, 18.6227 s, 231 MB/s

Check your command before pressing ENTER because this has the potential to destroy data if you point the of (which stands for output file) to the wrong location.

We can see that 8 Gigabytes have been allocated by typing:

ls -lh /swapfile
-rw-r—r— 1 root root 8.0G Nov 22 10:08 /swapfile

If you’ve completed the command above, you may notice that it took quite a while. In fact, you can see in the output that it took my system 36 seconds to create the file. That is because it has to write 8 Gigabytes of zeros to the disk.

If you want to learn how to create the file faster, remove the file and follow along below:

sudo rm /swapfile

The Faster Method

The quicker way of getting the same file is by using the fallocate program. This command creates a file of a preallocated size instantly, without actually having to write dummy contents.

We can create a 8 Gigabyte file by typing:

sudo fallocate -l 8G /swapfile

The prompt will be returned to you almost immediately. We can verify that the correct amount of space was reserved by typing:

ls -lh /swapfile
-rw-r—r— 1 root root 8.0G Nov 22 10:10 /swapfile

As you can see, our file is created with the correct amount of space set aside.

Enabling the Swap File

Right now, our file is created, but our system does not know that this is supposed to be used for swap. We need to tell our system to format this file as swap and then enable it.

Before we do that though, we need to adjust the permissions on our file so that it isn’t readable by anyone besides root. Allowing other users to read or write to this file would be a huge security risk. We can lock down the permissions by typing:

sudo chmod 600 /swapfile

Verify that the file has the correct permissions by typing:

ls -lh /swapfile
-rw——- 1 root root 8.0G Nov 22 10:11 /swapfile

As you can see, only the columns for the root user have the read and write flags enabled.

Now that our file is more secure, we can tell our system to set up the swap space by typing:

sudo mkswap /swapfile
Setting up swapspace version 1, size = 8388600 KiB
no label, UUID=e3f2e7cf-b0a9-4cd4-b9ab-814b8a7d6933

Our file is now ready to be used as a swap space. We can enable this by typing:

sudo swapon /swapfile

We can verify that the procedure was successful by checking whether our system reports swap space now:

sudo swapon -s
Filename                Type        Size    Used    Priority
/swapfile               file        8388600 0       -1

We have a new swap file here. We can use the free utility again to corroborate our findings:

free -m
             total       used       free     shared    buffers     cached
Mem:          7906        202       7704          0          5         30
-/+ buffers/cache:         66       7446
Swap:         8190          0       8190

Our swap has been set up successfully and our operating system will begin to use it as necessary.

Make the Swap File Permanent

We have our swap file enabled, but when we reboot, the server will not automatically enable the file. We can change that though by modifying the fstab file.

Edit the file with root privileges in your text editor:

sudo nano /etc/fstab

At the bottom of the file, you need to add a line that will tell the operating system to automatically use the file you created:

/swapfile   none    swap    sw    0   0

Save and close the file when you are finished.

Swap Settings

There are a few options that you can configure that will have an impact on your system’s performance when dealing with swap.

The swappiness parameter configures how often your system swaps data out of RAM to the swap space. This is a value between 0 and 100 that represents a percentage.

With values close to zero, the kernel will not swap data to the disk unless absolutely necessary. Remember, interactions with the swap file are «expensive» in that they take a lot longer than interactions with RAM and they can cause a significant reduction in performance. Telling the system not to rely on the swap much will generally make your system faster.

Values that are closer to 100 will try to put more data into swap in an effort to keep more RAM space free. Depending on your applications’ memory profile or what you are using your server for, this might be better in some cases.

We can see the current swappiness value by typing:

cat /proc/sys/vm/swappiness
60

For a Desktop, a swappiness setting of 60 is not a bad value. For a Server, we’d probably want to move it closer to 0.

We can set the swappiness to a different value by using the sysctl command.

For instance, to set the swappiness to 10, we could type:

sudo sysctl vm.swappiness=10
vm.swappiness = 10

This setting will persist until the next reboot. We can set this value automatically at restart by adding the line to our /etc/sysctl.conf file:

sudo nano /etc/sysctl.conf

At the bottom, you can add:

vm.swappiness=10

Save and close the file when you are finished.

Another related value that you might want to modify is the vfs_cache_pressure. This setting configures how much the system will choose to cache inode and dentry information over other data.

Basically, this is access data about the filesystem. This is generally very costly to look up and very frequently requested, so it’s an excellent thing for your system to cache. You can see the current value by querying the proc filesystem again:

cat /proc/sys/vm/vfs_cache_pressure
100

As it is currently configured, our system removes inode information from the cache too quickly. We can set this to a more conservative setting like 50 by typing:

sudo sysctl vm.vfs_cache_pressure=50
vm.vfs_cache_pressure = 50

Again, this is only valid for our current session. We can change that by adding it to our configuration file like we did with our swappiness setting:

sudo nano /etc/sysctl.conf

At the bottom, add the line that specifies your new value:

vm.vfs_cache_pressure = 50

Save and close the file when you are finished.

Conclusion

If you are running into OOM (out of memory) errors, or if you find that your system is unable to use the applications you need, the best solution is to optimize your application configurations or upgrade your server. Configuring swap space, however, can give you more flexibility and can help buy you time on a less powerful server.

Время на прочтение
5 мин

Количество просмотров 47K

Когда в Linux сервер базы данных непредвиденно завершает работу, нужно найти причину. Причин может быть несколько. Например, SIGSEGV — сбой из-за бага в бэкенд-сервере. Но это редкость. Чаще всего просто заканчивается пространство на диске или память. Если закончилось пространство на диске, выход один — освободить место и перезапустить базу данных.

Out-Of-Memory Killer

Когда у сервера или процесса заканчивается память, Linux предлагает 2 пути решения: обрушить всю систему или завершить процесс (приложение), который съедает память. Лучше, конечно, завершить процесс и спасти ОС от аварийного завершения. В двух словах, Out-Of-Memory Killer — это процесс, который завершает приложение, чтобы спасти ядро от сбоя. Он жертвует приложением, чтобы сохранить работу ОС. Давайте сначала обсудим, как работает OOM и как его контролировать, а потом посмотрим, как OOM Killer решает, какое приложение завершить.

Одна из главных задач Linux — выделять память процессам, когда они ее просят. Обычно процесс или приложение запрашивают у ОС память, а сами используют ее не полностью. Если ОС будет выдавать память всем, кто ее просит, но не планирует использовать, очень скоро память закончится, и система откажет. Чтобы этого избежать, ОС резервирует память за процессом, но фактически не выдает ее. Память выделяется, только когда процесс действительно собирается ее использовать. Случается, что у ОС нет свободной памяти, но она закрепляет память за процессом, и когда процессу она нужна, ОС выделяет ее, если может. Минус в том, что иногда ОС резервирует память, но в нужный момент свободной памяти нет, и происходит сбой системы. OOM играет важную роль в этом сценарии и завершает процессы, чтобы уберечь ядро от паники. Когда принудительно завершается процесс PostgreSQL, в логе появляется сообщение:

Out of Memory: Killed process 12345 (postgres).

Если в системе мало памяти и освободить ее невозможно, вызывается функция out_of_memory. На этом этапе ей остается только одно — завершить один или несколько процессов. OOM-killer должен завершать процесс сразу или можно подождать? Очевидно, что, когда вызывается out_of_memory, это связано с ожиданием операции ввода-вывода или подкачкой страницы на диск. Поэтому OOM-killer должен сначала выполнить проверки и на их основе решить, что нужно завершить процесс. Если все приведенные ниже проверки дадут положительный результат, OOM завершит процесс.

Выбор процесса

Когда заканчивается память, вызывается функция out_of_memory(). В ней есть функция select_bad_process(), которая получает оценку от функции badness(). Под раздачу попадет самый «плохой» процесс. Функция badness() выбирает процесс по определенным правилам.

  1. Ядру нужен какой-то минимум памяти для себя.
  2. Нужно освободить много памяти.
  3. Не нужно завершать процессы, которые используют мало памяти.
  4. Нужно завершить минимум процессов.
  5. Сложные алгоритмы, которые повышают шансы на завершение для тех процессов, которые пользователь сам хочет завершить.

Выполнив все эти проверки, OOM изучает оценку (oom_score). OOM назначает oom_score каждому процессу, а потом умножает это значение на объем памяти. У процессов с большими значениями больше шансов стать жертвами OOM Killer. Процессы, связанные с привилегированным пользователем, имеют более низкую оценку и меньше шансов на принудительное завершение.

postgres=# SELECT pg_backend_pid();
pg_backend_pid 
----------------
    3813
(1 row)

Идентификатор процесса Postgres — 3813, поэтому в другой оболочке можно получить оценку, используя этот параметр ядра oom_score:

vagrant@vagrant:~$ sudo cat /proc/3813/oom_score
2

Если вы совсем не хотите, чтобы OOM-Killer завершил процесс, есть еще один параметр ядра: oom_score_adj. Добавьте большое отрицательное значение, чтобы снизить шансы на завершение дорогого вам процесса.

sudo echo -100 > /proc/3813/oom_score_adj

Чтобы задать значение oom_score_adj, установите OOMScoreAdjust в блоке сервиса:

[Service]
OOMScoreAdjust=-1000

Или используйте oomprotect в команде rcctl.

rcctl set <i>servicename</i> oomprotect -1000

Принудительное завершение процесса

Когда один или несколько процессов уже выбраны, OOM-Killer вызывает функцию oom_kill_task(). Эта функция отправляет процессу сигнал завершения. В случае нехватки памяти oom_kill() вызывает эту функцию, чтобы отправить процессу сигнал SIGKILL. В лог ядра записывается сообщение.

Out of Memory: Killed process [pid] [name].

Как контролировать OOM-Killer

В Linux можно включать и отключать OOM-Killer (хотя последнее не рекомендуется). Для включения и отключения используйте параметр vm.oom-kill. Чтобы включить OOM-Killer в среде выполнения, выполните команду sysctl.

sudo -s sysctl -w vm.oom-kill = 1

Чтобы отключить OOM-Killer, укажите значение 0 в этой же команде:

sudo -s sysctl -w vm.oom-kill = 0

Результат этой команды сохранится не навсегда, а только до первой перезагрузки. Если нужно больше постоянства, добавьте эту строку в файл /etc/sysctl.conf:

echo vm.oom-kill = 1 >>/etc/sysctl.conf

Еще один способ включения и отключения — написать переменную panic_on_oom. Значение всегда можно проверить в /proc.

$ cat /proc/sys/vm/panic_on_oom
0

Если установить значение 0, то когда закончится память, kernel panic не будет.

$ echo 0 > /proc/sys/vm/panic_on_oom

Если установить значение 1, то когда закончится память, случится kernel panic.

echo 1 > /proc/sys/vm/panic_on_oom

OOM-Killer можно не только включать и выключать. Мы уже говорили, что Linux может зарезервировать для процессов больше памяти, чем есть, но не выделять ее по факту, и этим поведением управляет параметр ядра Linux. За это отвечает переменная vm.overcommit_memory.

Для нее можно указывать следующие значения:

0: ядро само решает, стоит ли резервировать слишком много памяти. Это значение по умолчанию в большинстве версий Linux.
1: ядро всегда будет резервировать лишнюю память. Это рискованно, ведь память может закончиться, потому что, скорее всего, однажды процессы затребуют положенное.
2: ядро не будет резервировать больше памяти, чем указано в параметре overcommit_ratio.

В этом параметре вы указываете процент памяти, для которого допустимо избыточное резервирование. Если для него нет места, память не выделяется, в резервировании будет отказано. Это самый безопасный вариант, рекомендованный для PostgreSQL. На OOM-Killer влияет еще один элемент — возможность подкачки, которой управляет переменная cat /proc/sys/vm/swappiness. Эти значения указывают ядру, как обрабатывать подкачку страниц. Чем больше значение, тем меньше вероятности, что OOM завершит процесс, но из-за операций ввода-вывода это негативно сказывается на базе данных. И наоборот — чем меньше значение, тем выше вероятность вмешательства OOM-Killer, но и производительность базы данных тоже выше. Значение по умолчанию 60, но если вся база данных помещается в память, лучше установить значение 1.

Итоги

Пусть вас не пугает «киллер» в OOM-Killer. В данном случае киллер будет спасителем вашей системы. Он «убивает» самые нехорошие процессы и спасает систему от аварийного завершения. Чтобы не приходилось использовать OOM-Killer для завершения PostgreSQL, установите для vm.overcommit_memory значение 2. Это не гарантирует, что OOM-Killer не придется вмешиваться, но снизит вероятность принудительного завершения процесса PostgreSQL.

Ubuntu 12

The ‘Out of Memory’ error can be a common issue when installing Ubuntu 22.04 on a dual boot system. This error is often due to insufficient memory allocation during the installation process. This article will guide you through the steps to resolve this issue.

To fix the ‘Out of Memory’ error when installing Ubuntu 22.04 on dual boot, you can try disabling TPM, SGX, and Secure Boot in the UEFI/BIOS settings. If the issue persists, create a live USB using Rufus or consider installing Ubuntu 21.10 as a temporary workaround.

  1. Understanding the Error
  2. Pre-installation Checks
  3. Step-by-Step Guide to Fix the Error
    • Step 1: Disable TPM, SGX, and Secure Boot
    • Step 2: Create a Live USB Using Rufus
    • Step 3: Check for Known Bugs
    • Step 4: Consider Installing Ubuntu 21.10
  4. Conclusion

Understanding the Error

The ‘Out of Memory’ error typically occurs when the system runs out of available memory during the installation process. This could be due to a variety of reasons such as insufficient physical memory, a large number of applications running in the background, or incorrect memory allocation settings.

Pre-installation Checks

Before you start the installation process, it’s recommended to perform some checks:

  1. Update BIOS/UEFI: Ensure that your BIOS/UEFI is updated to the latest version. This can be done by visiting your manufacturer’s website and downloading the latest firmware update.
  2. Check System Requirements: Make sure your system meets the minimum requirements for Ubuntu 22.04. This includes at least 4GB of RAM and 25GB of free hard drive space.

Step-by-Step Guide to Fix the Error

Step 1: Disable TPM, SGX, and Secure Boot

Access your UEFI/BIOS menu by pressing the appropriate key during the system startup. The key varies depending on the manufacturer but is typically one of the function keys (F1, F2, F10, etc.), the DEL key, or the ESC key.

Once in the UEFI/BIOS menu, navigate to the security settings and disable the following features:

  • TPM (Trusted Platform Module)
  • SGX (Software Guard Extensions)
  • Secure Boot

These features can interfere with the installation of new operating systems, and disabling them can often resolve the ‘Out of Memory’ error.

Step 2: Create a Live USB Using Rufus

If you’re still experiencing issues, consider creating your live USB using a different tool. Rufus is a popular option that’s easy to use and reliable. You can download Rufus from the official website here.

Once Rufus is downloaded and installed, follow these steps:

  1. Insert your USB drive.
  2. Open Rufus and select your USB drive from the ‘Device’ dropdown.
  3. In the ‘Boot selection’ dropdown, select ‘Disk or ISO image’.
  4. Click the ‘Select’ button and navigate to your Ubuntu 22.04 ISO file.
  5. Click ‘Start’ to create your live USB.

Step 3: Check for Known Bugs

Check if there is a known bug with Ubuntu 22.04 that might be causing the issue. You can refer to the Ubuntu bug tracker here for more information.

Step 4: Consider Installing Ubuntu 21.10

As a last resort, consider installing Ubuntu 21.10 instead of 22.04. This can be a temporary workaround until the issue with 22.04 is resolved.

Conclusion

Remember to always backup your important data before making any changes to your system or attempting to install a new operating system. While the ‘Out of Memory’ error can be frustrating, it’s often a simple fix. By following the steps outlined in this guide, you should be able to successfully install Ubuntu 22.04 on your dual boot system.

If you encounter the ‘Out of Memory’ error during the Ubuntu 22.04 installation, you can try disabling TPM, SGX, and Secure Boot in your UEFI/BIOS menu. This can often resolve the issue by removing any interference with the installation process.

To check if your system meets the minimum requirements for Ubuntu 22.04, ensure that you have at least 4GB of RAM and 25GB of free hard drive space. You can check your system specifications by going to the «System Settings» or «About» section in your operating system.

Yes, using a different tool to create a live USB can sometimes help resolve the ‘Out of Memory’ error. Rufus is a popular option that you can try using to create your live USB. It is easy to use and reliable, and you can download it from the official website.

You can find information about known bugs with Ubuntu 22.04 by visiting the Ubuntu bug tracker. The bug tracker provides a platform for users to report and track bugs, and you can search for any known bugs related to the ‘Out of Memory’ error on the Ubuntu bug tracker website.

If you are unable to resolve the ‘Out of Memory’ error in Ubuntu 22.04, you can consider installing Ubuntu 21.10 as a temporary workaround. While it may not have the latest features and updates of 22.04, it can be a viable option until the issue with 22.04 is resolved.

Out-of-memory (OOM) errors take place when the Linux kernel can’t provide enough memory to run all of its user-space processes, causing at least one process to exit without warning. Without a comprehensive monitoring solution, OOM errors can be tricky to diagnose.

In this post, you will learn how to diagnose OOM errors in Linux kernels by:

  • Analyzing different types of OOM error logs
  • Choosing the most revealing metrics to explain low-memory situations on your hosts
  • Using a profiler to understand memory-heavy processes
  • Setting up automated alerts to troubleshoot OOM error messages more easily

Identify the error message

OOM error logs are normally available in your host’s syslog (in the file /var/log/syslog). In a dynamic environment with a large number of ephemeral hosts, it’s not realistic to comb through system logs manually—you should forward your logs to a monitoring platform for search and analysis. This way, you can configure your monitoring platform to parse these logs so you can query them and set automated alerts. Your monitoring platform should enrich your logs with metadata, including the host and application that produced them, so you can localize issues for further troubleshooting.

There are two major types of OOM error, and you should be prepared to identify each of these when diagnosing OOM issues:

  • Error messages from user-space processes that handle OOM errors themselves
  • Error messages from the kernel-space OOM Killer

Error messages from user-space processes

User-space processes receive access to system memory by making requests to the kernel, which returns a set of memory addresses (virtual memory) that the kernel will later assign to pages in physical RAM. When a user-space process first requests a virtual memory mapping, the kernel usually grants the request regardless of how many free pages are available. The kernel only allocates free pages to that mapping when it attempts to access memory with no corresponding page in RAM.

When an application fails to obtain a virtual memory mapping from the kernel, it will often handle the OOM error itself, emit a log message, then exit. If you know that certain hosts will be dedicated to memory-intensive processes, you should determine in advance what OOM logs these processes output, then set up alerts on these logs. Consider running game days to see what logs your system generates when it runs out of memory, and consult the documentation or source of your critical applications to ensure that your log management system can ingest and parse OOM logs.

The information you can obtain from error logs differs by application. For example, if a Go program attempts to request more memory than is available on the system, it will print a log that resembles the following, print a stack trace, then exit.

fatal error: runtime: out of memory

In this case, the log prints a detailed stack trace for each goroutine running at the time of the error, enabling you to figure out what the process was attempting to do before exiting. In this stack trace, we can see that our demo application was requesting memory while calling the *ImageProcessor.IdentifyImage() method.

goroutine 1 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0000b9d10 sp=0xc0000b9d08 pc=0x461570
runtime.mallocgc(0x278f3774, 0x695400, 0x1, 0xc00007e070)
/usr/local/go/src/runtime/malloc.go:1046 +0x895 fp=0xc0000b9db0 sp=0xc0000b9d10 pc=0x40c575
runtime.makeslice(0x695400, 0x278f3774, 0x278f3774, 0x38)
/usr/local/go/src/runtime/slice.go:49 +0x6c fp=0xc0000b9de0 sp=0xc0000b9db0 pc=0x44a9ec
demo_app/imageproc.(*ImageProcessor).IdentifyImage(0xc00000c320, 0x278f3774, 0xc0278f3774)
demo_app/imageproc/imageproc.go:36 +0xb5 fp=0xc0000b9e38 sp=0xc0000b9de0 pc=0x5163f5
demo_app/imageproc.(*ImageProcessor).IdentifyImage-fm(0x278f3774, 0x36710769)
demo_app/imageproc/imageproc.go:34 +0x34 fp=0xc0000b9e60 sp=0xc0000b9e38 pc=0x5168b4
demo_app/imageproc.(*ImageProcessor).Activate(0xc00000c320, 0x36710769, 0xc000064f68, 0x1)
demo_app/imageproc/imageproc.go:88 +0x169 fp=0xc0000b9ee8 sp=0xc0000b9e60 pc=0x516779
main.main()
demo_app/main.go:39 +0x270 fp=0xc0000b9f88 sp=0xc0000b9ee8 pc=0x66cd50
runtime.main()
/usr/local/go/src/runtime/proc.go:203 +0x212 fp=0xc0000b9fe0 sp=0xc0000b9f88 pc=0x435e72
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0000b9fe8 sp=0xc0000b9fe0 pc=0x463501

Since this behavior is baked into the Go runtime, Go-based infrastructure tools like Consul and Docker will output similar messages in low-memory conditions.

In addition to error messages from user-space processes, you’ll want to watch for OOM messages produced by the Linux kernel’s OOM Killer.

Error messages from the OOM Killer

If a Linux machine is seriously low on memory, the kernel invokes the OOM Killer to terminate a process. As with user-space OOM error logs, you should treat OOM Killer logs as indications of overall memory saturation.

How the OOM Killer works

To understand when the kernel produces OOM errors, it helps to know how the OOM Killer works. The OOM Killer terminates a process using heuristics. It assigns each process on your system an OOM score between 0 and 1000, based on its memory consumption as well as a user-configurable adjustment score. It then terminates the process with the highest score. This means that, by default, the OOM Killer may end up killing processes you don’t expect. (You will see different behavior if you configure the kernel to panic on OOM without invoking the OOM Killer, or to always kill the task that invoked the OOM Killer instead of assigning OOM scores.)

The kernel invokes the OOM Killer when it tries—but fails—to allocate free pages. When the kernel fails to retrieve a page from any memory zone in the system, it attempts to obtain free pages by other means, including memory compaction, direct reclaim, and searching again for free pages in case the OOM Killer had terminated a process during the initial search. If no free pages are available, the kernel triggers the OOM Killer. In other words, the kernel does not “see” that there are too few pages to satisfy a memory mapping until it is too late.

OOM Killer logs

You can use the OOM Killer logs both to identify which hosts in your system have run out of memory and to get detailed information on how much memory different processes were using at the time of the error. You can find an annotated example in the git commit that added this logging information to the Linux kernel. For example, the OOM Killer logs provide data on the system’s memory conditions at the time of the error.

Mem-Info:
active_anon:895388 inactive_anon:43 isolated_anon:0
active_file:13 inactive_file:9 isolated_file:1
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:4352 slab_unreclaimable:7352
mapped:4 shmem:226 pagetables:3101 bounce:0
free:21196 free_pcp:150 free_cma:0

As you can see, the system had 21,196 pages of free memory when it threw the OOM error. With each page on the system holding up to 4 kB of memory, we can assume that the kernel needed to allocate at least 84.784 MB of physical memory to meet the requirements of currently running processes.

Understand the problem

If your monitoring platform has applied the appropriate metadata to your logs, you should be able to tell from a wave of OOM error messages which hosts are facing critically low memory conditions. The next step is to understand how frequently your hosts reach these conditions—and when memory utilization has spiked anomalously—by monitoring system memory metrics. In this section, we’ll review some key memory metrics and how to get more context around OOM errors.

Choose the right metrics

Datadog's Live Process view

As we’ve seen, it’s difficult for the Linux kernel to determine when low-memory conditions will produce OOM errors. To make matters worse, the kernel is notoriously imprecise at measuring its own memory utilization, as a process can be allocated virtual memory but not actually use the physical RAM that those addresses map to. Since absolute measures of memory utilization can be unreliable, you should use another approach: determine what levels of memory utilization correlate with OOM errors in your hosts and what levels indicate a healthy baseline (e.g. by running game days).

When monitoring overall memory utilization in relation to a healthy baseline, you should track both how much virtual memory the kernel has mapped for your processes and how much physical memory your processes are using (called the resident set size). If you’ve configured your system to use a significant amount of swap space—space on disk where the kernel stores inactive pages—you can also track this in order to monitor total memory utilization across your system.

Determine the scope

Your monitoring platform should tag your memory metrics with useful metadata about where those metrics came from—or at least the name of the associated host and process. The host tag is useful for determining if any one host is consuming an unusual amount of memory, whether in comparison to other hosts or to its baseline memory consumption.

You should also be able to group and filter your timeseries data either by process within a single host, or across all hosts in a cluster. This will help you determine whether any one process is using an unusual amount of memory. You’ll want to know which processes are running on each system and how long they’ve been running—in short, how recently your system has demanded an unsustainable amount of memory.

Find aggravating factors

A dashboard showing a web server process's memory consumption alongside its request count

If certain processes seem unusually memory-intensive, you’ll want to investigate them further by tracking metrics for other parts of your system that may be contributing to the issue.

For processes with garbage-collected runtimes, you can investigate garbage collection as one source of higher-than-usual memory utilization. On a timeseries graph of heap memory utilization for a single process, garbage collection forms a sawtooth pattern (e.g. the JVM)—if the sawtooth does not return to a steady baseline, you likely have a memory leak. To see if your process’s runtime is garbage collecting as expected, graph the count of garbage collection events alongside heap memory usage.

Alternatively, you can search your logs for messages accompanying the “cliffs” of the sawtooth pattern. If you run a Go program with the GODEBUG environment variable assigned to gctrace=1, for example, the Go runtime will output a log every time it runs a garbage collection. (The screenshot below shows a graph of garbage collection log frequency over time, plus a typical list of garbage collection logs.)

Datadog's Log Analytics view showing Golang garbage collection logs.

Another factor has to do with the work that a process is performing. If an application is managing memory in a healthy way (i.e. without memory leaks) but still using more than expected, the application may be handling unusual levels of work. You’ll want to graph work metrics for memory-heavy applications, such as request rates for a web application, query throughput for a database, and so on.

Find unnecessary allocations

A Datadog memory profile.

If a process doesn’t seem to be subject to any aggravating factors, it’s likely that the process is requesting more memory than you anticipate. One tool that helps you identify these memory requests is a memory profile.

Memory profiles visualize both the order of function calls within a call stack and how much heap memory each function call allocates. Using a memory profiler, you can quickly determine whether a given call is particularly memory intensive. And by examining the call’s parents and children, you can determine why a heavy allocation is taking place. For example, a profile could include a memory-intensive code path introduced by a recent feature release, suggesting that you should optimize the new code for memory utilization.

Get notified on OOM errors

An alert based on OOM error logs.

The most direct way to find out about OOM errors is to set alerts on OOM log messages: whenever your system detects a certain number of OOM errors in a particular interval, your monitoring platform can alert your team. But to prevent OOM errors from impacting your system, you’ll want to be notified before OOM errors begin to terminate processes. Here again, knowing the healthy baseline usage of virtual memory and RSS utilization allows you to set alerts when memory utilization approaches unhealthy levels. Ideally, you should be using a monitoring platform that can forecast resource usage and alert your team based on expected trends, and flag anomalies in system metrics automatically.

Investigate OOM errors in a single platform

Since the Linux kernel provides an imprecise view of its own memory usage and relies on page allocation failures to raise OOM errors, you need to monitor a combination of OOM error logs, memory utilization metrics, and memory profiles.

A monitoring service like Datadog unifies all of these sources of data in a single platform, so you get comprehensive visibility into your system’s memory usage—whether at the level of the process or across every host in your environment. To stay on top of memory saturation issues, you can also set up alerts to automatically notify you about OOM logs or projected memory usage. By getting notified when OOM errors are likely, you’ll know which parts of your system you need to investigate to prevent application downtime. If you’re curious about using Datadog to monitor your infrastructure and applications, sign up for a free trial.

About Datadog

Datadog is a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog delivers complete visibility into the performance of modern applications in one place through its fully unified platform. By reducing the number of tools needed to troubleshoot performance problems, Datadog reduces the time needed to detect and resolve issues. With vendor-backed integrations for 400+ technologies, a world-class enterprise customer base, and focus on ease of use, Datadog is the leading choice for infrastructure and software engineering teams looking to improve cross-team collaboration, accelerate development cycles, and reduce operational and development costs.

Tags: devops, linux, memory, sysadmins

Понравилась статья? Поделить с друзьями:
  • Outlook ошибка 421
  • Out of memory ошибка mathcad
  • Outlook ошибка 403
  • Outlook почистить ошибки синхронизации
  • Out of memory ошибка pubg