Thursday, December 13, 2007

Case Study: Linux Memory Management

PageAllocation
The Linux page allocator, from mm/page_alloc.c, is the main memory allocation mechanism in the Linux kernel. It has to deal with allocations from many parts of the Linux kernel, under many different circumstances. Consequently the Linux page allocator is fairly complex, and easiest to understand in the context of its environment. Because of this, this wiki article begins with an explanation of exactly what the page allocator needs to do, before going into the details of how things are done.

I am writing this article bit by bit whenever I feel like it. If you feel like writing something, go right ahead - RikvanRiel


Contents

memory allocators
gfp mask
page allocation order
alloc_pages
_alloc_pages
buddy allocator
per-cpu page queues
hot/cold pages
NUMA tradeoffs

memory allocators
Various different parts of the Linux kernel allocate memory, under different circumstances. Most memory allocations happen on behalf of userspace programs; these allocations can use any memory in the system (highmem, zone_normal and dma) and, if free memory is low, can wait for memory to be freed by the pageout code. Page cache and page table allocations can also use any memory in the system and can wait for memory to be freed.

Most kernel level allocations are different and can only use memory that is directly mapped into kernel address space (zone_normal and dma). Most, though not all, kernel level allocations can wait for memory to be freed, if free memory is low.

Allocations from interrupt context are different. They can not wait for memory to be freed, so if free memory on the system is low at the time of the allocation, the allocation will simply fail.

PageReplacementDesign
This page describes a new page replacement design by Rik van Riel, Lee Schermerhorn and others.

If you spot any conceptual problems or oversights in this design, or have questions on the details, please email or IRC RikvanRiel. Once nobody can find holes, the concepts will probably work


The problem
In brief, the current page replacement mechanism in Linux (up to and including 2.6.24) has two problems:

Sometimes the kernel evicts the wrong pages, resulting in bad performance.
The kernel scans over pages that should not be evicted. This results in increased CPU use on large memory systems (several GB), but results in catastrophic CPU use and lock contention on huge systems (>128GB of RAM).

More details on the problem space can be found on ProblemWorkloads and page replacement requirements.


High level overview
Evicts pages with minimal scanning. Systems with 1TB RAM exist, the VM cannot afford to scan pages that should not be evicted.
Most page churn comes from reading large files. Evict those pages first.
Evict other pages when we do not have enough memory for readahead or we keep reading in the same file data over and over again.
Filesystem IO is much more efficient than swap IO. Only evict file cache if we can.
Use reference and refault data to balance filesystem cache and anonymous (process) memory.
Optimize the normal case; also optimize the worst case. The VM must be robust at all times, so the worst case has to work well too.

Wednesday, November 21, 2007

Two Reasons Why Regional Bank decided to buy six server computer than one supercomputer:

  1. it can have at least backup for their important files when the other computers is being errored.
  2. it can have other computers running while the other needs maintenance.

Operating System News

Review: Mac OS X Shines In Comparison With Windows Vista

Amid the hype surrounding the release of Windows Vista, Mac users are taking solace from the fact that OS X is still a champ on many fronts. Here are some reasons our reviewer John C. Welch opts for Apple.
By John C. Welch InformationWeek January 6, 2007 08:00 AM
OS X vs. Vista
Image Gallery
If you believe all the hype, installing the new Windows Vista Windows, there's a lot of it that is, quite frankly, just Microsoft (NSDQ: MSFT) making up for lost time. The last non-server release of Windows was in 2001 with Windows XP, with only a single major interim update in Service Pack 2. In the same time, Apple has been steadily releasing updates to Mac OS X on what was a yearly schedule, now around every 18 months.
This means that while Mac OS X has been steadily evolving through 10.0, 10.1, 10.2, 10.3, and 10.4, and is now working toward 10.5, Microsoft was waiting on what would become Vista. When it was obvious the original Longhorn OS wasn't going to happen, it took the Windows Server 2003 code base and used that for the basis of Vista. It also chopped quite a few features out of Vista, most notably the WinFS object-based data storage and management system, which had been promised in various forms since the first blurbs about Cairo in the early 1990s.
Microsoft had two serious issues. First, it had to make this update of Windows revolutionary enough that it came close to justifying the delay. Second, it had to come up with something that would stand up well with its main competitor in the desktop OS market, Mac OS X. Has it succeeded at both? I'd argue that the former's almost a nonissue: Vista will sell well, because the world won't have a choice. As far as the latter, well, probably, but you'd be hard-pressed to say Vista's better than Mac OS X.
In a nutshell, Vista vs. Mac OS X is Revolution vs. Evolution. It's about a massive, long-delayed upgrade that has to account for almost six years of progress by its competitors, versus a well-executed strategy of regular updates. While updating an operating system is never something that can be called easy, Apple's strategy has been the better one for keeping its OS on top of things, something Microsoft has admitted to in a roundabout way.

Source: http://www.informationweek.com/