XV6 Lab2

Memory Management

Introduction

Memory management consists of two parts:

  • A physical memory allocator for the kernel is needed, which could then be used to request memory possession and later release it. The allocator operates in units of 4096 bytes, called pages. The data struture named PageInfo (defined in file inc/memlayout.h) is provided for tracking which part of memory is free or allocated, as well as how many processes are sharing each allocated page.

  • Virutal Memory, which maps the virtual addresses used by kernel and user software to addresses in physical memory. The MMU performs the mapping by consulting page tables. Our job is set up page tables as needed.

Before we get started, we need to switch our git repo to branch ‘lab2’ by running this:

1
git checkout -b lab2 origin/lab2

Physical Page Management

Introduction

The operating system needs to know which part of memory is currently in use and which is not. JOS manages physical memory with page granularity, and MMU can map and protect each part which has already been allocated.

We use C structure PageInfo

1
2
3
4
5
6
7
8
9
10
11
struct PageInfo {
// Next page on the free list.
struct PageInfo *pp_link;

// pp_ref is the count of pointers (usually in page table entries)
// to this page, for pages allocated using page_alloc.
// Pages allocated at boot time using pmap.c's
// boot_alloc do not have valid reference count fields.

uint16_t pp_ref;
};

to keep track of which pages are free. Each PageInfo object represents a page. It is fundamental since the latter virtual memory part needs a space of page to store page tables.

During the last lecture’s reading mission, we can tell that the virtual address will be firstly translated to linear address. At this moment, if paging is enabled, linear address will be translated to the actual physical address, or use this linear address as a physical one.

We also know that in JOS, all usable physical addresses are all mapped to the start location 0xf0000000 and file inc/memlayout.h also defines maro KERNBASE,

1
2
// All physical memory mapped at this address
#define KERNBASE 0xF0000000

so when we’ve already known the whole physical memory space, we can simply calculate how many pages we actually need by being divided by 4096 or just perform logical right shift : paddr >> 12.

Now it’s time to write some code, implementing physical allocator.
The first function needs to be done is boot_alloc(), as comments say, this function should only be called once, when JOS is setting up virtual memory. The argument n it takes means need n bytes memory. The address(is already kind of virtual one) of the first page would be returned once the first calling to this function has been finished. Then offsets(n * PGSIZE) following can be obtained by continuing calling this function again and again. Actually, the first calling happens at here:

1
2
3
// create initial page directory.
kern_pgdir = (pde_t *) boot_alloc(PGSIZE);
memset(kern_pgdir, 0, PGSIZE);

in mem_init() function. To understand what kern_pgdir means, we surely need to understand the way linear address forms. Fortunately, the definition of linear address is explained in the comments of file inc/mmu.h.

1
2
3
4
5
6
7
8
// A linear address 'la' has a three-part structure as follows:
//
// +--------10------+-------10-------+---------12----------+
// | Page Directory | Page Table | Offset within Page |
// | Index | Index | |
// +----------------+----------------+---------------------+
// \--- PDX(la) --/ \--- PTX(la) --/ \---- PGOFF(la) ----/
// \---------- PGNUM(la) ----------/

There are some other macros for the convenience defined here, such as:

1
2
3
4
5
6
7
8
9
10
11
12
13
#define PGNUM(la)	(((uintptr_t) (la)) >> PTXSHIFT)

// page directory index
#define PDX(la) ((((uintptr_t) (la)) >> PDXSHIFT) & 0x3FF)

// page table index
#define PTX(la) ((((uintptr_t) (la)) >> PTXSHIFT) & 0x3FF)

// offset in page
#define PGOFF(la) (((uintptr_t) (la)) & 0xFFF)

// construct linear address from indexes and offset
#define PGADDR(d, t, o) ((void*) ((d) << PDXSHIFT | (t) << PTXSHIFT | (o)))

In a word, after paging, there is a kind of mapping between linear address and physical address. This mapping can be demonstrated as follows:

  • designate an exact page table from page directory
  • designate an exact page from this page table
  • obtain the final physical address by adding offset and page frame address

These steps can be simply called Page Translation, illustrated below:

Quoted by Intel 80386 Reference Manual : “A page table is simply an array of 32-bit page specifiers. A page table is itself a page, and therefore contains 4 Kilobytes of memory or at most 1K 32-bit entries.
Two levels of tables are used to address a page of memory. At the higher level is a page directory. The page directory addresses up to 1K page tables of the second level. A page table of the second level addresses up to 1K pages. All the tables addressed by one page directory, therefore, can address 1M pages (2^(20)). Because each page contains 4K bytes 2^(12) bytes), the tables of one page directory can span the entire physical address space of the 80386 (2^(20) times 2^(12) = 2^(32)).”

Q&A

  • boot_alloc

Prototype:

1
static void *boot_alloc(uint32_t n);

Let’s give a look to these two macros:

1
2
3
4
5
6
7
8
9
10
11
12
13
// Rounding operations (efficient when n is a power of 2)
// Round down to the nearest multiple of n
#define ROUNDDOWN(a, n) \
({ \
uint32_t __a = (uint32_t) (a); \
(typeof(a)) (__a - __a % (n)); \
})
// Round up to the nearest multiple of n
#define ROUNDUP(a, n) \
({ \
uint32_t __n = (uint32_t) (n); \
(typeof(a)) (ROUNDDOWN((uint32_t) (a) + __n - 1, __n)); \
})

Then we notice comments:

1
2
//Allocate a chunk large enough to hold 'n' bytes, then update
//nextfree. Make sure n extfree is kept aligned to a multiple of PGSIZE.

we firstly need to store previous nextfree, then make this pointer point to the next page beginning address. And nextfree will be incremented during several processes cuz it’s static. Then make sure it is PGSIZE aligned. So the code will be like this:

1
2
result = nextfree;
nextfree += ROUNDUP(n, PGSIZE);

  • mem_init

Prototype:

1
void mem_init(void);

The purpose of this function is allocate enough space for struct PageInfo of each page. This chunk of memory is right following the page directory.

1
2
pages = (struct PageInfo *)boot_alloc(npages * sizeof(struct PageInfo));
memset(pages, 0, npages * sizeof(struct PageInfo));
  • page_init

Prototype:

1
void page_init(void);

Check the comments:

1
2
3
4
5
6
7
8
//  1) Mark physical page 0 as in use.
// This way we preserve the real-mode IDT and BIOS structures
// in case we ever need them. (Currently we don't, but...)
// 2) The rest of base memory, [PGSIZE, npages_basemem * PGSIZE)
// is free.
// 3) Then comes the IO hole [IOPHYSMEM, EXTPHYSMEM), which must
// never be allocated.
// 4) Then extended memory [EXTPHYSMEM, ...).

Moreover, the memory mapping helps a lot.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/*
* Virtual memory map: Permissions
* kernel/user
*
* 4 Gig --------> +------------------------------+
* | | RW/--
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* : . :
* : . :
* : . :
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
* | | RW/--
* | Remapped Physical Memory | RW/--
* | | RW/--
* KERNBASE, ----> +------------------------------+ 0xf0000000 --+
* KSTACKTOP | CPU0's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| |
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* | CPU1's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| PTSIZE
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* : . : |
* : . : |
* MMIOLIM ------> +------------------------------+ 0xefc00000 --+
* | Memory-mapped I/O | RW/-- PTSIZE
* ULIM, MMIOBASE --> +------------------------------+ 0xef800000
* | Cur. Page Table (User R-) | R-/R- PTSIZE
* UVPT ----> +------------------------------+ 0xef400000
* | RO PAGES | R-/R- PTSIZE
* UPAGES ----> +------------------------------+ 0xef000000
* | RO ENVS | R-/R- PTSIZE
* UTOP,UENVS ------> +------------------------------+ 0xeec00000
* UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebff000
* | Empty Memory (*) | --/-- PGSIZE
* USTACKTOP ---> +------------------------------+ 0xeebfe000
* | Normal User Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebfd000
* | |
* | |
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* . .
* . .
* . .
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
* | Program Data & Heap |
* UTEXT --------> +------------------------------+ 0x00800000
* PFTEMP -------> | Empty Memory (*) | PTSIZE
* | |
* UTEMP --------> +------------------------------+ 0x00400000 --+
* | Empty Memory (*) | |
* | - - - - - - - - - - - - - - -| |
* | User STAB Data (optional) | PTSIZE
* USTABDATA ----> +------------------------------+ 0x00200000 |
* | Empty Memory (*) | |
* 0 ------------> +------------------------------+ --+
*
* (*) Note: The kernel ensures that "Invalid Memory" is *never* mapped.
* "Empty Memory" is normally unmapped, but user programs may map pages
* there if desired. JOS user programs map pages temporarily at UTEMP.
*/

// At IOPHYSMEM (640K) there is a 384K hole for I/O. From the kernel,
// IOPHYSMEM can be addressed at KERNBASE + IOPHYSMEM. The hole ends
// at physical address EXTPHYSMEM.
#define IOPHYSMEM 0x0A0000
#define EXTPHYSMEM 0x100000

So here it is, use some if-elses.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void
page_init(void)
{
size_t i;
for (i = 0; i < npages; i++) {
if (0 == i) {
pages[i].pp_ref = 1; // pp_ref = 1 means page is in use, otherwise is free
pages[i].pp_link = NULL;
}else if (i < npages_basemem) {
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list; // fill in the list
page_free_list = &pages[i];
}else if (i <= (EXTPHYSMEM/PGSIZE) || i < ((size_t)boot_alloc(0)-KERNBASE) / PGSIZE) {
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
}else {
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}
}
}
  • page_alloc

Prototype:

1
struct PageInfo *page_alloc(int alloc_flags)

Allocate a physical page if page_free_list is non-NULL. Also we need to clear the memory space if alloc_flags & ALLOC_ZERO is non-NULL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct PageInfo *
page_alloc(int alloc_flags)
{
if (!page_free_list)
return NULL;

if (alloc_flags & ALLOC_ZERO)
memset(page2kva(page_free_list), 0, PGSIZE); // clear memory space
struct PageInfo *store; // store allocated page
store = page_free_list;
page_free_list = page_free_list->pp_link;

store->pp_link = NULL;
store->pp_ref = 0;
return store;
}
  • page_free

Prototype:

1
void page_free(struct PageInfo *pp);

Re-add a page to page_free_list, and when this page is currently in use, we should panic the kernel >_<, after all we should not free this page forcefully.

1
2
3
4
5
6
7
8
9
10
11
12
void
page_free(struct PageInfo *pp)
{
// Fill this function in
// Hint: You may want to panic if pp->pp_ref is nonzero or
// pp->pp_link is not NULL.
if (pp->pp_ref != 0 || pp->pp_link != NULL) {
panic("You can't free this page");
}
pp->pp_link = page_free_list;
page_free_list = pp;
}

Let’s run check_page_alloc() to see if pages are allocated successfully.

Yeah :)

Virtual Memory

Introduction

First of all, being clear of segment and page is fundamental. Without considering segment here, I find it enough that we just know about page directory and page table for this lab. Being familiar with these two helps us access them using C pointers. Luckily, I believe Intel 80386 Reference Manual has told us enough.

One thing needs extra attention is page frame address is a physical address.

After memory space for page directory is been allocated, we need to remap all physical address at 0xf0000000 to help the kernel read and write memory for which it knows just the physical address. Reason for that remapping is the kernel cannot bypass virtual address translation and thus cannot directly load and store to physical addresses.

Q&A

  • pgdir_walk

Prototype:

1
pte_t *pgdir_walk(pde_t *pgdir, const void *va, int create);
1
2
3
// Given 'pgdir', a pointer to a page directory, pgdir_walk returns
// a pointer to the page table entry (PTE) for linear address 'va'.
// This requires walking the two-level page table structure.

To obtain the address of one page directory entry, we could just add an index to that given pgdir, this index here should be PDX(va). If this directory entry exists and meanwhile, the present bit indicates that a page table entry can be used in address translation, then use this macro:

1
2
// Address in page table or page directory entry
#define PTE_ADDR(pte) ((physaddr_t) (pte) & ~0xFFF)

to get a physical address of a page table, then use KADDR to transfer it to its corresponding virtual address, finally, like how we get directory entry at the very beginning, we add an index too, that would be PTX(va).

According to the rest of these comments below:

1
2
3
4
5
6
7
8
9
// Hint 1: you can turn a PageInfo * into the physical address of the
// page it refers to with page2pa() from kern/pmap.h.
//
// Hint 2: the x86 MMU checks permission bits in both the page directory
// and the page table, so it's safe to leave permissions in the page
// directory more permissive than strictly necessary.
//
// Hint 3: look at inc/mmu.h for useful macros that mainipulate page
// table and page directory entries.

One possible implementation for this function could be like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{
/* page table (1K 32 bits) referenced by pointer 'pg_dir_entry = pgdir+PDX(va)'
* +------------------+
* |page table entry 0|
* |page table entry 1|
* |page table entry 2|
* |page table entry 3|
* |... x-1|
* |.. x|
* +------------------+
*/
assert(pgdir != NULL);
pde_t *pg_dir_entry = NULL; // page directory entry
pte_t *pg_tbl_entry = NULL; // page table entry, the return value
struct PageInfo *page_info = NULL; //

pg_dir_entry = pgdir+PDX(va);

if (*pg_dir_entry & PTE_P) // The present bit indicates whether a page table
// entry can be used in address translation.
// P=1 indicates that the entry can be used.
return (pte_t *)KADDR(PTE_ADDR(*pg_dir_entry)) + PTX(va);

if (create) {
page_info = page_alloc(0); // allocate a page for a page table
if (page_info == NULL)
return NULL;

memset(page2kva(page_info), 0, PGSIZE);
page_info->pp_ref ++; // manually increment reference counting
pg_tbl_entry = (pte_t *)page2kva(page_info); // translate physical address to virtual address
*pg_dir_entry = PADDR(pg_tbl_entry) | PTE_P | PTE_U | PTE_W; // hint 2
return pg_tbl_entry+PTX(va);
}

return NULL;
}
  • boot_map_region

Prototype:

1
static void boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm);

This is just to set up a static mapping.

1
2
3
4
5
// Map [va, va+size) of virtual address space to physical [pa, pa+size)
// in the page table rooted at pgdir. Size is a multiple of PGSIZE, and
// va and pa are both page-aligned.
// Use permission bits perm|PTE_P for the entries.
// Hint: the TA solution uses pgdir_walk

Also keep this in mind: In a page directory, the page frame address is the address of a page table. In a second-level page table, the page frame address is the address of the page frame that contains the desired memory operand.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static void
boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
pte_t *pg_tbl_entry = NULL;

for (int i = 0; i < size/PGSIZE; i ++) {
pg_tbl_entry = pgdir_walk(pgdir, (const void *)va, 1);
if (NULL == pg_tbl_entry)
panic("error! Page table entry shouldn't be NULL!");
*pg_tbl_entry = pa | perm | PTE_P; // set static mappings

va += PGSIZE;
pa += PGSIZE;
}
}
  • page_lookup

Prototype:

1
struct PageInfo *page_lookup(pde_t *pgdir, void *va, pte_t **pte_store);

Return the physical page mapped at virtual address va. The comment says we might need pa2page which takes one physical address as the only parameter and return the corresponding physical page, to help to reach that. And pa2page is sort of tricky:

1
2
3
4
5
6
7
static inline struct PageInfo*
pa2page(physaddr_t pa)
{
if (PGNUM(pa) >= npages)
panic("pa2page called with invalid pa");
return &pages[PGNUM(pa)];
}
1
2
// page number field of address
#define PGNUM(la) (((uintptr_t) (la)) >> 12)

So we get virtual address va, to obtain the physical page, we firstly need to get corresponding physical address, which resides in a page table entry mentioned above (By using macro PTE_ADDR). So the task becomes how we get that entry. Obviously, using page_walk implemented before would be quite convenient. Also, there’s one extra parameter named pte_store. If this pte_store does exist (non-zero), then we need store the address (C pointer) of the page table entry we are about to get in it.

1
2
3
4
5
6
7
8
9
10
11
struct PageInfo *
page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
pte_t *pg_tbl_entry = pgdir_walk(pgdir, va, 1);
if (NULL == pg_tbl_entry || !(*pg_tbl_entry & PTE_P))
return NULL;
if (0 != pte_store)
*pte_store = pg_tbl_entry; // store the address of pte in pte_store;
physaddr_t pa = PTE_ADDR(*pg_tbl_entry);
return pa2page(pa);
}
  • page_remove

Prototype:

1
void page_remove(pde_t *pgdir, void *va);
1
2
// Unmaps the physical page at virtual address 'va'.
// If there is no physical page at that address, silently does nothing.

Here is some details if we wish one page’d been correctly removed.

1
2
3
4
5
6
7
// Details:
// - The ref count on the physical page should decrement.
// - The physical page should be freed if the refcount reaches 0.
// - The pg table entry corresponding to 'va' should be set to 0.
// (if such a PTE exists)
// - The TLB must be invalidated if you remove an entry from
// the page table.

page_decref should be used here properly.

1
2
3
4
5
6
7
8
9
10
//
// Decrement the reference count on a page,
// freeing it if there are no more refs.
//
void
page_decref(struct PageInfo* pp)
{
if (--pp->pp_ref == 0)
page_free(pp);
}

Firstly, we need to locate that page, using page_lookup. Detail 1&2, using page_decref to decrement page’s reference count after it’s been found. Detail 3, page_table_entry should be set to 0, this should be simple. Finally, invalidate TLB? Just another tricky one:

1
2
3
4
5
6
7
8
9
10
11
//
// Invalidate a TLB entry, but only if the page tables being
// edited are the ones currently in use by the processor.
//
void
tlb_invalidate(pde_t *pgdir, void *va)
{
// Flush the entry only if we're modifying the current address space.
// For now, there is only one address space, so always invalidate.
invlpg(va);
}

what this tlb_invalidate does isn’t conspicuous yet to me. To finish this one, here is my code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void
page_remove(pde_t *pgdir, void *va)
{
pte_t *pte = NULL;
pte_t **pte_store = &pte; // pte_store cannot be NULL
struct PageInfo *page_info = page_lookup(pgdir, va, pte_store);
if (!page_info)
return;

if (0 != *pte_store){
page_decref(page_info);
**pte_store = 0; // set pg table entry to 0
tlb_invalidate(pgdir, va);
}
}
  • page_insert

Prototype:

1
int page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)

In my opinion, there are three kinds of insertions.

First, invalidate existing old one and then remapping the new

Second, repeated mapping, all things are just the same, no incrementation in pp_ref

Third, unmapped area, which might be caused by insufficient memory, eg

So considering all three circumstances above, the first thing we need to do is obtain the page table entry by using page_walk. If page table entry returned is not existing, then we return an error code -E_NO_MEM. If repeated mapping happens, represented by pa2page(pa) and PTE_ADDR(*pg_tbl_entry) being the same one. We need to decrement its pp_ref , then invalidate it and map the new. The last is directly map the new one, just remove the previous mapping by using page_remove.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int
page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
// Fill this function in
pte_t *pg_tbl_entry = pgdir_walk(pgdir, va, 1);
if (NULL == pg_tbl_entry)
return -E_NO_MEM;
if (page2pa(pp) == PTE_ADDR(*pg_tbl_entry)) {
pp->pp_ref--;
// formerly presented, then invalidate it
tlb_invalidate(pgdir, va);
}else if (*pg_tbl_entry & PTE_P) {
page_remove(pgdir, va);
}

// map the new
*pg_tbl_entry = page2pa(pp) | perm | PTE_P;
pp->pp_ref++;
return 0;
}

In the end, we could run check_page to ensure all the functions are working with no mistake.