Project 2: Custom system call and signal handling for insane memory read; and ELF analysis
Due date
Nov 15, 2017, 11:59pm (please demo to me 3-5pm Nov 16).
Goals
The goals of this project are (1) to practice basic kernel programming, (2) to practice signal handling, and (3) to enhance the understanding of ELF file format and virtual memory.
Details
Implement a user space function int readAddr(void *p, int *v)
.
Given a user space virtual address p
, read the word at that address and store the value at
v
.
You should iterate through every address in user space at an interval of PAGE_SIZE * 1024.
char *p = 0;
unsigned long validPages = 0, invalidPages = 0;
for( ; (unsigned long)p < TASK_SIZE; p += PAGE_SIZE * 1024 ) {
...
int a = 0;
int r = readAddr(p, &a);
if( r == -1 ) // return value -1 means the read was invalid
invalidPages++;
else {
validPages++;
printf("%p: %d\n", p, a);
}
...
}
printf("%lu out of %lu pages are valid", validPages, validPages + invalidPages);
The first two sub-projects should implement the readAddr
above in the following
two ways, respectively. The third sub-project is about static and dynamic ELF file analysis.
- Please implement a system call
isReadable(char*)
and make use of it to implement the user-space functionreadAddr
for safe reading. The system callisReadable(char* p)
returns whether the address passed asp
falls into a readable memory area. If not, the system call should return -1; otherwise, 0. You should make use of themm
field intask_struct
. Specifically,mm
is of typemm_struct
, which contains a fieldmmap
pointing to a list of nodes, each describing a virtual memory area (VMA)vm_area_struct
. A VMA describes a range of virtual address space and the allowed access operations (VM_READ
,VM_WRITE
,VM_EXEC
, etc.) in user space. Note thatVM_EXEC
also implies it is readable. Also, note that if it is VM_IO, it is not safe to read, as such a region maps a device's I/O space. -
Make use of signal handling to handle invalid read. Reading a memory cell at an
arbitrary address may be just fine or may trigger a SIGSEGV signal. To deal with the latter case, you can
install a signal handler using
sigaction
. Whenever a SIGSEGV signal occurs, capture it and recover your program execution usingsigsetjmp/siglongjmp
. Specifically, inreadAddr
, before each (un)safe read, callsigsetjmp
; if the subsequent read triggers a SIGSEGV, the signal handler callssiglongjmp
to recover the control flow and hintreadAddr
to return -1. -
For the third sub-project, you only need to write a piece of code and then write a report describing how different
program elements (e.g., functions, global variables, local variables, string literals) in your code are stored in
different sections (such as
.text
,.data
,.bss
) of the ELF file; and how they are stored in different segments (such ascode
,data
,call stack
) of the memory address space. You report should include at least five different sections and five different segments. For example, in the report you can point out that the global variableint g = 100;
is stored in the.data
section in the ELF file and in thedata
segment during execution. You should use various tools (such asreadelf, objdump, gdb, nm
, etc.) to collect evidence (in the form of screenshots) to support your description. Assuming you report coversX
different sections andY
segments, You will get bonus/penalty points (X+Y-10).

Tips
To obtain PAGE_SIZE
and TASK_SIZE
, please refer to the code below
#include <unistd.h>
unsigned long PAGE_SIZE = 0, TASK_SIZE = 0;
PAGE_SIZE = sysconf(_SC_PAGESIZE);
if(sizeof (void*) == sizeof (int)) // 32-bit system
TASK_SIZE = 0xc0000000UL;
else // 64-bit system
TASK_SIZE = (1UL << 47) - PAGE_SIZE;
Submission
Your submission should include the code (the kernel code modification should be submitted as a kernel patch), a readme file describing your design, how to compile / use your code and the contribution in the case of group programming, and a report which consists of the following parts:
- APIs used to allocate memory in Linux user space, and when to use which.
- In your second subproject, you will encounter SIGSEGV. Describe how the system recognizes
p
is an invalid address and triggers a SIGSEGV. Hint: your answer should involve TLB, page table, and VMA. -
What information is saved at
/proc/$pid/maps
? How is it generated? Hint: the first subproject above.
Environment
Linux (any kernel version >= 2.6 is fine) and C/C++.
How to create a kernel patch
Assume the original kernel code linux-2.6.18-old
and the modified code linux-2.6.18-new
are in the same directory src
. Pay attention to the current working directory (I omitted the cd
command), and make sure you backup your code before running the commands below.
// Remove all intermediate files, such as the object files and configuration files
src/linux-2.6.18-new$ make distclean
// Generate the patch. -r: recursive, -u: a unified output wrt the difference, -N: handle new files
src$ diff -urN linux-2.6.18-old linux-2.6.18-new >patchfile
// Now you can verify the patch by applying it to the original code
// "-p1" makes the "patch" command ignore "linux-2.6.18-old/" inside the generated patch
src/linux-2.6.18-old$ patch -p1 <../patchfile
// You should see no difference
src$ diff -r linux-2.6.18-old linux-2.6.18-new
References
You may find the following articles useful
- Adding a Hello World System Call to Linux kernel 3.16.0. Surya, 2014. link
- How the Kernel Manages Your Memory. Duarte, 2009. link
- A wrong example that can be your start point for recovering from SIGSEGV. link
- Anatomy of a Program in Memory. Duarte, 2009. link
- The 101 of ELF Binaries on Linux: Understanding and Analysis. link
- ELF Hello World Tutorial. Santilli. link