Post

Portable Executable (PE) Structure

Portable Executable (PE) Structure

Banner PEStruct

A Portable Executables (PE) is the file format for executables on Windows. A few examples of PE file extensions are .exe, .dll, .sys, and .scr. In this blog psot, I’ll discuss the Windows PE structure, which is fundamental knowledge to those working in IT, specifically if you work with Windows-based malware in any capacity, as a developer or reverse engineer.

PE Structure

The diagram below shows a general, simplified structure of a PE file, with every header being a data structure.

PE-structure-diagram

DOS Header (IMAGE_DOS_HEADER)

The DOS header is always prefixed with two bytes, 0x4D and 0x5A, or 4D5A, this is commonly referred to as the magic bytes MZ. These bytes represent the DOS header signature, which is used to confirm that the file being parsed or inspected is a valid PE file. The DOS header is a data structure, defined as follows in the code block below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // Offset to the NT header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

The two members of the above struct which are the most important are the e_magic and e_lfanew

  • e_magic is as described above, the Magic Bytes (MZ) denoted as 0x5A4D here.

  • e_lfanew is a 4-byte value that holds an offset at the beginning of the NT Header, which is located at 0x3C

DOS Stub

The DOS stub is a short piece of code that resides at the beginning of every Portable Executable (PE) file format within Windows operating systems.

This stub is executed when a user tries to run the file on MS-DOS or equivalent platforms. Since PE files can’t run on such platforms, the DOS stub usually prints out a message along the lines of “This program cannot be run in DOS mode.” By default, this is what the DOS stub does, but it can be modified to implement other routines, though it’s rarely done because modern systems do not use the MS-DOS environment for execution.

Therefore, the main purpose of the DOS stub in the modern context is to provide backward compatibility and prevent DOS-based systems from executing these PE files and potentially causing system instability.

NT Header (IMAGE_NT_HEADERS)

The NT header is essential for one key reason: it incorporates the other two image headers contained in a PE file, FileHeader and OptionalHeader, which together contains a large amount of information about a PE file.

Where the DOS header has a unique signiture to identify it, namely 0x4D and 0x5A, the NT header has a unique signiture too, equal to the “PE” string, which is 0x50 and 0x45 bytes. However, since this signiture is of the data type DWORD, it’ll be represented as 0x50450000. This is exactly the same as the “PE” string, execpt that it is padded with two null bytes. The e_lfanew member is used to reach the NT header.

Depending on the machines’ architecture, the NT header structures will be different, see below:

32-Bit

1
2
3
4
5
typedef struct _IMAGE_NT_HEADERS {
  DWORD                   Signature;
  IMAGE_FILE_HEADER       FileHeader;
  IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

64-Bit

1
2
3
4
5
typedef struct _IMAGE_NT_HEADERS64 {
    DWORD                   Signature;
    IMAGE_FILE_HEADER       FileHeader;
    IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

Functionally, the only difference between the two is the OptionalHeader data structure, that being IMAGE_OPTIONAL_HEADER32 and IMAGE_OPTIONAL_HEADER64 respectively.

File Header (IMAGE_FILE_HEADER)

As previously mentioned, this header can be accessed via the NT header. The most important members of the FileHeader are:

  • NumberOfSections - The number of sections in the PE file.

  • Characteristics - Flags that specify certain attributes about the executable file, such as whether it is a dynamic-link library (DLL) or a console application.

  • SizeOfOptionalHeader - The size of the optional header, covered next.

The code block below shows the struct for the FileHeader

1
2
3
4
5
6
7
8
9
typedef struct _IMAGE_FILE_HEADER {
  WORD  Machine;
  WORD  NumberOfSections;
  DWORD TimeDateStamp;
  DWORD PointerToSymbolTable;
  DWORD NumberOfSymbols;
  WORD  SizeOfOptionalHeader;
  WORD  Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

Microsofts’ official documentation provides more information and context around the FileHeader.

Optional Header (IMAGE_OPTIONAL_HEADER)

Contrary to assumption, the OptionalHeader is important. This header is so named “optional” because some file ypes do not have it. The OptionalHeader has two versions, a 32-bit and 64-bit version, they are nearly identical, aside from the fact that the 32-bit version has some extra members that the 64-bit version doesn’t have. In addition, ULONGLONG is used in the 64-bit version, where DWORD is used in the 32-bit version.

See the code blocks below for the distinction.

32-Bit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;

64-Bit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
typedef struct _IMAGE_OPTIONAL_HEADER64 {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  ULONGLONG            ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  ULONGLONG            SizeOfStackReserve;
  ULONGLONG            SizeOfStackCommit;
  ULONGLONG            SizeOfHeapReserve;
  ULONGLONG            SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;

The OptionalHeader has plenty of information that can be used, listed below are some of the most common struct members:

  • Magic - Describes the state of the image file (32 or 64-bit image)

  • MajorOperatingSystemVersion - The major version number of the required operating system

  • MinorOperatingSystemVersion - The minor version number of the required operating system

  • SizeOfCode - The size of the .text section (Discussed later)

  • AddressOfEntryPoint - Offset to the entry point of the file (Typically the main function)

  • BaseOfCode - Offset to the start of the .text section

  • SizeOfImage - The size of the image file in bytes

  • ImageBase - It specifies the preferred address at which the application is to be loaded into memory when it is executed.

  • DataDirectory - One of the most important members in the optional header. This is an array of IMAGE_DATA_DIRECTORY, which contains the directories in a PE file (discussed next).

Data Directory

The data directory can be accessed from the last member of the OptionalHeader, as described above. This is an array of the data type IMAGE_DATA_DIRECTORY; which has the data structre shown in the code block below:

1
2
3
4
typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;
    DWORD   Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

The data directory array has a constant size value of 16, with each element in the array represents a specific data directory. The specific data directories are shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

The next two sections Export Directory and Import Address Table are shown as item (0) and item (12) in the above table respectively.

Export Directory

The export directory contains information about the functions and variables that a Dynamic Link Library (DLL) or an executable makes available to other programs. This directory is part of the PE file’s data structure and provides the ability for external programs to use the library’s exported symbols (e.g., functions) (kernel32.dll exporting CreateFileA).

Import Address Table (IAT)

The Import Address Table (IAT) in a PE file plays a crucial role in dynamic linking, which allows an executable to use functions or data from external Dynamic Link Libraries (DLLs). Essentially, the IAT is a structure within the PE file that holds addresses of the functions imported from DLLs. During the loading of the executable, these addresses are resolved by the operating system’s loader. An exxample of this coule be Malware.exe importing CreateFileA from kernel32.dll.

PE Sections

The PE sections contains the code needed to create an executable (.exe). Each PE section is given a unique name which may contain executable code or resource information. PE sections can be dynamic, with it being possible add manually add sections later on. IMAGE_FILE_HEADER.NumberOfSections will help to determine the total number of sections.

If we are talking about what sections exist in nearly every PE file, review the list below:

  • .text - Contains the executable code which is the written code.

  • .data - Contains initialized data which are variables initialized in the code.

  • .rdata - Contains read-only data. These are constant variables prefixed with const.

  • .idata - Contains the import tables. These are tables of information related to the functions called using the code. This is used by the Windows PE Loader to determine which DLL files to load to the process, along with what functions are being used from each DLL.

  • .reloc - Contains information on how to fix up memory addresses so that the program can be loaded into memory without any errors.

  • .rsrc - Used to store resources such as icons and bitmaps

Each PE section has an IMAGE_SECTION_HEADER data structure that contains information about it. These structures are saved under the NT headers in a PE file and are stacked above each other where each structure represents a section.

Referring to the IMAGE_SECTION_HEADER structure, as below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
typedef struct _IMAGE_SECTION_HEADER {
  BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;
  DWORD SizeOfRawData;
  DWORD PointerToRawData;
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

the elements are as follows:

  • Name - The name of the section. (e.g. .text, .data, .rdata).
  • PhysicalAddress or VirtualSize - The size of the section when it is in memory.
  • VirtualAddress - Offset of the start of the section in memory.

Conclusion

This has been my fundamental look at the structure of a PE file. If your analysing malware, this knowledge will be helpful. Feel free to refer back to it when you need to. I haven’t gone into a ton of detail, there is a lot more we could discuss, but this blog would be much longer.

This post is licensed under CC BY 4.0 by the author.