Secure code
Secure software does precisely what it is supposed to do. This means it starts in a secure state and cannot transition to an insecure state. Achieving secure code involves learning from past security issues and applying state-of-the-art techniques tailored to specific systems or programming languages, all while maintaining the highest possible code quality.
As C is one of the most popular programming languages, many attackers exploit its vulnerabilities. The reason there are so many vulnerabilities in C and C++ is that these languages trust the programmer to know what they are doing. They aim to be small and simple, prioritizing speed but not guaranteeing portability.
C defines different behaviors for portability issues that can create opportunities for security attacks. Most security vulnerabilities result from Undefined Behavior and Unspecified Behavior.
- Undefined Behavior (UB): Behavior caused by ambiguous failures.
- Unspecified Behavior (USB): Behavior caused by ambiguous failures.
- Implementation-Defined Behavior (IDB): Behavior dependent on the compiler implementation.
- Locale-Specific Behavior (LSB): Ambiguities based on dependencies with locale-specific functions.
CIA Triad
Confidentiality.
Confidentiality is ensured when a system guarantees that unauthorized acquisition of information is not possible. The principal methods to protect confidentiality are based on data encryption.
Integrity
Integrity is ensured when a system guarantees that unauthorized subjects cannot manipulate data without being detected. To prevent data manipulation and preserve data integrity within the system, Message Authentication Codes (MACs) and Digital Signatures are used. The sub-goals of integrity include:
- Correctness: Data has not been modified. Data may appear complete, but some bytes might have been altered.
- Completeness: Data has been fully transferred. Data may be correct, but if not entirely sent, it can cause system misbehavior.
- Freshness: Data is not outdated. Data may be complete and unmodified, but if too much time has passed, it may be risky to use as it could be obsolete.
Availability
Availability is ensured when a system enforces that authenticated and authorized subjects are not denied access by an unauthorized attacker. This can occur when system availability is reduced due to high resource consumption during peak times or through malicious overload attacks.
Memory segments
The C standard does not specify a particular memory layout, so each microcontroller defines how it organizes its segments and the data mapping used.
Code segment
The code segment, also known as the text segment, is a memory area that contains frequently executed code. This segment is typically read-only to prevent the risk of being overwritten, which could cause issues like *buffer overflows. The code segment does not contain variables (global or local) and, due to its fixed size, is often stored in ROM in embedded systems.
Data segments
These data segments contain global variables and static variables, both initialized and uninitialized, as well as constants. These segments can be classified as read-only areas for constant variables and read-write areas for variables that can be modified at runtime. The size of this segment is determined by the values in the source code and does not change at runtime.
Heap segment
The heap segment is used to support dynamic memory allocation. In C, this is done using the malloc, calloc, and realloc functions. Memory allocation on the heap is comparatively slower to access and write to compared to other memory segments. In the heap, memory remains allocated until it is deallocated, and if it is not, memory leaks can easily occur. The heap can be seen as a large pool of memory where allocations are made randomly, which can lead to fragmentation when linking different types of structures and data types.
Stack segment
The stack segment is used to store variables created within functions, which go out of scope when the function exits. This includes local variables, function arguments, and return addresses. The stack operates on a LIFO (Last In, First Out) structure, and memory allocation on the stack is very fast. However, the stack is relatively small, so deeply nested functions or functions without proper bounds can cause a stack overflow.
Stack buffer overflow
The stack segment is organized as a LIFO (Last In, First Out) structure. It is used to store variables created within functions, which go out of scope when the function exits:
- Local variables of the function.
- Arguments passed to the function.
- Return addresses.
The stack is relatively small, so numerous nested function calls, recursion without boundaries, or very large local arrays can cause a stack overflow.
Essentially, the stack structure relies on a small memory area where scoped variables are continuously defined. The amount of memory allocated is defined at compile time, but there are cases where a user can cause an overflow of the size of a variable or pointer, affecting other variables and pointers both inside and outside the stack. Here are some examples:
- An overflow of the size of an array can overwrite a local variable within the stack scope, leading to unexpected behavior.
- An overflow of the size of an array can overwrite the instruction pointer, allowing an attacker to redirect execution to their code or to unexpected code, causing unpredictable behavior.
- An overflow of the size of an array can cause a stack overflow (exceeding the stack’s defined boundaries), potentially affecting other memory areas unintentionally modified by an attacker.
Pointer Subterfuge
Function pointers can be overwritten to transfer control to an attacker’s code, causing issues in the software. When the program executes the function pointer call, the attacker’s code runs instead of the original code.
Function pointers can also be modified when they are the target of an assignment, allowing attackers to control the address and alter other memory locations.
Function pointers can be vulnerable in two main ways:
- They can be directed to execute code from an attacker, compromising system integrity called Code Injection.
- They can be directed to execute original code that triggers a preliminary failure, leading to system misbehavior, called Arc Injection. Arc Injection can occur when the return pointer is altered to point to any part of the running program. For example, it could be redirected to a function that would normally require security authorization. Code injection occurs, for example, when a pointer is overwritten with malicious code provided by an attacker. This code is typically included in the same payload that triggered the Arc Injection.
Buffer overflow exploits.
There are several exploits that can be used to overwrite function pointers, with buffer overflows being one of the most common. These buffer overflows frequently occur due to a failure to restrict writing within array boundaries, often caused by improperly bounded loops.
For a buffer overflow to affect a function pointer, the buffer must be located in the same segment as the target function pointer. Executables typically have two types of data segments: Data and BSS. The Data segment contains all initialized global variables and constants, while the BSS segment contains all uninitialized global variables.
Recommendations
- Always ensure that sufficient memory is allocated for a specific purpose.
- Validate data from external interfaces before allocating memory for or based on them.
- Avoid allowing attacker-controlled data to determine the size of allocated memory.
Heap Smashing
When a buffer overflow occurs in a variable stored on the heap, it can result in entries in the Global Offset Table (GOT) being overwritten, corrupting the resolution of addresses for dynamically linked libraries. It’s also possible for the program flow to be manipulated, affecting data on the heap or the metadata of the memory allocator.
Integer overflow
Due to the lack of distinction in the optimization phase, compilers may not differentiate between signed and unsigned variables, potentially optimizing in ways that cause logical issues and integer overflows. For example, if an upper limit expects an unsigned variable but a signed variable is used, the interpretation may result in the signed value wrapping around, causing the larger supported value to be considered lower, thus passing an unintended condition.
For more examples of integer exploits, consider:
- Integer overflow or wraparound
- Integer underflow.
- Unexpected sign extension.
- Signed to unsigned conversion error.
- Unsigned to signed conversion error.
- Numeric truncation error.
To minimize the risk of integer exploits, follow these recommendations:
- Use unsigned integers whenever possible unless the logic specifically requires a signed integer. Avoid using int as the default.
- Carefully consider the implications of converting from signed to unsigned.
- Be mindful of conversions from a smaller integer domain.
- Perform boundary checks to prevent integer overflows.
- Never deliberately rely on signed integer overflows.
Methods of security.
Input and Output validation
The process of developing input data validation involves:
- Determining the trust boundaries.
- Identifying input sources.
- Specifying minimum and maximum values, valid content, and size range.
- Validating that the input data adheres to these specifications.
- Ensuring that your code can handle valid input values effectively.
Canonicalizacion y Normalizacion
The goal of canonicalization and normalization of input is to reduce it to its simplest and most unified form. These steps are recommended to be performed before input validation, as they simplify the complexity of subsequent validations.
Canonicalization is the process of converting data to its simplest form.
Normalization is the process of standardizing input data by eliminating differences that do not affect the meaning of the data. Example of Canonicalization and Normalization
- Input value: “ Gearstech/CHE …$ /”
- Canonicalization: “ Gearstech/CHE…$/” -> Removes extra spaces and unsupported characters.
- Normalization: “GearsTech/CHE666…$” -> Standardizes the name “GearsTech” to match the predefined name of the association.
Input Output Sanitization
The goal of sanitization is to ensure that the data is well-formed by removing or replacing characters that cannot be part of a username (in this case).
Sanitization: “GearsTech/CHE666” -> Removes unexpected characters from a username, allowing only alphanumeric characters.
Input Validation
Validation is the final step in input validation, ensuring that the input data falls within the expected domain.
Validation: “GearsTech/CHE66” -> Only two numbers are expected, so one is removed.