.NET Control Flow Analysis (1)-Getting Started

Category: Tag:

I personally think that the three most difficult protections under .NET should be:

IL-level virtual machine
Control flow confusion
Jit Hook

If these protections are not lifted, any of them will greatly hinder our analysis of an assembly using dnSpy.

Today we will briefly study one of the difficulties I think is “control flow”. The premise of learning control flow analysis is that you have a certain understanding of IL. If you understand, then congratulations. For some simple control flow obfuscation, you can write your own anti-obfuscation tool. If you don’t understand it, it’s okay. I have to take it slowly. I have spent some time studying this myself. To read this article, you must open vs, write the code yourself, and follow the article step by step.

Before I wrote the control flow analysis project, the purpose was to learn control flow, and I wanted to understand the implementation and principles of the lowest level. Although there are ready-made projects, such as dnSpy’s de4dot.blocks, de4dot is open source under the GPLv3 agreement, and the project itself is not maintained. The key is that there are not many comments in the project. If there are any bugs that can only be fixed by yourself, it is better to write a project yourself and start writing from scratch.

The control flow is very abstract, <del>I am not particularly familiar with those drawing libraries</del>, so we just use the simplest method to convert all the control flow into string information and display it for us to debug. I remember in a previous post I posted, a classmate said that he wanted a video commentary. It just happens that the explanation of the control flow is not clear with a few lines of text, so I will record a video and add some things that are not clear in the article. This morning, I learned the dirty graph library. At the end of the article, there is the code to draw the control flow! ! !

This article is just an introduction and will not involve any control flow confusion. We only need to build a framework that can analyze the control flow, express the control flow, and feed the analysis and processing results back to the method body.

The entire project has a lot of code, go to blank lines and comments. So only the more critical code will be mentioned below, and the rest of the code can be viewed as an attachment. The attached code is complete and can be used directly by adding it to the newly created project.

Define structure
Reminder: The structure defined in the article is slightly different from that in de4dot.blocks, but the idea is the same

To build a house, you must make a good frame, and a frame must also choose materials. We need to define a structure that can convert the linear instruction flow, exception handling clauses, and local variables in the method body into a structure that is easier to analyze.

For example, IDA will be truncated before jumping to become a code fragment, and this fragment can become a block. We can use various jump statements to convert a method body into many blocks, that is, “blocks”.

Of course, things are far from that simple. When we return to .NET, we can find that there are exception handling clauses in .NET.

For example, there are two red boxes in the figure. The first is try and the second is catch. If the execution of the try is normal, then the catch will not be entered, so we also need to rely on exception handling clauses to block the method body.

So is it over? Certainly not. A try or a catch can be called a scope. We can jump from a smaller scope of the child scope to a larger scope of the parent scope, but we cannot jump from a larger scope of the parent scope To a smaller sub-scope.

The code in the first picture is illegal, and the code in the second picture is legal.

From the IL level, we can only jump to the first statement in a scope, and cannot jump to other statements in the scope. What does it mean?

The second statement of the try block that br jumps to in the first picture is illegal.

What should I do if I want to leave the try block? Use the leave command.

In order to prevent this situation in C#, there is also the above-mentioned “large range cannot enter a small range, and a small range can enter a large range”.

The catch block will not be directly referenced by any jump instruction, only when an exception occurs in the try block, it will enter the catch block.

So far, we can define the structure like this (part of the code has been omitted):

 

public enum ScopeBlockType {
    Normal,
    Try,
    Filter,
    Catch,
    Finally,
    Fault
}
public interface IBlock {
    IBlock Scope { get; set; }
    bool HasExtraData { get; }
    void PushExtraData(object obj);
    void PopExtraData();
    T PeekExtraData<T>();
}
public abstract class BlockBase : IBlock {
    private IBlock _scope;
    private Stack<object> _extraDataStack;
    public IBlock Scope {
        get => _scope;
        set => _scope = value;
    }
    public bool HasExtraData => _extraDataStack != null && _extraDataStack.Count != 0;
    public T PeekExtraData<T>() {
        return (T)_extraDataStack.Peek();
    }
    public void PopExtraData() {
        _extraDataStack.Pop();
    }
    public void PushExtraData(object obj) {
        if (_extraDataStack == null)
            _extraDataStack = new Stack<object>();
        _extraDataStack.Push(obj);
    }
}
public sealed class BasicBlock : BlockBase {
    private List<Instruction> _instructions;
    private OpCode _branchOpcode;
    private BasicBlock _fallThrough;
    private BasicBlock _conditionalTarget;
    private List<BasicBlock> _switchTargets;
    public List<Instruction> Instructions {
        get => _instructions;
        set => _instructions = value;
    }
    public bool IsEmpty => _instructions.Count == 0;
    public OpCode BranchOpcode {
        get => _branchOpcode;
        set => _branchOpcode = value;
    }
    public BasicBlock FallThrough {
        get => _fallThrough;
        set => _fallThrough = value;
    }
    public BasicBlock ConditionalTarget {
        get => _conditionalTarget;
        set => _conditionalTarget = value;
    }
    public List<BasicBlock> SwitchTargets {
        get => _switchTargets;
        set => _switchTargets = value;
    }
}
public abstract class ScopeBlock : BlockBase {
    protected List<IBlock> _blocks;
    protected ScopeBlockType _type;
    public List<IBlock> Blocks {
        get => _blocks;
        set => _blocks = value;
    }
    public IBlock FirstBlock {
        get => _blocks[0];
        set => _blocks[0] = value;
    }
    public IBlock LastBlock {
        get => _blocks[_blocks.Count - 1];
        set => _blocks[_blocks.Count - 1] = value;
    }
    public ScopeBlockType Type {
        get => _type;
        set => _type = value;
    }
}
public sealed class TryBlock : ScopeBlock {
    private readonly List<ScopeBlock> _handlers;
    public List<ScopeBlock> Handlers => _handlers;
}
public sealed class FilterBlock : ScopeBlock {
    private HandlerBlock _handler;
    public HandlerBlock Handler {
        get => _handler;
        set => _handler = value;
    }
}
public sealed class HandlerBlock : ScopeBlock {
    private ITypeDefOrRef _catchType;
    public ITypeDefOrRef CatchType {
        get => _catchType;
        set => _catchType = value;
    }
}
public sealed class MethodBlock : ScopeBlock {
    private List<Local> _variables;
    public List<Local> Variables {
        get => _variables;
        set => _variables = value;
    }
}

Let me explain this definition. There is a very strange BlockBase and ExtraData. This ExtraData can be understood as additional data. Sometimes when we analyze the control flow, we need to bind a piece of data and a block together. At this time, ExtraData is It worked. Because it is not binding data once, there may be a lot of data to be bound, so we need Stack<T>, which is the stack type, which can be first-in and last-out, which is very in line with our programming habits. Push data during initialization and Peek when it is used. Just finished Pop.

BasicBlock is the smallest unit, called the basic block. For convenience, if the last instruction of the basic block changes the control flow, we will delete the last instruction in the basic block, assign it to the field _branchOpcode, and then assign the jump target to _fallThrough, _conditionalTarget, _switchTargets. In this way, it becomes much more convenient for us to update the jump relationship between the control flow.

Many basic blocks together can become a scope block, that is, ScopeBlock. Of course ScopeBlock can also be nested within each other, for example, one ScopeBlock contains another ScopeBlock.

 

Instruction stream to block
The block in the subtitle refers to the structure we defined earlier, for example, BasicBlock is called a basic block.

Returning to the disassembled control flow graph shown by IDA, we can find that the control flow is actually a directed graph.

This directed graph may have loops, self-loops, and one point may be connected to many points. We can use some of the thoughts of “graphs” to deal with (this is not difficult, search for BFS, DFS, directed graphs, and just figure out these 3)

We add a class called “BlockParser” and add the following code:

public sealed class BlockParser {
    private readonly IList<Instruction> _instructions;
    private readonly IList<ExceptionHandler> _exceptionHandlers;
    private readonly IList<Local> _variables;

    public BlockParser(IList<Instruction> instructions, IList<ExceptionHandler> exceptionHandlers, IList<Local> variables) {
        if (instructions == null)
            throw new ArgumentNullException(nameof(instructions));
        if (exceptionHandlers == null)
            throw new ArgumentNullException(nameof(exceptionHandlers));
        if (variables == null)
            throw new ArgumentNullException(nameof(variables));
        if (HasNotSupportedInstruction(instructions))
            throw new NotSupportedException("存在不受支持的指令。");

        _instructions = instructions;
        _exceptionHandlers = exceptionHandlers;
        _variables = variables;
    }

    private static bool HasNotSupportedInstruction(IEnumerable<Instruction> instructions) {
        foreach (Instruction instruction in instructions)
            switch (instruction.OpCode.Code) {
            case Code.Jmp:
                return true;
            }
        return false;
    }
}

We don’t need to process the jmp instruction, because it is very troublesome to process, and this instruction will not appear in normal .NET programs. If you want to know what jmp is, you can see MSDN. The jmp in IL and the assembled jmp are not the same thing.

First we have to analyze the potential entrance. Why is it potential? Because the method body may be confused, full of instructions that cannot be executed, such as this:

The part in the red box is the basic block that will not be used, although the first statement IL_0005 in the red box is indeed an entry point.

We add fields:

private bool[] _isEntrys;
private int[] _blockLengths;

_isEntrys indicates whether an instruction is a potential entry point. If it is, there will be a record in _blockLengths, indicating that the basic block represented by this entry point has several instructions.

Then we can scan directly from the beginning to the end of the method body to get the two information mentioned above.

private void AnalyzeEntrys() {
    _isEntrys = new bool[_instructions.Count];
    _isEntrys[0] = true;
    for (int i = 0; i < _instructions.Count; i++) {
        Instruction instruction;

        instruction = _instructions[i];
        switch (instruction.OpCode.FlowControl) {
        case FlowControl.Branch:
        case FlowControl.Cond_Branch:
        case FlowControl.Return:
        case FlowControl.Throw:
            if (i + 1 != _instructions.Count)
                // If the current is not the last instruction, then the next instruction is the new entry
                _isEntrys[i + 1] = true;
            if (instruction.OpCode.OperandType == OperandType.InlineBrTarget)
                // brX
                _isEntrys[_instructionDictionary[(Instruction)instruction.Operand]] = true;
            else if (instruction.OpCode.OperandType == OperandType.InlineSwitch)
                // switch
                foreach (Instruction target in (IEnumerable<Instruction>)instruction.Operand)
                    _isEntrys[_instructionDictionary[target]] = true;
            break;
        }
    }
    foreach (ExceptionHandlerInfo exceptionHandlerInfo in _exceptionHandlerInfos) {
        _isEntrys[exceptionHandlerInfo.TryStartIndex] = true;
        if (exceptionHandlerInfo.TryEndIndex != _instructions.Count)
            _isEntrys[exceptionHandlerInfo.TryEndIndex] = true;
        // try
        if (exceptionHandlerInfo.FilterStartIndex != -1)
            _isEntrys[exceptionHandlerInfo.FilterStartIndex] = true;
        // filter
        _isEntrys[exceptionHandlerInfo.HandlerStartIndex] = true;
        if (exceptionHandlerInfo.HandlerEndIndex != _instructions.Count)
            _isEntrys[exceptionHandlerInfo.HandlerEndIndex] = true;
        // handler
    }
}

Next, we need to start DFS or BFS from each entry, eliminate invalid entry points through jump instructions, and create basic blocks.

DFS, or depth-first search, has a very downside, that is, if the method body is too large and there are too many jumps, the stack may overflow, because the depth-first search is recursive. BFS, which is breadth first search, has more code than DFS, but not much. In the future, many of our algorithms will use recursive operations, so we choose DFS with less code. When we want to block, we can create a new thread and specify a 4MB 16MB stack, which will never overflow.

private void AnalyzeReferencesAndCreateBasicBlocks(int startIndex) {
    if (!_isEntrys[startIndex])
        throw new InvalidOperationException("Entrance recognition error。");

    int exitIndex;
    Instruction exit;
    int nextEntryIndex;

    exitIndex = FindExitIndex(startIndex, out _blockLengths[startIndex]);
    _basicBlocks[startIndex] = new BasicBlock(EnumerateInstructions(startIndex, _blockLengths[startIndex]));
    exit = _instructions[exitIndex];
    switch (exit.OpCode.FlowControl) {
    case FlowControl.Branch:
        // Branch block is quoted
        nextEntryIndex = _instructionDictionary[(Instruction)exit.Operand];
        if (_blockLengths[nextEntryIndex] == 0)
            // Branch block not analyzed
            AnalyzeReferencesAndCreateBasicBlocks(nextEntryIndex);
        break;
    case FlowControl.Cond_Branch:
        // The next block and branch block are quoted
        nextEntryIndex = exitIndex + 1;
        if (nextEntryIndex < _instructions.Count && _blockLengths[nextEntryIndex] == 0)
            // The next instruction is the entry of the unanalyzed new block
            AnalyzeReferencesAndCreateBasicBlocks(nextEntryIndex);
        if (exit.OpCode.OperandType == OperandType.InlineBrTarget) {
            // bxx
            nextEntryIndex = _instructionDictionary[(Instruction)exit.Operand];
            if (_blockLengths[nextEntryIndex] == 0)
                AnalyzeReferencesAndCreateBasicBlocks(nextEntryIndex);
        }
        else if (exit.OpCode.OperandType == OperandType.InlineSwitch) {
            // switch
            foreach (Instruction nextEntry in (IEnumerable<Instruction>)exit.Operand) {
                nextEntryIndex = _instructionDictionary[nextEntry];
                if (_blockLengths[nextEntryIndex] == 0)
                    AnalyzeReferencesAndCreateBasicBlocks(nextEntryIndex);
            }
        }
        break;
    case FlowControl.Call:
    case FlowControl.Next:
        // The next block is quoted
        nextEntryIndex = exitIndex + 1;
        if (_blockLengths[nextEntryIndex] == 0)
            // We don’t need to judge whether it’s the end
            // If there are no instructions such as ret, br, and throw, the end of the block is reached, indicating that the control flow of the method body is problematic
            AnalyzeReferencesAndCreateBasicBlocks(nextEntryIndex);
        break;
    }
}

This is the core code of the effective and used basic block created by DFS. Of course, the basic block created is still linear, without any block-to-block information.

Then we need to add branches to these basic blocks, that is, add the jump relationship between the blocks (there is no containment relationship):

private void AddBranchs() {
    BasicBlock nextBasicBlock;

    nextBasicBlock = null;
    for (int i = _basicBlocks.Length - 1; i >= 0; i--) {
        BasicBlock basicBlock;
        List<Instruction> instructions;
        int lastInstructionIndex;
        Instruction lastInstruction;

        basicBlock = _basicBlocks[i];
        if (basicBlock == null)
            continue;
        instructions = basicBlock.Instructions;
        lastInstructionIndex = instructions.Count - 1;
        lastInstruction = instructions[lastInstructionIndex];
        switch (lastInstruction.OpCode.FlowControl) {
        case FlowControl.Branch:
            basicBlock.BranchOpcode = lastInstruction.OpCode;
            basicBlock.FallThrough = _basicBlocks[_instructionDictionary[(Instruction)lastInstruction.Operand]];
            instructions.RemoveAt(lastInstructionIndex);
            break;
        case FlowControl.Cond_Branch:
            if (nextBasicBlock == null)
                // nextBasicBlock should not be null, because we have removed invalid code before
                throw new InvalidOperationException();
            basicBlock.BranchOpcode = lastInstruction.OpCode;
            basicBlock.FallThrough = nextBasicBlock;
            if (lastInstruction.OpCode.Code == Code.Switch) {
                Instruction[] switchTargets;

                switchTargets = (Instruction[])lastInstruction.Operand;
                basicBlock.SwitchTargets = new List<BasicBlock>(switchTargets.Length);
                for (int j = 0; j < switchTargets.Length; j++)
                    basicBlock.SwitchTargets.Add(_basicBlocks[_instructionDictionary[switchTargets[j]]]);
            }
            else
                basicBlock.ConditionalTarget = _basicBlocks[_instructionDictionary[(Instruction)lastInstruction.Operand]];
            instructions.RemoveAt(lastInstructionIndex);
            break;
        case FlowControl.Call:
        case FlowControl.Next:
            if (nextBasicBlock == null)
                throw new InvalidOperationException();
            basicBlock.BranchOpcode = OpCodes.Br;
            basicBlock.FallThrough = nextBasicBlock;
            break;
        case FlowControl.Return:
        case FlowControl.Throw:
            basicBlock.BranchOpcode = lastInstruction.OpCode;
            instructions.RemoveAt(lastInstructionIndex);
            break;
        }
        nextBasicBlock = basicBlock;
    }
}

To deal with some simple cases, you may not need to convert to a tree structure, that is, you do not need to add containment relationships between basic blocks. But in most cases, we need to convert to a tree structure (the MethodBlock we defined earlier).

The containment relationship between blocks is only related to exception handling clauses, we only need to merge the basic blocks contained in try/catch together.

Before that, we have to define a new exception handling clause structure. We can mark the entry point of a try, which is the first basic block in the try scope, to indicate that there is an exception handling clause. Because at the same entry point, there may be multiple handlers for a try block, such as:

try {
    code...
}
catch (Exception1) {
    code...
}
catch (Exception2) {
    code...
}

There may be more complicated situations, and there are nesting relationships between try:

try {
    try {
        code...
    }
    catch {
        code...
    }
    code...
}
catch (Exception1) {
    code...
}
catch (Exception2) {
    code...
}

Considering these circumstances, our structure can be defined as such. Of course, this definition is not necessary. It’s just because I prefer to define it this way, and I find it convenient. If you have a more convenient structure, you can also define it in your own way to achieve the final goal.

private sealed class LinkedExceptionHandlerInfo {
    private readonly ExceptionHandlerInfo _value;
    private readonly List<ExceptionHandlerInfo> _handlers;
    private List<LinkedExceptionHandlerInfo> _children;

    public ExceptionHandlerInfo TryInfo => _value;

    public List<ExceptionHandlerInfo> Handlers => _handlers;

    public bool HasChildren => _children != null && _children.Count != 0;

    public List<LinkedExceptionHandlerInfo> Children {
        get {
            if (_children == null)
                _children = new List<LinkedExceptionHandlerInfo>();
            return _children;
        }
    }
}

Then, we need to analyze the relationship between the exception handling clauses and save it in the structure just defined.

Let’s add another false exception handling clause called dummy. Dummy is the parent of any exception handling clause. This allows us to perform recursive operations to merge all blocks in the scope into one scope.

private LinkedExceptionHandlerInfo _linkedExceptionHandlerInfoRoot;

private void AnalyzeExceptionHandlers() {
    _linkedExceptionHandlerInfoRoot = new LinkedExceptionHandlerInfo(new ExceptionHandlerInfo(0, int.MaxValue));
    // Create Dummy
    foreach (ExceptionHandlerInfo exceptionHandlerInfo in _exceptionHandlerInfos) {
        bool isTryEqual;
        LinkedExceptionHandlerInfo scope;

        if (!exceptionHandlerInfo.IsVisited) {
            Debug.Assert(false);
            // Under normal circumstances, we will not encounter invalid exception handling information, and currently no shell will add invalid exception handling information

            continue;
        }
        scope = _linkedExceptionHandlerInfoRoot.FindParent(exceptionHandlerInfo, out isTryEqual);
        if (isTryEqual)
            scope.Handlers.Add(exceptionHandlerInfo);
        else {
            List<LinkedExceptionHandlerInfo> children;
            LinkedExceptionHandlerInfo child;

            children = scope.Children;
            child = new LinkedExceptionHandlerInfo(exceptionHandlerInfo);
            if (!scope.HasChildren)
                children.Add(child);
            else {
                int subChildCount;

                subChildCount = 0;
                for (int i = 0; i < children.Count; i++) {
                    LinkedExceptionHandlerInfo subChild;

                    subChild = children[i];
                    // Determine whether child is the scope of a subChild
                    if (child.TryInfo.HasChild(subChild.TryInfo)) {
                        child.Children.Add(subChild);
                        subChildCount++;
                    }
                    else
                        // Advance subChild
                        children[i - subChildCount] = subChild;
                }
                children.RemoveRange(children.Count - subChildCount, subChildCount);
                children.Add(child);
            }
        }
    }
}

Remember the interface IBlock we declared earlier? All the block structures must inherit from IBlock, indicating that this is a block.

We add a field _blocks to represent the array of this interface.

private IBlock[] _blocks;

private void CombineExceptionHandlers(LinkedExceptionHandlerInfo linkedExceptionHandlerInfo) {
    ExceptionHandlerInfo tryInfo;
    TryBlock tryBlock;

    if (linkedExceptionHandlerInfo.HasChildren)
        // Find the smallest exception handling block
        foreach (LinkedExceptionHandlerInfo child in linkedExceptionHandlerInfo.Children)
            CombineExceptionHandlers(child);
    tryInfo = linkedExceptionHandlerInfo.TryInfo;
    tryBlock = new TryBlock(EnumerateNonNullBlocks(tryInfo.TryStartIndex, tryInfo.TryEndIndex));
    RemoveBlocks(tryInfo.TryStartIndex, tryInfo.TryEndIndex);
    _blocks[tryInfo.TryStartIndex] = tryBlock;
    // try
    foreach (ExceptionHandlerInfo handlerInfo in linkedExceptionHandlerInfo.Handlers) {
        AddHandler(tryBlock, handlerInfo);
        RemoveBlocks(handlerInfo.FilterStartIndex == -1 ? handlerInfo.HandlerStartIndex : handlerInfo.FilterStartIndex, handlerInfo.HandlerEndIndex);
    }
    // filter/handler
}

In this way, we get the containment relationship between the blocks and have a complete tree structure. Next, remove the null in the _blocks array, which is a MethodBlock.

Display block as text
The code is not BUG-free after it is written, and it may be BUG-free after many debugging. Control flow is inherently abstract, and without special treatment, you can’t see what branches are there and where it flows from and where you are when you look at a river. We use one of the simplest methods to convert to a string to display this block.

Let’s first add a Helper class called BlockEnumerator, which can help us traverse all blocks in an IBlock.

It looks like this:

public abstract class BlockEnumerator {
    protected void Enumerate(IEnumerable<IBlock> blocks);
    protected void Enumerate(IBlock block);
    protected virtual void OnBasicBlock(BasicBlock basicBlock);
    protected virtual void OnScopeBlockEnter(ScopeBlock scopeBlock);
    protected virtual void OnScopeBlockLeave(ScopeBlock scopeBlock);
    protected virtual void OnTryBlockEnter(TryBlock tryBlock);
    protected virtual void OnTryBlockLeave(TryBlock tryBlock);
    protected virtual void OnFilterBlockEnter(FilterBlock filterBlock);
    protected virtual void OnFilterBlockLeave(FilterBlock filterBlock);
    protected virtual void OnHandlerBlockEnter(HandlerBlock handlerBlock);
    protected virtual void OnHandlerBlockLeave(HandlerBlock handlerBlock);
    protected virtual void OnMethodBlockEnter(MethodBlock methodBlock);
    protected virtual void OnMethodBlockLeave(MethodBlock methodBlock);
}

We inherit this class, write a class called BlockPrinter, rewrite the virtual function of OnXXBlockXX in the base class. For example, when encountering basic blocks, we can do this:

protected override void OnBasicBlock(BasicBlock basicBlock) {
    StringBuilder branchInfo;

    if (_needNewLine)
        _buffer.AppendLine();
    WriteLine("// " + GetBlockIdString(basicBlock) + (basicBlock.IsEmpty ? " (empty)" : string.Empty));
    for (int i = 0; i < basicBlock.Instructions.Count; i++)
        WriteLine(basicBlock.Instructions[i].ToString());
    branchInfo = new StringBuilder();
    branchInfo.Append("// opcode:" + basicBlock.BranchOpcode.ToString());
    if (basicBlock.BranchOpcode.FlowControl == FlowControl.Branch)
        branchInfo.Append(" | fallthrough:" + GetBlockIdString(basicBlock.FallThrough));
    else if (basicBlock.BranchOpcode.FlowControl == FlowControl.Cond_Branch) {
        branchInfo.Append(" | fallthrough:" + GetBlockIdString(basicBlock.FallThrough));
        if (basicBlock.BranchOpcode.Code == Code.Switch) {
            branchInfo.Append(" | switchtarget:{");
            foreach (BasicBlock target in basicBlock.SwitchTargets)
                branchInfo.Append(GetBlockIdString(target) + " ");
            branchInfo[branchInfo.Length - 1] = '}';
        }
        else
            branchInfo.Append(" | condtarget:" + GetBlockIdString(basicBlock.ConditionalTarget));
    }
    WriteLine(branchInfo.ToString());
    _needNewLine = true;
}

Other types of IBlock are also handled in the same way. The code is not posted, and it is included in the attachment.

Block conversion back to instruction stream
The conversion of the instruction stream to the block is a complicated process, and the conversion of the block back to the instruction stream is much simpler.

First of all, we need to convert the tree-structured block back to a linear basic block array. Is this step the opposite of the step in “Instruction stream conversion to block”?

After converting to basic blocks, we can more easily generate jump statements and exception handling clauses in metadata.

We first add a class called BlockInfo, an instance of this class should be added to each basic block as additional data.

If a basic block is the entry point of an exception handling clause, then the _tryBlocks of the BlockInfo of this basic block will not be null, and we can use additional information to generate exception handling clauses.

private sealed class BlockInfo {
    private readonly int _index;
    private readonly Instruction _branchInstruction;
    private readonly List<TryBlock> _tryBlocks;
    private bool _canSkip;

    public int Index => _index;

    public Instruction BranchInstruction => _branchInstruction;

    public List<TryBlock> TryBlocks => _tryBlocks;

    /// <summary>
    /// Indicates whether the current block can be skipped (the current block must have only one br instruction, and the target of br is the next basic block)

    /// </summary>
    public bool CanSkip {
        get => _canSkip;
        set => _canSkip = value;
    }
}

Add another class called BlockLayouter. This class can convert MethodBlock into a collection of many basic blocks, and add additional data to the basic blocks

private sealed class BlockLayouter : BlockEnumerator {
    private readonly List<BasicBlock> _basicBlocks;
    private readonly List<TryBlock> _lastTryBlocks;
    private int _index;

    public BlockLayouter(List<BasicBlock> basicBlocks) {
        if (basicBlocks == null)
            throw new ArgumentNullException(nameof(basicBlocks));

        _basicBlocks = basicBlocks;
        _lastTryBlocks = new List<TryBlock>();
    }

    public void LayoutAndCreateBlockInfo(MethodBlock methodBlock) {
        Enumerate(methodBlock);
    }

    protected override void OnBasicBlock(BasicBlock basicBlock) {
        basicBlock.PushExtraData(new BlockInfo(_index, _lastTryBlocks));
        _basicBlocks.Add(basicBlock);
        _lastTryBlocks.Clear();
        _index++;
    }

    protected override void OnTryBlockEnter(TryBlock tryBlock) {
        _lastTryBlocks.Add(tryBlock);
    }
}

We use the converted List<BasicBlock> to generate the instruction stream.

private void GenerateInstructions() {
    _instructions = new List<Instruction>();
    for (int i = 0; i < _basicBlocks.Count - 1; i++) {
        BasicBlock basicBlock;

        basicBlock = _basicBlocks[i];
        if (basicBlock.IsEmpty && basicBlock.BranchOpcode.Code == Code.Br && basicBlock.FallThrough == _basicBlocks[i + 1])
            basicBlock.PeekExtraData<BlockInfo>().CanSkip = true;
    }
    // Set up CanSkip
    foreach (BasicBlock basicBlock in _basicBlocks) {
        Instruction branchInstruction;

        branchInstruction = basicBlock.PeekExtraData<BlockInfo>().BranchInstruction;
        branchInstruction.OpCode = basicBlock.BranchOpcode;
        if (branchInstruction.OpCode.FlowControl == FlowControl.Branch)
            branchInstruction.Operand = GetFirstInstruction(basicBlock.FallThrough);
        else if (branchInstruction.OpCode.FlowControl == FlowControl.Cond_Branch)
            if (branchInstruction.OpCode.Code == Code.Switch) {
                Instruction[] switchTargets;

                switchTargets = new Instruction[basicBlock.SwitchTargets.Count];
                for (int i = 0; i < switchTargets.Length; i++)
                    switchTargets[i] = GetFirstInstruction(basicBlock.SwitchTargets[i]);
                branchInstruction.Operand = switchTargets;
            }
            else
                branchInstruction.Operand = GetFirstInstruction(basicBlock.ConditionalTarget);
    }
    // Add jump instructions
    for (int i = 0; i < _basicBlocks.Count; i++) {
        BasicBlock basicBlock;
        BlockInfo blockInfo;
        Instruction branchInstruction;
        BasicBlock nextBasicBlock;

        basicBlock = _basicBlocks[i];
        blockInfo = basicBlock.PeekExtraData<BlockInfo>();
        if (blockInfo.CanSkip)
            continue;
        branchInstruction = blockInfo.BranchInstruction;
        nextBasicBlock = i + 1 == _basicBlocks.Count ? null : _basicBlocks[i + 1];
        if (branchInstruction.OpCode.Code == Code.Br) {
            AppendInstructions(basicBlock, basicBlock.FallThrough == nextBasicBlock);
            // When it is an unconditional jump instruction, if the direct block is the next block, then the branch instruction can be omitted
        }
        else if (branchInstruction.OpCode.FlowControl == FlowControl.Cond_Branch) {
            AppendInstructions(basicBlock, false);
            if (basicBlock.FallThrough != nextBasicBlock)
                // Need to fix the jump
                _instructions.Add(new Instruction(OpCodes.Br, GetFirstInstruction(basicBlock.FallThrough)));
        }
        else
            AppendInstructions(basicBlock, false);

    }
}

private void AppendInstructions(BasicBlock basicBlock, bool canSkipBranchInstruction) {
    if (!basicBlock.IsEmpty)
        _instructions.AddRange(basicBlock.Instructions);
    if (!canSkipBranchInstruction)
        _instructions.Add(basicBlock.PeekExtraData<BlockInfo>().BranchInstruction);
}

CanSkip in the code represents whether this block is a basic block with only br jump instructions and no other instructions, and the target of br is the next basic block. If it is, this basic block can be omitted and not processed at all. If the jump instruction of a basic block is br and the target is also the next basic block, then we only need to add other instructions of this basic block, without adding jump instructions.

The next step is to generate exception handling clauses. The core code is this:

private void GenerateExceptionHandlers() {
    _exceptionHandlers = new List<ExceptionHandler>();
    for (int i = _basicBlocks.Count - 1; i >= 0; i--) {
        // The innermost exception block should be declared first. (Error: 0x801318A4)
        // So we traverse in reverse order
        BasicBlock basicBlock;
        List<TryBlock> tryBlocks;

        basicBlock = _basicBlocks[i];
        tryBlocks = basicBlock.PeekExtraData<BlockInfo>().TryBlocks;
        if (tryBlocks == null || tryBlocks.Count == 0)
            continue;
        for (int j = tryBlocks.Count - 1; j >= 0; j--) {
            TryBlock tryBlock;

            tryBlock = tryBlocks[j];
            foreach (ScopeBlock scopeBlock in tryBlock.Handlers)
                if (scopeBlock is FilterBlock) {
                    FilterBlock filterBlock;

                    filterBlock = (FilterBlock)scopeBlock;
                    _exceptionHandlers.Add(GetExceptionHandler(tryBlock, GetFirstBasicBlock(filterBlock.FirstBlock), filterBlock.Handler));
                }
                else {
                    HandlerBlock handlerBlock;

                    handlerBlock = (HandlerBlock)scopeBlock;
                    _exceptionHandlers.Add(GetExceptionHandler(tryBlock, null, handlerBlock));
                }
        }
    }
}

There is also a list of local variables used in the generation, which is not posted here. In fact, my code for generating local variables may be a bit troublesome. I changed it directly to traverse all basic blocks. It was much easier to add local variables to the list when I found the first time I encountered them. The whole project is packaged, so I don’t bother to change it.

Draw control flow
This is a newly added section. I also temporarily learned how to draw control flow this morning, and I will post the rendering directly here. Then this drawn control flow does not seem to show the relationship between exception handling blocks.

The red line indicates an unconditional jump, and the green line indicates a conditional jump. It doesn’t seem to be useful to draw it, but it looks more comfortable.

 

Download

Control flow analysis project download

Control flow drawing project download address (no source code, decompile the ConsoleApp1.exe in the compressed package)

 

 

Reviews

There are no reviews yet.

Be the first to review “.NET Control Flow Analysis (1)-Getting Started”

Your email address will not be published. Required fields are marked *