Ninja Ferret : MSIL - 1. First steps

The random thoughts of a software developer

MSIL - 1. First steps

In my first blog on this process I want to start simple, creating a basic calculator class that can Add and Subtract, using Reflection.Emit to give a basic introduction IL and some of the operations that I have encountered. In the subsequent blogs I will then build on this knowledge until I can achieve a specific goal of dynamically generating WCF clients at runtime. All of the source code for this blog can be found at here.

What does IL look like?

Not knowing IL I decided that the first thing to do was to write the class in C# properly, compile it then use ILDASM to take a look at the resulting IL.

public class Calculator
{
   public int Add(int a, int b) { return a + b; }
   public int Subtract(int a, int b) { return a - b; }
}

Just looking a the Add method for now:


.method public hidebysig instance int32  Add(int32 a,
                     int32 b) cil managed
{
  // Code size       9 (0x9)
  .maxstack  2
  .locals init ([0] int32 CS$1$0000)
  IL_0000:  nop
  IL_0001:  ldarg.1
  IL_0002:  ldarg.2
  IL_0003:  add
  IL_0004:  stloc.0
  IL_0005:  br.s       IL_0007
  IL_0007:  ldloc.0
  IL_0008:  ret
} // end of method TestCalculator::Add

What does this all mean?

.maxstack 2 tells the runtime to expect a maximum of 2 items on the stack during the execution of this method.
.locals init ([0] int32 CS$1$0000) creates a new local variable of type Int32 that will hold the result of the sum.
IL_0000: nop is an empty operation that does not have any affect on the system.
IL_0001: ldarg.1 pushes argument 1 onto the stack - note that argument 0 for an instance method is the this parameter.
IL_0002: ldarg.2 pushes argument 2 onto the stack.
IL_0003: add pops the two integers on the stack and adds them together, putting the result onto the stack.
IL_0004: stloc.0 pops the item on the stack and stores it into the local variable at location 0 (defined above).
IL_0005: br.s IL_0007 transfers execution to the specified location. In my opinion it is almost redundant in this case as execution is being transferred to the next statement.
IL_0007: ldloc.0 loads the value from the local variable and pushes it onto the stack
IL_0008: ret returns the value on the stack back to the caller

Generating this at runtime...

I am now ready to begin auto-generating the above IL and I will be using the following basic pattern:

Create a dynamic assembly to hold the new types
Create the TypeBuilder to generate the type
Create each required method using a MethodBuilder
Create the actual type

1. Creating the dynamic assembly

Before starting to generate types I have to first create an assembly and module in which to hold them:

var assemblyName = new AssemblyName {Name = "Calculator"};
var assemblyBuilder = Thread.GetDomain().DefineDynamicAssembly(assemblyName,
                            AssemblyBuilderAccess.RunAndSave);
var modBuild = assemblyBuilder.DefineDynamicModule("CalculatorModule",
                    string.Format("{0}.dll", assemblyName.Name)

This simply defines a new dynamic assembly the AssemblyBuilderAccess.RunAndSave allows the code to be both executed and saved as an assembly to the file system. It does not automatically save the assembly, I will come to that later, but the ability to save can come in very handy when debugging the generated code. An alternative is to use AssemblyBuilderAccess.Run which will simply allow the assembly to be executed in memory. (Note: I would normally suffix the assembly name with a GUID or timestamp so that it can be uniquely identified, especially if saving to the file system)

2. Create the TypeBuilder

The ModuleBuilder.DefineType() method is used to create the type builder:

var typeBuilder = modBuild.DefineType("Calculator",
                          TypeAttributes.Public |
                          TypeAttributes.Class |
                          TypeAttributes.AutoLayout |
                          TypeAttributes.AnsiClass |
                          TypeAttributes.BeforeFieldInit);

The TypeAttributes were simply taken from what ILDASM was showing me from my compiled implementation of this code. We can see that this is now a public class but there are a few interesting attributes here that I had not seen before:

TypeAttributes.AutoLayout specifies that the fields are automatically laid out by the Common Language Runtime.
TypeAttributes.AnsiLayout tells the underlying runtime that LPSTR is to be interpreted as an ANSI string.
TypeAttributes.BeforeFieldInit states that calling static methods of the type does not force the system to initialize the type.

3. Create the methods

Methods are created using the TypeBuilder.DefineMethod() method to create a MethodBuilder and then using the MethodBuilder.GetILGenerator() method to retrieve the object that we will use to generate the code:

var methodBuilder = typeBuilder.DefineMethod("Add",
                    MethodAttributes.Public | MethodAttributes.Final);
methodBuilder.SetReturnType(typeof(int));
methodBuilder.SetParameters(new[] {typeof (int), typeof (int)});
methodBuilder.InitLocals = true;

var il = methodBuilder.GetILGenerator();
var label = il.DefineLabel();
il.DeclareLocal(typeof (int));
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Ldarg_2);
il.Emit(OpCodes.Add);
il.Emit(OpCodes.Stloc_0);
il.Emit(OpCodes.Br_S, label);
il.MarkLabel(label);
il.Emit(OpCodes.Ldloc_0);
il.Emit(OpCodes.Ret);

Hopefully, the first few lines are self-explanatory, define the method with the set attributes (I'll explore method attributes in more detail in a later post) then set the parameters and return type. On line 5 the methodBuilder.InitLocals = true tells the runtime to automatically initialises local variables to zero. The following lines are simply emitting the operation codes identified in the IL that I generated above. Where you see "_1", "_2" etc. suffixes there is a more generic version where you can pass a short as the second parameter to il.Emit() e.g. il.Emit(OpCodes.Ldarg_1) can be written il.Emit(OpCodes.Ldarg, (short)1). Lines 8, 14 and 15 show how to transfer the execution to another point. I have defined a label on line 5 then when I emit the break in execution on line 14 to tell the code to jump to that label but the label is not yet associated with a position in the IL, this happens on line 15 where I mark the label's position. Now that the Add method has been defined I repeat almost the same IL for the subtract method but use OpCodes.Sub instead of OpCodes.Add.

4. Produce the type

So all that is left to do is to create the type and test it:

var calculatorType = typeBuilder.CreateType();

var obj = Activator.CreateInstance(calculatorType);

var addMethod = calculatorType.GetMethod("Add", new[] {typeof (int), typeof (int)});
Console.WriteLine("1 + 2 = {0}", addMethod.Invoke(obj, new object[] {1, 2}));

var subtractMethod = calculatorType.GetMethod("Subtract", new[] { typeof(int), typeof(int) });
Console.WriteLine("10 + 7 = {0}", subtractMethod.Invoke(obj, new object[] { 10, 7 }));

I simply call typeBuilder.CreateType() once everything is defined to create the type and begin to use it. However, the remaining code is very ugly simply because at the time the application is compiled the type has not been created so the compiler does not know what methods this type will have; this leaves us using reflection to call all of the methods. What would be nicer is if we could have a known interface and automatically generate the implementation using Reflection.Emit and that will be the topic of my next post.

Finally...

If you take anything away from this blog post is that you should not be scared of Reflection.Emit, by creating a real class that is close to the implementation that you want you can identify what operations you need to do the job. Over the coming blog posts things will get more complicated and more useful, hopefully, as I move towards a real-world use of runtime-class generation.

Tags: