Open GL with Assembly in 4kb - Notes from the Field

Written by: (Polaris & Syntax) / Northern Dragons

Contact: polaris@northerndragons.ca & syntax@northerndragons.ca

Hello all and welcome to this article on creating a 4kb OpenGL framework in Assembly and using it. This framework was used to create the intro "Trees" and was presented to the world at Assembly 2002. In making the framework, we had a lot of experiences, which we would like to share with you so you can avoid the insanity and terror of trying to write your first OpenGL intro in Assembly Language. :)

There were two primary phases to the creation of "Trees". The first phase – involved the creation of the "framework" – authored by Syntax of the Northern Dragons. The framework was left for a while until Polaris decided to try to code some stuff with it. Several (okay.. many) iterations later; the 4kb intro "Trees" was born. After completing the intro (having nearly lost our minds in the process), we decided to add some documentation to the Internet world that we could really have used if it had existed. Use it well, as writing OpenGL enabled applications with Assembly seems to be a dying art with little information available.

Phase #1 – (The Framework) By Syntax / Northern Dragons

PART 1 - First Contact

How I came to write the initial framework is still a mystery, especially for me. Polaris and I both eagerly read the article by Upi / throb in HUGI 24 about writing a 512b intro which uses OpenGL. How can this be possible? More importantly, can we learn from this? The example was assembled using NASM, but we were using the movsd distribution of MASM32. Could we write the same thing in MASM32?? I tried and I couldn't get it to work, as much as I would have liked to. MASM32 had particular problems switching between code/data segments and putting data into the PE header sections. I'm sure some of you could actually get that to work. Maybe it could have worked, maybe not.

PART 2 - The Next Attempt

Then, one day I was looking around in the MASM32 example directory and found EXAMPLE8\MOB\NOIMPORT.EXE by MOB aka Drcmda. This application doesn't use ANY import libraries, but instead uses a technique of looking at the stack before the application was created to grab API call pointers from the runtime libraries. Polaris and I had already agreed that the import libraries were the biggest problems to face to restrict the app size. So, using NOIMPORT as a base, I started to rewrite the code...

The plan was to retrieve the LoadLibrary and GetProcAddress calls from the OS, allocate an area of memory for the OpenGL call addresses, load the OpenGL DLL, use GetProcAddress to get the function addresses and store them in the allocated memory. I soon found this to be costly in terms of code (the size was around 2.5-3kb with OpenGL/OGU) and really, really awful to try and maintain, so what else?

PART 3 - Brighter Days Ahead :)

As I say, we had not used importing of the libraries as we thought this would bloat the app. This was the general theory, but we had been proved wrong before, so let's try it... a small test app hack later and I found that importing really didn't cause that much of a bloat as was first thought! Cool! The test app was rewritten a few (2) times, OpenGL was added and NeHe's Framework from Tutorial 1 was used as a basis for the OpenGL sections. Size was reduced in the following ways:

1. Use the EDIT class to base the app window on (idea from 512b Upi/throb, in HUGI 24). This removes about 100-200 bytes of code to register the window class, if I remember correctly. Another advantage of the linking is that unlike Upi/Throb's code, it works on XP without a change needed (Unicode is handled).

2. Linker options (combining sections, using the NOWIN98 option, etc). This involved lots of research on the net and experimentation.

3. Removing as much code as was possible while still getting it to work. This included removing ExitProcess(), cutting down the OpenGL code were the main fixes.

4. Trying to use smaller opcodes for instructions. Instead of using "mov eax,0" we used xor eax,eax. This generates a smaller output. We found a few of these during the project, when Polaris was searching for bytes doing massive amounts of "left brain optimizations".

PART 4 - "Friends really don't let Friends do OpenGL in ASM"

After the framework had been built and used in an actual competition, we thought of three more techniques that we could have used to reduce the size.

1. Rearrange the data to allow greater compression via UPX or another executable compressor. Arranging that data so identical data streams are continuous allows for better compression. For Example XXXXYYYYXXXXZZZZ = NOT SO GOOD COMPRESSION XXXXXXXXYYYYZZZZ = BETTER COMPRESSION!

2. Rearrange the code to make it as sequential as possible, thus removing some calls. This involves a small reduction, a small hack job if you're desperate for space.

3. Hack UPX and re-write it to get greater compression (a technique we heard was used by Farb-rausch!). To hack upx takes a lot of time and a talented coder. Unfortunately, Polaris was busy on the main bit of the intro and I was busy in my Real Life job, and no-one else was interested in it, so we didn't do this.

So, in my hot little hands I had an OpenGL framework in assembly that weighed in at 2kb, used OpenGL, worked on all 32 bit Windows platforms with no changes necessary and all we needed now is for it to do something cool... enter Polaris and his Tree.

Phase #2 – (The Tree) By Polaris / Northern Dragons

Part 1 – "Begin at the beginning..."

It can be pretty daunting to code something in assembly, but I found the task of developing an OpenGL enabled application a little more than just challenging. Here I was with a fancy OpenGL framework from Syntax, but no background in OpenGL, much less OpenGL in assembly to pull the project off. The framework is straight forward to just read and understand... but that doesn’t make it "ready to serve". Perhaps I started this late; and don’t have my free copy of "the soon to be lost art of OpenGL assembly programming". So, with few resources in hand, I started the project.

It’s can be pretty frustrating to have to experiment (amongst blue screens of death etc) to figure out how something works. Shortly into the trees project I promised myself that I would share the information that I discovered. This might be old news for some of you... but I hope that a newer coder finds this information useful.

Part 2 – "It wasn’t always that way..."

The first step in the trees project involved actively seeking out some documentation on how to code in OpenGL. This is very easy to find, www.google.com or another search engine will easily link you up. Some good sites include the tutorials from nehe.gamedev.net, as well as the OpenGL redbook. The Neon Helium site does have tutorials in assembly, but I couldn’t get them to compile at first. The examples mention "The hardcode web site: bizarrecreations.webjump.com", and referenced include definition files that are no longer in the masm32 package. I couldn’t find any examples or documentation that showed how to use the OpenGL include files in the masm32 package. Despite my best efforts, I couldn’t really get a hold of any documentation that showed what data types the routines expected on the stack. Every once and a while I check the bizarrecreation website and find that the site is still down.

After a lot of Internet searching, I did discover a sample masm32 OpenGL application that had these additional include files that every example I found seemed to use. I do not know what happened to the bizarrecreations web site; and can find only a few scant references and links to it. In any case, I started on my merry way with these include files, and have included them with this package for your reference and use. It will save you several hours of websearch. Included are:

GL.DEF – This is the meat and potatoes for the OpenGL library. Any routine that starts in gl... is found here.

GLU.DEF – This is the second half for the OpenGL Library. Any routine that starts in glu... is in here.

INCLUDE.DEF – Has a few extra (missing routines?) from the include file.

GDI32.INC, KERNEL32.INC, UNICODE.DEF, USER32.INC, WINEXTRA.DEF – I didn’t have to use these files; instead I used the files in the masm32 package for the windows functions. Guess these were required once upon a time.

Part 3 – "One small step for the demo coder... = one small intro application."

I wish I could say that it is super easy after this point, and that all you would have to do is take the excellent Neon Helium samples from http://nehe.gamedev.net/ with the include files. But I would most definitely be fibbing. Sitting down with a C++ prototype of what you want to do, the OpenGL redbook, and the Neon Helium samples will illuminate the path, but there are hidden pitfalls to be wary off.

RULE #1 Watch your registers.

I have yet to find specific documentation on what the register usage of the OpenGL routines are. The return codes do come back in EAX; so don’t trust that. But what of the other registers? I honestly don’t know and can’t really say. Perhaps the best way to be insured is to use the pushad / popad. If nothing else; don’t forget to preserve eax. If you do have this documented somewhere; please send it to me!

RULE #2 Watch for the dreaded GL Float

The include files define GLFLOAT as a REAL4, but be sure to watch your interaction with it. Definitions for variables should be like: "gl05f GLfloat 0.5f". Failure to include this F specifier will not result in a compilation error, but it won’t work either. Also, ensure that you include the decimal place. 1f is different than 1.0f. When you have bad values it tends to generate a value that is so far out of bounds that you won’t see the results... and probably won’t know why for a while either.

RULE #3 Make sure you are sending the right stuff on the stack for parameters.

Even with the definition file, it can be challenging to know what is expected to be on the stack for some routines. Some routines may be undefined in the gl.def file, or unavailable. Knowing how to invoke them becomes a form of artwork; experimentation and frustration. After licking our mental wounds from developing our intro, I thought that there has to be a better way... and voilà! Our gift to you is the "Dictionary of Open GL Assembly Techniques". Included are as many examples of Open GL Assembler as I could dig up from the sources on the internet... each one showing a different piece of the puzzle. The routines are sorted alphabetically,

RULE #4 C++ Prototype

Okay, so you are really stuck, and don’t have a clue where to turn. The routine you are trying to use compiles just fine, but for what ever reason you aren’t sure why it isn’t working the way you expect it. Chances are you’ve got a C++ prototype already, or you can whip one up super quickly.

Whip it up. Why? Not because you have time to waste, but because you don’t. If you compile it release mode, you can use your compiler to do some of the work for you. Remember that the compiler has to make the same kind of preparations for the OpenGL routines that you do. There is no reason why you can’t look at the code masm compiled and compare it to the code that a c++ compiled.

The masm32 package from masm32.com has a great dis-assembler called "dumppe". You can access this tool from the command line, or from qeditor yourself. Make sure when you compare the c++ dumppe – that you are comparing a normally compiled assembly program and release mode c++ program. The highly optimized relocation technique that Syntax used for merging data sections in an executable will confuse dumppe, you should compile it in "plain vanilla mode".

Once you have the dump, you should be able to find the functional imports, and then search for the address. Then you can look at the parameters that prepare the stack... what do they push; how do they push the values. This should be enough to reveal why your code isn’t working as expected.

RULE #5 Don’t lose your mind, and share your knowledge & Check the bonus pack.

I love computer challenges, but I have to admit that I’m not a big fan of OpenGL in Assembly. I love assembly, and I really enjoy OpenGL... but there isn’t enough reference information for my liking about programming OpenGL with Assembly. Which values do the routines change? What does it expect on the stack (I mean really expect... not just 4bytes of something). And so forth. It would be nice one day if all of it was documented like the Ralph Brown Interrupt list.

I hope that this article has helped to give you some ignition for your own ideas; and that the reference might save you some cycles that could be better spent on your production. If you find information that was hard to come by for yourself, I also hope that you will be inspired to share it. In this spirit, you will find a sample program in the HUGI archive, include files, and everything you need for it to compile. Have fun!

Phase #3 – DICTIONARY OF ASSEMBLY OPEN GL TECHNIQUES

Please note: This stuff was compiled and snagged from a number of example source codes downloaded from the internet.

Function Name CDefinition Examples
glBegin void glBegin(
GLenum mode
);
Example 1:
invoke glBegin,GL_QUADS
Example 2:
invoke glBegin,GL_TRIANGLES
glBindTexture void glBindTexture(
GLenum target,
GLuint texture
);
Example 1:
invoke glBindTexture,GL_TEXTURE_2D,1
Example 2:
mov eax, offset texture
invoke glBindTexture,GL_TEXTURE_2D,DWORD PTR [eax]
Example 3:
number:DWORD
invoke glBindTexture,GL_TEXTURE_2D,number
glBlendFunc void glBlendFunc(
GLenum sfactor,
GLenum dfactor
);
invoke glBlendFunc,GL_SRC_ALPHA,GL_ONE
glCallList void glCallList(
GLuint list
);
Example 1:
box GLuint ?
invoke glCallList,box
glClear void glClear(
GLbitfield mask
);
Example 1:
invoke glClear,GL_COLOR_BUFFER_BIT
Example 2:
invoke glClear,GL_COLOR_BUFFER_BIT or GL_DEPTH_BUFFER_BIT
glClearColor void glClearColor(
GLclampf red,
GLclampf green,
GLclampf blue,
GLclampf alpha
);
Example 1:
_glClearColor 0.0f,0.0f,0.0f,0.0f
glClearDepth void glClearDepth(
GLclampd depth
);
Example 1:
_glClearDepth 1.0f
glColor3f void glColor3f(
GLfloat red,
GLfloat green,
GLfloat blue
);
Example 1:
_glColor3f 0.0f,0.0f,1.0f
glColor3fv void glColor3fv(
const GLfloat *v
);
Example 1:
CurrentLevel:dword
Colours GLfloat 0.66666666666f,0.0f,0.0f, .. Etc. etc.
mov eax,CurrentLevel ; calculate position for colours
inc eax
mov ebx, ((sizeof GLfloat)*3)
mul ebx
add eax, offset Colours;+24 for 16 bit modes
mov ebx,eax
glColor3ub void glColor3ub(
GLubyte red,
GLubyte green,
GLubyte blue
);
Example 1:
invoke glColor3ub,0,0,255
Example 2:
stars STRUCT
r DWORD ?
g DWORD ?
b DWORD ?
dist GLfloat ?
angle GLfloat ?
stars ENDS
star stars num dup(<?>)
invoke glColor3ub,star[eax].r,star[eax].g,star[eax]
glColor4f void glColor4f(
GLfloat red,
GLfloat green,
GLfloat blue,
GLfloat alpha
);
Example 1:
_glColor4f 1.0f,1.0f,1.0f,0.5f
glCullFace void glCullFace(
GLenum mode
);
Example 1:
invoke glCullFace,GL_BACK
Example 2:
invoke glCullFace,GL_FRONT
glDepthFunc void glDepthFunc(
GLenum func
);
Example 1:
invoke glDepthFunc,GL_LEQUAL
Example 2:
invoke glDepthFunc,GL_LESS
glDisable void glDisable(
GLenum cap
);
Example 1:
invoke glDisable,GL_BLEND
Example 2:
invoke glDisable,GL_DEPTH_TEST
Example 3:
invoke glDisable,GL_FOG
Example 4:
invoke glDisable,GL_LIGHTING
glEnable void glEnable(
GLenum cap
);
Example 1:
invoke glEnable,GL_BLEND
Example 2:
invoke glEnable,GL_COLOR_MATERIAL
Example 3:
invoke glEnable,GL_CULL_FACE
Example 4:
invoke glEnable,GL_DEPTH_TEST
Example 5:
invoke glEnable,GL_FOG
Example 6:
invoke glEnable,GL_LIGHT0
glEnd void glEnd(
void
);
Example 1:
invoke glEnd
glEndList void glEndList(
void
);
Example 1:
invoke glEndList
glFogf void glFogf(
GLenum pname,
GLfloat param
);
Example 1:
_glFogf GL_FOG_DENSITY,0.1f
Example 2:
_glFogf GL_FOG_END,-4.5f
Example 3:
_glFogf GL_FOG_START,-1.0f
glFrontFace void glFrontFace(
GLenum mode
);
Example 1:
invoke glFrontFace,GL_CCW
glGenLists GLuint glGenLists(
GLsizei range
);
Example 1:
invoke glGenLists,2
glGenTextures void glGenTextures(
GLsizei n,
GLuint *textures
);
Example 1:
texture GLuint ?
invoke glGenTextures,1,addr texture
Example 2:
local texture:DWORD
invoke glGenTextures,1,texture
glHint void glHint(
GLenum target,
GLenum mode
);
Example 1:
invoke glHint,GL_PERSPECTIVE_CORRECTION_HINT,GL_NICEST
glIndexi void glIndexi(
GLint c
);
Example 1:
invoke glIndexi,1
glLightfv void glLightf(
GLenum light,
GLenum pname,
GLfloat param
);
Example:
LightPosition GLfloat 0.0,0.0,2.0,1.0
invoke glLightfv,GL_LIGHT1,GL_POSITION,ADDR LightPosition
glLoadIdentity void glLoadIdentity(
void
);
invoke glLoadIdentity
glNewList void glNewList(
GLuint list,
GLenum mode
);
Example 1:
box GLuint ?
invoke glNewList,box,GL_COMPILE
glNormal3f void glNormal3f(
GLfloat nx,
GLfloat ny,
GLfloat nz
);
Example 1:
_glNormal3f -1.0f,0.0f,0.0f
glPolygonMode void glPolygonMode(
GLenum face,
GLenum mode
);
Example 1:
invoke glPolygonMode,GL_BACK,GL_FILL
Example 2:
invoke glPolygonMode,GL_FRONT,GL_LINE
glPopMatrix void glPopMatrix(
void
);
Example 1:
invoke glPopMatrix
glPushMatrix void glPushMatrix(
void
);
Example 1:
invoke glPushMatrix
glRotatef void glRotatef(
GLfloat angle,
GLfloat x,
GLfloat y,
GLfloat z
);
Example 1:
_glRotateF 20.0f,1.0f,0.0f,0.0f
Example 2:
LOCAL lookupdown:Glfloat
_glRotatef lookupdown,1.0f,0.0f,0.0f
Example 3:
Point3Df STRUCT
X GLfloat ?
Y GLfloat ?
Z GLfloat ?
Point3Df ENDS
ALIGN DWORD
rot Point3Df <0.0>
_glRo
glShadeModel void glShadeModel(
GLenum mode
);
Example 1:
invoke glShadeModel,GL_SMOOTH
glTexCoord2f void glTexCoord2f(
GLfloat s,
GLfloat t
);
Example 1:
_glTexCoord2f 0.0f,0.0f
Example 2:
LOCAL fx,fy:Glfloat
_glTexCoord2f fx,fy
glTexEnvf void glTexEnvf(
GLenum target,
GLenum pname,
GLfloat param
);
Example 1:
Invoke glTexEnvf,GL_TEXTURE_ENV,GL_TEXTURE_ENV_MODE,GL_MODULATE
glTexParameteri void glTexParameterf(
GLenum target,
GLenum pname,
GLfloat param
);
Example 1:
Invoke glTexParameteri,GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_LINEAR
Example 2:
Invoke glTexParameteri,GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_LINEAR_MIPMAP_NEAREST
Example 3:
invoke glTexParameteri,GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,G
glTranslatef void glTranslatef(
GLfloat x,
GLfloat y,
GLfloat z
);
Example 1:
_glTranslatef -1.5f,0.0f,-6.0f
Example 2:
zoom REAL4 -15.0f
_glTranslatef 0.0f,0.0f,zoom
Example 3:
stars STRUCT
r DWORD ?
g DWORD ?
b DWORD ?
dist GLfloat ?
angle GLfloat ?
stars ENDS
star
gluPerspective void gluPerspective(
GLdouble fovy,
GLdouble aspect,
GLdouble zNear,
GLdouble zFar
);
 
LOCAL ratio: GLdouble ; local variable
LOCAL w:GLsizei, h:Glsizei
fild w
fild h
fdivp st(1), st ; devide width by height
fstp ratio
_gluPerspective 45.0f,ratio,0.1f,100.0f
glVertex2f void glVertex2f(
GLfloat x,
GLfloat y
);
Example 1:
_glVertex2f -0.6f,-0.6f
glVertex3f void glVertex3f(
GLfloat x,
GLfloat y,
GLfloat z
);
Example 1:
_glVertex3f -1.0f,-1.0f,-1.0f
Example 2:
points GLfloat (SIZEOF_LINE*SIZEOF_LINE*3) dup(?)
_glVertex3f points[eax],points[ebx],points[ecx]
Example 3:
LOCAL x_m, y_m, z_m:DWORD
_glVertex3f x_m,y_m,z_m
glVertex3i void glVertex3i(
GLint x,
GLint y,
GLint z
);
Example 1:
invoke glVertex3i,-1,-1,-1
glViewport void glViewport(
GLint x,
GLint y,
GLsizei width,
GLsizei height
);
Example 1:
invoke glViewport,0,0,w,h
invoke glFlush void glFlush( void ); Example 1:
invoke glFlush
invoke glMatrixMode,GL_MODELVIEW void glMatrixMode(
GLenum mode
);
Example 1:
invoke glMatrixMode,GL_MODELVIEW
Example 2:
invoke glMatrixMode,GL_PROJECTION

(Polaris & Syntax) / Northern Dragons