Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.

understanding gcc assembly output

_codist__codist_ Posts: 17Member
[b][red]This message was edited by _codist_ at 2004-9-12 13:27:16[/red][/b][hr]
Out of curiosity I disassembled a trivial executable I've written in C, and faced some problems fully understanding the output. Here's the source of the exectuable:
[code]
void function()
{
int i;
i = 1;
}

int main()
{
function();
return 0;
}
[/code]
I compiled it using gcc 3.3.4 (command: gcc -o main main.c) and disassembled it using objdump 2.14.90.0.7 (command: objdump -d main). Here's the (shortened) output:
[code]
08048334 :
8048334: 55 push %ebp
8048335: 89 e5 mov %esp,%ebp
8048337: 83 ec 04 sub $0x4,%esp
804833a: c7 45 fc 01 00 00 00 movl $0x1,0xfffffffc(%ebp)
8048341: c9 leave
8048342: c3 ret

08048343 :
8048343: 55 push %ebp
8048344: 89 e5 mov %esp,%ebp
8048346: 83 ec 08 sub $0x8,%esp
8048349: 83 e4 f0 and $0xfffffff0,%esp
804834c: b8 00 00 00 00 mov $0x0,%eax
8048351: 29 c4 sub %eax,%esp
8048353: e8 dc ff ff ff call 8048334
8048358: b8 00 00 00 00 mov $0x0,%eax
804835d: c9 leave
804835e: c3 ret
804835f: 90 nop
[/code]
As I understand this, the main function starts by setting up an 8 byte stack frame. Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...
Anyway, I think I've got the rest of the main function. 0 is put in eax and is then substracted from esp (probably that's what they mean with the overhead C automatically adds), function() is called, 0 is put in eax again, the stack frame is cleared, and the function returns.

The function "function" also starts with the good old stack frame set up, sized 4 bytes this time to keep the lokal int variable. Then the value of the var is set to 1 (line 804833a: movl $0x1,0xfffffffc(%ebp)). What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...

Any answers for the mentioned questions would be appreciated, and please also let me know if I any of my interpretations of the assembley code are wrong ... Looking forward to your answers!


Comments

  • shaolin007shaolin007 Posts: 1,018Member
    : [b][red]This message was edited by _codist_ at 2004-9-12 13:27:16[/red][/b][hr]
    : Out of curiosity I disassembled a trivial executable I've written in C, and faced some problems fully understanding the output. Here's the source of the exectuable:
    : [code]
    : void function()
    : {
    : int i;
    : i = 1;
    : }
    :
    : int main()
    : {
    : function();
    : return 0;
    : }
    : [/code]
    : I compiled it using gcc 3.3.4 (command: gcc -o main main.c) and disassembled it using objdump 2.14.90.0.7 (command: objdump -d main). Here's the (shortened) output:
    : [code]
    : 08048334 :
    : 8048334: 55 push %ebp
    : 8048335: 89 e5 mov %esp,%ebp
    : 8048337: 83 ec 04 sub $0x4,%esp
    : 804833a: c7 45 fc 01 00 00 00 movl $0x1,0xfffffffc(%ebp)
    : 8048341: c9 leave
    : 8048342: c3 ret
    :
    : 08048343 :
    : 8048343: 55 push %ebp
    : 8048344: 89 e5 mov %esp,%ebp
    : 8048346: 83 ec 08 sub $0x8,%esp
    : 8048349: 83 e4 f0 and $0xfffffff0,%esp
    : 804834c: b8 00 00 00 00 mov $0x0,%eax
    : 8048351: 29 c4 sub %eax,%esp
    : 8048353: e8 dc ff ff ff call 8048334
    : 8048358: b8 00 00 00 00 mov $0x0,%eax
    : 804835d: c9 leave
    : 804835e: c3 ret
    : 804835f: 90 nop
    : [/code]
    : As I understand this, the main function starts by setting up an 8 byte stack frame. Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...
    : Anyway, I think I've got the rest of the main function. 0 is put in eax and is then substracted from esp (probably that's what they mean with the overhead C automatically adds), function() is called, 0 is put in eax again, the stack frame is cleared, and the function returns.
    :
    : The function "function" also starts with the good old stack frame set up, sized 4 bytes this time to keep the lokal int variable. Then the value of the var is set to 1 (line 804833a: movl $0x1,0xfffffffc(%ebp)). What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...
    :
    : Any answers for the mentioned questions would be appreciated, and please also let me know if I any of my interpretations of the assembley code are wrong ... Looking forward to your answers!
    :
    :
    :
    [red]Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...[/red]

    Depends, what is the value in ESP before the bitwise AND? Have you run it through a debugger?

    [red]What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...[/red]

    I'm not too sure either, and it don't help to look at this backwards either since I'm use to the Intel syntax not AT&T, but it looks like the variable 'i' was allocated 4 bytes(dword size=size movl ?) hence the 4 byte stack frame. Then it was moved into the memory location 0xfffffffc. Why it was put in the EBP location puzzles me and why you would move that value into EBP and not ESP? Sorry if I couldn't be more of help. :-(
  • _codist__codist_ Posts: 17Member
    : : [b][red]This message was edited by _codist_ at 2004-9-12 13:27:16[/red][/b][hr]
    : : Out of curiosity I disassembled a trivial executable I've written in C, and faced some problems fully understanding the output. Here's the source of the exectuable:
    : : [code]
    : : void function()
    : : {
    : : int i;
    : : i = 1;
    : : }
    : :
    : : int main()
    : : {
    : : function();
    : : return 0;
    : : }
    : : [/code]
    : : I compiled it using gcc 3.3.4 (command: gcc -o main main.c) and disassembled it using objdump 2.14.90.0.7 (command: objdump -d main). Here's the (shortened) output:
    : : [code]
    : : 08048334 :
    : : 8048334: 55 push %ebp
    : : 8048335: 89 e5 mov %esp,%ebp
    : : 8048337: 83 ec 04 sub $0x4,%esp
    : : 804833a: c7 45 fc 01 00 00 00 movl $0x1,0xfffffffc(%ebp)
    : : 8048341: c9 leave
    : : 8048342: c3 ret
    : :
    : : 08048343 :
    : : 8048343: 55 push %ebp
    : : 8048344: 89 e5 mov %esp,%ebp
    : : 8048346: 83 ec 08 sub $0x8,%esp
    : : 8048349: 83 e4 f0 and $0xfffffff0,%esp
    : : 804834c: b8 00 00 00 00 mov $0x0,%eax
    : : 8048351: 29 c4 sub %eax,%esp
    : : 8048353: e8 dc ff ff ff call 8048334
    : : 8048358: b8 00 00 00 00 mov $0x0,%eax
    : : 804835d: c9 leave
    : : 804835e: c3 ret
    : : 804835f: 90 nop
    : : [/code]
    : : As I understand this, the main function starts by setting up an 8 byte stack frame. Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...
    : : Anyway, I think I've got the rest of the main function. 0 is put in eax and is then substracted from esp (probably that's what they mean with the overhead C automatically adds), function() is called, 0 is put in eax again, the stack frame is cleared, and the function returns.
    : :
    : : The function "function" also starts with the good old stack frame set up, sized 4 bytes this time to keep the lokal int variable. Then the value of the var is set to 1 (line 804833a: movl $0x1,0xfffffffc(%ebp)). What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...
    : :
    : : Any answers for the mentioned questions would be appreciated, and please also let me know if I any of my interpretations of the assembley code are wrong ... Looking forward to your answers!
    : :
    : :
    : :
    : [red]Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...[/red]
    :
    : Depends, what is the value in ESP before the bitwise AND? Have you run it through a debugger?

    [blue]Good idea, I set a breakpoint in main and figured out that ESP is 0xbffffa10 before the AND, so the value shouldn't be modified at all by the command[/blue]
    :
    : [red]What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...[/red]
    :
    : I'm not too sure either, and it don't help to look at this backwards either since I'm use to the Intel syntax not AT&T, but it looks like the variable 'i' was allocated 4 bytes(dword size=size movl ?) hence the 4 byte stack frame. Then it was moved into the memory location 0xfffffffc. Why it was put in the EBP location puzzles me and why you would move that value into EBP and not ESP? Sorry if I couldn't be more of help. :-(
    :
    [blue]Finally figured that out now. 0xfffffffc is not a memory offset here, but a signed number. It is in fact the -4 value I've expected, written in hex. Seems I was to focused on memory addresses to note that :-). In intel syntax, that line would be "mov dword ptr [ebp-4],1".
    You're right, that "movl" is used in the AT&T syntax because of the dword size. Mnemonics are always suffixed with 'l' 'w' or 'b' in AT&T to indicate dword, word or byte operands. That way, byte ptr and the like are not needed.[/blue]
  • shaolin007shaolin007 Posts: 1,018Member
    : [blue]Finally figured that out now. 0xfffffffc is not a memory offset here, but a signed number. It is in fact the -4 value I've expected, written in hex. Seems I was to focused on memory addresses to note that :-). In intel syntax, that line would be "mov dword ptr [ebp-4],1".
    : You're right, that "movl" is used in the AT&T syntax because of the dword size. Mnemonics are always suffixed with 'l' 'w' or 'b' in AT&T to indicate dword, word or byte operands. That way, byte ptr and the like are not needed.[/blue]
    :
    [green]
    I guess I'll have to take your word for it. :-) To me it is a confusing way to have EBP-4 represented. But that's my opinion though.[/green]
  • _codist__codist_ Posts: 17Member
    : : [blue]Finally figured that out now. 0xfffffffc is not a memory offset here, but a signed number. It is in fact the -4 value I've expected, written in hex. Seems I was to focused on memory addresses to note that :-). In intel syntax, that line would be "mov dword ptr [ebp-4],1".
    : : You're right, that "movl" is used in the AT&T syntax because of the dword size. Mnemonics are always suffixed with 'l' 'w' or 'b' in AT&T to indicate dword, word or byte operands. That way, byte ptr and the like are not needed.[/blue]
    : :
    : [green]
    : I guess I'll have to take your word for it. :-) To me it is a confusing way to have EBP-4 represented. But that's my opinion though.[/green]
    :
    Same here. I think the instruction suffixes are a bit more pleasant to read than pointers, but the fact that sub eax,[ebx+ecx*4-20] is written subl -20(%ebx,%ecx,0x4),%eax in AT&T makes me definitely prefer intel syntax
Sign In or Register to comment.