INT13/AH=02 Bugs?

Hey,

Back again to annoy the community.
This time, I'm puzzled about my OS kernel loader's ReadSector code.
It uses INT13/AH=02 to read a sector from a FAT12 floppy.
The problem occurs when I make my kernel image file too big. 32 kB's or below works fine, but when I get to 50 kB's INT13 starts to fail. It loads a whole lot of sectors, but eventually fails somewhere in the hundred-ish (it changes from time to time, as I modify my kernel image, but as long as the image file remains fairly unmodified it's the same LBA giving me the problem).
I used a little code to bypass the failed LBA index, and then it works fine again for another couple of hundred-ish sectors and then WHAM! again the error.
The ReadSector function is very standard - pretty much what you find in any bootloader example that uses INT13 (eg LBA to CHS, INT13 1 sector at a time, check CF, 5 retries and a disk reset between each one, then the next sector).
I am, like so many times before, puzzled. This time, however, I seriously doubt it's my coding, since it has proved its integrity multiple times,

Comments


  • I doubt theres a bug with INT 0x13/function 02, but who knows. It is impossible to tell without seeing your current code, and binary.

    I am betting there is a problem in your current source that is causing the problem, as it is the usual case.

    If you can/want, PM/Email me or post your current source, and I can see what I can find. I would not mind the end binary image as well for additional debugging.

    [hr][size=1][leftbr].:EvolutionEngine[rightbr][leftbr].:MicroOS Operating System[rightbr][leftbr][link=http://www.mt2002.sitesled.com]Website :: OS Development Series[rightbr][/link][/size]
  • :
    : I doubt theres a bug with INT 0x13/function 02, but who knows. It is
    : impossible to tell without seeing your current code, and binary.
    :

    I always like to think I'm right :) Anyway, there probably is a bug. I just hope for my sake that it's very hard to find ^^

    : I am betting there is a problem in your current source that is
    : causing the problem, as it is the usual case.
    :
    : If you can/want, PM/Email me or post your current source, and I can
    : see what I can find. I would not mind the end binary image as well
    : for additional debugging.

    I'll attach a ZIP with the code

    [color=Blue]EDIT/PS:[/color] I have a couple of debug functions that easily allows one to print a string or a number (or an address, which is the number in hexadecimal). They're called DebugPrint, DebugPrintNum and DebugPrintAddr. Each of them requires DS=CS, and the Num/Addr require DS=ES=CS. Furtherly, they preserve all general and segment registers - they don't preserve the flags though. I always found them handy when debugging a bit: I used it to check at which LBA INT13 failed etc.

    Best Regards,
    Richard

    The way I see it... Well, it's all pretty blurry

  • : I always like to think I'm right :) Anyway, there probably is a bug.
    : I just hope for my sake that it's very hard to find ^^

    Actually, its probably something really simple :) That tends to be the nature of assembly, that which makes it hard :)

    If/When I see a problem, I'll let you know.

    Also, Bochs debugger is not that hard. All you need to do is find the beginning address of the routine you want, set a breakpoint, and continue execution. Bochs will break at that function so that you single step.
    [hr][size=1][leftbr].:EvolutionEngine[rightbr][leftbr].:MicroOS Operating System[rightbr][leftbr][link=http://www.mt2002.sitesled.com]Website :: OS Development Series[rightbr][/link][/size]
  • :
    : : I always like to think I'm right :) Anyway, there probably is a bug.
    : : I just hope for my sake that it's very hard to find ^^
    :
    : Actually, its probably something really simple :) That tends to be
    : the nature of assembly, that which makes it hard :)
    :

    We'll see :P

    : If/When I see a problem, I'll let you know.
    :
    : Also, Bochs debugger is not that hard. All you need to do is find
    : the beginning address of the routine you want, set a breakpoint, and
    : continue execution. Bochs will break at that function so that you
    : single step.

    That was truthfully my biggest problem - finding the entry point of a function. I haven't yet thought of a way to do this.
    I guess for debug purposes I could add a DW somewhere at a fixed location in the file and fill it with the address.
    Anyway, thanks again for your effort. I'll try this evening to debug it again myself.

    Best Regards,
    Richard

    The way I see it... Well, it's all pretty blurry
  • : : Actually, its probably something really simple :) That tends to be
    : : the nature of assembly, that which makes it hard :)
    : :
    :
    : We'll see :P

    I guess we will :)

    : That was truthfully my biggest problem - finding the entry point of
    : a function. I haven't yet thought of a way to do this.
    : I guess for debug purposes I could add a DW somewhere at a fixed
    : location in the file and fill it with the address.
    : Anyway, thanks again for your effort. I'll try this evening to debug
    : it again myself.
    :
    I figured that (That gave me the most trouble getting used to) :)

    This is what I do:

    1) Bochs debugger begins at the jmp instruction set up by its ROM BIOS. Because the bootloader is loaded at 0x7c00, set a breakpoint their and continue execution. Or, if your Kernel is loaded at 0x500, set a breakpoint there:
    [code]
    b 0x500 < set breakpoint to 0x500
    c < continue execution
    [/code]
    Bochs will break at 0x500--the first instruction in your kernel, loaded at 0x500.

    Now just follow it with your source line by line. This is not that hard, as you can jump over instructions that you dont care for.

    For example, if you want to skip past calling a function, just skip over 2 or 3 bytes (the size of the instruction) from your current position, and set a new breakpoint there.

    To see this in action, here is what Bochs outputs for my bootloader:
    [code]
    <0> [0x00007ccd] 0000:7ccd : call 7c3e ; e86eff
    [/code]
    Dissecting this, we have:
    [code]
    Current instruction address: (Physical) 0x00007ccd (Seg:Off) 0000:7ccd
    Section: unkown (Its pure binary, after all)
    Assembly Instruction: call 7c3e
    Machine Instruction (OPCode): 0xe86eff
    [/code]
    [b]The call instruction has the offset address inside of it[/b]. This means, the call instruction calls the function at [b]0x7c3e[/b]. [b]This is the functions address[/b].

    If you want to skip over calling this function (i.e., execute the function, but jump over it), look at its machine code: [b]0xe86eff[/b]. Its OPCode is 3 bytes in size, starting from the current location (0x00007ccd). So, set a breakpoint from 0x00007ccd+3 bytes:
    [code]
    b 0x7cd0 < jump over the call--we dont care for watching this execute
    c < continue
    [/code]
    Bochs will break to the instruction right after it.

    Bochs gives you all the information--the address that it is calling (the address of the function); the current executing instruction address, etc..

    ----------

    If you can output a map file, its even easier, as the map file tells you the addresses of all routines to set a breakpoint to. This is why I prefer C and C++ when debugging.

    [hr][size=1][leftbr].:EvolutionEngine[rightbr][leftbr].:MicroOS Operating System[rightbr][leftbr][link=http://www.mt2002.sitesled.com]Website :: OS Development Series[rightbr][/link][/size]
  • : Hey,
    :
    : Back again to annoy the community.
    : This time, I'm puzzled about my OS kernel loader's ReadSector code.
    : It uses INT13/AH=02 to read a sector from a FAT12 floppy.
    : The problem occurs when I make my kernel image file too big. 32 kB's
    : or below works fine, but when I get to 50 kB's INT13 starts to fail.

    fail in what way? Are you getting an invalid sector error, or perhaps it's as simple as your disk is going bad? What is the value in AX when INT 13 fails? That'll clue you in to what the problem is.

    I'd suggest adding some debug stuff in your loadSectors routine that prints out the track/sector/head info (cx and dx respectively), and also the contents of es:bx just before int 13, and then print out ax immediately after.

    Most likely you've either told int 13 to go off disk (ie, sector #19 on an 18 sector disk) or the contents of the data it read is overwriting something critical in memory-this will give you really strange results sometimes.

    INT 13h is my friend. (I play around with bootable 80's games as a hobby) I can't sit back and watch you bad mouth it like this. :)

    -jeff!
  • : fail in what way? Are you getting an invalid sector error, or
    : perhaps it's as simple as your disk is going bad? What is the value
    : in AX when INT 13 fails? That'll clue you in to what the problem is.
    :
    : I'd suggest adding some debug stuff in your loadSectors routine that
    : prints out the track/sector/head info (cx and dx respectively), and
    : also the contents of es:bx just before int 13, and then print out ax
    : immediately after.
    :
    : Most likely you've either told int 13 to go off disk (ie, sector #19
    : on an 18 sector disk) or the contents of the data it read is
    : overwriting something critical in memory-this will give you really
    : strange results sometimes.
    :
    : INT 13h is my friend. (I play around with bootable 80's games as a
    : hobby) I can't sit back and watch you bad mouth it like this. :)
    :
    : -jeff!

    Well I kind of had a satisfying relationship with it, until now.

    The memory *should* be quite available. 0xFF00 is available at boot time right? I have assumed that 0x500 - 0xA0000 are there for me to exploit in any way I see fit.

    I'll do some more debugging and edit the results into this post (such as AX code)

    [color=Blue]EDIT:[/color] Here's my debug print for where the current error occurs: LBA=124, CHS=3-0-17, AX=0x0900
    As I was trying different sizes of my kernel, the error seems to occur for 40 kB image files or larger.
    Next step I tried to put some rubbish on the disk and then copy the kernel after the rubbish: LBA=210, CHS=5-1-13, AX=0x0900

    Hmm... that makes me think the memory location is unavailable or something since it seems to be always after the same amount of sectors. So, next debug (with buffer ES:DI): {0FF0:0000 210 5 1 13} [Err: 0900]

    Now I am quite sure... it's oh-ef-ef-oh-colon-quadruple-oh :/
    So, what's up with that memory location? I recognise the segment selector as FAT12 reserved cluster - but INT13 shouldn't know about that :-S

    ... *sigh* very nice of me to look up everything, but simply looking up the 09 error code delivers me much more "09h data boundary error (attempted DMA across 64K boundary or >80h sectors)".

    I'm getting a bit of 'WTF'. But indeed, that must be it. One sector above FF00 is above the 64 kB boundary.
    So... another buffer? Copy to temp sector buffer, REP MOVSW to the actual buffer, and then as a final REP MOVSD to the 1 MB boundary?

    (H
  • : Hmm... that makes me think the memory location is unavailable or
    : something since it seems to be always after the same amount of
    : sectors. So, next debug (with buffer ES:DI): {0FF0:0000 210 5 1 13}
    : [Err: 0900]

    what's ES:BX at that point? That's where the data from floppy is loaded, not ES:DI. You must have es:bx set somewhere that the DMA controller is angry. (I've never actually run into that before, so this is getting weird)

    : [color=Blue]PS (EDIT2):[/color] The MS/IBM INT13 extensions, are
    : they only available for the hard disk? (This is just a curiousity)

    It appears that way. I've never used them before, but the input requirements for drive # start with 80h, so that means hard disk.

    -jeff!
  • : : Hmm... that makes me think the memory location is unavailable or
    : : something since it seems to be always after the same amount of
    : : sectors. So, next debug (with buffer ES:DI): {0FF0:0000 210 5 1 13}
    : : [Err: 0900]
    :
    : what's ES:BX at that point? That's where the data from floppy is
    : loaded, not ES:DI. You must have es:bx set somewhere that the DMA
    : controller is angry. (I've never actually run into that before, so
    : this is getting weird)
    :

    That was a typo, sorry.
    No it really was the DMA thing that prevented me from writing directly to above 0x10000. I've solved it by creating a buffer at 0x5500 that'll hold a sector, then after each sector copy the code to the right spot in the real destination. What a way to kill EVERY advantage DMA would ever have in such a situation. It definitely slows the code substantially, but compatibility with a bit of a bigger kernel image is more important to me anway.

    Thanks for the help, both of you.

    Best Regards,
    Richard

    The way I see it... Well, it's all pretty blurry
  • : No it really was the DMA thing that prevented me from writing
    : directly to above 0x10000. I've solved it by creating a buffer at
    : 0x5500 that'll hold a sector, then after each sector copy the code
    : to the right spot in the real destination. What a way to kill EVERY
    : advantage DMA would ever have in such a situation. It definitely
    : slows the code substantially, but compatibility with a bit of a
    : bigger kernel image is more important to me anway.

    well, if you can change your starting load segment down just a wee bit, you might be able to skip that step: You're loading at ff0:0, and that's 1/2 of a sector away from a 64k boundary. I think that if you moved the segment that you start loading your kernel in just a little lower (256 bytes!) then your floppy loads should line up perfectly on 64k boundaries and you shouldn't run into this problem at all.

    This was a really great problem to work on-like I said, of all the years that I've been playing with int 13, I've never run into this-that I know of. I very well may have bumped into it and blamed it on something else, as this is not obvious at all what the problem was. I'm smarter now too.

    -jeff!
  • : well, if you can change your starting load segment down just a wee
    : bit, you might be able to skip that step: You're loading at ff0:0,
    : and that's 1/2 of a sector away from a 64k boundary. I think that
    : if you moved the segment that you start loading your kernel in just
    : a little lower (256 bytes!) then your floppy loads should line up
    : perfectly on 64k boundaries and you shouldn't run into this problem
    : at all.
    :

    I want to be able to load big kernels... certainly larger than 64 kB.
    I hope to start writing the next step of the OS in C++, which means the code will just fly out of proportions ;) Atleast, it'll generate a LOT more code a LOT faster than I am able to do with ASM. So, that's why I regisgned it to have a kernel image buffer area of around 670 kB. No matter where I hide that - I'll always cross the 64k boundary.

    : This was a really great problem to work on-like I said, of all the
    : years that I've been playing with int 13, I've never run into
    : this-that I know of. I very well may have bumped into it and blamed
    : it on something else, as this is not obvious at all what the problem
    : was. I'm smarter now too.
    :
    : -jeff!

    Glad to hear you amused yourself :)

    Best Regards,
    Richard

    The way I see it... Well, it's all pretty blurry
  • : I want to be able to load big kernels... certainly larger than 64 kB.
    : I hope to start writing the next step of the OS in C++, which means
    : the code will just fly out of proportions ;) Atleast, it'll generate
    : a LOT more code a LOT faster than I am able to do with ASM. So,
    : that's why I regisgned it to have a kernel image buffer area of
    : around 670 kB. No matter where I hide that - I'll always cross the
    : 64k boundary.

    What I was getting at is that 1 single transfer of 512 bytes is what is crossing the 64k boundary. You can load load as big of a kernel as you want, just make sure that none of the transfers individually cross a boundary and you should be ok.

    So if you move your starting segment where you load the kernel into, and make sure it's aligned on 64k boundary, then you shouldn't run into this INT 13 problem (because 64k pages are cleanly divisible by 512 byte chunks), and then you shouldn't need your load-data-to-scratch-buffer and copy-into-proper-place routine.

    -jeff!

  • : What I was getting at is that 1 single transfer of 512 bytes is what
    : is crossing the 64k boundary. You can load load as big of a kernel
    : as you want, just make sure that none of the transfers individually
    : cross a boundary and you should be ok.
    :
    : So if you move your starting segment where you load the kernel into,
    : and make sure it's aligned on 64k boundary, then you shouldn't run
    : into this INT 13 problem (because 64k pages are cleanly divisible by
    : 512 byte chunks), and then you shouldn't need your
    : load-data-to-scratch-buffer and copy-into-proper-place routine.
    :
    : -jeff!

    No can't do - I want to load it starting at FFFF:0010 (1 MB), and then have support for more than 64kB size.

    Best Regards,
    Richard

    The way I see it... Well, it's all pretty blurry
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories