Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.

Binary file i/o problem?

saw7988saw7988 Posts: 14Member
I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.

My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.

I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.

Comments

  • LundinLundin Posts: 3,711Member
    : I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.
    :
    : My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.
    :
    : I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
    :


    When saving/loading numbers you need to be aware of how the CPU stores them in memory. Here is an example:

    [code]
    #include

    int main()
    {
    unsigned long number = 0xAABBCCDD;
    int i;

    for(i=0; i<4; i++)
    printf("%X", *(i+(unsigned char*)&number) );

    return 0;
    }
    [/code]

    On a [b]little endian[/b] CPU (like your PC), it will give the result: DDCCBBAA
    On a [b]big endian[/b] CPU it will give the result: AABBCCDD
  • tsagldtsagld Posts: 621Member
    : : I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.
    : :
    : : My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.
    : :
    : : I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
    : :
    :
    :
    : When saving/loading numbers you need to be aware of how the CPU stores them in memory. Here is an example:
    :
    : [code]
    : #include
    :
    : int main()
    : {
    : unsigned long number = 0xAABBCCDD;
    : int i;
    :
    : for(i=0; i<4; i++)
    : printf("%X", *(i+(unsigned char*)&number) );
    :
    : return 0;
    : }
    : [/code]
    :
    : On a [b]little endian[/b] CPU (like your PC), it will give the result: DDCCBBAA
    : On a [b]big endian[/b] CPU it will give the result: AABBCCDD

    Most obvious reason is that you open a binary file in text-mode. That gives trouble.



    Greets,
    Eric Goldstein
    http://www.gvh-maatwerk.nl


  • saw7988saw7988 Posts: 14Member
    : : : I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.
    : : :
    : : : My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.
    : : :
    : : : I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
    : : :
    : :
    : :
    : : When saving/loading numbers you need to be aware of how the CPU stores them in memory. Here is an example:
    : :
    : : [code]
    : : #include
    : :
    : : int main()
    : : {
    : : unsigned long number = 0xAABBCCDD;
    : : int i;
    : :
    : : for(i=0; i<4; i++)
    : : printf("%X", *(i+(unsigned char*)&number) );
    : :
    : : return 0;
    : : }
    : : [/code]
    : :
    : : On a [b]little endian[/b] CPU (like your PC), it will give the result: DDCCBBAA
    : : On a [b]big endian[/b] CPU it will give the result: AABBCCDD
    :
    : Most obvious reason is that you open a binary file in text-mode. That gives trouble.
    :
    :
    :
    : Greets,
    : Eric Goldstein
    : http://www.gvh-maatwerk.nl
    :
    :
    :

    Well I'm opening all files with ios::binary mode, and the data isn't coming through backwords or wrong, when im reading it something is going wrong with specific characters and it stops. I can paste my code and explain what I'm doing, but the program works with textfiles only, so I was thinking the problem might be specific to non textfiles.
  • IDKIDK Posts: 1,784Member
    : : : : I'm writing a primitive file compression program; basically the general idea is it reads an inputted file, does some computations and writes a compressed version back to another file, the compressed file. Im using the fstream.h header and the read and write functions.
    : : : :
    : : : : My textfiles work fine, but various other types of files stop working at certain places. For example in one file it's the 6th byte which happens to be a number 6. The loops and computations still run in the "background" but the variable that im storing in stops getting updated.
    : : : :
    : : : : I was wondering if there is some weird things that happen when doing binary file i/o specifically that you need to watch out for, like certain bytes can't be read or something like that. I can't think of anything else it could be because it works with the textfiles I've tried.
    : : : :
    : : :
    : : :
    : : : When saving/loading numbers you need to be aware of how the CPU stores them in memory. Here is an example:
    : : :
    : : : [code]
    : : : #include
    : : :
    : : : int main()
    : : : {
    : : : unsigned long number = 0xAABBCCDD;
    : : : int i;
    : : :
    : : : for(i=0; i<4; i++)
    : : : printf("%X", *(i+(unsigned char*)&number) );
    : : :
    : : : return 0;
    : : : }
    : : : [/code]
    : : :
    : : : On a [b]little endian[/b] CPU (like your PC), it will give the result: DDCCBBAA
    : : : On a [b]big endian[/b] CPU it will give the result: AABBCCDD
    : :
    : : Most obvious reason is that you open a binary file in text-mode. That gives trouble.
    : :
    : :
    : :
    : : Greets,
    : : Eric Goldstein
    : : http://www.gvh-maatwerk.nl
    : :
    : :
    : :
    :
    : Well I'm opening all files with ios::binary mode, and the data isn't coming through backwords or wrong, when im reading it something is going wrong with specific characters and it stops. I can paste my code and explain what I'm doing, but the program works with textfiles only, so I was thinking the problem might be specific to non textfiles.
    :
    Your program can be the problem. Post your code.
  • saw7988saw7988 Posts: 14Member
    Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)

    [code]
    #include
    #include
    #include
    #include

    bool containsChar(unsigned char *string, char c);
    int getReps(unsigned char* string, int start);
    unsigned char* compress(unsigned char* data, long size);

    int main(int argc, char* argv[])
    {
    if (argc != 2)
    {
    cout<<"Incorrect Usage...";
    getch();
    return 0;
    }

    char* file;
    file = argv[1];

    cout<<"Opening " <<file <<"..." <<endl;
    ifstream in(file, ios::binary);

    if (!in)
    {
    cout<<"Couldn't read " <<argv[1] <<".";
    getch();
    return 0;
    }

    in.seekg(0, ios::end);
    long filesize = in.tellg();
    in.seekg(0);

    unsigned char* data;
    data = new unsigned char[filesize];
    in.read(data, filesize);
    in.close();
    cout<<"File read." <<endl;

    unsigned char* cdata;
    cout<<"Compressing..." <<endl;
    cdata = compress(data, filesize);
    cout<<"Compressed." <<endl;

    delete [] data;

    cout<<"Writing compressed file..." <<endl;
    ofstream out("output.cpr", ios::binary);
    if (!out)
    {
    cout<<"Couldn't open output.cpr.";
    getch();
    return 0;
    }
    out.write(cdata, strlen((char*)cdata));
    out.close();
    cout<<"Done.";

    delete [] cdata;

    getch();
    return 0;
    }

    bool containsChar(unsigned char *string, char c)
    {
    int size = strlen((char*)string);

    for (int i=0; i<size; i++)
    {
    if (*(string+i) == c)
    return true;
    }

    return false;
    }

    int getReps(unsigned char* string, int start, long size)
    {
    int reps = 1;
    int pos = start;

    while (*(string+pos) == *(string+pos+1))
    {
    pos++;
    reps++;
    if (pos == size)
    break;
    }

    return reps;
    }

    unsigned char* compress(unsigned char* data, long size)
    {
    unsigned char* cdata;
    cdata = new unsigned char[size];
    char marker;

    int ascii = 33;
    while (containsChar(data, (char)ascii))
    ascii++;
    marker = (char)ascii;
    *cdata = marker;

    int pos1=0, pos2=1, reps; //first char in cdata is the marker

    while (pos1 < size)
    {
    reps = getReps(data, pos1, size);

    if (reps < 4)
    {
    for (int i=0; i<reps; i++)
    {
    *(cdata+pos2) = *(data+pos1);
    pos1++;
    pos2++;
    }
    }
    else
    {
    if (reps > 255)
    reps = 255;
    *(cdata+pos2) = marker;
    *(cdata+pos2+1) = (unsigned char)reps;
    *(cdata+pos2+2) = *(data+pos1);
    pos2 += 3;
    pos1 += reps;
    }
    }

    return cdata;
    }
    [/code]
  • IDKIDK Posts: 1,784Member
    : Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
    :

    [green]
    chars are unsigned chars by default, like int are unsigned ints as default.
    [/green]

    : [code]
    : #include
    : #include
    : #include
    : #include
    :
    : bool containsChar(unsigned char *string, char c);
    : int getReps(unsigned char* string, int start);
    : unsigned char* compress(unsigned char* data, long size);
    :
    : int main(int argc, char* argv[])
    : {
    : if (argc != 2)
    : {
    : cout<<"Incorrect Usage...";
    : getch();
    : return 0;
    : }
    :
    : char* file;
    : file = argv[1];
    :
    : cout<<"Opening " <<file <<"..." <<endl;
    : ifstream in(file, ios::binary);
    :
    : if (!in)
    : {
    : cout<<"Couldn't read " <<argv[1] <<".";
    : getch();
    : return 0;
    : }
    :
    : in.seekg(0, ios::end);
    : long filesize = in.tellg();
    : in.seekg(0);
    :
    : unsigned char* data;
    : data = new unsigned char[filesize];
    : in.read(data, filesize);
    : in.close();
    : cout<<"File read." <<endl;
    :
    : unsigned char* cdata;
    : cout<<"Compressing..." <<endl;
    : cdata = compress(data, filesize);
    : cout<<"Compressed." <<endl;
    :
    : delete [] data;
    :
    : cout<<"Writing compressed file..." <<endl;
    : ofstream out("output.cpr", ios::binary);
    : if (!out)
    : {
    : cout<<"Couldn't open output.cpr.";
    : getch();
    : return 0;
    : }
    : out.write(cdata, strlen((char*)cdata));
    : out.close();
    : cout<<"Done.";
    :
    : delete [] cdata;
    :
    : getch();
    : return 0;
    : }
    :
    : bool containsChar(unsigned char *string, char c)
    : {
    : int size = strlen((char*)string);
    :
    : for (int i=0; i<size; i++)
    : {
    : if (*(string+i) == c)
    : return true;
    : }
    :
    : return false;
    : }
    :
    : int getReps(unsigned char* string, int start, long size)
    : {
    : int reps = 1;
    : int pos = start;
    :
    : while (*(string+pos) == *(string+pos+1))
    : {
    : pos++;
    : reps++;
    : if (pos == size)
    : break;
    : }
    :
    : return reps;
    : }
    :
    : unsigned char* compress(unsigned char* data, long size)
    : {
    : unsigned char* cdata;
    : cdata = new unsigned char[size];
    : char marker;
    :
    : int ascii = 33;
    : while (containsChar(data, (char)ascii))
    : ascii++;
    : marker = (char)ascii;
    : *cdata = marker;
    :
    : int pos1=0, pos2=1, reps; //first char in cdata is the marker
    :
    : while (pos1 < size)
    : {
    : reps = getReps(data, pos1, size);
    :
    : if (reps < 4)
    : {
    : for (int i=0; i<reps; i++)
    : {
    : *(cdata+pos2) = *(data+pos1);
    : pos1++;
    : pos2++;
    : }
    : }
    : else
    : {
    : if (reps > 255)
    : reps = 255;
    : *(cdata+pos2) = marker;
    : *(cdata+pos2+1) = (unsigned char)reps;
    : *(cdata+pos2+2) = *(data+pos1);
    : pos2 += 3;
    : pos1 += reps;
    : }
    : }
    :
    : return cdata;
    : }
    : [/code]
    :
    [green]
    I don't see anything wrong...
    [/green]
  • LundinLundin Posts: 3,711Member
    : : Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
    : :
    :
    : [green]
    : chars are unsigned chars by default, like int are unsigned ints as default.
    : [/green]


    No, it is compiler dependant. Most common is that they are signed. This is the reason why you should always explicitly write signed or unsigned when declaring a variable.
    From ANSI C:

    "The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char."

    /--/

    "CHAR_MIN, defined in , will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the
    other two and is not compatible with either."

  • IDKIDK Posts: 1,784Member
    : : : Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
    : : :
    : :
    : : [green]
    : : chars are unsigned chars by default, like int are unsigned ints as default.
    : : [/green]
    :
    :
    : No, it is compiler dependant. Most common is that they are signed. This is the reason why you should always explicitly write signed or unsigned when declaring a variable.
    : From ANSI C:
    :
    : "The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char."
    :
    : /--/
    :
    : "CHAR_MIN, defined in , will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the
    : other two and is not compatible with either."
    :
    :
    Thanks, I'm always learning something new.
  • tsagldtsagld Posts: 621Member
    : Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
    :
    : [code]
    : #include
    : #include
    : #include
    : #include
    :
    : bool containsChar(unsigned char *string, char c);
    : int getReps(unsigned char* string, int start);
    : unsigned char* compress(unsigned char* data, long size);
    :
    : int main(int argc, char* argv[])
    : {
    : if (argc != 2)
    : {
    : cout<<"Incorrect Usage...";
    : getch();
    : return 0;
    : }
    :
    : char* file;
    : file = argv[1];
    :
    : cout<<"Opening " <<file <<"..." <<endl;
    : ifstream in(file, ios::binary);
    :
    : if (!in)
    : {
    : cout<<"Couldn't read " <<argv[1] <<".";
    : getch();
    : return 0;
    : }
    :
    : in.seekg(0, ios::end);
    : long filesize = in.tellg();
    : in.seekg(0);
    :
    : unsigned char* data;
    : data = new unsigned char[filesize];
    : in.read(data, filesize);
    : in.close();
    : cout<<"File read." <<endl;
    :
    : unsigned char* cdata;
    : cout<<"Compressing..." <<endl;
    : cdata = compress(data, filesize);
    : cout<<"Compressed." <<endl;
    :
    : delete [] data;
    :
    : cout<<"Writing compressed file..." <<endl;
    : ofstream out("output.cpr", ios::binary);
    : if (!out)
    : {
    : cout<<"Couldn't open output.cpr.";
    : getch();
    : return 0;
    : }
    : out.write(cdata, strlen((char*)cdata));
    : out.close();
    : cout<<"Done.";
    :
    : delete [] cdata;
    :
    : getch();
    : return 0;
    : }
    :
    : bool containsChar(unsigned char *string, char c)
    : {
    : int size = strlen((char*)string);
    :
    : for (int i=0; i<size; i++)
    : {
    : if (*(string+i) == c)
    : return true;
    : }
    :
    : return false;
    : }
    :
    : int getReps(unsigned char* string, int start, long size)
    : {
    : int reps = 1;
    : int pos = start;
    :
    : while (*(string+pos) == *(string+pos+1))
    : {
    : pos++;
    : reps++;
    : if (pos == size)
    : break;
    : }
    :
    : return reps;
    : }
    :
    : unsigned char* compress(unsigned char* data, long size)
    : {
    : unsigned char* cdata;
    : cdata = new unsigned char[size];
    : char marker;
    :
    : int ascii = 33;
    : while (containsChar(data, (char)ascii))
    : ascii++;
    : marker = (char)ascii;
    : *cdata = marker;
    :
    : int pos1=0, pos2=1, reps; //first char in cdata is the marker
    :
    : while (pos1 < size)
    : {
    : reps = getReps(data, pos1, size);
    :
    : if (reps < 4)
    : {
    : for (int i=0; i<reps; i++)
    : {
    : *(cdata+pos2) = *(data+pos1);
    : pos1++;
    : pos2++;
    : }
    : }
    : else
    : {
    : if (reps > 255)
    : reps = 255;
    : *(cdata+pos2) = marker;
    : *(cdata+pos2+1) = (unsigned char)reps;
    : *(cdata+pos2+2) = *(data+pos1);
    : pos2 += 3;
    : pos1 += reps;
    : }
    : }
    :
    : return cdata;
    : }
    : [/code]
    :

    You say that the code works fine on text files, and doesn't work on binar files. That seems logical to me, since you use the strlen-function on the data at some places.
    If the data contains a zero-byte, that is interpreted as the end of the string by strlen(), which is probably not what you want.
    Binary data may (or is even likely to) contain zeroes, text-data don't.


    Greets,
    Eric Goldstein
    http://www.gvh-maatwerk.nl


  • saw7988saw7988 Posts: 14Member
    : : Ok, here's my program. I used to have chars instead of unsigned chars but neither method works. (Also a clarification of the use of chars vs unsigned chars would be greatly appreciated.)
    : :
    : : [code]
    : : #include
    : : #include
    : : #include
    : : #include
    : :
    : : bool containsChar(unsigned char *string, char c);
    : : int getReps(unsigned char* string, int start);
    : : unsigned char* compress(unsigned char* data, long size);
    : :
    : : int main(int argc, char* argv[])
    : : {
    : : if (argc != 2)
    : : {
    : : cout<<"Incorrect Usage...";
    : : getch();
    : : return 0;
    : : }
    : :
    : : char* file;
    : : file = argv[1];
    : :
    : : cout<<"Opening " <<file <<"..." <<endl;
    : : ifstream in(file, ios::binary);
    : :
    : : if (!in)
    : : {
    : : cout<<"Couldn't read " <<argv[1] <<".";
    : : getch();
    : : return 0;
    : : }
    : :
    : : in.seekg(0, ios::end);
    : : long filesize = in.tellg();
    : : in.seekg(0);
    : :
    : : unsigned char* data;
    : : data = new unsigned char[filesize];
    : : in.read(data, filesize);
    : : in.close();
    : : cout<<"File read." <<endl;
    : :
    : : unsigned char* cdata;
    : : cout<<"Compressing..." <<endl;
    : : cdata = compress(data, filesize);
    : : cout<<"Compressed." <<endl;
    : :
    : : delete [] data;
    : :
    : : cout<<"Writing compressed file..." <<endl;
    : : ofstream out("output.cpr", ios::binary);
    : : if (!out)
    : : {
    : : cout<<"Couldn't open output.cpr.";
    : : getch();
    : : return 0;
    : : }
    : : out.write(cdata, strlen((char*)cdata));
    : : out.close();
    : : cout<<"Done.";
    : :
    : : delete [] cdata;
    : :
    : : getch();
    : : return 0;
    : : }
    : :
    : : bool containsChar(unsigned char *string, char c)
    : : {
    : : int size = strlen((char*)string);
    : :
    : : for (int i=0; i<size; i++)
    : : {
    : : if (*(string+i) == c)
    : : return true;
    : : }
    : :
    : : return false;
    : : }
    : :
    : : int getReps(unsigned char* string, int start, long size)
    : : {
    : : int reps = 1;
    : : int pos = start;
    : :
    : : while (*(string+pos) == *(string+pos+1))
    : : {
    : : pos++;
    : : reps++;
    : : if (pos == size)
    : : break;
    : : }
    : :
    : : return reps;
    : : }
    : :
    : : unsigned char* compress(unsigned char* data, long size)
    : : {
    : : unsigned char* cdata;
    : : cdata = new unsigned char[size];
    : : char marker;
    : :
    : : int ascii = 33;
    : : while (containsChar(data, (char)ascii))
    : : ascii++;
    : : marker = (char)ascii;
    : : *cdata = marker;
    : :
    : : int pos1=0, pos2=1, reps; //first char in cdata is the marker
    : :
    : : while (pos1 < size)
    : : {
    : : reps = getReps(data, pos1, size);
    : :
    : : if (reps < 4)
    : : {
    : : for (int i=0; i<reps; i++)
    : : {
    : : *(cdata+pos2) = *(data+pos1);
    : : pos1++;
    : : pos2++;
    : : }
    : : }
    : : else
    : : {
    : : if (reps > 255)
    : : reps = 255;
    : : *(cdata+pos2) = marker;
    : : *(cdata+pos2+1) = (unsigned char)reps;
    : : *(cdata+pos2+2) = *(data+pos1);
    : : pos2 += 3;
    : : pos1 += reps;
    : : }
    : : }
    : :
    : : return cdata;
    : : }
    : : [/code]
    : :
    :
    : You say that the code works fine on text files, and doesn't work on binar files. That seems logical to me, since you use the strlen-function on the data at some places.
    : If the data contains a zero-byte, that is interpreted as the end of the string by strlen(), which is probably not what you want.
    : Binary data may (or is even likely to) contain zeroes, text-data don't.
    :
    :
    : Greets,
    : Eric Goldstein
    : http://www.gvh-maatwerk.nl
    :
    :
    :

    Oh wow, the strlen thing didn't even occur to me with zero bytes, I'll try to change that and see if it helps. Thanks
Sign In or Register to comment.