[advanced] Reading from a file, part I

Hi guys, am I the first girl to post here :) ?

Ok, let's get to the point:

let's say there is a text file with this type of

content:

[string] [int] [char] [int] [char] [string]

...

...

or

[Beta] [5],[0] [;] [beta load 4.0 build 127]

...

...

(not always in this order; ints might be first)


How would I recognize whether the data I'm reading is

a string (char[]), integer, double, or a single char.

Notice: every thing after the character [;] is considered

a comment.

So, that the string "beta load 4.0 ..." is a "full"

string; numbers 4.0 and 127 are part of it (so I would

read it as a string)


Any ideas? I'm kinda new to files...


Comments

  • : Hi guys, am I the first girl to post here :) ?


    Probably not, but probably one of the first to announce it.




    : Ok, let's get to the point:


    Damn fine idea.




    : let's say there is a text file with this type of

    : content:

    : [string] [int] [char] [int] [char] [string]

    : ...

    : ...

    : or

    : [Beta] [5],[0] [;] [beta load 4.0 build 127]

    : ...

    : ...

    : (not always in this order; ints might be first)


    : How would I recognize whether the data I'm reading is

    : a string (char[]), integer, double, or a single char.


    Well, without getting down into language grammars, there's still a relatively simple way to explain it. However, it requires that your data be unambiguous. For example, if either of your 'int' fields can come in any order, how do you tell which number means what?


    If, before the comment, any field can come in any position then you must have a way to differentiate the different inputs. The easiest way to tell the difference between a string and a number is to decide that if it starts with a letter, then its a string, and if it starts with a number, then its an integer. I know this is insufficient to deal with floating point numbers in addition to ints, but bear with me -- this simple case may give you ideas for more complex cases.


    For each field, I'd recommend reading until the first non-whitespace character using getc() in C or stream.get() for some stream in C++. The moment you detect a non-whitespace character, examine it.


    If it's a comment, then you can immediately break out of your field processing loop and deal with the comment separately. If it's a number or a letter, then in both cases put the character back into the stream (ungetc() in C, and stream.putback() in C++). This will put the character back into the stream for reading in the number or string.


    Once you've put back the number or letter, you can then use standard string reading or number reading (fscanf in C, or using the >> operator with a stream in C++). The put-back character will be a part of the input. You don't have to handle it separately.


    The problem with this technique is that you are only guaranteed to be able to put back a single character into a stream. Therefore, if you need to look further ahead that a single character to find out whether to read in an int or a double or a string (which is true), you'll have to try the following:


    Read in characters until you reach a non-whitespace. From that point on, read characters until you reach a whitespace, putting the read characters into a buffer. When you're finished reading this 'whitespace delimited' input, you can null-terminate the character buffer you're dumping into. You can then examine the character buffer to decide if it's a number or a string, and if it's a number to determine whether or not there's a decimal point. If it's a number, then you can use sscanf or an istrstream object to parse the buffer and extract the relevant value.


    You could also analyze the buffer when it's a number and extract the value by yourself, but sscanf or an istrstream object already have this capability, so it is probably easier to go this route.


    Obviously, the more complex your input, or the more complex your recognition of different input types, the harder it will be to create a rock-solid input mechanism.


    There are techniques for parsing data that don't require you to maintain a buffer like this, but they're a bit tougher to implement. I'll leave this idea out, for the moment. If you want to discuss it, post. :)



    : Notice: every thing after the character [;] is considered

    : a comment.

    : So, that the string "beta load 4.0 ..." is a "full"

    : string; numbers 4.0 and 127 are part of it (so I would

    : read it as a string)


    Well, no matter what technique you use, once you hit a ;, you can deal with the rest of the line as you see fit (as a string, I guess).


    : Any ideas? I'm kinda new to files...


    I hope this helps.


    Good luck.




Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories