Efficient text file parsing - Programmers Heaven

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Efficient text file parsing

thegreenstarthegreenstar Posts: 173Member
I want to make a parser that will load info into a struct from a file. The format looks like this:

[code]
%Block Number One%
3
000.222.112.003.005.013+
321.012.012.350.015.015+
556.651.651.854.984.120+
[/code]

Where the first line (%Block Number One%) is loaded into an std::string, 3 is the number of block lines, and each block line is subdivided into cells. The cells are divided by .s, with + denoting a newline. I would actually rather just eliminate the + and use newline insead. Anyway, it is supposed to load each 'cell' into a struct containing three integers, and each number (like each 0 in 000, or the ones and the two in 112) is loaded into one of the three integers, to if we had

[code]
struct cell
{
int one, two, three;
};
[/code]


if we loaded 152 into it, one would equal 1, two would equal 5, and three would equal 2. Also, it needs to set one, two and three to -1 if the separator is a + sign (or a newline, which would be better). Sorry about the huge post, but I have no idea how to do this efficiently. Thanks.

Comments

  • blitzblitz Posts: 620Member
    : I want to make a parser that will load info into a struct from a file. The format looks like this:
    :
    : [code]
    : %Block Number One%
    : 3
    : 000.222.112.003.005.013+
    : 321.012.012.350.015.015+
    : 556.651.651.854.984.120+
    : [/code]
    :
    : Where the first line (%Block Number One%) is loaded into an std::string, 3 is the number of block lines, and each block line is subdivided into cells. The cells are divided by .s, with + denoting a newline. I would actually rather just eliminate the + and use newline insead. Anyway, it is supposed to load each 'cell' into a struct containing three integers, and each number (like each 0 in 000, or the ones and the two in 112) is loaded into one of the three integers, to if we had
    :
    : [code]
    : struct cell
    : {
    : int one, two, three;
    : };
    : [/code]
    :
    :
    : if we loaded 152 into it, one would equal 1, two would equal 5, and three would equal 2. Also, it needs to set one, two and three to -1 if the separator is a + sign (or a newline, which would be better). Sorry about the huge post, but I have no idea how to do this efficiently. Thanks.
    :

    This assumes that you have the file already open in text mode and the file pointer is positioned at the first line to be read:
    [code]
    void read_lines(FILE *f, size_t n_lines, vector &cells)
    {
    char buff[1024], *p;
    cell tmp_cell;

    while (n_lines--) {
    fgets(buff, sizeof buff, f);

    p = buff;
    do {
    tmp_cell.one = p[0] - '0';
    tmp_cell.two = p[1] - '0';
    tmp_cell.three = p[2] - '0';

    cells.push_back(tmp_cell);
    p += 4;
    } while (p[-1] != '+');

    tmp_cell.one = tmp_cell.two = tmp_cell.three = -1;
    cells.push_back(tmp_cell);
    }
    }
    [/code]
    It also assumes that the lines have a maximum determined length (1024 in this case) - for arbitrary long lines, you may have to read the input character by character...

    Also, here's a test function that uses the values stored in the cells vector to get back to the initial format (from the input file):
    [code]
    void print_cells(vector const &cells)
    {
    char buff[] = "000";
    bool first = true;

    for (vector::const_iterator it = cells.begin(); it != cells.end(); ++it)
    if (it->one >= 0) {
    buff[0] = '0' + it->one;
    buff[1] = '0' + it->two;
    buff[2] = '0' + it->three;
    if (!first) putchar('.'); else first = false;
    printf("%s", buff);
    } else {
    puts("+"); first = true;
    }
    }
    [/code]

    Regards,
    Blitz

Sign In or Register to comment.