Problems with creating an index - Programmers Heaven

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.

Problems with creating an index

sandratzsandratz Posts: 18Member
I wish my assignment was easy this semester but it isn't; it's very hard and complicated. I've been working in it for weeks now and my program still has bugs and logical errors.
I was wondering if you could please help me with this assignment. I would be very grateful.

I've included below a copy of what is expected for this assignmne , and below that, I've copied and pasted my code.

I commented the codes which gave compilation errors. I used Borland to compile the program.
It has the following bugs:
-It doesn't append words to the left and right nodes
-It doesn't store the page numbers into the linked list
-It doesn't filter out noise words
-It stores duplicate words
-The query function has problems

I would kindly appreciate your help.

Thanks in advance,

Introduction
This assignment deals with the construction of an index for a text book. You are provided with an ilustration of an Abstract Data Type (ADT) - index, with a well-defined interface.
You will need to define structs to represent this ADT, and use typedefs to create the ADT Index. The interface of the ADT consists of the following function prototypes.
void IndexInit(Index*);
void IndexAppend(Index*, char*, int);
void IndexPrint(Index*);
int IndexQuery(Index*, char*, int**);
void IndexDestroy(Index*);


Task Description
Part A - Implementation of the Client Program
Write a program to build an index for a text book and allow a user to query the index, i.e. to find page numbers given a word.
In detail, the program must perform the following tasks.
Declare an instance of Index ADT.
Read a chapter of the book titled "Open Sources - Voices of Open Source Revolution" the chapter is titled A Brief History of Hackerdom <./book.txt> , tokenise the input into words using the strtok() function, and construct the index. the input text file is separated into pages using tages. (Roughly there are 40 lines per page in the input file). So for example, page 3 begins immediately after the second tag.
The following code fragment illustrates how strtok() function can be used to tokenise a string into words.
#define SEPARATORS " .,?"
"
....
....
char *line, *word;
....
....

gets(line);
if (strlen(line)<=1) break;
word = strtok(line,SEPARATORS);
while(word!=NULL)
{
printf("%s
",word);
word = strtok(NULL,SEPARATORS);
}
....

Note that, not all the words in the book are inserted to the index. For each of the words tokenised from the book, check if it appears in the common-words file stopped-words.txt <./stopped-words.txt> and insert only if it does not appear in the common words file.
Display the entire index of the book on the standard output, each entry in a separate line.
Let the user to enter a word from the standard input and output the list of page numbers on the standard output. The output should contain only the input word and a comma separated list of page numbers. If the word appears in more than two adjacent pages, the list should be displayed as a range, as illustrated below. No other junk should be sent to the output. Let the user to continue querying, until he enters the word 'enough'.
Destroy the entire index.
Part B - Implementation of the Index ADT
Implement the above data structure using the binary tree-based representation and the associated functions to manipulate the ADT Index illustrated in Figure 01.
The Index ADT is a homogeneous collection of records, where each record comprises of a word and a list of page numbers that word appear in the text. The list of page numbers for each word is stored in an integer linked-list (as illustrated vertically in the diagram), and the list of words are themselves stored in the nodes of the binary tree.
The task of each function in the interface is given below.
IndexInit - Initialises an Index instance. It initialises the 'words' pointer (top-level pointer) to NULL, and the number of words to 0.
IndexAppend - appends an entry into the index. It first checks if the word already exists in the words tree, and if it exists, adds the page number into the page number list of that word. However, if the entered page number already exists in the corresponding list, it simply discards the new entry. If the word doesn't exist in the words list, it appends a new node into the top-level list and appends a node into its (new node's) page number list.
In short, the data structure does not allow duplicates either in the top-level (words) tree or page numbers lists. Top-level tree is maintained in the alphabatical order of words, and page number lists are maintained in the ascending order of page numbers. Each insertion should adhere these rules.
IndexPrint - displays the entire index, as appeared at the end of a text book, i.e. each indexed word in a separate line, followed by a comma separated ordered list of page numbers. If more than two adjacent page numbers appear in a page list, they should be displayed as a range, i.e. if the term 'Abstract' appear in pages 10, 45, 46, 47 and 48, it should be displayed as;
Abstract 10, 45 - 48.
The output should be in the alphabatical order of terms.
IndexQuery - reads in an Index ADT instance and a word as the input and returns the list of page numbers on which the word appears. The thrid argument in the prototype is an array of integers, which is used to return the results set. The function also returns the number of page numbers n the results set as well. (The return value of the function is used for this purpose).
IndexDestroy - as its name explains, it destroys the entire ADT instance and deallocates memory used for the whole data structure.
You may define as many local functions as necessary to construct the above functions in a well structured form, however, the ADT interface should include only the above-listed functions. In other words, any client program should access the Index ADT only using these functions, not any other local functions.
You must use a Makefile to compile the ADT implementation and the client program together, and your submission must include the Makefile used to compile your program.


IN MY HEADER FILE, I TYPED THE FOLLOWING:

//asgn.h
struct pagelist {
struct pagelist *next;
int page;
};

struct node {
char *word;
struct pagelist *head;
struct node *left;
struct node *right;
int size;
};

typedef struct node NODE;
typedef struct node *Index;

void IndexInit(Index *);
void IndexAppend(Index*, char*,int);
void IndexPrint(Index *);
int IndexQuery(Index*,char*,int**);
void IndexDestroy(Index *);

My implementation is as follows:
//asgn.c



#include
#include
#include
#include
#include "asgn.h"

#define MAX 10000

static char buff[MAX] = {0};

void IndexInit(Index *);
void IndexAppend(Index *, char *,int);
void IndexPrint(Index *);
int IndexQuery(Index*,char*,int**);
int noise(char *s);
char *tokeniser(char **, char *);
char *char_in_string(char *, int );
int compareString(const char *, const char *);
int PageNumbers(struct pagelist *);
struct pagelist *addPageNo(int);
int findWord (Index *, char);
void printPageNumbers(struct pagelist *);
char *GetFile1(char *,int, FILE *);
char *GetFile2(char *,int, FILE *);

char *word;
char *s=NULL;
char *SEPARATORS = " ";

void IndexPrint(Index *);


void IndexInit(Index *ind) {


(*ind) -> word = NULL;
(*ind) -> size = 0;
printf("Init. success.");
}

void IndexAppend(Index *ind, char *word, int page) {
int diff;
//struct pagelist *pl = malloc(sizeof(*pl));
*ind = (Index)malloc(sizeof(NODE));


diff = compareString((*ind)->word, word);
if ((*ind) !=NULL) {

(*ind)->word = word;
printf("%s ",(*ind)->word);
printf("%d
",page);
(*ind)->head = addPageNo(page);
//IndexPrint(&ind);

}
else
if (diff > 0) {
IndexAppend(&((*ind)->left), word, page);
}
else
IndexAppend(&((*ind)->right), word, page);


}

void IndexPrint(Index *ind) {

if ((*ind) != NULL) {
//printf("%s ", (ind)->word);
//printPageNumbers((*ind)->head);
}
}


int IndexQuery(Index *ind,char *word,int **results) {

//if (!findWord(&ind,*word)) {

// printf("Word not found.");

//}
//else *results = findWord(&ind,*word);

return **results;
}


void IndexDestroy(Index *ind) {
if((*ind) != NULL) {
free(*ind);
*ind=NULL;
}
}




int noise(char *c) {
int found = 0;

//while (GetFile2(buff, sizeof(buff),stdin) != NULL) {
//s=buff;
//while ((word=tokeniser(&s,SEPARATORS)) != NULL) {
if (c==s)
found=1;
//}
//}
return found;

}

char *tokeniser(char **s, char *delims) {
char *p = NULL;
char *q;

if ((s != NULL) && (*s != '') && (delims != NULL)) {
//pass over leading delimeters
while (NULL != char_in_string(delims, **s)) {
++*s;
}
if (**s != '') {
q = *s + 1;
p = *s;
while (*q != '' && NULL == char_in_string(delims,*q)) {
++q;
}
*s = q + (*q != '');
*q='';
}
}
return p;
}

char *char_in_string(char *s, int c) {
char *p = NULL;

if (s != NULL) {
if (c != '') {
while (*s != '' && *s != c) {
++s;
}
if (*s == c) {
p = s;
}
}
}
return p;
}

int compareString(const char *s, const char *t) {
int diff = 0;
char cs;
char ct;

while (diff==0 && *s !='' && *t != '') {
cs = tolower((unsigned char)*s);
ct = tolower((unsigned char) *t);

if (cs < ct) {
diff=-1;
}
else if (cs>ct) {
diff = 1;
}
++s;
++t;
}

if (diff==0 && *s != *t) {
if (*s == '') {
diff =-1;
}
else {
diff=1;
}
}
return diff;
}


struct pagelist* addPageNo(int page) {
struct pagelist *new = malloc(sizeof(*new));

if (new !=NULL) {
new->page = page;
}
return new;
}


void printPageNumbers(struct pagelist *pl){
if (pl != NULL) {
printPageNumbers(pl->next);
printf("%d ", pl->page);
}
}


int PageNumbers(struct pagelist *pl){
int **result;

if (pl != NULL) {
PageNumbers(pl->next);
**result = pl->page;
}
return **result;
}


char *GetFile1(char *s,int n, FILE *fp) {
int c;

char *p=s;

fp = fopen("book.txt","r");
while ( (--n > 0) && (c=getc(fp)) !=EOF) {

*p++ = c;


}
return p;
}


char *GetFile2(char *s,int n, FILE *fp) {
int c;

char *p=s;

fp = fopen("stopped-words.txt","r");
while ( (--n > 0) && (c=getc(fp)) !=EOF) {

*p++ = c;


}
return p;
}

//int findWord (Index *ind, char w) {
//int **results;

//if (ind != NULL) {
//if ((*ind)->word == w){
//*results = PageNumbers((*ind)->head);
//}
//}
//else if ((*ind)->word != w)
// findWord(&((*ind)->left), w);
//else findWord(&((*ind)->right), w);

//return *results;
//}


THE DRIVER IS AS FOLLOWS:
#include
#include
#include
#include
#include "asgn.c"



void main(void) {
Index ind;


#define MAXLINE 40000

char *s=NULL;

static char buffer[MAXLINE] = {0};

char *word;
char *pg="END";
int pagenum = 1;
char answer ='y';
char *w=NULL;
int **results;

char *SEPARATORS = "{}. ,?;";


//IndexInit(&ind);



while (GetFile1(buffer, sizeof(buffer),stdin) != NULL) {

s=buffer;
if (s==pg) {
++pagenum;
}
while ((word=tokeniser(&s,SEPARATORS)) != NULL) {


if (noise(word) == 0) {
IndexAppend(&ind,word,pagenum);
}


else {
continue;
}
}

}


while (answer != 'n') {
printf("Enter a word to be searched: ");
scanf("%s", &w);
**results = IndexQuery(&ind,w,results);

printf("Continue with Query? ");
scanf("%c", &answer);
}

IndexDestroy(&ind);
}




Comments

  • LundinLundin Posts: 3,711Member
    What is the problem? What is that you don't understand?
    Ask a specific question and use code tags when posting code:

    [leftbr]code[rightbr]
    You code here.
    [leftbr]/code[rightbr]

  • bilderbikkelbilderbikkel Posts: 754Member
    (Please use code tags next time)

    For debugging tricks, e.g. look at www.CodePedia.com/1/CppDebugging .
    Why do you use char* so often instead of std::string (see www.CodePedia.com/1/CppString) ?
    Try to use more assert statements (see www.CodePedia.com/1/Assert), they will help you find errors for you.

    Also, please ask a specific question.
    bilderbikkel

  • stoberstober Posts: 9,765Member ✭✭✭
    : I wish my assignment was easy this semester but it isn't; it's very hard and complicated. I've been working in it for weeks now and my program still has bugs and logical errors.


    Welcome to the real-world of programming! After graduation from college you can expect those types of problems (and even bigger ones) for the next 40 or so years.
  • WerewolfWareWerewolfWare Posts: 304Member
    [b][red]This message was edited by WerewolfWare at 2005-9-27 4:44:27[/red][/b][hr]
    [b][red]This message was edited by WerewolfWare at 2005-9-27 4:43:11[/red][/b][hr]
    I don't know if you've tried this already, but I find it very useful, when I counter an error, to put the entire code under a comment except for the first function you've written and check if every individual function work's properly. If this doesn't work, print out all the files, delete them from your computer, realgorithemize and start over again. I know this may sound stupid, but it actually works.

    Nobody here will do your work for you - meaning: don't ask for help with long codes (even if they are easy to figure out).

    Hope I helped , WEREWOLF.

    P.S.
    Here are some possible errors I've encounter (I'm writing from experience with TCLite so perhapse these corrections are'nt valid for your compiler):
    1.[code](*ind)->word = word;[/code]
    this is wrong. The proper way to address a certain field in a
    struct pointer is:
    I. ind->word
    II. (*ind).word
    Another correction in this line is: you can't assign the value of
    one string to another with the operator '='.

    2.[code]diff = compareString((*ind)->word, word);[/code]
    I think you are trying to address strcmp since as far as I know,
    there is no function (as far as I know) called compareString()
    And again , remark 1 is definetly valid in this line as well.

    3.The sequance:
    [code]diff = compareString((*ind)->word, word);
    if ((*ind) !=NULL) { [/code]

    is not efficient in terms of time - since you don't need diff in
    the if block, you should move it to the end of the block.

    4.[code]IndexAppend(&((*ind)->left), word, page);[/code]
    Again comment no.1, and also, since the field left is a pointer, the
    proper way to transfer it (as far as I remember) is:
    [code]IndexAppend(ind->left , word , page)[/code]

    5.[code]printf("%s ", (ind)->word); [/code]
    You don't need to put () everytime you want to address a field.
    [code]printf("%s ", ind->word); [/code]
    will do just fine.


    At this point I stopped checking because your code is long and I taught myself C programming when I was in the 8th grade with a book suitable for people who already learnt C once, so my programming is not as perfect as it could be. Plus, I'm only in high school so I doubt we have the same way of thinking about programming.

    I hope you take my corrections to your attention and comment back if you think I'm wrong so I will know better for next time.

    WEREWOLF



  • stoberstober Posts: 9,765Member ✭✭✭
    [b][red]This message was edited by stober at 2005-9-27 6:28:53[/red][/b][hr]
    : I don't know if you've tried this already, but I find it very useful, when I counter an error, to put the entire code under a comment except for the first function you've written and check if every individual function work's properly. If this doesn't work, print out all the files, delete them from your computer, realgorithemize and start over again. I know this may sound stupid, but it actually works.
    :

    "supiid" ? no. I've done that too. C/C++ compilers are a lot better at identifying errors than other types of compilers. For example, and SQL compiler I use at work sometimes just says "error near #", meaning that an error occurred somewhere between the first occurrence if the character '#' and end-of-file. Commenting out large blocks of code is about the only way to find such stupid err messages.





  • sandratzsandratz Posts: 18Member
    : What is the problem? What is that you don't understand?
    : Ask a specific question and use code tags when posting code:

    I've encountered the following problems with my code & :
    I commented the codes which gave compilation errors. I used Borland to compile the program.

    It has the following bugs and I don't know how to fix them:
    -It doesn't append words to the left and right nodes
    -It doesn't store the page numbers into the linked list
    -It doesn't filter out noise words
    -It stores duplicate words
    -The query function has problems
    I've include the requirements for my assignment and what is required so that you can help and my code; I used 3 files; asgn1.h, asgn1.c and asgndriver.c Can u pls tell me where I went wrong?


    Here is an introduction of the assignment
    This assignment, deals with the construction of an index for a text book. The Abstract Data Type (ADT) - index, which is illustrated in the Figure 01, is to be implemented as a top-down Red-Black Tree.
    An instance of ADT Index can accommodate the entire index of a book, i.e. an ordered list of distinct words and for each word, an ordered list of distinct page numbers.


    You will need to define structs to represent this ADT, and use typedefs to create the ADT Index. The interface of the ADT consists of the following function prototypes.
    void IndexInit(Index*);
    void IndexAppend(Index*, char*, int);
    void IndexPrint(Index*);
    int IndexQuery(Index*, char*, int**);
    void IndexDestroy(Index*);

    Task Description
    Part A - Implementation of the Client Program
    Write a program to build an index for a text book and allow a user to query the index, i.e. to find page numbers given a word.
    In detail, the program must perform the following tasks.
    1. Declare an instance of Index ADT.
    2. Read a chapter of the book titled "Open Sources - Voices of Open Source Revolution" The chapter is titled A Brief History of Hackerdom , tokenise the input into words using the strtok() function, and construct the index. The input text file is separated into pages using tags. (Roughly there are 40 lines per page in the input file). So for example, page 3 begins immediately after the second tag.
    The following code fragment illustrates how strtok() function can be used to tokenise a string into words.
    3. #define SEPARATORS " `~!@#$%^&*()_-+=[{]}|;:'",./?"
    4. ....
    5. ....
    6. char *line, *word;
    7. ....
    8. ....
    9.
    10. gets(line);
    11. if (strlen(line)<=1) break;
    12. word = strtok(line,SEPARATORS);
    13. while(word!=NULL)
    14. {
    15. printf("%s
    ",word);
    16. word = strtok(NULL,SEPARATORS);
    17. }
    18. ....
    19.
    Note that, not all the words in the book are inserted to the index. For each of the words tokenised from the book, check if it appears in the common-words file stopped-words.txt and insert only if it does not appear in the common words file.
    20. Display the entire index of the book on the standard output, each entry in a separate line.
    21. Let the user to enter a word from the standard input and output the list of page numbers on the standard output. The output should contain only the input word and a comma separated list of page numbers. If the word appears in more than two adjacent pages, the list should be displayed as a range, as illustrated below. No other junk should be sent to the output. Let the user to continue querying, until he enters the word 'enough'.
    22. Destroy the entire index.
    Part B - Implementation of the Index ADT
    Implement the above data structure using a top-down Red-Black Tree representation and the associated functions to manipulate the ADT Index illustrated in Figure 01. Note that, to accommodate easy navigation upwards the tree, each node contains a link to its parent node. You must include a flag (possibly a char) to indicate the colour of the node. You must implement the balancing rotations and colour flips using a non-recursive approach.
    The Index ADT is a homogeneous collection of records, where each record comprises of a word and a list of page numbers that word appear in the text. The list of page numbers for each word is stored in an integer linked-list (as illustrated vertically in the diagram), and the list of words are themselves stored in the nodes of the Red-Black tree.
    The task of each function in the interface is given below.
    1. IndexInit - Initialises an Index instance. It initialises the 'words' pointer to NULL, and the number of words to 0.
    2. IndexAppend - appends an entry <word, page> into the index. It first checks if the word already exists in the words tree, and if it exists, adds the page number into the page number list of that word. However, if the entered page number already exists in the corresponding list, it simply discards the new entry. If the word doesn't exist in the words tree, it appends a new node into the Red-Black tree and appends a node into its (new node's) page number list.
    In short, the data structure does not allow duplicates either in the top-level (words) tree or page numbers lists. Top-level tree is maintained in the alphabetically ordered Red-Black tree of words, and page number lists are maintained in the ascending order of page numbers. Each insertion should adhere these rules.
    3. IndexPrint - displays the entire index, as appeared at the end of a text book, i.e. each indexed word in a separate line, followed by a comma separated ordered list of page numbers. If more than two adjacent page numbers appear in a page list, they should be displayed as a range, i.e. if the term 'Abstract' appear in pages 10, 45, 46, 47 and 48, it should be displayed as;
    4. Abstract 10, 45 - 48.
    The output should be in the alphabetical order of words.
    5. IndexQuery - reads in an Index ADT instance and a word as the input and returns the list of page numbers on which the word appears. The third argument in the prototype is an array of integers, which is used to return the results set. The function also returns the number of page numbers n the results set as well. (The return value of the function is used for this purpose).
    6. IndexDestroy - as its name explains, it destroys the entire ADT instance and deallocates memory used for the whole data structure.
    You may define as many local functions as necessary to construct the above functions in a well structured form, however, the ADT interface should include only these functions. In other words, any client program should access the Index ADT only using these functions, not any other local functions.
    You must use a Makefile to compile the ADT implementation and the client program together, and your submission must include the Makefile used to compile your program.


    : [leftbr]code[rightbr]

    MY CODE:
    header file: asgn1.h
    struct pagelist {
    struct pagelist *next;
    int page;
    };

    struct node {
    char *word;
    struct pagelist *head;
    struct node *left;
    struct node *right;
    int size;
    };

    typedef struct node NODE;
    typedef struct node *Index;

    void IndexInit(Index *);
    void IndexAppend(Index*, char*,int);
    void IndexPrint(Index *);
    int IndexQuery(Index*,char*,int**);
    void IndexDestroy(Index *);

    implementation: asgn1.c

    #include
    #include
    #include
    #include
    #include "index1.h"

    #define MAX 10000

    static char buff[MAX] = {0};

    void IndexInit(Index *);
    void IndexAppend(Index *, char *,int);
    void IndexPrint(Index *);
    int IndexQuery(Index*,char*,int**);
    int noise(char *s);
    char *tokeniser(char **, char *);
    char *char_in_string(char *, int );
    int compareString(const char *, const char *);
    int PageNumbers(struct pagelist *);
    struct pagelist *addPageNo(int);
    int findWord (Index *, char);
    void printPageNumbers(struct pagelist *);
    char *GetFile1(char *,int, FILE *);
    char *GetFile2(char *,int, FILE *);

    char *word;
    char *s=NULL;
    char *SEPARATORS = " ";

    void IndexPrint(Index *);


    void IndexInit(Index *ind) {


    (*ind) -> word = NULL;
    (*ind) -> size = 0;
    printf("Init. success.");
    }

    void IndexAppend(Index *ind, char *word, int page) {
    int diff;
    struct pagelist *pl = malloc(sizeof(*pl));
    *ind = (Index)malloc(sizeof(NODE));


    diff = compareString((*ind)->word, word);
    if ((*ind) !=NULL) {

    (*ind)->word = word;
    printf("%s ",(*ind)->word);
    printf("%d
    ",page);
    (*ind)->head = addPageNo(page);
    IndexPrint(&ind);

    }
    else
    if (diff > 0) {
    IndexAppend(&((*ind)->left), word, page);
    }
    else
    IndexAppend(&((*ind)->right), word, page);


    }

    void IndexPrint(Index *ind) {

    if ((*ind) != NULL) {
    printf("%s ", (ind)->word);
    printPageNumbers((*ind)->head);
    }
    }


    int IndexQuery(Index *ind,char *word,int **results) {

    if (!findWord(&ind,*word)) {

    printf("Word not found.");

    }
    else *results = findWord(&ind,*word);

    return **results;
    }


    void IndexDestroy(Index *ind) {
    if((*ind) != NULL) {
    free(*ind);
    *ind=NULL;
    }
    }




    int noise(char *c) {
    int found = 0;

    while (GetFile2(buff, sizeof(buff),stdin) != NULL) {
    s=buff;
    while ((word=tokeniser(&s,SEPARATORS)) != NULL) {
    if (c==s)
    found=1;
    }
    }
    return found;

    }

    char *tokeniser(char **s, char *delims) {
    char *p = NULL;
    char *q;

    if ((s != NULL) && (*s != '') && (delims != NULL)) {
    //pass over leading delimeters
    while (NULL != char_in_string(delims, **s)) {
    ++*s;
    }
    if (**s != '') {
    q = *s + 1;
    p = *s;
    while (*q != '' && NULL == char_in_string(delims,*q)) {
    ++q;
    }
    *s = q + (*q != '');
    *q='';
    }
    }
    return p;
    }

    char *char_in_string(char *s, int c) {
    char *p = NULL;

    if (s != NULL) {
    if (c != '') {
    while (*s != '' && *s != c) {
    ++s;
    }
    if (*s == c) {
    p = s;
    }
    }
    }
    return p;
    }

    int compareString(const char *s, const char *t) {
    int diff = 0;
    char cs;
    char ct;

    while (diff==0 && *s !='' && *t != '') {
    cs = tolower((unsigned char)*s);
    ct = tolower((unsigned char) *t);

    if (cs < ct) {
    diff=-1;
    }
    else if (cs>ct) {
    diff = 1;
    }
    ++s;
    ++t;
    }

    if (diff==0 && *s != *t) {
    if (*s == '') {
    diff =-1;
    }
    else {
    diff=1;
    }
    }
    return diff;
    }


    struct pagelist* addPageNo(int page) {
    struct pagelist *new = malloc(sizeof(*new));

    if (new !=NULL) {
    new->page = page;
    }
    return new;
    }


    void printPageNumbers(struct pagelist *pl){
    if (pl != NULL) {
    printPageNumbers(pl->next);
    printf("%d ", pl->page);
    }
    }


    int PageNumbers(struct pagelist *pl){
    int **result;

    if (pl != NULL) {
    PageNumbers(pl->next);
    **result = pl->page;
    }
    return **result;
    }


    char *GetFile1(char *s,int n, FILE *fp) {
    int c;

    char *p=s;

    fp = fopen("book.txt","r");
    while ( (--n > 0) && (c=getc(fp)) !=EOF) {

    *p++ = c;


    }
    return p;
    }


    char *GetFile2(char *s,int n, FILE *fp) {
    int c;

    char *p=s;

    fp = fopen("stopped-words.txt","r");
    while ( (--n > 0) && (c=getc(fp)) !=EOF) {

    *p++ = c;


    }
    return p;
    }

    int findWord (Index *ind, char w) {
    int **results;

    if (ind != NULL) {
    if ((*ind)->word == w){
    *results = PageNumbers((*ind)->head);
    }
    }
    else if ((*ind)->word != w)
    findWord(&((*ind)->left), w);
    else findWord(&((*ind)->right), w);

    return *results;
    }

    driver: asgndriver.c

    #include
    #include
    #include
    #include
    #include "test.c"



    void main(void) {
    Index ind;


    #define MAXLINE 40000

    char *s=NULL;

    static char buffer[MAXLINE] = {0};

    char *word;
    char *pg="END";
    int pagenum = 1;
    char answer ='y';
    char *w=NULL;
    int **results;

    char *SEPARATORS = "{}. ,?;";


    //IndexInit(&ind);



    while (GetFile1(buffer, sizeof(buffer),stdin) != NULL) {

    s=buffer;
    if (s==pg) {
    ++pagenum;
    }
    while ((word=tokeniser(&s,SEPARATORS)) != NULL) {


    if (noise(word) == 0) {
    IndexAppend(&ind,word,pagenum);
    }


    else {
    continue;
    }
    }

    }


    while (answer != 'enough') {
    printf("Enter a word to be searched: ");
    scanf("%s", &w);
    **results = IndexQuery(&ind,w,results);

    printf("Continue with Query? ");
    scanf("%c", &answer);
    }

    IndexDestroy(&ind);
    }




    : [leftbr]/code[rightbr]
    :
    :

  • sandratzsandratz Posts: 18Member
    Hello,
    Thank for your time.
  • sandratzsandratz Posts: 18Member
    Thank for the reply.
    How can I insert words from one file(book.txt) into an ADT such as the one below, and store the words from that file into the root, left and right nodes? There is another file(noise.txt), from which we have to compare with book.txt; if words from book.txt appear in noise.txt, how can I prevent them from being stored in the nodes and also, how can I prevent duplicate words from being stored in the nodes? How can I store page numbers in the linkes list using this ADT? I'll include my code below. It doesn't append anything to the left and right nodes, it stores duplicate words and as for page numbers, only 2 zeros appear.

    struct node {
    char *word;
    struct pagelist *head;
    struct node *left;
    struct node *right;
    int size;
    };

    typedef struct node NODE;
    typedef struct node *Index;

    void IndexInit(Index *);
    void IndexAppend(Index*, char*,int);
    void IndexPrint(Index *);
    int IndexQuery(Index*,char*,int**);
    void IndexDestroy(Index *);


    implementation: asgn1.c

    #include
    #include
    #include
    #include
    #include "index1.h"

    #define MAX 10000

    static char buff[MAX] = {0};

    void IndexInit(Index *);
    void IndexAppend(Index *, char *,int);
    void IndexPrint(Index *);
    int IndexQuery(Index*,char*,int**);
    int noise(char *s);
    char *tokeniser(char **, char *);
    char *char_in_string(char *, int );
    int compareString(const char *, const char *);
    int PageNumbers(struct pagelist *);
    struct pagelist *addPageNo(int);
    int findWord (Index *, char);
    void printPageNumbers(struct pagelist *);
    char *GetFile1(char *,int, FILE *);
    char *GetFile2(char *,int, FILE *);

    char *word;
    char *s=NULL;
    char *SEPARATORS = " ";

    void IndexPrint(Index *);


    void IndexInit(Index *ind) {


    (*ind) -> word = NULL;
    (*ind) -> size = 0;
    printf("Init. success.");
    }

    void IndexAppend(Index *ind, char *word, int page) {
    int diff;
    struct pagelist *pl = malloc(sizeof(*pl));
    *ind = (Index)malloc(sizeof(NODE));


    diff = compareString((*ind)->word, word);
    if ((*ind) !=NULL) {

    (*ind)->word = word;
    printf("%s ",(*ind)->word);
    printf("%d
    ",page);
    (*ind)->head = addPageNo(page);
    IndexPrint(&ind);

    }
    else
    if (diff > 0) {
    IndexAppend(&((*ind)->left), word, page);
    }
    else
    IndexAppend(&((*ind)->right), word, page);


    }

    void IndexPrint(Index *ind) {

    if ((*ind) != NULL) {
    printf("%s ", (ind)->word);
    printPageNumbers((*ind)->head);
    }
    }


    int IndexQuery(Index *ind,char *word,int **results) {

    if (!findWord(&ind,*word)) {

    printf("Word not found.");

    }
    else *results = findWord(&ind,*word);

    return **results;
    }


    void IndexDestroy(Index *ind) {
    if((*ind) != NULL) {
    free(*ind);
    *ind=NULL;
    }
    }




    int noise(char *c) {
    int found = 0;

    while (GetFile2(buff, sizeof(buff),stdin) != NULL) {
    s=buff;
    while ((word=tokeniser(&s,SEPARATORS)) != NULL) {
    if (c==s)
    found=1;
    }
    }
    return found;

    }

    char *tokeniser(char **s, char *delims) {
    char *p = NULL;
    char *q;

    if ((s != NULL) && (*s != '') && (delims != NULL)) {
    //pass over leading delimeters
    while (NULL != char_in_string(delims, **s)) {
    ++*s;
    }
    if (**s != '') {
    q = *s + 1;
    p = *s;
    while (*q != '' && NULL == char_in_string(delims,*q)) {
    ++q;
    }
    *s = q + (*q != '');
    *q='';
    }
    }
    return p;
    }

    char *char_in_string(char *s, int c) {
    char *p = NULL;

    if (s != NULL) {
    if (c != '') {
    while (*s != '' && *s != c) {
    ++s;
    }
    if (*s == c) {
    p = s;
    }
    }
    }
    return p;
    }

    int compareString(const char *s, const char *t) {
    int diff = 0;
    char cs;
    char ct;

    while (diff==0 && *s !='' && *t != '') {
    cs = tolower((unsigned char)*s);
    ct = tolower((unsigned char) *t);

    if (cs < ct) {
    diff=-1;
    }
    else if (cs>ct) {
    diff = 1;
    }
    ++s;
    ++t;
    }

    if (diff==0 && *s != *t) {
    if (*s == '') {
    diff =-1;
    }
    else {
    diff=1;
    }
    }
    return diff;
    }


    struct pagelist* addPageNo(int page) {
    struct pagelist *new = malloc(sizeof(*new));

    if (new !=NULL) {
    new->page = page;
    }
    return new;
    }


    void printPageNumbers(struct pagelist *pl){
    if (pl != NULL) {
    printPageNumbers(pl->next);
    printf("%d ", pl->page);
    }
    }


    int PageNumbers(struct pagelist *pl){
    int **result;

    if (pl != NULL) {
    PageNumbers(pl->next);
    **result = pl->page;
    }
    return **result;
    }


    char *GetFile1(char *s,int n, FILE *fp) {
    int c;

    char *p=s;

    fp = fopen("book.txt","r");
    while ( (--n > 0) && (c=getc(fp)) !=EOF) {

    *p++ = c;


    }
    return p;
    }


    char *GetFile2(char *s,int n, FILE *fp) {
    int c;

    char *p=s;

    fp = fopen("stopped-words.txt","r");
    while ( (--n > 0) && (c=getc(fp)) !=EOF) {

    *p++ = c;


    }
    return p;
    }

    int findWord (Index *ind, char w) {
    int **results;

    if (ind != NULL) {
    if ((*ind)->word == w){
    *results = PageNumbers((*ind)->head);
    }
    }
    else if ((*ind)->word != w)
    findWord(&((*ind)->left), w);
    else findWord(&((*ind)->right), w);

    return *results;
    }

    driver: asgndriver.c

    #include
    #include
    #include
    #include
    #include "test.c"



    void main(void) {
    Index ind;


    #define MAXLINE 40000

    char *s=NULL;

    static char buffer[MAXLINE] = {0};

    char *word;
    char *pg="END";
    int pagenum = 1;
    char answer ='y';
    char *w=NULL;
    int **results;

    char *SEPARATORS = "{}. ,?;";


    //IndexInit(&ind);



    while (GetFile1(buffer, sizeof(buffer),stdin) != NULL) {

    s=buffer;
    if (s==pg) {
    ++pagenum;
    }
    while ((word=tokeniser(&s,SEPARATORS)) != NULL) {


    if (noise(word) == 0) {
    IndexAppend(&ind,word,pagenum);
    }


    else {
    continue;
    }
    }

    }


    while (answer != 'enough') {
    printf("Enter a word to be searched: ");
    scanf("%s", &w);
    **results = IndexQuery(&ind,w,results);

    printf("Continue with Query? ");
    scanf("%c", &answer);
    }

    IndexDestroy(&ind);
    }








  • bilderbikkelbilderbikkel Posts: 754Member
    [b][red]This message was edited by bilderbikkel at 2005-9-29 8:35:3[/red][/b][hr]
    No code tags, no answer from me...
    Especially not with such a long piece of code.
    Perhaps you should read http://www.catb.org/~esr/faqs/smart-questions.html once...

    Good luck,
    bilderbikkel
Sign In or Register to comment.