Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Table lookup from K&R

p_vp_v Member Posts: 61
Hello all,

My reference is to section 6.6 in K&R (Second Edition), which presents an example of Table Lookup. I could not understand the code properly, and would be grateful if any of the members could help.

I assume that almost all the members should have access to the book. Hence I am not reproducing the entire code here; only those chunks that are relevant.


My first doubt is w.r.t to the struct nlist *install(char *name, char *def) routine. The code that I could not understand in the install routine is given below:

hashval = hash(name);/*this will store the value returned by hash routine in hashval */
np->next = hashtab[hashval];/* this will store the pointer in hashtab array at position hashval into np->next*/

Now, how is it assured that hashval is less than or equal to 100, since the size of the array is equal to 101?

hashtab[hashval] = np; /*this will store the pointer in np at position hashval in the hashtab array */

As far as I can understand, when install records a new element into hashtab, it does so before another already existing element, as evinced by the code above. Why?

My second doubt is w.r.t the struct nlist *lookup(char *s) routine. The loop in this routine begins at hashtab[hash(s)] position in the array, and checks for s until null is encountered.

K&R says "The hashing process produces a starting index in the array hashtab; if the string is to be found anywhere, it will be in the list of blocks beginning there."

Could anyone explain this algorithm to me?

My third doubt is w.r.t unsigned hash (char *s) routine. K&R says that "Unsigned arithmetic ensures that the hash value is non-negative."

How is it possible that the only arithmetic operation in the routine

hashval = *s + 31 * hashval;

could end up generating a negative value?

Thanks for all your valuable inputs.

Regards
PV


Comments

  • whoiewhoie Member Posts: 672
    : Hello all,

    Hi! :)


    : My first doubt is w.r.t to the struct nlist *install(char *name, char *def) routine. The code that I could not understand in the install routine is given below:
    :
    : hashval = hash(name);/*this will store the value returned by hash routine in hashval */
    : np->next = hashtab[hashval];/* this will store the pointer in hashtab array at position hashval into np->next*/
    :
    : Now, how is it assured that hashval is less than or equal to 100, since the size of the array is equal to 101?

    Look at the hash() function, and note that the return value is reduced modulo HASHSIZE (defined as 101). That means that no matter what the value of 'hashval' is (as long as it is unsigned), the return value can only be in the range 0...HASHSIZE - 1.


    : hashtab[hashval] = np; /*this will store the pointer in np at position hashval in the hashtab array */
    :
    : As far as I can understand, when install records a new element into hashtab, it does so before another already existing element, as evinced by the code above. Why?

    Adding a new element to the front of a list is the simplest and fastest way to do it. There isn't really any need to keep the list in sorted order, the point of this hash table is to get you close to the item you are looking for, and then do a linear search for the exact item. It may not seem like the most efficient way to do this, but it works quite well in practice.


    : My second doubt is w.r.t the struct nlist *lookup(char *s) routine. The loop in this routine begins at hashtab[hash(s)] position in the array, and checks for s until null is encountered.
    :
    : K&R says "The hashing process produces a starting index in the array hashtab; if the string is to be found anywhere, it will be in the list of blocks beginning there."
    :
    : Could anyone explain this algorithm to me?

    As I mentioned above, a hash table isn't meant to be a perfect lookup table (i.e., one element, one item), although they certainly can be. What you really want, is a big enough array so that only a few items are at each element. This way, we can maintain an O(1) lookup. The list at each element is used because rapid insertions are one of the benefits of a list, so our overall efficiency is quite good for the amount of effort it takes to create the table.



    : My third doubt is w.r.t unsigned hash (char *s) routine. K&R says that "Unsigned arithmetic ensures that the hash value is non-negative."
    :
    : How is it possible that the only arithmetic operation in the routine
    :
    : hashval = *s + 31 * hashval;
    :
    : could end up generating a negative value?

    Well, 's' is a string, and char can be either signed or unsigned. If one of the characters happens to have a negative integer value, then that would be a Bad Thing when we try to use a negative index on our hash table. By making 'hashval' an unsigned integer, we can keep the values positive according to the usual arithmetic conversions, no matter the signedness of 'char'.


    HTH,
    Will
    --
    http://www.tuxedo.org/~esr/faqs/smart-questions.html
    http://www.eskimo.com/~scs/C-faq/top.html
    http://www.parashift.com/c++-faq-lite/
    http://www.accu.org/


  • p_vp_v Member Posts: 61
    Will,
    Thanks for your reply. It helped a lot.

    But I am still not clear about the statement in K&R
    "...if the string is to be found anywhere, it will be in the list of blocks beginning there."

    According to the way the routine unsigned hash (char *s) has been written, each distinct string forms a distinct element of the hashtab array. Thus son and sony will be two distinct elements.
    So what does the above statement imply?
    When struct nlist *lookup(char *s) routine has to find out if a string has already been recorded in the hashtab array, it just has to search in one location, if my understanding is correct.

    Please reply.

    thanks for your time and education
    PV

  • whoiewhoie Member Posts: 672
    : Will,
    : Thanks for your reply. It helped a lot.

    Good to hear. :)


    : But I am still not clear about the statement in K&R
    : "...if the string is to be found anywhere, it will be in the list of blocks beginning there."
    :
    : According to the way the routine unsigned hash (char *s) has been written, each distinct string forms a distinct element of the hashtab array. Thus son and sony will be two distinct elements.
    : So what does the above statement imply?

    It implies the opposite of what you are asserting here, namely, that distinct strings [b]don't[/b] form distinct elements of the hashtab array. Hash tables are a sort of compromise between storage savings and lookup speed. They use less storage than an array sized for every possible input, and they can keep some sort of random access which a list by itself cannot provide. For example, when I hash your post using K&R2's function, here are all the strings that hash to the same value:

    [code=000000][color=ffffff]
    C:My Documents>foo < pvhash.txt
    "not" collided with "am", hash: 86
    "string" collided with "clear", hash: 8
    "be" collided with "string", hash: 8
    "found" collided with "is", hash: 37
    "of" collided with "be", hash: 8
    "beginning" collided with ""...if", hash: 61
    "way" collided with "about", hash: 24
    "unsigned" collided with "But", hash: 4
    "written," collided with "Will,", hash: 22
    "distinct" collided with "According", hash: 11
    "element" collided with "unsigned", hash: 4
    "son" collided with "found", hash: 37
    "sony" collided with "It", hash: 56
    "elements." collided with "lot.", hash: 94
    "does" collided with "*s)", hash: 33
    "above" collided with "your", hash: 40
    "out" collided with "been", hash: 21
    "if" collided with "way", hash: 24
    "already" collided with "each", hash: 36
    "correct." collided with "I", hash: 73
    "Please" collided with "routine", hash: 90
    "thanks" collided with "helped", hash: 20
    [/color][/code]

    Try it out! See the "son" and "sony" hash values? ;)


    : When struct nlist *lookup(char *s) routine has to find out if a string has already been recorded in the hashtab array, it just has to search in one location, if my understanding is correct.

    It /starts/ in one location that only has a few elements in the list. Does that help? If not, look at the diagram at the top of page 144 again. Also, I highly recommend getting the book [italic]The Practice of Programming[/italic] by Kernighan and Pike when you are finished with K&R2. It discusses hash tables in more detail than K&R2 (appropriately). Here is a link to the books site, there is a heck of alot of good info in it:

    http://tpop.awl.com


    HTH,
    Will
    --
    http://www.tuxedo.org/~esr/faqs/smart-questions.html
    http://www.eskimo.com/~scs/C-faq/top.html
    http://www.parashift.com/c++-faq-lite/
    http://www.accu.org/


  • p_vp_v Member Posts: 61
    Will,
    Thanks a lot. That reply was indeed an eye-opener.
    regards
    PV
  • Andre YoungAndre Young USAMember Posts: 0

    __ / http://forcoder.org / free video tutorials and ebooks about [ C, MATLAB, Java, Delphi, JavaScript, C++, Assembly, Visual Basic, Visual Basic .NET, Perl, Ruby, Swift, Go, Python, PHP, C#, R, Objective-C, PL/SQL, Scratch Julia, Lisp, LabVIEW, Crystal, Rust, Transact-SQL, Ada, COBOL, SAS, Scala, Erlang, Logo, Prolog, Lua, Hack, F#, D, Clojure, Apex, Scheme, ABAP, FoxPro, Dart, Awk, Kotlin, Bash, ML, VBScript, Fortran, Alice ] ____

Sign In or Register to comment.