Database Structure for Web Search Engine

I'm trying to build a web search engine to index around 2000 web sites. I've chosen mysql as the database and came up with 3 table definitions to store the data. table words that contains the list of words along with a id-key. table webpages that contains the list of URL's with each with an id, and a table called wordlinks that contains wordid and webpageid pairs denoting that the url whose id is webpageid contains the word whose id is wordid. This sounded like a good solution in theory but when implemented, I tried to index a medium-sized web site and it resulted into 17mb of data in the wordlinks table which obviously is far from space-efficient. Could anyone suggest an alternative structure?
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!