Fuzzy Matching in Access

[b][red]This message was edited by OpnSeason at 2006-5-11 2:37:41[/red][/b][hr]
[b][red]This message was edited by OpnSeason at 2006-3-17 17:22:45[/red][/b][hr]
Hello All,

For those who are interested in Approximate String Matching or those who could use these algorithms; I have a complete suite of Approximate String Matching algorithms written in Visual Basic in an Access database.

In 2004 I decided to jump into the world of Fuzzy Matching with both feet.

As it is, I am working for a company that deals with names, addresses, etc. very intensely. It is a fair sized company that
uses Access on a grand scale. Since I am an Access programmer, I work in an Access gold mine!

I knew that if I could get a good handle on Fuzzy Matching, that when I hit the right person at the right time, the company could greatly benefit from my research on Fuzzy Matching. The right time and the right person are not here yet.

Nevertheless, since I have reaped much free source code and information from the Web, it is now time to return the favor.

I developed a package that is sort of a demo/tutorial on Approximate String Matching algorithms in Access that is very
robust in Fuzzy Matching. It would overtax the post in this forum for me to include it in a post.

To summarize, it works with the basic name - Last, First, and Middle. It has a user interface that allows a user to type in
what would be a good name and what would be a questionable name to resemble the good name. The weighted results of all the various algorithms can be chosen, or an individual algorithm can be chosen to display how closely the names match.

In addition, it has a table of 17,295 known good names with unique ID numbers as a reference table, and table of 1200
morphed names that are typical of names entered in a database with no input conventions. These morphed names have typos, transpositions, variations on maiden names, etc. 1200 good names were selected for alteration and the unique ID of each original good name was stored in the table with the altered names to determine the accuracy of the matching process.

The morphed names were compared to the known good names in a query with an approximate join using the suite of algorithms to determine match percentage. The altered names, the ID number of the original good name, the ID number of the name it matched to, and the match percentage were stored in a results table to determine the results of the matching run.

These tables were used to test and tweak the algorithms by comparing the morphed names with the known good names. The results of 1322 names were saved to a results table with match scores.

The matching process was executed in a query with an approximate join using the suite of algorithms.


The match results:

Total Approximate Matches: 1188
(Recall) Precision Pct: 99.00%

Total Unmatched Names: 12
Unmatched Pct: 1.00%

Total Other Matches: 134
Other Matches Pct: .77%


The tables are accessible in the database, so anyone can run their own tests. The interface is set up to accommodate this
as well.

The algorithms used: Dice coefficient as a threshold algorithm, Levenshtein Distance algorithm, Longest Common Subsequence, and the DoubleMetaphone. The names were passed to the algorithms by way of the bigram model.

I will email it to anyone who requests it.

It is in two platforms, Office 97 and Office 2000 as FuzzyMatching97.zip (692 KB) and FuzzyMatching2k.zip (721 KB).
The zip files include ApprxStrMatchingEngine97.pps or ApprxStrMatchingEngine2k.pps respectively, StrMatching97.mde or StrMatching2k.mde respectively, IEEESoundexV5.pdf, and VBAlgorithms.txt.

IEEESoundexV5.pdf is an abstract about Approximate Sting Matching that fired my curiosity about the subject, and pertains to the package.

VBAlgorithms.txt contains the entire suite of algorithms in Visual Basic extracted from the MDB modules.

The PowerPoint presentations describe the workings of the MDE and give a good overview of Fuzzy Matching.

Matching is divine....

It can be downloaded at http://www.kdkeys.net/forums/thread/6450.aspx




«13

Comments

  • Can you send FuzzyMatching for Office 2000 to this email, please?
    [email protected]

    Thanks,

    Eusebio
  • : Can you send FuzzyMatching for Office 2000 to this email, please?
    : [email protected]
    :
    : Thanks,
    :
    : Eusebio
    :

    Hello Eusebio,

    It is on its way to you.

    I am open to discuss any questions about it.

    Thanks,

    OpnSeason

  • [b][red]This message was edited by turkeyboy at 2006-3-16 16:55:42[/red][/b][hr]
    Hi,

    I'm trying to match company names from two databases. The names are not always exactly the same. Would your program be helpful? If so, if you could send it to me at [email protected] I'd appreciate it. All data is currently in Excel.

    Thanks,

    Scott
  • : [b][red]This message was edited by turkeyboy at 2006-3-16 16:55:42[/red][/b][hr]
    : Hi,
    :
    : I'm trying to match company names from two databases. The names are not always exactly the same. Would your program be helpful? If so, if you could send it to me at [email protected] I'd appreciate it. All data is currently in Excel.
    :
    : Thanks,
    :
    : Scott
    :

    Hello Scott,

    It should handle that situation nicely.
    You will need to make some modifications, though.
    I will send instructions with the attached zip file.

    Thanks,

    OpnSeason

  • [b][red]This message was edited by EPEREZ at 2006-3-17 3:37:58[/red][/b][hr]
    Hi OpnSeason,
    My problem is:
    I've a database with 35.000.000 records in SQL.
    Fields: REFNUM,NAME,STREET,CITY,PC,PHONE,FAX,EMAIL,URL

    The problem is that I cannot compare all records (Ref1 with Ref2,Ref1 with Ref3....Refn with Ref1,...,Refn with Refn-1).

    I must find possible candidates for compare.

    Can you help me, please.

    Thanks,

    Eusebio.


  • Hello,

    I am trying to match Facility names from two seperate db that are sometimes not spelled exactly the same. It seem like your program might be able to do just that. I was wondering if you could email me the program and directions to [email protected]

    Thanks,
    C-
  • Could I please get a copy of your process.
    Thanks!

    [email protected]

  • : Hello,
    :
    : I am trying to match Facility names from two seperate db that are sometimes not spelled exactly the same. It seem like your program might be able to do just that. I was wondering if you could email me the program and directions to [email protected]
    :
    : Thanks,
    : C-
    :

    Hello C-,

    The files are on the way.
    Post back with any questions or comments.

    Thanks,

    OpnSeason
  • : Could I please get a copy of your process.
    : Thanks!
    :
    : [email protected]
    :
    :

    Hello phansen,

    The files are on the way.
    Post back with any questions or comments.

    Thanks,

    OpnSeason
  • [b][red]This message was edited by jimbolicious at 2006-3-21 19:22:59[/red][/b][hr]
    Hi Opn,

    can I get a copy?


    Thks,

    jim


  • : [b][red]This message was edited by jimbolicious at 2006-3-21 19:22:59[/red][/b][hr]
    : Hi Opn,
    :
    : can I get a copy?
    : [email protected]
    :
    : Thks,
    :
    : jim
    :
    :
    :

    Hello Jim,

    FuzzyMatching2k.zip is on its way.
    Please post comments, questions, and impression.

    Thanks,

    OpnSeason
  • I've come across a problem where the Access 2k version of your tools would be most valuable.

    Please send to [email protected]

    Thanks in advance.
  • : I've come across a problem where the Access 2k version of your tools would be most valuable.
    :
    : Please send to [email protected]
    :
    : Thanks in advance.
    :

    Hello ewinterhof,

    It is on the way.

    OpnSeason
  • Could you send the Access 2k version to [email protected]

    This is the solution to my biggest problem ever. THANKS!!!

    : [b][red]This message was edited by OpnSeason at 2006-3-17 17:22:45[/red][/b][hr]
    : Hello All,
    :
    : For those who are interested in Approximate String Matching or those who could use these algorithms; I have a complete suite of Approximate String Matching algorithms written in Visual Basic in an Access database.
    :
    : In 2004 I decided to jump into the world of Fuzzy Matching with both feet.
    :
    : As it is, I am working for a company that deals with names, addresses, etc. very intensely. It is a fair sized company that
    : uses Access on a grand scale. Since I am an Access programmer, I work in an Access gold mine!
    :
    : I knew that if I could get a good handle on Fuzzy Matching, that when I hit the right person at the right time, the company could greatly benefit from my research on Fuzzy Matching. The right time and the right person are not here yet.
    :
    : Nevertheless, since I have reaped much free source code and information from the Web, it is now time to return the favor.
    :
    : I developed a package that is sort of a demo/tutorial on Approximate String Matching algorithms in Access that is very
    : robust in Fuzzy Matching. It would overtax the post in this forum for me to include it in a post.
    :
    : To summarize, it works with the basic name - Last, First, and Middle. It has a user interface that allows a user to type in
    : what would be a good name and what would be a questionable name to resemble the good name. The weighted results of all the various algorithms can be chosen, or an individual algorithm can be chosen to display how closely the names match.
    :
    : In addition, it has a table of 17,295 known good names with unique ID numbers as a reference table, and table of 1200
    : morphed names that are typical of names entered in a database with no input conventions. These morphed names have typos, transpositions, variations on maiden names, etc. 1200 good names were selected for alteration and the unique ID of each original good name was stored in the table with the altered names to determine the accuracy of the matching process.
    :
    : The morphed names were compared to the known good names in a query with an approximate join using the suite of algorithms to determine match percentage. The altered names, the ID number of the original good name, the ID number of the name it matched to, and the match percentage were stored in a results table to determine the results of the matching run.
    :
    : These tables were used to test and tweak the algorithms by comparing the morphed names with the known good names. The results of 1322 names were saved to a results table with match scores.
    :
    : The matching process was executed in a query with an approximate join using the suite of algorithms.
    :
    :
    : The match results:
    :
    : Total Approximate Matches: 1188
    : (Recall) Precision Pct: 99.00%
    :
    : Total Unmatched Names: 12
    : Unmatched Pct: 1.00%
    :
    : Total Other Matches: 134
    : Other Matches Pct: .77%
    :
    :
    : The tables are accessible in the database, so anyone can run their own tests. The interface is set up to accommodate this
    : as well.
    :
    : The algorithms used: Dice coefficient as a threshold algorithm, Levenshtein Distance algorithm, Longest Common Subsequence, and the DoubleMetaphone. The names were passed to the algorithms by way of the bigram model.
    :
    : I will email it to anyone who requests it.
    :
    : It is in two platforms, Office 97 and Office 2000 as FuzzyMatching97.zip (692 KB) and FuzzyMatching2k.zip (721 KB).
    : The zip files include ApprxStrMatchingEngine97.pps or ApprxStrMatchingEngine2k.pps respectively, StrMatching97.mde or StrMatching2k.mde respectively, IEEESoundexV5.pdf, and VBAlgorithms.txt.
    :
    : IEEESoundexV5.pdf is an abstract about Approximate Sting Matching that fired my curiosity about the subject, and pertains to the package.
    :
    : VBAlgorithms.txt contains the entire suite of algorithms in Visual Basic extracted from the MDB modules.
    :
    : The PowerPoint presentations describe the workings of the MDE and give a good overview of Fuzzy Matching.
    :
    : Matching is divine....
    :
    :
    :
    :

  • Hello pstover,

    It is on its way.

    OpnSeason

    : Could you send the Access 2k version to [email protected]
    :
    : This is the solution to my biggest problem ever. THANKS!!!
    :
    : : [b][red]This message was edited by OpnSeason at 2006-3-17 17:22:45[/red][/b][hr]
    : : Hello All,
    : :
    : : For those who are interested in Approximate String Matching or those who could use these algorithms; I have a complete suite of Approximate String Matching algorithms written in Visual Basic in an Access database.
    : :
    : : In 2004 I decided to jump into the world of Fuzzy Matching with both feet.
    : :
    : : As it is, I am working for a company that deals with names, addresses, etc. very intensely. It is a fair sized company that
    : : uses Access on a grand scale. Since I am an Access programmer, I work in an Access gold mine!
    : :
    : : I knew that if I could get a good handle on Fuzzy Matching, that when I hit the right person at the right time, the company could greatly benefit from my research on Fuzzy Matching. The right time and the right person are not here yet.
    : :
    : : Nevertheless, since I have reaped much free source code and information from the Web, it is now time to return the favor.
    : :
    : : I developed a package that is sort of a demo/tutorial on Approximate String Matching algorithms in Access that is very
    : : robust in Fuzzy Matching. It would overtax the post in this forum for me to include it in a post.
    : :
    : : To summarize, it works with the basic name - Last, First, and Middle. It has a user interface that allows a user to type in
    : : what would be a good name and what would be a questionable name to resemble the good name. The weighted results of all the various algorithms can be chosen, or an individual algorithm can be chosen to display how closely the names match.
    : :
    : : In addition, it has a table of 17,295 known good names with unique ID numbers as a reference table, and table of 1200
    : : morphed names that are typical of names entered in a database with no input conventions. These morphed names have typos, transpositions, variations on maiden names, etc. 1200 good names were selected for alteration and the unique ID of each original good name was stored in the table with the altered names to determine the accuracy of the matching process.
    : :
    : : The morphed names were compared to the known good names in a query with an approximate join using the suite of algorithms to determine match percentage. The altered names, the ID number of the original good name, the ID number of the name it matched to, and the match percentage were stored in a results table to determine the results of the matching run.
    : :
    : : These tables were used to test and tweak the algorithms by comparing the morphed names with the known good names. The results of 1322 names were saved to a results table with match scores.
    : :
    : : The matching process was executed in a query with an approximate join using the suite of algorithms.
    : :
    : :
    : : The match results:
    : :
    : : Total Approximate Matches: 1188
    : : (Recall) Precision Pct: 99.00%
    : :
    : : Total Unmatched Names: 12
    : : Unmatched Pct: 1.00%
    : :
    : : Total Other Matches: 134
    : : Other Matches Pct: .77%
    : :
    : :
    : : The tables are accessible in the database, so anyone can run their own tests. The interface is set up to accommodate this
    : : as well.
    : :
    : : The algorithms used: Dice coefficient as a threshold algorithm, Levenshtein Distance algorithm, Longest Common Subsequence, and the DoubleMetaphone. The names were passed to the algorithms by way of the bigram model.
    : :
    : : I will email it to anyone who requests it.
    : :
    : : It is in two platforms, Office 97 and Office 2000 as FuzzyMatching97.zip (692 KB) and FuzzyMatching2k.zip (721 KB).
    : : The zip files include ApprxStrMatchingEngine97.pps or ApprxStrMatchingEngine2k.pps respectively, StrMatching97.mde or StrMatching2k.mde respectively, IEEESoundexV5.pdf, and VBAlgorithms.txt.
    : :
    : : IEEESoundexV5.pdf is an abstract about Approximate Sting Matching that fired my curiosity about the subject, and pertains to the package.
    : :
    : : VBAlgorithms.txt contains the entire suite of algorithms in Visual Basic extracted from the MDB modules.
    : :
    : : The PowerPoint presentations describe the workings of the MDE and give a good overview of Fuzzy Matching.
    : :
    : : Matching is divine....
    : :
    : :
    : :
    : :
    :
    :

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories