fast packed BCD mutiply on 8051

i'm working on 8051 microprocessor.
Fast multiply of packed BCD numbers is nessecary.
8051 has only DecimalAdjustafterAddition(DA A) instruction.
Hardwre multiplier can multiply 8bitX8bit binary.
My default procedure multiplyes AB*CD (A,B,C,D - decimal digits) with
about 64 instructions. Other (not tested) procedure has 37
instructions and about 650 bytes in lookup tables.
Is there faster method?

