Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

PROJECT ON SOUND EDITOR

kooladikooladi Member Posts: 72
I AM PROGRAMMER IN C. I WAS JUST WONDERING IF I HAD TO MAKE AN EDITOR THAT TAKES INPUT FROM THE MICROPHONE AND WRITES ON THE SCREEN...WHAT KIND OF KNOWLEDGE DO I NEED(ANY BOOKS, TUTORIALS,SOURCE CODE) AND DO I HAVE TO PROGRAM THE SOUND CARD TOO??

PLS HELP


«1

Comments

  • TomyTomy Member Posts: 35
    Not that much

    The first thing you do is setting the record-source to 'mic' or 'line in'. Then, if you use the WinApi functions to capture sound (take 8000 Hz, 8 bits, mono), and you make the blocksize exacly the size of your screen (eg 800), then you've got a nice stream of data you can visualize with a simple routine.

    for example: for(x = 0; x < BlockSize; x++) YourLineTo(x,Data[x])

    this is called an oscilloscope. If you want to show the spectral data (like winamp can do) you need more complex mathematics, so I suggest to start with this.

    However: this is not an editor until you write routines to modify stored audio-data. Now we're speaking about a visualiser.
  • kooladikooladi Member Posts: 72
    Thanks for the help... (its another thing i didnt understand a word of it...i am just starting out in sound programming)..anyway..are there any books which can help me further.

    Another thing.. is it possible to make a TSR in dos that will do the following. USER gives input from the microphone like say "cls"..then cls gets typed on the screen and the command is executed.The objective is to make dos more user friendly.

    pls help
  • TomyTomy Member Posts: 35
    About the manual: try to find a Win32.hlp, or go to msdn.microsoft.com, There you find the explanation for the functions you need.

    And for the Dos-thing:
    1 Dos is not multi-threating, so it won't be easy to write a program waiting for something when another program (command.com) is running too. You could do it in fake-dos (like in win), but then why use the command.com at all.
    2 Speetch-programs are very hard to make: you need 2 things: (1) a trick to code human speetch into something mathematical that can be measured and comparated, and (2) a lookup-table or a trick to find out what is actually been said. It's not a sunday-afternoon job but a whole industry !

    So I suggest we start with the basics...

    : Thanks for the help... (its another thing i didnt understand a word of it...i am just starting out in sound programming)..anyway..are there any books which can help me further.
    :
    : Another thing.. is it possible to make a TSR in dos that will do the following. USER gives input from the microphone like say "cls"..then cls gets typed on the screen and the command is executed.The objective is to make dos more user friendly.
    :
    : pls help
    :

  • kooladikooladi Member Posts: 72
    [b][red]This message was edited by the kooladi at 2002-4-7 13:48:53[/red][/b][hr]
    thanks a lot for the help.
    From your replies, i get an impression that you think i am making this thing in windows but am actually working in dos.

    well its possible to make a program wait for something while the command.com executes(thats what a TSR(terminate stay and resident programs do)). but they hog up lot of dos precious memeory.what i want to know, if somehow by a miracle :-)) i am able to make such an editor, do you think it will be huge in size bcoz of the mathmatical calculations and all that...pls respond.
    u have been a great help
    thanks


  • TomyTomy Member Posts: 35
    Sorry about the DOS-misunerstanding. I'm not experienced with audio in DOS, but for the speech-thing. The technique I've heard about is quite simple & straight-forward:

    In your case, it is posible to make a list of words & their representation. You could make it from the beginning (like the most likely commands) or you could detect new words (learning).

    For the recognition technique: a simple method I've heard about is following the amplitude at some key-frequencies. You could detect like the 8 most modulating frequencies in your own voice. By cutting the signal into words (assuming a word is some signal with lets say 1 second silence before & behind), scaling the average (not peak !) energy of the word to a fixed level, finding the amplitude-modulation for the key-frequencies and resampling it until you get lets say 256 samples/word, you get a profile of the word.

    Now you can check the profile with existing profiles in a table. There should be a procedure to check out the 'similarity', giving a score between 0 & 1 (1 for completely the same). The profile in the table with the highest score is the word said. By setting a minimum score, new words could be detected.

    I've never cheked this method, so let me know if it works, or if you find an other/better one.
  • kooladikooladi Member Posts: 72
    well the more u tell me...the more i lose u...i have heard and read something about amplitutde modulation..but i have no idea how to do it. do i need some additional hardware.??

    secondly are there any books available on speech technology and programming it...


    p.s : u must be thinking what a fool i am ...well bear with me pls


  • TomyTomy Member Posts: 35
    with amplitude-modulation I mean the change of the average absolute value of the audio-samples (= absolute surface under the sample-curve).

    approx: Amplitude = N * Amplitude + (1-N) * NewSample

    You should look for a N-value that is pretty small, but doesn't get too much influenced by the rapily changing input-values. Also you need to follow 8 different frequencies. For this task I suggest you read websites about it, like this-one:

    www.dsptutor.freeuk.com

    there you find out that the formula I've written above is actually a lowpass-filter that gets rid of the quick changes in surface under the sample-curve (the surface under the curve is the amplitude of the signal, the 'Energy', there is also the peak-amplitude: the maximum sample value, but this is only important to avoid digital overflow).
    It may be interesting to you that an ear can hear from 20-20.000 Hz, but that we still understand each other thrue a telephone (500-5000Hz), so concetrate on those freqencies. Second: we are not capable to hear amplitude-modulations faster than 10-50Hz (depending on the frequency of the signal).

    To find more sites, perform a search on the we with following keywords:
    'digital' 'filter' 'IIR' & 'design'
  • kooladikooladi Member Posts: 72
    thanks again...i am starting to get the hang of it... but i am still a long way to go.

    Well i need a favour from u.. while searching on the net..i came across one speech realted software with c code...

    would u pls download it and tell me if i go through this..well it help me

    http://home.attbi.com/~wasser/TextToSpeech/

    although this is text to speech.......
  • TomyTomy Member Posts: 35
    It seems to be interesting, but it's not the information we need. I suggest to skip the part of 'what's actually been said'. We could do it with a lookup-table.
  • kooladikooladi Member Posts: 72
    thanks again...

    say will it be easier if i make this whole thing up for the windows envoirnment...bcoz i have searching c in dos but couldnt find a function or library which take an input from mike...

    is visual c or c++ any better??

«1
Sign In or Register to comment.