Voice Recognition Using web camera and Google API - Raspberry pi 3B (python)
Voice Recognition Using web camera and Google API - Raspberry pi 3B (python).
I am trying to find a Speech recognition library similar to PySpeech that will work on a Raspberry Pi 3. Google API is best for speech recognition because google api use hidden markov model for recognition. there are steps given below for voice recognition
Required module:
1. Raspberry pi model 3 B
2. Logitech Web camera
3. Rasbian os installed SD card
Hidden markov Model:
- Hidden markov model is used to convert speech into text and recognized it.
- In HMM, speech signal is divided into 10-millisecond fragment.
- Every fragment converted into power spectrum.
- Then each power spectrum converted into frequency domain.
- In last each frequency domain fragment converted into vector. this each vector compare
with google speech API vector.
- Dimension of these vector usually small, sometime 10. More accurate system may have
dimension 32 or more.
- This vectors are output of HMM.
- After decode speech into text, group of vector are matched to one or more phonemes C
unit of sound which distinguish one word from another in particular language.
Installing Speech Recognition:
Open the Terminal and type..
$ sudo pip install speech Recognition.
In python Shell:
>>> import speech_recognition as sr
>>> sr_version_
>>> r = sr.recognizer() # initialize recognizer
There are 7 recognizeer Each Recognizer have 7 method for recognizing speech from audio
source:-
1. recognize_bing() - MS Bing Speech (Microsoft)
2. recognize_google() - google web speech API
3. recognize_google_cloud() - google cloud Speech
4. recognize_houndify() - by Sound hound
5. recognize_ibm() - IBM speech to text
6. recognize_sphinx() - CMU sphinx (it reconize intalling pocket Sphinx)
7. recognize_wit() - wit AI
-In this all recognizer only Spinx work Offline, other work Online it need internet
connection.
Next Step is to install dependencies: -
- $ sudo pip install Speech Recognition
- $ sudo apt-get install python-pyaudio python3-pyaudio
- $ sudo apt-get install portaudio19-dev python-all-dev python3-all-dev &&
- $ sudo pip install pyaudio
- $ sudo apt-get install flac
- $ sudo apt-get install flac
Considered Parameter:-
1. Configure Microphone (For external microphones): It is advisable to specify the microphone during the program to avoid any glitches.
2. Type lsusb in the terminal. A list of connected devices will show up. mic_name = "Webcam C110: USB Audio (hw:1,0)"
3. Set Chunk Size: This basically involved specifying how many bytes of data we want to read at once. Typically, this value is specified in powers of 2 such as 1024 or 2048.
4. Set Sampling Rate: Sampling rate defines how often values are recorded for processing.
5. Set Device ID to the selected microphone: In this step, we specify the device ID of the microphone that we wish to use in order to avoid ambiguity in case there are multiple microphones.
6. Allow Adjusting for Ambient Noise: Since the surrounding noise varies, we must allow
the program a second or too to adjust the energy threshold of recording so it is adjusted according to the external noise level.
7. Speech to text translation: This is done with the help of Google Speech Recognition.This requires an active internet connection to work. However, there are certain offline Recognition systems such as PocketSphinx, but have a very rigorous installation process that requires several dependencies. Google Speech Recognition is one of the easiest to use.
Run the Code:-
step 1:- import Speech Recognition
step 2:- define audio input or webcam
step 3:- define speech sample rate
step 4:- define chunk size
step 5:- initialize recognizer
step 6:- generate audio from microphone
step 7:- Avoid noise
step 8:- Speak Something & record
step 9:- recognize speech via recognize_google()
step 10 :- print recognized output
CODE :-
import speech_recognition as sr
mic_name = "Webcam C110: USB Audio (hw:1,0)"
sample_rate = 48000
chunk_size = 2048
#Initialize the recognizer
r = sr.Recognizer()
mic_list = sr.Microphone.list_microphone_names()
for i, microphone_name in enumerate(mic_list):
if microphone_name == mic_name:
device_id = i
with sr.Microphone(device_index = device_id, sample_rate = sample_rate, chunk_size = chunk_size) as source:
r.adjust_for_ambient_noise(source)
print ("Say Something")
#listens for the user's input
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print ("you said: " + text)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service;{0}".format(e))
Output:
>>> say something
>>> you said: ok google what the current temperature in india
>>>say something
>>>you said: hello, juhi where are you
Excellent
ReplyDeleteExcellent
DeleteNice work
ReplyDeletethank You
ReplyDelete