AI Voice Chatbot

In the era of AI and Machine Learning, chatbots have become essential to our digital lives, offering everything from customer support to personal virtual assistants. Today, I’m thrilled to share my latest project — a Voice-Based Chatbot, powered by JavaScript, that brings conversational AI to life with an immersive, voice-driven experience.

Introduction

This voice chatbot takes the interaction to the next level, combining speech recognition, natural language understanding, and speech synthesis into one seamless experience. The bot processes spoken input, generates intelligent responses, and then speaks back, creating a natural, flowing conversation.

Tech Stack

This chatbot integrates several key tools and technologies:

Web Speech API (Speech-to-Text): JavaScript's Web Speech API offers robust speech recognition capabilities. This API listens to the user’s voice, transcribes it into text, and passes it to the conversation engine.

GPT-3.5 Turbo (Text-based Conversation Generation): At the heart of the bot’s intelligence is OpenAI's GPT-3.5 Turbo. It understands context, generates coherent responses, and adapts to different conversation topics.

Speech Synthesis API (Text-to-Speech): For converting text responses back into voice, the Speech Synthesis API in the browser provides a simple, effective method. It transforms text into speech with various available voices and languages, offering a natural auditory response.

Frontend Framework (React): The chatbot UI is built using React.js, offering a sleek, interactive user interface. It handles real-time interactions, allowing for smooth communication between the user and the chatbot.

Code Implementation

1. Speech-to-Text Conversion Using Web Speech API:

JavaScript’s Web Speech API is used to recognize the user's voice and convert it to text:


function startSpeechRecognition() {
    const recognition = new window.webkitSpeechRecognition();
    recognition.lang = 'en-US'; // Set the recognition language
    recognition.interimResults = false; // Only return final results
    recognition.start();

    recognition.onresult = function(event) {
        const userInput = event.results[0][0].transcript; // Capture the recognized text
        console.log("User Input:", userInput);
        handleTextConversation(userInput); // Pass text to the conversation handler
    };
}

2. Text-Based Conversation Generation Using GPT-3.5 Turbo:

For generating context-aware responses, we use OpenAI’s GPT-3.5 model through the fetch API:


async function handleTextConversation(userText) {
    const response = await fetch('<https://api.openai.com/v1/chat/completions>', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer ${OPENAI_API_KEY}` // Your API key stored securely
        },
        body: JSON.stringify({
            model: "gpt-3.5-turbo",
            messages: [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": userText}
            ]
        })
    });

    const data = await response.json();
    const botReply = data.choices[0].message.content;
    console.log("Bot Response:", botReply);
    convertTextToSpeech(botReply); // Pass response to Text-to-Speech conversion
}

3. Text-to-Speech Conversion Using Speech Synthesis API:

Once the text-based response is generated, the Speech Synthesis API will convert it back into speech:


function convertTextToSpeech(text) {
    const utterance = new SpeechSynthesisUtterance(text); // Create a speech instance
    utterance.voice = speechSynthesis.getVoices()[0]; // Select a voice
    speechSynthesis.speak(utterance); // Speak the response out loud
}

4. Creating the Frontend with React:

The frontend interface, built with React, allows users to interact with the voice bot in real-time. Here’s an example of how the voice recording and response functionality is integrated:


import React, { useState } from 'react';

const VoiceChatBot = () => {
    const [response, setResponse] = useState('');

    const startChat = () => {
        startSpeechRecognition();
    };

    const handleResponse = (botReply) => {
        setResponse(botReply);
    };

    return (
        <div>
            <h1>🎙️ Voice ChatBot 🤖</h1>
            <button onClick={startChat}>Start Talking</button>
            <p>{response}</p>
        </div>
    );
};

export default VoiceChatBot;

The UI features a simple button to trigger voice recording, and once the conversation is processed, the bot’s text response is displayed.

Conclusion

By combining JavaScript APIs for speech recognition and synthesis with OpenAI's GPT-3.5 Turbo for conversation generation, this Voice Chatbot offers an engaging and fluid user experience. It bridges the gap between voice interaction and natural language understanding, offering endless possibilities—from customer support to personal virtual assistants.

This JavaScript-based approach allows developers to bring voice-enabled chatbots directly into the browser without relying on external dependencies, making it highly accessible for web-based applications.

Feel free to customize the UI, add additional features, or even integrate this bot into various domains like e-commerce, healthcare, or education for a truly interactive user experience.