- OpenAI’s flagship chatbot, ChatGPT, is getting a major tweak that will transform how users interact with it.
- Reportedly, the world-famous chatbot is getting a brand-new open-source text-to-speech model that can quickly generate a human-like voice from a brief speech sample.
- It will now be able to see, hear, and speak, becoming more intuitive.
- The voice feature is available in the English language for now.
History was created last November when OpenAI launched its flagship chatbot, ChatGPT. The product took the internet world by storm, making generating text and solving queries hassle-free.
The AI firm has been improving its features, and now it has announced adding a voice identification capability dubbed Whisper to the already feature-rich chatbot. For your information, Whisper is the chatbot’s speech recognition system that can perform various functions, such as settling a debate or creating a bedtime story. OpenAI has upgraded the app to use its reasoning skills for answering queries about images.
For example, it can offer insight into various image types, including documents containing pictures or text, photos, and screenshots. It can also troubleshoot when inquired about any issue, such as why the grill isn’t working or how to plan a meal. Moreover, the chatbot can analyze complex graphs.
“You can now use voice to engage in a back-and-forth conversation with your assistant,” OpenAI explained when announcing the new iOS and Android upgrade.
ChatGPT is tweaked with a new open-source text-to-speech model and will now generate human-like audio after listening to a few seconds-long sample speech or text. This means the world’s favorite chatbot will become chattier now.
Users can directly send audio queries; the app will respond with a synthesized voice. Users can choose between five different voices to interact with the chatbot. To create these voices, OpenAI collaborated with professional voice artists.
In addition, the new version will feature visual smarts, allowing users to upload/snap a photo from ChatGPT, and it will quickly generate a description of the image, just like it happens with Google Lens. Users can also interact with the app regarding selected images or parts of an image through the newly added drawing tool using voice commands. OpenAI has added audio/visual data feed into its chatbot’s machine-learning model to make it as smart as a human.
Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.
— OpenAI (@OpenAI) September 25, 2023
Sound on 🔊 pic.twitter.com/3tuWzX0wtS
The addition of such a wide range of speaking and image-analyzing features indicates that OpenAI is working constantly on its AI models to make them more intuitive and interactive. ChatGPT can easily compete with Amazon’s Alexa or Apple’s Siri with the new features and may even become much more attractive to consumers than other AI chatbots with its rich data feed.
ChatGPT’s new voice generation tech will also allow other companies to license this technology as it has been created in-house by OpenAI. For instance, Spotify can use OpenAI’s speech synthesis algorithms to offer a podcast translation feature, delivering the content in different languages but in the original podcaster’s voice.
The new features will be available today only for the subscription-based version of ChatGPT, at $20 per month. It will be available in all the markets where ChatGPT is available, but currently, the voice feature will work in English.
RELATED ARTICLES
- China’s Baidu Introduces ChatGPT Rival Ernie Bot
- Microsoft patent reveals chatbot to talk to dead people
- ChatGPT Clone Apps Collecting Personal Data on iOS, Play Store
- Malicious ChatGPT & Google Bard Installers Distribute RedLine Stealer
- Facebook Phishing Scam: Crooks Using Messenger Chatbots to Steal Data