ChatGPT Can Now See Images & Listen To Your Voice

Microsoft-backed start-up OpenAI recently added voice and image capabilities for its generative AI-based chatbot ChatGPT that will now let it see, hear, and speak.

These capabilities offer a new, more intuitive type of interface that allows users to have a voice conversation or show ChatGPT what they are talking about.

ChatGPT Can Now See Images & Listen To Your Voice

Let’s have a look at the new features added to the ChatGPT:


Users can now use voice to engage in a back-and-forth conversation with the AI assistant. Powered by a new text-to-speech model, the ChatGPT can now generate human-like audio from just text and a few seconds of sample speech.

OpenAI has collaborated with professional voice actors to create five different voice options, which include male and female voices. It has also used Whisper, its open-source speech recognition system, to transcribe the user’s spoken words into text.

To get started with voice conversations, open Settings and click on “New Features” on the mobile app. Then, opt for voice conversations. Once done, tap on the headphone button located in the top-right corner of the home screen and select your preferred voice out of five different voice options.


The ChatGPT can now respond to images uploaded by users. For instance, users can snap a picture of a landmark while traveling to get more details on it or send pictures of their fridge and pantry, and the AI assistant can suggest what dishes can be cooked for dinner with the ingredients present.

This is possible by image understanding, which is powered by multimodal GPT-3.5 and GPT-4 that apply their language reasoning skills to various images, such as photographs, screenshots, and documents containing both text and pictures.

To get started, tap the photo button to capture or select an image. You need to tap the plus button first if you are using an iOS or Android device. In addition, you can discuss multiple images or use OpenAI’s drawing tool to guide your AI assistant.

“Voice and image give you more ways to use ChatGPT in your life. Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” the company announced in a blog post on Monday.

“When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.”


Over the next two weeks, the voice and image features will be available to ChatGPT Plus and Enterprise customers. While the voice feature will be available on iOS and Android (opt-in in your settings), the images feature will be available on all platforms.

Leave a Comment