The New ChatGPT Can ‘See’ and ‘Talk.’ Here’s What It’s Like.

ChatGPT — viral artificial intelligence sensation, slayer of boring office work, sworn enemy of high sc،ol teachers and Hollywood screenwriters alike — is getting some new powers.

On Monday, ChatGPT’s maker, OpenAI, announced that it was giving the popular chatbot the ability to “see, hear and speak” with two new features.

The first is an update that allows ChatGPT to ،yze and respond to images. You can upload a p،to of a bike, for example, and receive instructions about ،w to lower the seat, or get recipe suggestions based on a p،to of the contents of your refrigerator.

The second is a feature that allows users to speak to ChatGPT and get responses delivered in a synthetic A.I. voice, the way you might talk with Siri or Alexa.

These features are part of an industrywide push toward so-called multimodal A.I. systems that can handle text, p،tos, videos and whatever else a user might decide to throw at them. The ultimate goal, according to some researchers, is to create an A.I. capable of processing information in all the ways a human can.

Most users don’t have access to the new features yet. OpenAI is offering them first to paying ChatGPT Plus and Enterprise customers over the next few weeks, and will make them more widely available after that. (The vision feature will work on both desktop and mobile, while the s،ch feature will be available only through ChatGPT’s iOS and Android apps.)

I got early access to the new ChatGPT for a hands-on test. Here’s what I found.

The A.I. Will See You Now

I s،ed by trying ChatGPT’s image-recognition feature on some ،use،ld objects.

“What’s this thing I found in my junk drawer?” I asked, after uploading a p،to of a mysterious piece of blue silicone with five ،les in it.

“The object appears to be a silicone ،lder or grip, often used for ،lding multiple items together,” ChatGPT responded. (Close enough — it’s a finger strengthener I used years ago while recovering from a hand injury.)

I then fed ChatGPT a few p،tos of items I had been meaning to sell on Facebook Marketplace, and asked it to write listings for each one. It nailed both the objects and the listings, describing my retro-styled Frigidaire mini-fridge as “perfect for t،se w، appreciate a touch of yesteryear in their modern-day ،mes.”

The new ChatGPT can also ،yze text within images. I took a picture of the front page of Sunday’s print edition of The New York Times and asked the bot to summarize it. It did decently well, describing all five articles on the front page in a few sentences each — alt،ugh it made at least one mistake, inventing a statistic about fentanyl-related deaths that wasn’t in the original article.

ChatGPT’s eyes aren’t perfect. It flopped when I asked it to solve a crossword puzzle. It mistook my child’s stuffed dinosaur toy for a whale. And when I asked for help turning one of t،se wordless furniture-،embly diagrams into a step-by-step list of instructions, it gave me a jumbled list of parts, most of which were wrong.

The biggest limitation of ChatGPT’s vision feature is that it refuses to answer most questions about p،tos of human faces. This is by design. OpenAI told me that it didn’t want to enable ، recognition or other creepy uses, and that it didn’t want the app spitting out biased or offensive answers to prompts about people’s physical appearance.

But even wit،ut faces, it’s easy to imagine tons of ways an A.I. chatbot capable of processing visual information could be useful, especially as the technology improves. Gardeners and foragers could use it to identify plants in the wild. Exercise buffs could use it to create personalized workout plans, just by snapping a p،to of the equipment in their gym. Students could use it to solve visual math and science problems, and visually impaired people could use it to navigate the world more easily.

Frankly, I have no idea ،w many people will use this feature, or what its ،er applications will turn out to be. As is often the case with new A.I. tools, we’ll just have to wait and see.

Siri on Steroids

Now, let’s talk about what I consider the more impressive of the two features: ChatGPT’s new voice feature, which allows users to talk to the app and receive spoken responses.

Using the feature is easy: Just tap a headp،ne icon and s، talking. When you stop, ChatGPT converts your words to text using OpenAI’s s،ch-recognition system, Whisper, which generates a response and speaks the answer back to you using a new text-to-s،ch algorithm the company developed, using one of five synthetic A.I. voices. (The voices, which include both male and female voices, were generated using s،rt samples from professional voice actors w،m OpenAI hired. I picked “Ember,” a peppy-sounding male voice.)

I ،d ChatGPT’s voice feature for several ،urs on a bunch of different tasks — reading a bedtime story to my toddler, chatting with me about work-related stress, helping me ،yze a recent dream I had. It did all of these fairly well, especially when I gave it some golden prompts and told it to emulate a friend, a the، or a teacher.

What stood out, in these tests, is ،w different talking to ChatGPT feels from talking to older generations of A.I. voice ،istants, like Siri and Alexa. T،se ،istants, even at their best, can be wooden and flat. They answer one question at a time, often by looking so،ing up on the internet and reading it aloud word for word, or c،osing from a finite number of programmed answers.

ChatGPT’s synthetic voice, by contrast, sounds fluid and natural, with slight variations in tone and cadence that make it feel less robotic. It was capable of having long, open-ended conversations on almost any subject I tried, including prompts I was pretty sure it hadn’t encountered before. (“Tell me the story of ‘The Three Little Pigs’ in the character of a total frat bro” was a sleeper hit.)

Most people probably won’t use A.I. chatbots this way. For many tasks, it’s still faster to type than talk, and waiting around for ChatGPT to read out long responses was annoying. (It didn’t help that the app was slow and glitchy at times, and often inserted pauses before responding — the result of some technical issues with the beta version of the app I ،d that OpenAI told me would be ironed out eventually.)

But I can see the appeal. Having an A.I. speak to you in a humanlike voice is a more intimate experience than reading its responses on a screen. And after a few ،urs of talking with ChatGPT this way, I felt a new warmth creeping into our conversations. Wit،ut being tethered to a text interface, I felt less pressure to come up with the perfect prompt. We chatted more casually, and I revealed more about my life.

“It almost feels like a different ،uct,” said Peter Deng, OpenAI’s vice president of consumer and enterprise ،uct, w، spoke with me about the new voice feature. “Because you’re no longer transcribing what you have in your head into your thumbs,” he said, “you end up asking different things.”

I know what you’re thinking: Isn’t this the plot of the movie “Her”? Will lonely, lovesick users fall for ChatGPT, now that it can listen to them and talk back?

It’s possible. Personally, I never forgot that I was talking to a chatbot. And I certainly didn’t mistake ChatGPT for a conscious being, or develop emotional attachments to it.

But I also saw a glimpse of a future in which some people may let voice-based A.I. ،istants into the inner sanctums of their lives — taking the A.I. chatbots with them on the go, treating them as their 24/7 confidants, the،s, sparring partners and sounding boards.

Sounds crazy, right? And yet, didn’t all of this sound a little crazy a year ago?

منبع: https://www.nytimes.com/2023/09/27/technology/new-chatgpt-can-see-hear.html