OpenAI’s latest upgrade essentially lets users livestream with ChatGPT

(Coin Telegraph)-15/05/2024

ChatGPT creator OpenAI has announced its latest AI model, GPT-4o, a chattier, more humanlike AI chatbot, which can interpret a user’s audio and video and respond in real time.

A series of demos released by the firm shows GPT-4 Omni helping potential users with things like interview preparation — by making sure they look presentable for the interview — as well as calling a customer service agent to get a replacement iPhone.

Other demos show it can share dad jokes, translate a bilingual conversation in real time, be the judge of a rock-paper-scissors match between two users, and respond with sarcasm when asked. One demo even shows how ChatGPT reacts to being introduced to the user’s puppy for the first time.

“Well hello, Bowser! Aren’t you just the most adorable little thing?” the chatbot exclaimed.

“It feels like AI from the movies; and it’s still a bit surprising to me that it’s real,” said the firm’s CEO, Sam Altman, in a May 13 blog post.

“Getting to human-level response times and expressiveness turns out to be a big change.”

A text and image-only input version was launched on May 13, with the full version set to roll out in the coming weeks, OpenAI said in a recent X post.

GPT-4o will be available to both paid and free ChatGPT users and will be accessible from ChatGPT’s API.

OpenAI said the “o” in GPT-4o stands for “omni” — which seeks to mark a step toward more natural human-computer interactions.

GPT-4o’s ability to process any input of text, audio and image at the same time is a considerable advancement compared with OpenAI’s earlier AI tools, such as ChatGPT-4, which often “loses a lot of information” when forced to multi-task.

OpenAI said “GPT-4o is especially better at vision and audio understanding compared to existing models,” which even includes picking up on a user’s emotions and breathing patterns.

It is also “much faster” and “50% cheaper” than GPT-4 Turbo in OpenAI’s API.

The new AI tool can respond to audio inputs in as little as 2.3 seconds, with an average time of 3.2 seconds, OpenAI claims, which it says is similar to human response times in an ordinary conversation.