• Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Trending

Why Conversational Commerce is the Future of Shopping

May 29, 2025

10 Leadership Myths You Need to Stop Believing

May 29, 2025

Tesla’s Layoffs Won’t Solve Its Growing Pains

May 29, 2025
Facebook Twitter Instagram
  • Newsletter
  • Submit Articles
  • Privacy
  • Advertise
  • Contact
Facebook Twitter Instagram
InDirectica
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Subscribe for Alerts
InDirectica
Home » AI Watching And Listening: Cross-Sensory Cognition Work
Innovation

AI Watching And Listening: Cross-Sensory Cognition Work

adminBy adminSeptember 18, 20230 ViewsNo Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email

Sometimes we forget how much AI is really doing behind the scenes – but to be reminded, we need to look no further than so much of what came out of Imagination in Action, and everything these experts showed us.

Large language models are taking our world by storm, with the ability to imitate human cognition in so many different ways. We’re really seeing all of this lead into a massive trend toward digital disruption.

That idea comes through loud and clear as James Glass takes us through some of the intersections between video, audio and new technology.

For example, take a look at the part of the video where he talks about image captioning and the interplay between visuals and text:

“We were interested in seeing if we could take speech and pair it up with vision, and with no other information, see what the machine could learn from raw audio samples and raw pixels,” he explains. “And so since nothing like this existed, we went out and collected about 400,000 or so people talking about images. People like to do this; it’s pretty easy. Then we (built) a deep learning model, having one branch grovel (sic) over the image and another branch grovel (sic) over the audio, and then at a high level, have them connect and try and learn a joint audiovisual semantic Layton representation of the signal.”

Glass talks about “semantic objects” as versatile units of digital cognition, and shows us how the computer ‘thinks’ by offering a display where you can hear people talking about items in a picture, and see pixels lighting up around those objects.

In a way, it’s kind of like a step-through code editing program where you see what the machine is doing while it’s doing it.

Lighthouses and sunsets are pretty, but Glass suggests there’s more to it than that:

“It’s sort of like somebody shining a flashlight at a picture while you’re talking. And it’s not perfect, but you get a sense that on some of the concepts that you’re hearing, it sort of knows what you’re talking about. You can quantify this a little bit more by looking through a large data set and finding patches (sic) and images that have high correspondence with segments in the speech captions, and pooling them together and then clustering, and you get hundreds and hundreds of these kinds of clusters…”

He talks about the “Rosetta Stone” of language intersection, where some of these new technologies will enable better translations – or more to the point, entirely new kinds of translations transcending text and verbal reading in very sci-fi ways.

But that’s really just the tip of the iceberg. Think about what’s going to happen when we allow AI entities to translate between media, between speech and visuals!

Or to put it another way, think back about a decade to early AI work. We had unsupervised machine learning, and supervised machine learning.

These paradigms that Glass is talking about are inherently different. They’re based on self-supervised learning, as he mentions several times. And that’s critically important. Self-supervising systems evolve in ways that make it hard for humans to keep up with them.

As an example, Glass talks about scene analysis and perception models. Listen to this part where he discusses a methodology for multimedia analysis:

“You can modify that basic model to have a visual branch that’s processing video, and an audio branch that’s processing speech and the audio sounds, and learn a high-level embedding space. And you can do things like retrieval: play an audio snippet and retrieve the corresponding video snippet, and things like that.”

Video: These are some very interesting new things that AI has just become capable of

He talks about listening and understanding, and how we can move the ball forward:

“Deep Learning has really enabled us to make connections across modalities,” he says. “It’s fascinating: self-supervised learning has led us learn from large quantities of unannotated data. And these newer large language models (are) going to be a really interesting research direction (in which) to connect perception with language: two of the original pillars of artificial intelligence.”

It truly is fascinating. After a while, you might find it almost keeps you up at night. With AI doing all of this – how long until it’s doing it better than us? Anyway, the applications are evident, and the methodology, the cutting-edge research, is starkly impressive.

Read the full article here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Articles

Going Eco Benefits Planet And This Hotel’s Bottom Line

Innovation May 29, 2025

What IBM’s Deal For HashiCorp Means For The Cloud Infra Battle

Innovation April 25, 2024

Is Telepathy Possible? Perhaps, Due To New Technology

Innovation April 24, 2024

Luminar Launches Production For Volvo, Shows Next-Gen Halo Lidar

Innovation April 23, 2024

Turning Customers Into Investors – Tiny Health’s Experience

Innovation April 22, 2024

Netflix’s Best New Original Series Is Stressing Me Out

Innovation April 21, 2024
Add A Comment

Leave A Reply Cancel Reply

Editors Picks

Why Conversational Commerce is the Future of Shopping

May 29, 2025

10 Leadership Myths You Need to Stop Believing

May 29, 2025

Tesla’s Layoffs Won’t Solve Its Growing Pains

May 29, 2025

Going Eco Benefits Planet And This Hotel’s Bottom Line

May 29, 2025

What IBM’s Deal For HashiCorp Means For The Cloud Infra Battle

April 25, 2024

Latest Posts

The Future of Football Comes Down to These Two Words, Says This CEO

April 25, 2024

This Side Hustle Is Helping Land-Owners Earn Up to $60,000 a Year

April 25, 2024

A Wave of AI Tools Is Set to Transform Work Meetings

April 25, 2024

Is Telepathy Possible? Perhaps, Due To New Technology

April 24, 2024

How to Control the Way People Think About You

April 24, 2024
Advertisement
Demo

InDirectica is your one-stop website for the latest news and updates about how to start a business, follow us now to get the news that matters to you.

Facebook Twitter Instagram Pinterest YouTube
Sections
  • Growing a Business
  • Innovation
  • Leadership
  • Money & Finance
  • Starting a Business
Trending Topics
  • Branding
  • Business Ideas
  • Business Models
  • Business Plans
  • Fundraising

Subscribe to Updates

Get the latest business and startup news and updates directly to your inbox.

© 2026 InDirectica. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.