Apple's MM1, Grok-1 Released, Google's VLOGGER, and OpenAI's Mira Murati's Fail

Keeping it light this Monday morning with some industry news.

Mar 18, 2024

Morning y’all!

Hope you had a wonderful weekend! I, for one, spent the entire week working to catch up on some important things that I had missed in the last few weeks and although it was exhausting it also felt good to be on top of things.

As I mentioned previously, we all know what it’s like to start a new gig and we naturally want to present our very best to our teammates and leaders. But, even more importantly I like to keep the bar very, very high as to my own worth ethic and performance because at the end of the day we each hold the line to our own expectations of ourselves.

Which includes this small newsletter, by the way.

-`ღ´-

Today I’ve got a few apps and industry news that’s captured my attention for more than a Tiktok-length video. Not too many, but important. Let’s go.

Apple shared some more info on the new MM1, a collection of multimodal models that use visual artifacts and language. It has 30B parameters and shows promising results from only a handful of examples. And yes, it does compete with GPT4 and Gemini Pro in terms of performance.

Elon promised that he’d open source Grok and it’s here! Massive in size (314B parameters) it’s under the Apache 2.0 license. Long live open source!

Folks have been talking about Pipio’s new AI-driven translation tool that does precisely what you think it does, even syncing lip movements with really great results. Upload a video and it does all the work for you! You can then export it and do whatever you’d like with it. Spooky how good these are becoming.

Google has developed VLOGGER an AI model that can create photorealistic talking avatar videos with full upper body motion from just an image and audio clip. Are. You. Kidding. Me.

VLOGGER is a novel framework to synthesize humans from audio. Given a single input image like the ones shown on the first column, and a sample audio input, our method generates photorealistic and temporally coherent videos of the person talking and vividly moving. As seen on the synthesized images in the right columns, we generate head motion, gaze, blinking, lip movement and unlike previous methods, upper-body and hand gestures, thus taking audio-driven synthesis one step further.

Trained on 800k videos this technology has a lot of obvious applications. I bet a number of startups use this type of research for new projects. Let’s just wait and see.

How strange and awkward is this video from the CTO of OpenAI? This seems like a perfect example of how one should be professionally coached on how to do an interview for a major news publication. Mira Murati makes it painful to watch with her chat with WSJ.

Super-cringe and it clearly is a problem, not because she’s a terrible interviewer but because it lacks acknowledgment that it is a problem. OpenAI is being sued left and right at this point and time and I’m shocked she can’t answer this question. Weird.

Startups are raising tons of money! I’ve shared previously you simply need to follow the money and you’ll know what’s on most venture capitalist’s mind:

Generative AI and AI-related startups raised nearly $50 billion in 2023, per Crunchbase data, with some, including OpenAI, Anthropic and Inflection AI, raking in billions of dollars all by themselves.
While investors have little doubt many AI startups will continue to push their valuations northward — especially those building their own models and platforms — the new year could provide a reckoning and recalibration for a market that seemed to know no bounds.

And if you’re interested in getting into AI then now’s the time. This is going to be a huge and growing industry and at some point it’ll be in every company, everywhere. You can bank on that.

And that’s it! Have a great day! We’re keeping it light so you can get back to work. Have a great one!

※\(^o^)/※

— Summer

DED.ai

Discussion about this post