- Tubelator AI
- >
- Videos
- >
- Science & Technology
- >
- Catching up with Scale AI: A Conversation with founder and CEO Alexandr Wang
Catching up with Scale AI: A Conversation with founder and CEO Alexandr Wang
Index partner Mike Volpi recently caught up with Alexandr Wang, founder and CEO of Scale AI, at Scale’s new San Francisco office. They discussed the future of AI, the critical role of high-quality data, and the challenges of moving AI projects from prototype to production. Alex also shared his thoughts on the importance of AI safety and the broader geopolitical dynamics shaping the industry today.
https://scale.com/
https://www.indexventures.com/
https://www.indexventures.com/perspectives/catching-up-with-scale-ai-founder-and-ceo-alexandr-wang/
Video Summary & Chapters
No chapters for this video generated yet.
Video Transcript
So these are the new digs on Iowa.
New office.
It's been a great setup.
We have a responsibility to produce all the data,
truly generate the data necessary to fuel this next era.
What worked before is not going to keep working.
Our responsibility is to utilize the exact same infrastructure and data foundry that we've built
to support every enterprise in their own journey
to make use of all their proprietary data
towards building customized specialized agents
for their own businesses.
You must still do a lot of college.
I mean, when I, obviously I dropped out to start the company, so I have a strong conviction
and talent right out of college.
We have an office in New York, an office in DC, we just started an office in London.
That's amazing.
Thank you very much.
You were sort of born in the AI era, so where do you think we are in the development of
this?
Are we getting to the stage where, you know, musical cheers are done, we know who the main
players are, this game is over, or you know, are we in the first inning, the
third inning, are we, you know, getting settled into it? To me, the first inning
was sort of the tinkering phase of modern deep learning. So from ImageNet
and AlexNet, so ImageNet was the first large-scale image labeled image
data set, AlexNet was the very first use of deep neural networks to solve
that problem. There was like the Google result where they could recognize cats
and YouTube videos. All of that was sort of, you know, basically like call it 2009 through
really probably 2020 was the first inning. And you know, while it was quite a while, it was really
a lot of tinkering with different kinds of model architectures, different kinds of data sets.
It was the first demonstration that showed that scaling these models up really worked. A lot of
the progress from GPT-2 through GPT-4 was all in pre-training, was all training on,
you know, more bigger and bigger chunks of the internet on more and more GPUs. And then
basically all the gains from that point, which was, you know, March of last year through
today, you know, August of 2024 have been through gains in post-training. And this is
through better SFT, RLHF, DPO of the models
and the use of data sets of increasing complexity,
going into more and more expert areas
and really driving for the performance of the models
using high quality over quantity.
So smaller data sets, but a very, very high quality.
Now you're actually creating effectively data sets
that are unique to the explicit purpose
that the model maker wants, right?
Exactly, yeah.
So we think about our role now as more of a data foundry than a data annotator.
The industry is very, very excited about agents.
There's very little data that can train these models of what are the series of actions
that a human takes and what is their internal thought process as they go through each of those steps.
We actually view one of our most important roles, especially over the next few years,
is to lay the groundwork for agents to actually become a possibility.