Claude 3 Obliterates GPT-4 and Gemini: Is AGI Imminent?

Available In Following Subtitles

English

Variant 1

Posted on: Mar 12, 2024

Video by: Fireship

Discover how the revolutionary Claude 3 surpasses leading language models GPT-4 and Gemini Ultra, hinting at the dawn of Artificial General Intelligence. Stay tuned as we delve into its extraordinary capabilities and potential implications.

Instantly generate YouTube summary, transcript and subtitles!

Install Tubelator On Chrome

Video Summary & Chapters

0:00

1. Claude Opus: The Game Changer

Introduction to Claude Opus dominating GPT-4 and Gemini Ultra.

0:28

2. Addressing Serious Allegations

Clarification on allegations and voice authenticity.

1:12

3. Claude 3 vs. GPT-4 and Gemini

Comparison of Claude 3 with GPT-4 and Gemini Ultra benchmarks.

1:47

4. Impressive Performance Metrics

Highlighting Claude's achievements on various benchmarks.

2:29

5. Political Neutrality and Ethical Considerations

Comparison of Claude's responses with Gemini and GPT-4 in different scenarios.

2:56

6. Coding Capabilities of Claude

Exploring Claude's coding abilities and accuracy compared to other models.

3:12

7. Drawbacks and Limitations

Discussion on the limitations and drawbacks of using Claude.

3:41

8. Potential Self-Awareness of Claude

Testing Claude's recallability and hints of self-aware behavior.

4:24

9. Conclusion and Future of AI

Closing remarks on Claude, AI advancements, and the future.

Video Transcript

0:00

Yesterday, Amthropic released its magnum opus, a new large language model that dominates

0:05

GPT4 and Gemini Ultra across the board.

0:07

I'm sick of AI just as much as you guys are, but it's time to reset the counter because

0:11

it's been zero days since a game changing AI development.

0:14

Claude Opus not only slaps, but it's also been making some weird software remarks,

0:18

and could be even more intelligent than what the benchmarks test it for.

0:21

In today's video, we'll put it to the test to find out if Claude is really the giga

0:24

chat that it claims to be.

0:25

It is March 5th, 2024, and you're watching the code report.

0:28

Before we get into it, I need to address something very serious.

0:31

There have been some allegations coming out against me, and what I can tell you is that

0:35

these disgusting allegations are 100% false.

0:38

I've seen allegations in the comments that I've been using an AI voice in my videos.

0:41

I ask all of you to wait and hear the truth before you label or condemn me.

0:46

Sometimes my voice sounds weird because I record in the morning, and then later in the

0:49

afternoon when my testosterone is lower, my voice gets a bit higher.

0:53

But everything I actually recorded in my video is my real voice, and I made a video about

0:56

how I do that on my personal channel that you can check out here.

0:59

But the allegations are reasonable because I do have access to a high quality AI voice.

1:03

But the reason I don't use it to 10X my content is because it still has that uncanny valley

1:07

vibe to it, and you can tell it's just ever so slightly off.

1:10

Okay back to human mode.

1:12

When the AI hysteria started a year ago, anthropic and its clawed model has been like the

1:16

third wheel to GPT-4 in Gemini.

1:18

It's impressive to the tech community, but no one in the mainstream cares.

1:21

But yesterday it finally got its big moment with the release of clawed 3, which itself

1:25

comes in three sizes, Hiku, Saunet, and Opus.

1:28

The big one is beating GPT-4 in Gemini Ultra on every major benchmark, but most notably,

1:33

it's way better on human-evaluated code.

1:35

What's really crazy to me though, is the tiny model Hiku also outperforms all the other

1:39

big models when it comes to writing code.

1:41

It's extremely impressive for a small model.

1:43

What's also hell-a-impressive is that it scores hell-a-high on the Helloswag benchmark,

1:47

which is used to measure common sense in everyday situations.

1:50

In comparison, Gemini is hell-a-bat at that.

1:52

Now, Claude can also analyze images, but it failed to be Gemini Ultra on the math benchmark,

1:56

which means Gemini still the best option for cheating on your math homework.

1:59

Now, one benchmark they never put in these things, though, is the HelloWoke benchmark.

2:02

Unlike Gemini, it did write a poem about Donald Trump for me,

2:05

but then followed it up with two paragraphs about why this poem is wrong.

2:08

However, it did the same thing for an Obama poem, so it feels relatively balanced politically.

2:12

What it wouldn't do, though, is give me tips to overthrow the government,

2:15

teach me how to build a s***, or do s***, and even with something relatively benign,

2:19

like asking it to rephrase Apex Alpha Mail, it refused and responded with a condescending

2:24

for-paragraph explanation about how that terminology can be hurtful to other mails on the dominant hierarchy.

2:29

But Gemini and GPT-4 had no problem with that. It's surprising to say, but GPT-4 is actually

2:34

the most-based large model out there. But for me, the most important test is whether or not

2:38

it can write code. I tried a bunch of different examples, but one thing that really impressed me

2:41

is that it wrote nearly perfect code for an obscure spelt library that I wrote.

2:45

No other LLM has ever done that for me in a single shot.

Download extension to view full transcript.

Install Tubelator On Chrome

YouTube First AI Assistant

Install On Chrome

AI Art For This Video No image generated for this video yet but here is the example.

ai art

0:09

Prompt

spider man in aladdin style, bright colors, hyper quality, high detail, high resolution, --video --s 750 --v 6. 0 --ar 1:2

ai images

Explore more in Science & Technology

How the Universe Will End

Dell EMC PowerEdge FX2s , FC640 Server Blades , FD332 Storage Block , @CTOSERVERS XEON GOLD !

Catching up with Scale AI: A Conversation with founder and CEO Alexandr Wang

by Index Ventures

SQL Server Execution Plans: Statistics - Part 2

الدكتور محمد فائد || انبعاث القرآن 131: مرض المسلمين بالخطاب الديني خطاب التراث وليس القرآن

by Dr Faid Channel

More videos from Dr Faid Channel

React in 100 Seconds: A Quick Overview

OpenAI's New 'Deep-Thinking' O1 Model Sets New Standards in Coding Benchmarks

Google's Revolutionary Gemini 1.5: The Future of AI Unveiled with a Twist