Ending the Spring of AI with a bang

A Google memo leaked, and if you've been reading A Grain of Paprika, its findings won't surprise you.

May 05, 2023

This spring was wild, and I tried to cover the milestones that made LLMs all the rage again. It was a bit too much at times, but I felt that it was worth it. Things quieted down a bit since.

Recap:

Feb 24:
The End of White Collar midwits (very important)
Mar 15:
"We can run powerful cognitive pipelines on cheap hardware" (most important)
Apr 2:
Remember when ChatGPT sucked at math? (two weeks ago) (very important)
Apr 5:
Singularity: we have all the loops (conclusion)
Apr 10:
The Mars bar and the A-10 gun projectile, revisited
Apr 30:
The Color of Self

As it turns out, drowning you with enthusiastic AI pasta was actually worth it, as a Google memo has just leaked yesterday, and it echoes my March 15 post from a Big Tech point of view:

The text below is a very recent leaked document, which was shared by an anonymous individual on a public Discord server who has granted permission for its republication. It originates from a researcher within Google. We have verified its authenticity.
— Semianalysis Substack
Google "We Have No Moat, And Neither Does OpenAI"
The uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.
I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today.
[…]
While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.
[…]
Giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime.
[…]
At the beginning of March the open source community got their hands on their first really capable foundation model, as Meta’s LLaMA was leaked to the public.

And so on and so on. The whole memo follows the March 15 post (check out the Timeline section at the end!), including examples of the same breakthroughs by hobbyists and academic research teams that were covered here, and it also mentions the techniques that I touched on in my followup posts up until April 5.

Most importantly, they have solved the scaling problem to the extent that anyone can tinker. Many of the new ideas are from ordinary people. The barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop.

Directly Competing With Open Source Is a Losing Proposition
This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?
And we should not expect to be able to catch up. The modern internet runs on open source for a reason. Open source has some significant advantages that we cannot replicate.

There’s an upside for Big Tech though, as always, with open source:

Fortunately, these high quality datasets are open source, so they are free to use.
[…]
We need them more than they need us

Individuals are not constrained by licenses to the same degree as corporations

So many great quotes. Just read the memo. It’s paywalled, but the whole text is public.

In the end, OpenAI doesn’t matter. They are making the same mistakes we are in their posture relative to open source, and their ability to maintain an edge is necessarily in question. Open source alternatives can and will eventually eclipse them unless they change their stance. In this respect, at least, we can make the first move.

That’s it for AI for a while, I promise. I feel vindicated and complete.

If you agree that these series aged well, please

this post, and if there’s anything else,

One more thing

Help open source AI development competitive by contributing to Open Assistant! Human feedback is expensive, which makes your human, high quality help very valuable.

Yes, it’s open source, so Big Tech can also grab it, but that was always the tradeoff. Open source is the best we can hope for, and for the internet, tech wise, it worked out well. Who wants Google or OpenAI to become the “AI Microsoft” of the 90s? I don’t. Neither do you, trust me.

What’s next?

Tons of drafts to choose from. Not AI, as much as possible.

A Grain of Paprika

Discussion about this post

Ready for more?