Premium Only Content
Sparse Expert Models (Switch Transformers, GLAM, and more... w/ the Authors)
#nlp #sparsity #transformers
This video is an interview with Barret Zoph and William Fedus of Google Brain about Sparse Expert Models.
Sparse Expert models have been hugely successful at distributing parts of models, mostly Transformers, across large array of machines and use a routing function to effectively route signals between them. This means that even though these models have a huge number of parameters, the computational load for a given signal does not increase because the model is only sparsely activated. Sparse expert models, such as Switch Transformers and GLAM can scale up to trillions of parameters and bring a number of desirable properties. We discuss everything from the fundamentals, history, strengths and weaknesses, up to the current state of the art of these models.
OUTLINE:
0:00 - Intro
0:30 - What are sparse expert models?
4:25 - Start of Interview
5:55 - What do you mean by sparse experts?
8:10 - How does routing work in these models?
12:10 - What is the history of sparse experts?
14:45 - What does an individual expert learn?
19:25 - When are these models appropriate?
22:30 - How comparable are sparse to dense models?
26:30 - How does the pathways system connect to this?
28:45 - What improvements did GLAM make?
31:30 - The "designing sparse experts" paper
37:45 - Can experts be frozen during training?
41:20 - Can the routing function be improved?
47:15 - Can experts be distributed beyond data centers?
50:20 - Are there sparse experts for other domains than NLP?
52:15 - Are sparse and dense models in competition?
53:35 - Where do we go from here?
56:30 - How can people get started with this?
Papers:
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (https://arxiv.org/abs/2101.03961)
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (https://arxiv.org/abs/2112.06905)
Designing Effective Sparse Expert Models (https://arxiv.org/abs/2202.08906)
Links:
Merch: store.ykilcher.com
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
-
22:01
DeVory Darkins
1 day ago $26.58 earnedHakeem Jeffries SHUTS DOWN The View as Matt Gaetz Speaks out
43.9K92 -
2:02:54
Mally_Mouse
6 hours agoLet's Play!! - Spicy Saturday
30.1K -
1:33:06
Slightly Offensive
7 hours ago $20.08 earnedAre You Ready for What's Coming Next? | Just Chatting Chill Stream
48.2K32 -
32:10
MYLUNCHBREAK CHANNEL PAGE
1 day agoThe Gate of All Nations
125K50 -
13:07
Sideserf Cake Studio
11 hours ago $1.88 earnedIS THIS THE MOST REALISTIC SUSHI CAKE EVER MADE?
46.9K3 -
21:08
Clownfish TV
1 day agoElon Musk Tells WotC to BURN IN HELL for Erasing Gary Gygax from DnD!
36.4K13 -
48:22
PMG
7 hours ago $6.50 earned"IRS Whistleblowers Speak Out on Biden Family with Mel K In-Studio"
30.1K14 -
2:59
BIG NEM
9 hours agoLost in the Wrong Hood: Who Do I Check In With?
24.2K2 -
1:29:32
I_Came_With_Fire_Podcast
19 hours ago"UFOs, Nukes, & Secrecy: Bob Salas on the 1967 Malmstrom Incident, UAPs, & Disclosure"
137K28 -
1:57:05
The Quartering
12 hours agoElon Musk To BUY MSNBC & Give Joe Rogan A Spot, MrBeast Responds Finally To Allegations & Much More
129K107