Premium Only Content
Unicode at gigabytes per second
We often represent text using Unicode formats (UTF-8 and UTF-16). UTF-8 is increasingly popular (XML, HTML, JSON, Rust, Go, Swift, Ruby). UTF-16 is most common in Java, .NET, and inside operating systems such as Windows. Software systems frequently have to validate text or convert text from one encoding to the other. While recent disks have bandwidths of 5 GB/s or more, conventional approaches transcode non-ASCII text at a fraction of a gigabyte per second. We show that we can transcode (UTF-8, UTF-16) at gigabytes per second on current systems (x64 and ARM) without sacrificing safety. Our open-source library can be ten times faster than the popular ICU library on non-ASCII strings and even faster on ASCII strings.
Invited talk at SPIRE 2021, 28th International Symposium on String Processing and Information Retrieval (October 4-6th, 2021 - Lille, France)
-
0:34
On_the_Other_Hand
3 years agoA Second Channel!
32 -
0:06
Womblefam1857
3 years agoSkunk second run
66 -
2:03
KNXV
3 years agoSecond chance bike drive
10 -
1:27
WMAR
3 years agoServe second Saturdays
16 -
1:47:40
Glenn Greenwald
8 hours agoThe Key Issues Determining the Trajectory of the Second Trump Administration: From Israel and Ukraine to Populism and Free Speech | SYSTEM UPDATE #382
67.1K39 -
1:02:44
The StoneZONE with Roger Stone
6 hours agoRoger Stone Unveils His 16th Annual International Best and Worst Dressed List | The StoneZONE
30.1K4 -
45:22
Kyle Rittenhouse Presents: Tactically Inappropriate
7 hours ago $3.46 earnedKyle Rittenhouse Presents: Tactically Inappropriate
30.5K15 -
1:13:16
Patriots With Grit
6 hours agoThe Comedy of White Privilege & Government | A.J. Rice
22.9K2 -
49:40
Havoc
11 hours agoWhat's 2025 Looking Like... | Stuck Off the Realness Ep. 23
42.3K3 -
3:58:11
Nerdrotic
10 hours ago $34.89 earnedWOKE Hollywood Freak out, Cyber Truck Attack, 2025 BEGINS! | Friday Night Tights 335 w Benny Johnson
102K23