Premium Only Content

Unicode at gigabytes per second
We often represent text using Unicode formats (UTF-8 and UTF-16). UTF-8 is increasingly popular (XML, HTML, JSON, Rust, Go, Swift, Ruby). UTF-16 is most common in Java, .NET, and inside operating systems such as Windows. Software systems frequently have to validate text or convert text from one encoding to the other. While recent disks have bandwidths of 5 GB/s or more, conventional approaches transcode non-ASCII text at a fraction of a gigabyte per second. We show that we can transcode (UTF-8, UTF-16) at gigabytes per second on current systems (x64 and ARM) without sacrificing safety. Our open-source library can be ten times faster than the popular ICU library on non-ASCII strings and even faster on ASCII strings.
Invited talk at SPIRE 2021, 28th International Symposium on String Processing and Information Retrieval (October 4-6th, 2021 - Lille, France)
-
0:34
On_the_Other_Hand
3 years agoA Second Channel!
32 -
0:06
Womblefam1857
3 years agoSkunk second run
66 -
2:03
KNXV
3 years agoSecond chance bike drive
12 -
1:27
WMAR
3 years agoServe second Saturdays
16 -
34:10
Jamie Kennedy
7 hours agoCoincidence DOES NOT Exist - Matrix Shattering Moments | Ep. 198- Hate To Break It To Ya
33.2K15 -
1:12:05
Edge of Wonder
8 hours agoPyramid of Giza’s Mystery Hidden Beneath: Massive Structures Shatter Our History
44K10 -
55:09
LFA TV
1 day agoTrump Officials ‘Signal’ a Message to Europe | TRUMPET DAILY 3.25.25 7PM
41.5K2 -
11:46
Tundra Tactical
8 hours ago $1.21 earnedGEN Z Brit 3D Prints a WORKING Gun!
32.2K17 -
1:18:17
Awaken With JP
12 hours ago20 yrs in Prison for Tesla Terrorists, 5 yr Covidversary, and More! - LIES Ep 84
118K73 -
14:02
The Gun Collective
11 hours agoGuns That Just Came Out ... AND GUNCON 2025 ANNOUNCEMENT!
43.8K8