What's new in ECMAScript 2024
The final version of ECMAScript 2024 Language Specification was approved on the 26th of June. The list of new JavaScript features is now confirmed, and to keep my annual tradition, I am publishing this yearly recap for you and my future self. For curious ones, here are the posts from the past years: 2023, 2022, 2021, 2020, 2019, 2018, 2017 and 2016.
A few handy features became part of the specification, but some are more nuanced, lower-level and outside the commonly used set of tools by a regular app maker (like me). I did my homework, and in this article, I will explain them to people who rarely delve into the territory of complicated Regex, Unicode characters encoding and buffer manipulations.
- Well-Formed Unicode Strings by Guy Bedford, Bradley Farias, Michael Ficarra
- Asynchronous atomic wait for ECMAScript by Shu-yu Guo and Lars T Hansen
- RegExp v flag with set notation + properties of strings by Markus Scherer and Mathias Bynens
- In-Place Resizable and Growable ArrayBuffers by Shu-yu Guo
- ArrayBuffer transfer by Shu-yu Guo, Jordan Harband and Yagiz Nizipli
- Array grouping by Justin Ridgewell and Jordan Harband
- Promise.withResolvers by Peter Klecha
Well-Formed Unicode Strings by Guy Bedford, Bradley Farias, Michael Ficarra
Strings in JavaScript are represented by a sequence of UTF-16 code points. The 16 in the name represents the number of bits available to store the code point, which offers 65536 possible combinations (216). This amount is sufficient to store characters of Latin, Greek, Cyrillic and East Asian alphabets but not enough to store things like Chinese, Japanese, and Korean ideographs or emojis. Additional characters are stored in pairs of 16-bit code units, known as surrogate pairs.
'a'.length
// 1
'a'.split('')
// [ 'a' ]
'🥑'.length
// 2
'🥑'.split('')
//[ '\ud83e', '\udd51' ] 👈 surrogate pair
Leading and trailing surrogates are scoped to a range of code units which are not used to encode single-code-unit characters to avoid ambiguity. If a pair is missing a leading or tailing code unit or their order is flipped, we deal with a “lone surrogate”, and the whole string is “ill-formed “. For the string to be “well-formatted,” it must not contain lone surrogates.
The Well-Formed Unicode Strings proposal introduces a String.prototype.isWellFormed() method to verify whether a string is well-formed or not. In addition, it comes with a String.prototype.toWellFormed() helper method that replaces all lone surrogates with replacement characters (U+FFFD, �).
'\ud83e\udd51'
// 🥑
'\ud83e\udd51'.isWellFormed()
// true
'\ud83e'.isWellFormed() // without trailing surrogate
// false
'\ud83e'.toWellFormed()
// �
Asynchronous atomic wait for ECMAScript by Shu-yu Guo and Lars T Hansen
Workers enable multi-threading in JavaScript. The SharedArrayBuffer is a low-level API that allows us to perform operations on a memory shared between agents (main thread and workers). A set of static methods on Atomics object help us to avoid conflicts between reads and writes.
A common thing to do is to put a worker to sleep and wake it when needed. We combine Atomics.wait() and Atomics.notify() methods to achieve it. However, this can be limiting because Atomics.wait() is a synchronous API and cannot be used on the main thread.
The Asynchronous atomic wait proposal gives a way to do it asynchronously, and most importantly, it is possible to do it on the main thread.
// main thread
let i32a = null;
const w = new Worker("worker.js");
w.onmessage = function (env) {
i32a = env.data;
};
setTimeout(() => {
Atomics.store(i32a, 0, 1);
Atomics.notify(i32a, 0);
}, 1000);
// worker thread
const sab = new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT);
const i32a = new Int32Array(sab);
postMessage(i32a);
const wait = Atomics.waitAsync(i32a, 0, 0);
// { async: false; value: "not-equal" | "timed-out"; }
// or
// { async: true; value: Promise<"ok" | "timed-out">; }
if (wait.async) {
wait.value.then((value) => console.log(value));
} else {
console.log(wait.value);
}
RegExp v flag with set notation + properties of strings by Markus Scherer and Mathias Bynens
The new RegExp v flag is similar to unicode-aware regular expressions (u flag) added in 2015 but does much more. Due to similarities with the u flag and some incompatibilities, these two flags cannot be combined. The new v Regex mode enables three features: checks against a subset of Unicode string properties, performs subtraction/intersection/union matching and improves case-insensitive matching.
// `u` and `v` modes are similar, but they cannot be combined
const pattern = /./vu;
// SyntaxError: Invalid regular expression: invalid flags
Checks against a subset of Unicode string properties
The Unicode standard defines a list of properties that simplify regex patterns. For example, /\p{Math}/u checks for mathematical operators, /\p{Dash}/u for dash punctuation characters or /\p{ASCII_Hex_Digit}/u for symbols used for the representation of hexadecimal numbers.
const patternMath = /\p{Math}/u;
const patternDash = /\p{Dash}/u;
const patternHex = /\p{ASCII_Hex_Digit}/u;
patternMath.test('+'); // true
patternMath.test('z'); // false
patternDash.test('-'); // true
patternDash.test('z'); // false
patternHex.test('f'); // true
patternHex.test('z'); // false
Most of the properties apply to individual code points, but there are very few (for now, mostly emoji-related) that apply to strings (multiple code points). Basic_Emoji, RGI_Emoji and RGI_Emoji_Flag_Sequence, to name a few. These are the types that u mode doesn’t support, although there are some discussions to change it. Luckily, one of the features of v mode is the ability to perform checks against Unicode string properties.
const pattern = /\p{RGI_Emoji}/u
// SyntaxError: Invalid regular expression: /\p{RGI_Emoji}/u: Invalid property name
const pattern = /\p{RGI_Emoji}/v;
// single codepoint emoji
pattern.test('😀') // true
// multiple codepoints emoji
pattern.test('🫶🏾') // true
Subtraction/intersection/union matching
Another feature of v mode is subtraction (--), intersection (&&) and union of properties of strings. A new \q for string literals within character classes (multi-character strings) is worth noting.
// match all emojis except pile of poo
const pattern = /[\p{RGI_Emoji}--\q{💩}]/v;
pattern.test('😜') // true
pattern.test('💩') // false
// Only uppercase, hex-digit-safe chatacters
const pattern = /[\p{Uppercase}&&\p{ASCII_Hex_Digit}]/v;
pattern.test('f') // true
pattern.test('F') // false
// only melons and berries
const pattern = /^[\q{🍈|🍉|🍓|🫐}]$/v;
pattern.test('🥑') // false
pattern.test('🫐') // true
Improved case-insensivity
How the case sensitivity check works in u mode is confusing. Inversed patterns targeting specific case groups (Lowercase_Letter or Uppercase_Letter) with ignored case flag (i) enabled do not produce intuitive results. The new v flag makes the results much more predictable, which is why these two flags cannot be combined.
In-Place Resizable and Growable ArrayBuffers by Shu-yu Guo
The ArrayBuffer object in JavaScript is a way to represent a buffer of binary data. Resizing ArrayBuffers before ECMAScript 2024 was a tedious process of creating a new one and moving data from one to the other. Thanks to the “In-Place Resizable and Growable ArrayBuffers” proposal, we have a native way of defining growable buffers using options.maxByteLength property, and resize them by calling resize() method.
const buffer = new ArrayBuffer(8, { maxByteLength: 16 });
buffer.resizable; // true
buffer.byteLength; // 8
buffer.maxByteLength; // 16
buffer.resize(16);
buffer.byteLength; // 16
buffer.maxByteLength; // 16
ArrayBuffer transfer by Shu-yu Guo, Jordan Harband and Yagiz Nizipli
Following new resizing capabilities of ArrayBuffers, arrayBuffer.prototype.transfer and friends proposal add abilities to transfer their ownership. The transfer() or transferToFixedLength() methods allow us to relocate bytes depending on the destination. A new detached getter is a new native solution for checking deallocated buffers.
const buffer = new ArrayBuffer();
buffer.detached; // false
const newBuffer = buffer.transfer();
buffer.detached; // true
Array grouping by Justin Ridgewell and Jordan Harband
Thanks to the array grouping proposal, a popular groupBy method popularized by Lodash, Ramda and others has now become part of the ECMAScript. The initial idea was to implement it as Array.prototype.groupBy , which collided with the commonly used Sugar utility. It is implemented as an Object.groupBy / Map.groupBy static method.
const langs = [
{ name: "Rust", compiled: true, released: 2015 },
{ name: "Go", compiled: true, released: 2009 },
{ name: "JavaScript", compiled: false, released: 1995 },
{ name: "Python", compiled: false, released: 1991 },
];
const callback = ({ compiled }) => (compiled ? "compiled" : "interpreted");
const langsByType = Object.groupBy(langs, callback);
console.log({ langsByType });
// {
// compiled: [
// { name: "Rust", compiled: true, released: 2015 },
// { name: "Go", compiled: true, released: 2009 }
// ],
// interpreted: [
// { name: "JavaScript", compiled: false, released: 1995 },
// { name: "Python", compiled: false, released: 1991 }
// ]
// }
Promise.withResolvers by Peter Klecha
The Promise.withResolvers proposal adds to the language deferred promises, a popular pattern implemented before by jQuery, bluebird, p-defer and plenty of other libraries. You can use it to avoid nesting in the promise executor, although it shines when you need to pass resolve or reject to multiple callers. Working with stream or event-based systems is an excellent use case.
Look at this example of a createEventsAggregator taken from “Deferred JavaScript promises using Promise.withResolvers” which I published a few months ago. It returns an add method to push a new event and an abort method that cancels aggregation. Most importantly, it returns an events promise that resolves when it hits an eventsCount limit or rejects when abort is triggered.
function createEventsAggregator(eventsCount) {
const events = [];
const { promise, resolve, reject } = Promise.withResolvers();
return {
add: (event) => {
if (events.length < eventsCount) events.push(event);
if (events.length === eventsCount) resolve(events);
},
abort: () => reject("Events aggregation aborted."),
events: promise,
};
}
const eventsAggregator = createEventsAggregator(3);
eventsAggregator.events
.then((events) => console.log("Resolved:", events))
.catch((reason) => console.error("Rejected:", reason));
eventsAggregator.add("event-one");
eventsAggregator.add("event-two");
eventsAggregator.add("event-three");
// Resolved: [ "event-one", "event-two", "event-three" ]
Thats it for 2024. I will catch you next year 👋
The section on the regex
vflag shows\p{Hex_Digit}, but this is generally not what people want, since in addition to[0-9a-fA-F]it also matches full-width versions of these characters, making it equivalent to[0-9a-fA-F\uFF10-\uFF19\uFF21-\uFF26\uFF41-\uFF46]. Probably better to show/use\p{ASCII_Hex_Digit}instead.I trust you on this one Steven. I am literally in the process of reading your "Regexes Got Good: The History And Future Of Regular Expressions In JavaScript". I will amend this example. Thanks for your contribution and helping me out on this one.
Great breakdown of what’s new in ECMAScript. I’m particularly curious about how the ArrayBuffer changes will play out in real-world scenarios.
Stefan
So underwhelming. Where are my god damn pipes?!