このコンテンツはまだ翻訳されていません。 翻訳に協力してください

Lookahead and lookbehind

Sometimes we need to match a pattern only if followed by another pattern. For instance, we’d like to get the price from a string like 1 turkey costs 30€.

We need a number (let’s say a price has no decimal point) followed by sign.

That’s what lookahead is for.

Lookahead

The syntax is: x(?=y), it means "match x only if followed by y".

The euro sign is often written after the amount, so the regexp will be \d+(?=€) (assuming the price has no decimal point):

let str = "1 turkey costs 30€";

alert( str.match(/\d+(?=€)/) ); // 30 (correctly skipped the sole number 1)

Or, if we wanted a quantity, then a negative lookahead can be applied.

The syntax is: x(?!y), it means "match x only if not followed by y".

let str = "2 turkeys cost 60€";

alert( str.match(/\d+(?!€)/) ); // 2 (correctly skipped the price)

Lookbehind

Lookbehind allows to match a pattern only if there’s something before.

The syntax is:

  • Positive lookbehind: (?<=y)x, matches x, but only if it follows after y.
  • Negative lookbehind: (?<!y)x, matches x, but only if there’s no y before.

For example, let’s change the price to US dollars. The dollar sign is usually before the number, so to look for $30 we’ll use (?<=\$)\d+:

let str = "1 turkey costs $30";

alert( str.match(/(?<=\$)\d+/) ); // 30 (correctly skipped the sole number 1)

And for the quantity let’s use a negative lookbehind (?<!\$)\d+:

let str = "2 turkeys cost $60";

alert( str.match(/(?<!\$)\d+/) ); // 2 (correctly skipped the price)

Capture groups

Generally, what’s inside the lookaround (a common name for both lookahead and lookbehind) parentheses does not become a part of the match.

But if we want to capture something, that’s doable. Just need to wrap that into additional parentheses.

For instance, here the currency (€|kr) is captured, along with the amount:

let str = "1 turkey costs 30€";
let reg = /\d+(?=(€|kr))/;

alert( str.match(reg) ); // 30, €

And here’s the same for lookbehind:

let str = "1 turkey costs $30";
let reg = /(?<=(\$|£))\d+/;

alert( str.match(reg) ); // 30, $

Please note that for lookbehind the order stays be same, even though lookahead parentheses are before the main pattern.

Usually parentheses are numbered left-to-right, but lookbehind is an exception, it is always captured after the main pattern. So the match for \d+ goes in the result first, and then for (\$|£).

Summary

Lookahead and lookbehind (commonly referred to as “lookaround”) are useful for simple regular expressions, when we’d like not to take something into the match depending on the context before/after it.

Sometimes we can do the same manually, that is: match all and filter by context in the loop. Remember, str.matchAll and reg.exec return matches with .index property, so we know where exactly in the text it is. But generally regular expressions can do it better.

Lookaround types:

Pattern type matches
x(?=y) Positive lookahead x if followed by y
x(?!y) Negative lookahead x if not followed by y
(?<=y)x Positive lookbehind x if after y
(?<!y)x Negative lookbehind x if not after y

Lookahead can also used to disable backtracking. Why that may be needed – see in the next chapter.

チュートリアルマップ

コメント

コメントをする前に読んでください…
  • 自由に記事への追加や質問を投稿をしたり、それらに回答してください。
  • 数語のコードを挿入するには、<code> タグを使ってください。複数行の場合は <pre> を、10行を超える場合にはサンドボックスを使ってください(plnkr, JSBin, codepen…)。
  • 記事の中で理解できないことがあれば、詳しく説明してください。