Regex

Choice information

Regex will arbitrarily match pattern in a line unless pinned to a location. Any meta-character to be used literally must be escaped with a \ prepending it. Generally find what excludes the bad case.

Groups can be nested. Groups will only capture what is enclosed in (). Groups can typically be referenced with \{some_num} (like \0 or \1). Groups are referenced from \1 onwards; \0 is the whole matched string.

| can decide what to OR a little arbitrarily. Maybe on whitespace as well as group? All set matchers (like \d) can be inversed through capitalization.

Flags can be combined like (?im).

Sets

  • \d will match any digit [0-9].
  • \w will match any alphanumeric [A-Za-z0-9_].
  • \s will match any whitespace.

Globbing

  • [] encapsulating a collection of characters indicates an OR.
  • - can be used inside [] to indicate a sequential range of values to be represented: [a-z].
  • {} can be appended to a character to indicate number of instances to match: {4}.
  • , can be used in some engines inside {} to designate a range: {1,5}.
  • () will capture characters match in a group for later use.
  • ?: inside () will result in a non-capturing group: (?:.*).

Pinning

  • ^ is a meta-character IN [] for NOT.
  • ^ is a meta-character OUTSIDE of [] for start of line.
  • $ is a meta-character for end of line.
  • \b is a boundary between alphanumeric and non-alphanumeric character.

Matching

  • . is a meta-character for literally anything (including \n with (?s)).
  • * is a meta-character for 0 or more matches.
  • + is a meta-character for 1 or more matches.
  • ? after QUANTIER (*, +, {d+,d+}) is a meta-character for laziness.
  • ? after REGULAR CHARACTER/TOKEN is a meta-character for optionality.
  • | indicates an OR for the whole sequence.

Flags

  • (?i) flag for case-insensitive.
  • (?m) flag for multiline (^ and $ match line boundaries).
  • (?s) flag for dotall, making . match newlines.