|
class | AbstractLexer |
| The abstract lexer class template that is the abstract root class of all reflex-generated scanners. More...
|
|
class | AbstractMatcher |
| The abstract matcher base class template defines an interface for all pattern matcher engines. More...
|
|
class | Bits |
| RE/flex Bits class for dynamic bit vectors. More...
|
|
class | BoostMatcher |
| Boost matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the Boost::regex library. More...
|
|
class | BoostPerlMatcher |
| Boost matcher engine class, extends reflex::BoostMatcher for Boost Perl regex matching. More...
|
|
class | BoostPosixMatcher |
| Boost matcher engine class, extends reflex::BoostMatcher for Boost POSIX regex matching. More...
|
|
class | BufferedInput |
| Buffered input. More...
|
|
class | FlexLexer |
| Flex-compatible FlexLexer abstract base class template derived from reflex::AbstractMatcher for the reflex-generated yyFlexLexer scanner class. More...
|
|
class | Input |
| Input character sequence class for unified access to sources of input text. More...
|
|
struct | lazy_intersection |
| Intersection of two ordered sets, with an iterator to get elements lazely. More...
|
|
struct | lazy_union |
| Union of two ordered sets, with an iterator to get elements lazely. More...
|
|
class | Matcher |
| RE/flex matcher engine class, implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators. More...
|
|
class | ORanges |
| RE/flex ORanges (open-ended, ordinal value range) template class. More...
|
|
class | Pattern |
| Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...
|
|
class | PatternMatcher |
| The pattern matcher class template extends abstract matcher base class. More...
|
|
class | PatternMatcher< std::string > |
| A specialization of the pattern matcher class template for std::string, extends abstract matcher base class. More...
|
|
class | PCRE2Matcher |
| PCRE2 JIT-optimized matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the PCRE2 library. More...
|
|
class | PCRE2UTFMatcher |
| PCRE2 JIT-optimized native PCRE2_UTF+PCRE2_UCP matcher engine class, extends PCRE2Matcher. More...
|
|
struct | range_compare |
| Functor to define a total order on ranges (intervals) represented by pairs. More...
|
|
class | Ranges |
| RE/flex Ranges template class. More...
|
|
class | regex_error |
| Regex syntax error exceptions. More...
|
|
class | StdEcmaMatcher |
| std matcher engine class, extends reflex::StdMatcher for ECMA std::regex::ECMAScript syntax and regex matching. More...
|
|
class | StdMatcher |
| std matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the C++11 std::regex library. More...
|
|
class | StdPosixMatcher |
| std matcher engine class, extends reflex::StdMatcher for POSIX ERE std::regex::awk syntax and regex matching. More...
|
|
struct | TypeOp |
| TypeOp<T>::Type = T, TypeOp<T>::ConstType = const T, TypeOp<T>::NonConstType = non-const T. More...
|
|
struct | TypeOp< const T > |
| Template specialization of reflex::TypeOp. More...
|
|
|
int | isword (int c) |
| Check ASCII word-like character [A-Za-z0-9_] , permitting the character range 0..303 (0x12F) and EOF. More...
|
|
std::string | convert (const char *pattern, const char *signature, convert_flag_type flags=convert_flag::none, const std::map< std::string, std::string > *macros=NULL) |
| Returns the converted regex string given a regex library signature and conversion flags, throws regex_error. More...
|
|
std::string | convert (const std::string &pattern, const char *signature, convert_flag_type flags=convert_flag::none, const std::map< std::string, std::string > *macros=NULL) |
|
std::string | ztoa (size_t n) |
|
template<typename S1 , typename S2 > |
bool | is_disjoint (const S1 &s1, const S2 &s2) |
| Check if sets s1 and s2 are disjoint. More...
|
|
template<typename T , typename S > |
bool | is_in_set (const T &x, const S &s) |
| Check if value x is in set s . More...
|
|
template<typename S1 , typename S2 > |
bool | is_subset (const S1 &s1, const S2 &s2) |
| Check if set s1 is a subset of set s2 . More...
|
|
template<typename S1 , typename S2 > |
void | set_insert (S1 &s1, const S2 &s2) |
| Insert set s2 into set s1 . More...
|
|
template<typename S1 , typename S2 > |
void | set_delete (S1 &s1, const S2 &s2) |
| Delete elements of set s2 from set s1 . More...
|
|
void | timer_start (timer_type &t) |
| Start timer. More...
|
|
float | timer_elapsed (timer_type &t) |
| Return elapsed time in milliseconds (ms) with microsecond precision since the last call up to 1 minute (wraps if elapsed time exceeds 1 minute!) More...
|
|
std::string | latin1 (int a, int b, int esc= 'x', bool brackets=true) |
| Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern. More...
|
|
std::string | utf8 (int a, int b, int esc= 'x', const char *par="(", bool strict=true) |
| Convert a UCS-4 range [a,b] to a UTF-8 regex pattern. More...
|
|
size_t | utf8 (int c, char *s) |
| Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED. More...
|
|
int | utf8 (const char *s, const char **r=NULL) |
| Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes). More...
|
|
std::wstring | wcs (const char *s, size_t n) |
| Convert UTF-8 string to wide string. More...
|
|
std::wstring | wcs (const std::string &s) |
| Convert UTF-8 string to wide string. More...
|
|
std::string reflex::convert |
( |
const char * |
pattern, |
|
|
const char * |
signature, |
|
|
convert_flag_type |
flags = convert_flag::none , |
|
|
const std::map< std::string, std::string > * |
macros = NULL |
|
) |
| |
Returns the converted regex string given a regex library signature and conversion flags, throws regex_error.
A regex library signature is a string of the form "decls:escapes?+."
.
The optional "decls:"
part specifies which modifiers and other special (?...)
constructs are supported:
- non-capturing group
(?:...)
is supported
- letters and digits specify which modifiers e.g. (?ismx) are supported:
- 'i' specifies that
(?i...)
case-insensitive matching is supported
- 'm' specifies that
(?m...)
multiline mode is supported for the ^ and $ anchors
- 's' specifies that
(?s...)
dotall mode is supported
- 'x' specifies that
(?x...)
freespace mode is supported
- ... any other letter or digit modifier, where digit modifiers support
(?123)
for example
#
specifies that (?#...)
comments are supported
=
specifies that (?=...)
lookahead is supported
<
specifies that `(?'...)` 'name' groups are supported
<
specifies that (?<...)
lookbehind and <name> groups are supported
>
specifies that (?>...)
atomic groups are supported
>
specifies that (?|...)
group resets are supported
>
specifies that (?&...)
subroutines are supported
>
specifies that (?(...)
conditionals are supported
!
specifies that (?!=...)
and (?!<...)
are supported
^
specifies that (?^...)
negative (reflex) patterns are supported
*
specifies that (*VERB)
verbs are supported
The "escapes"
characters specify which standard escapes are supported:
a
for \a
(BEL U+0007)
b
for \b
the \b
word boundary
c
for \cX
control character specified by X
modulo 32
d
for \d
digit [0-9]
ASCII or Unicode digit
e
for \e
ESC U+001B
f
for \f
FF U+000C
j
for \g
group capture e.g. {1}
h
for \h
ASCII blank [ \t]
(SP U+0020 or TAB U+0009)
i
for \i
reflex indent anchor
j
for \j
reflex dedent anchor
j
for \k
reflex undent anchor or group capture e.g. {1}
l
for \l
lower case letter [a-z]
ASCII or Unicode letter
n
for \n
LF U+000A
o
for \o
octal ASCII or Unicode code
p
for \p{C}
Unicode character classes, also implies Unicode ., {X}, , , , , , and UTF-8 patterns
r
for \r
CR U+000D
s
for \s
space (SP, TAB, LF, VT, FF, or CR)
t
for \t
TAB U+0009
u
for \u
ASCII upper case letter [A-Z]
(when not followed by {XXXX}
)
v
for \v
VT U+000B
w
for \w
ASCII word-like character [0-9A-Z_a-z]
x
for \xXX
8-bit character encoding in hexadecimal
y
for \y
word boundary
z
for \z
end of input anchor
- `
for `\
begin of input anchor
'
for \'
end of input anchor
<
for \<
left word boundary
>
for \>
right word boundary
A
for \A
begin of input anchor
B
for \B
non-word boundary
D
for \D
ASCII non-digit [^0-9]
H
for \H
ASCII non-blank [^ \t]
L
for \L
ASCII non-lower case letter [^a-z]
N
for \N
not a newline
P
for \P{C}
Unicode inverse character classes, see 'p'
Q
for \Q...\E
quotations
R
for \R
Unicode line break
S
for \S
ASCII non-space (no SP, TAB, LF, VT, FF, or CR)
U
for \U
ASCII non-upper case letter [^A-Z]
W
for \W
ASCII non-word-like character [^0-9A-Z_a-z]
X
for \X
any Unicode character
Z
for \Z
end of input anchor, before the final line break
0
for \0nnn
8-bit character encoding in octal requires a leading 0
- '1' to '9' for backreferences (not applicable to lexer specifications)
Note that 'p' is a special case to support Unicode-based matchers that natively support UTF8 patterns and Unicode classes
{C}, {C}, , , , , , , , , , and {X}. Basically, 'p' prevents conversion of Unicode patterns to UTF8. This special case does not support {NAME} expansions in bracket lists such as [a-z||{upper}] and {lower}{+}{upper} used in lexer specifications.
The optional "?+"
specify lazy and possessive support:
?
lazy quantifiers for repeats are supported
+
possessive quantifiers for repeats are supported
An optional "."
(dot) specifies that dot matches any character except newline. A dot is implied by the presence of the 's' modifier, and can be omitted in that case.
An optional "["
specifies that bracket list union, intersection, and subtraction are supported, i.e. [–[a-z]].
- Parameters
-
pattern | regex string pattern to convert |
signature | regex library signature |
flags | conversion flags |
macros | {name} macros to expand |