C++ 正则表达式的符号、用法与应用-MuQYY的博客

正则表达式（Regular Expressions）是一种用于匹配字符串的强大工具。C++ 提供了标头文件，让我们可以方便地在 C++ 中使用正则表达式来进行模式匹配和文本处理。本文将详细介绍正则表达式的常用符号、C++ 中的正则表达式使用方法，以及在实际应用中的示例，包括如何在 C++ 中将 Markdown 转换为 HTML。

1. 正则表达式的常用符号

正则表达式中，符号和元字符定义了匹配的模式。以下是常用符号及其作用：

基本字符类

. ：匹配除换行符以外的任意单个字符。
\d：匹配任意一个数字字符，等同于 [0-9]。
\D：匹配任意一个非数字字符。
\w：匹配字母、数字或下划线，等同于 [A-Za-z0-9_]。
\W：匹配任意一个非字母、数字或下划线的字符。
\s：匹配任意空白字符，包括空格和制表符。

限定符

*：匹配前一个字符 0 次或多次，等同于 {0,}。
+：匹配前一个字符 1 次或多次，等同于 {1,}。
?：匹配前一个字符 0 次或 1 次。
{n}：匹配前一个字符 n 次。
{n,}：匹配前一个字符至少 n 次。
{n,m}：匹配前一个字符 n 到 m 次之间。

边界符

^：匹配字符串的开头位置。
：匹配字符串的结尾位置。
\b：匹配单词边界。
\B：匹配非单词边界。

分组与选择

()：用于分组，将多个字符看作一个整体。
| ：选择符，匹配左边或右边的表达式。

2. 正则表达式的常用符号及 C++ 示例

在正则表达式中，符号和元字符定义了匹配模式。以下是常用符号的作用和对应的 C++ 匹配示例，让大家更直观地理解如何构建 std::regex 模式。

基本字符类

.：匹配除换行符以外的任意单个字符。

std::string text = "a"; std::regex pattern(".");  // 匹配任意字符 if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：任意字符" << std::endl; }

\d：匹配任意一个数字字符，等同于 [0-9]。

std::string text = "5"; std::regex pattern("\\d");  // 匹配单个数字字符 if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：数字" << std::endl; }

\D：匹配任意一个非数字字符。

std::string text = "a"; std::regex pattern("\\D");  // 匹配非数字字符 if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：非数字" << std::endl; }

\w：匹配字母、数字或下划线，等同于 [A-Za-z0-9_]。

std::string text = "hello123"; std::regex pattern("\\w+");  // 匹配字母、数字或下划线 if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：字母、数字或下划线" << std::endl; }

\W：匹配任意一个非字母、数字或下划线的字符。

std::string text = "#"; std::regex pattern("\\W");  // 匹配非字母、数字或下划线 if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：非字母、数字或下划线" << std::endl; }

\s：匹配任意空白字符，包括空格和制表符。

std::string text = " "; std::regex pattern("\\s");  // 匹配空白字符 if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：空白字符" << std::endl; }

限定符

*：匹配前一个字符 0 次或多次，等同于 {0,}。

std::string text = "aaa"; std::regex pattern("a*");  // 匹配0或多个'a' if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：多个 'a'" << std::endl; }

+：匹配前一个字符 1 次或多次，等同于 {1,}。

std::string text = "aaa"; std::regex pattern("a+");  // 匹配一个或多个 'a' if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：一个或多个 'a'" << std::endl; }

?：匹配前一个字符 0 次或 1 次。

std::string text = "a"; std::regex pattern("a?");  // 匹配0或1个 'a' if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：0或1个 'a'" << std::endl; }

{n}：匹配前一个字符 n 次。

std::string text = "aaa"; std::regex pattern("a{3}");  // 匹配正好3个 'a' if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：正好3个 'a'" << std::endl; }

{n,}：匹配前一个字符至少 n 次。

std::string text = "aaa"; std::regex pattern("a{2,}");  // 匹配至少2个 'a' if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：至少2个 'a'" << std::endl; }

{n,m}：匹配前一个字符 n 到 m 次之间。

std::string text = "aa"; std::regex pattern("a{1,3}");  // 匹配1到3个 'a' if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：1到3个 'a'" << std::endl; }

边界符

^：匹配字符串的开头位置。

std::string text = "hello"; std::regex pattern("^hello");  // 匹配以 'hello' 开头的字符串 if (std::regex_match(text, pattern)) {   std::cout << "匹配成功：字符串开头是 'hello'" << std::endl; }

：匹配字符串的结尾位置。

std::string text = "world";
std::regex pattern("world$");  // 匹配以 'world' 结尾的字符串
if (std::regex_match(text, pattern)) {
  std::cout << "匹配成功：字符串结尾是 'world'" << std::endl;
}

\b：匹配单词边界（需要双反斜杠）。

std::string text = " hello ";
std::regex pattern("\\bhello\\b");  // 匹配独立的单词 'hello'
if (std::regex_search(text, pattern)) {
  std::cout << "匹配成功：独立单词 'hello'" << std::endl;
}

\B：匹配非单词边界。

std::string text = "shell";
std::regex pattern("s\\B");  // 匹配 's' 后面紧接着字母的情况
if (std::regex_search(text, pattern)) {
  std::cout << "匹配成功：非单词边界 's'" << std::endl;
}

分组与选择

()：用于分组，将多个字符视为一个整体进行匹配。

std::string text = "hellohello";
std::regex pattern("(hello){2}");  // 匹配连续两次出现 'hello'
if (std::regex_match(text, pattern)) {
  std::cout << "匹配成功：连续两次 'hello'" << std::endl;
}

|：选择符，匹配左边或右边的表达式。

std::string text = "apple";
std::regex pattern("apple|orange");  // 匹配 'apple' 或 'orange'
if (std::regex_match(text, pattern)) {
  std::cout << "匹配成功：'apple' 或 'orange'" << std::endl;
}

3. 正则表达式在 C++ 中的函数用法

在 C++ 中使用正则表达式，需要包含头文件，并使用 std::regex 定义模式。以下是常用的正则表达式函数：

`std::regex_match` - 完全匹配

std::regex_match 用于检查整个字符串是否与正则表达式完全匹配，适合精确匹配的场景。

#include 
#include 

int main() {
    std::string str = "12345";
    std::regex pattern("\\d+");  // 匹配纯数字

    if (std::regex_match(str, pattern)) {
        std::cout << "完全匹配！" << std::endl;
    }
}

`std::regex_search` - 查找部分匹配

std::regex_search 用于查找字符串中是否包含符合模式的部分。适合需要查找子串的场景。

std::string text = "say hello to the world";
std::regex pattern("hello");
std::smatch match;

if (std::regex_search(text, match, pattern)) {
    std::cout << "找到子串 'hello' 的位置: " << match.position() << std::endl;
}

`std::regex_replace` - 替换字符串

std::regex_replace 用于将匹配的部分替换为指定内容。

std::string str = "abc123xyz";
std::regex pattern("\\d+");
std::string result = std::regex_replace(str, pattern, "456");

std::cout << result << std::endl;  // 输出 "abc456xyz"

4. 正则表达式匹配模式串的方法

正则表达式可以轻松匹配指定模式串并获取位置。利用 std::regex_search 和 std::smatch 的 position() 方法，可以获得模式在字符串中的位置。还可以通过循环查找所有出现的匹配位置。

查找所有匹配项的位置

#include 
#include 
#include 

int main() {
    std::string text = "say hello to the world, hello again!";
    std::regex pattern("hello");
    std::smatch match;
    std::string::const_iterator searchStart(text.cbegin());

    while (std::regex_search(searchStart, text.cend(), match, pattern)) {
        int pos = match.position() + std::distance(text.cbegin(), searchStart);
        std::cout << "找到 'hello' 的位置: " << pos << std::endl;
        searchStart = match.suffix().first;
    }

    return 0;
}

初始化
- text 是要查找的字符串。
- pattern 是正则表达式 "hello"。
- match 用于存储每次找到的匹配信息。
- searchStart 设为 text 的起点，表示当前的查找位置。
查找循环
- std::regex_search 从 searchStart 开始，找到 pattern 的第一个匹配项，并存储在 match 中。
- match.position() 返回匹配项在 searchStart 中的相对位置。加上 searchStart 距离字符串开头的偏移量，得到匹配项在整个 text 中的实际位置。
- 打印该位置。
- 将 searchStart 更新为当前匹配项之后的位置，继续查找下一个匹配项，直到找不到匹配项为止。
- match 是一个 std::smatch 对象，用来存储正则表达式的匹配结果，包括匹配到的内容、位置等详细信息。以下是 match 存储的主要内容：
  1. 匹配到的整个子串：match[0] 存储匹配到的整个子串。
  2. 捕获组（子表达式）：如果正则表达式中包含分组 ()，match[1]、match[2] 等会存储每个分组捕获的内容。
  3. 匹配位置：match.position() 返回当前匹配项在查找范围内的起始位置。
  4. 匹配前缀和后缀：
    - match.prefix()：匹配项之前的字符串。
    - match.suffix()：匹配项之后的字符串。
  在示例代码中，match 的 position() 方法被用来确定匹配项在原始字符串中的位置，而 match[0] 包含了匹配到的子串 "hello"。

运行效果

在字符串 "say hello to the world, hello again!" 中，该代码会输出：

找到 'hello' 的位置: 4
找到 'hello' 的位置: 23

每次找到一个 "hello" 后，会从下一个位置继续查找，直到找不到为止。

4. 正则表达式在 Markdown 转 HTML 的应用

正则表达式可用于文本解析和格式转换，比如将 Markdown 转换为 HTML。下面的代码实现了一个简单的 Markdown 转 HTML 解析器，支持标题、列表、强调和超链接的转换。

示例代码

#include 
#include 
#include 

std::regex header_pattern("#+");
std::regex italic_pattern("_(.+?)_");          // 匹配强调
std::regex link_pattern("\\[(.+?)\\]\\((.+?)\\)");  // 匹配超链接

std::string process_inline(std::string s) {
    s = std::regex_replace(s, italic_pattern, "$1");
    s = std::regex_replace(s, link_pattern, "$1");
    return s;
}

int main() {
    std::string line;
    while (getline(std::cin, line)) {
        if (std::regex_match(line, header_pattern)) {
            int level = line.find_first_not_of('#');
            std::string content = line.substr(level);
            std::cout << "" << process_inline(content) << "" << std::endl;
        }
        else if (line == "*") {
            std::cout << "" << std::endl;
            getline(std::cin, line);
            std::cout << "" << process_inline(line) << "" << std::endl;
            std::cout << "" << std::endl;
        }
        else {
            std::cout << "" << process_inline(line) << "" << std::endl;
        }
    }
    return 0;
}

示例输入与输出

输入的 Markdown 内容如下：

# 标题一
这是一个段落，带有 _强调_ 和 [链接](https://example.com)。

* 列表项一
* 列表项二

转换后的 HTML 输出如下：

标题一
这是一个段落，带有 强调 和 链接.

列表项一
列表项二

1
本网站名称：MuQYY
2
本站永久网址：www.muqyy.top
3
本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长微信：bwj-1215 进行删除处理。
4
本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5
本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6
本站资源大多存储在云盘，如发现链接失效，请联系我们我们会在第一时间更新。

THE END

C++

C++ 正则表达式的符号、用法与应用

1. 正则表达式的常用符号

基本字符类

限定符

边界符

分组与选择

2. 正则表达式的常用符号及 C++ 示例

基本字符类

限定符

边界符

分组与选择

3. 正则表达式在 C++ 中的函数用法

std::regex_match - 完全匹配

std::regex_search - 查找部分匹配

std::regex_replace - 替换字符串

4. 正则表达式匹配模式串的方法

查找所有匹配项的位置

运行效果

4. 正则表达式在 Markdown 转 HTML 的应用

示例代码

示例输入与输出

标题一

请登录后发表评论

`std::regex_match` - 完全匹配

`std::regex_search` - 查找部分匹配

`std::regex_replace` - 替换字符串