Cross-platform C SDK logo

Cross-platform C SDK

Regular expressions

❮ Back
Next ❯
This page has been automatically translated using the Google Translate API services. We are working on improving texts. Thank you for your understanding and patience.

Functions

RegEx*regex_create (...)
voidregex_destroy (...)
bool_tregex_match (...)

Regular expressions define a text pattern that can be used to find or compare strings.


1. Define patterns

We can build a regular expression from a text string, following these simple rules:

  • A string pattern corresponds only to that same string.
  •  
    
    "hello" --> {"hello"}
    
  • A period '.' is equivalent to "any character".
  •  
    
    "h.llo" --> {"hello", "htllo", "hällo", "h5llo", ...}
    
  • A dash 'A-Z' sets a range of characters, using the ASCII/Unicode code from both ends.
  •  
    
    "A-Zello" --> {"Aello", "Bello", "Cello", ..., "Zello"}
    
    'A-Z': (65-90) (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
    '0-9': (48-57) (0123456789)
    'á-ú': (225-250) (áâãäåæçèéêëìíîïðñòóôõö÷øùú)
    
Like String objects, patterns are expressed in UTF-8, therefore the entire Unicode set can be used to create regular expressions.
  • The brackets '[áéíóú]' allow you to switch between several characters.
  •  
    
    "h[áéíóú]llo" --> {"hállo", "héllo", "híllo", "hóllo", "húllo"}
    
  • The asterisk '*' allows the last character to appear zero or more times.
  •  
    
    "he*llo" --> {"hllo", "hello", "heello", "heeello", "heeeello", ...}
    "h.*llo" --> {"hllo", "hello", "hallo", "hillo", "hasello", ...}
    "hA-Z*llo" --> {"hllo", "hAllo", "hABllo", "hVFFRREASllo", "hAQWEDllo", ...}
    "FILE_0-9*.PNG" --> {"FILE_.PNG", "FILE_0.PNG", "FILE_01.PNG", "FILE_456.PNG", ...}
    
  • The parentheses '(he*llo)' allow grouping a regular expression, so that it behaves as a single character.
  •  
    
    "[(hello)(bye)]" --> {"hello", "bye" }
    "[(red)(blue)(1*)]" --> {"red", "blue", "", "1", "11", "111", ... }
    "(hello)*" --> {"", "hello", "hellohello", "hellohellohello", ... }
    "(he*llo)ZZ" --> {"hlloZZ", "helloZZ", "heelloZZ", "heeelloZZ", ... }
    
  • For '.', '-', '[]', '*', '()' to be interpreted as characters, use the backslash '\'.
  •  
    
    "\(he\*\-llo\)" --> {"(he*-llo)"}
    
Remember that for expressions inserted as constants in C code, the backslash character is represented by a double slash "\(he\\*\\-llo\\)".

2. Regular languages ​​and automata

Regular languages ​​are those that are defined recursively using three basic operations on the set of characters (or symbols) available. They can be described using the regular expressions discussed above.

  • Each character 'a' is a regular language 'A'.
  • The union of two regular languages, is a regular language A∪B.
  • The concatenation of two regular languages, is a regular language A·B.
  • The closure of a regular language is a regular language A*. This is where recursion comes in.
In this context the symbols are all Unicode characters. But you can define languages ​​based on other alphabets, including the binary {0, 1}.

To recognize whether or not a string belongs to a certain regular language, it is necessary to build a Finite Automata based on the rules reflected in (Figure 1).

Concatenation, Union and Closure of finite automata.
Figure 1: Construction of finite automata to filter regular expressions.

regex_create ()

Create a regular expression from a pattern.

RegEx*
regex_create(const char_t *pattern);
pattern

Search pattern.

Return

Regular expression (automata).

Remarks

See Define patterns.


regex_destroy ()

Destroy a regular expression.

void
regex_destroy(RegEx **regex);
regex

Regular expresion. Will be set to NULL after destruction.


regex_match ()

Check if a string matches the search pattern.

bool_t
regex_match(const RegEx *regex,
            const char_t *str);
regex

Regular expresion.

str

String to evaluate.

Return

TRUE if the string is accepted by the regular expression.

❮ Back
Next ❯