Posted on 2 mins read

I think regular expressions are essential when it comes to extracting data from a source.

For example if you wanted the names of all the files in a folder without the extensio, using a regular expression could make the process easier and code more concise.

The difficulty is finding the material to practice on even with a cheatsheet for regex.

I found this site called regexone and I felt the tutorial reflected certain real life situations. Not only did it teach me about the symbols it did so by sharing real life examples(where you might use a regular expression). Such as for checking for email validity or telephone number validity. Do check it out!

When crafting a regular expression for use you will definitely need to test it first, so here is an online tool you might want to use: regex101

Brief recap of what I went through

  1. ^(hat) the hat character can be used to exclude or declare the start of a regular expression.
  2. [abc] square brackets mean that it is a character set. In this case only a,b, or c are valid.
  3. [^abc] this inverts the logic in the previous point. Not a, b, nor c. Exclude a, b, and c.
  4. [a-z]|[0-9] a to z or 0-9.
  5. $ end of regular expression
  6. () capture group, can be nested. Jan 1988. (\w+\s(\d+)) gives Jan 1988 1988.
  7. . means any character. \. with backslash means period.
  8. + means ONE or many repetitions. * ZERO or many repetitions.
  9. \s any white space character \t, , r, n.
  10. {n}, {a,b}. n repetitions. a to b repetitions.
  11. \w means [A-Za-z0-9_]
  12. \d means [0-9]
  13. ab?c means b is an optional. abc and ac are valid.
  14. \w+\b is useful for getting entire words. boundary between word and non-word character.
  15. \W+$ would match any non-alphanumeric character up to the end of the line. Eg. &$#*@!..