Using Regexes for Batch Changes

Using Regexes for Batch Changes

While trudging through a very long, very repetitive project, I found myself wondering if there was a faster way to get it done. We were standardizing on having a trailing slash on all of our internal links to reduce load time/cut out redirects. This meant that I had to look through each HTML file, find any links that didn't end with "/" and add it—did I mention this particular site had tens of thousands of pages?

Regex to the Rescue

I defined the pattern for all of the possible links that could be missing that slash. After all, I had just been doing it by hand, so the patterns became pretty easily identifiable:

  1. With our framework, links could either be inside of href="" or data-content="" attributes.
  2. Change only relative paths (no external links need to be affected).
  3. Ignore relative paths that already have a trailing slash.
  4. Ignore images and PDFs.

The resulting regular expression was:

((data-content|href)\=\"\/\S+([^\.png|^\.jpg|^\.pdf|^\/])\")

Thinking myself particularly clever for now being able to let my editor find the links I needed to change, I still thought I could improve this workflow even more. Wasn't there a way to make the edits all at once?

Yes, there is a way.

I plugged the regular expression into the "Find" tool, with regular expressions turned on. This highlighted any instances of missing trailing slashes, then using a multi-cursor selection I selected all instances (in VS Code, that's ALT+ENTER). Moved all the cursors over one character (the end ") and hit the slash button, saved and called it a day.

The more I've spotted patterns in what I'm editing, the more powerful regexes become for making batch changes. One last example: regex groups.

Group Project

There was a site I worked on the needed translation into 6 languages to serve their global customers. The problem was that we always had mobile issues with text when it came back translated into French. Any time there was a Product name or some phrase that was in English, they put a non-breaking space between the words. That's fine for two words, but the product names we were dealing with could be up to 7 words long, which made the text punch through their containers every time on smaller screens (sometimes even desktop sizes).

Instead of going through and manually deleting the   and inserting a space between every word in the HTML file, I discovered regex groups.

I started out by finding the pattern constraints:

  1. Keep non-breaking spaces between punctuation (French grammar puts a space between colons, semi-colons, quotes, question and exclamation marks).
  2. Find all non-breaking spaces between characters.

Here's the regex:

([a-zA-Z]) ([a-zA-Z])

The parentheses that you see in the regex above are unnecessary if you're just trying to find all the   between characters, but I wanted to replace those   with normal spaces. By "grouping" those characters on both sides of the   with parentheses, I could now enter this into my "Replace" tool:

$1 $2

By using groups, you can now use them almost like variables: $ + number determined by the order of groups, left to right.

Now, in just a couple of keystrokes, the entire file was rid of any unnecessary  . What used to take 10–20 minutes, now took seconds.

Keep exploring the uses of regular expressions for batch changes—this is the easiest way to cut some serious time off of your workflow. Let me know what tricks you're using batch-change–regexes for in the comments below.