[JavaScript] Detecting Hyphens and Dashes with Regular Expressions Using unicode-regex + Dash_Punctuation
Tadashi Shigeoka · Tue, November 5, 2019
I’ll introduce how to detect delimiter characters like hyphens and dashes with regular expressions using unicode-regex + Dash_Punctuation in JavaScript and Node.js.
Prerequisites: Want to Comprehensively Detect Hyphens and Dashes with Regular Expressions
There are many types of ”―” characters, and to comprehensively detect these input values with regular expressions, I decided to use the npm package unicode-regex.
Examples:
- ― Dash
- − Hyphen Minus
- ‐ Hyphen
Dashes and Hyphens in Google Japanese Input
Even in Google Japanese Input conversion candidates, this many types are displayed on the first page.
unicode-regex + Dash_Punctuation
Sample Code: Regular Expression to Detect Hyphens and Dashes
const unicodeRegex = require('unicode-regex');
const dashRegex = unicodeRegex({
General_Category: ['Dash_Punctuation']
});
console.log(dashRegex.toString());
// [\\u002d\\u058a\\u05be\\u1400\\u1806\\u2010-\\u2015\\u2e17\\u2e1a\\u2e3a-\\u2e3b\\u2e40\\u301c\\u3030\\u30a0\\ufe31-\\ufe32\\ufe58\\ufe63\\uff0d]
Reference: List of Unicode Characters of Category “Dash Punctuation”
That’s all from the Gemba.