Learn how to create and use Hunspell-compatible dictionaries in Vale.
Hunspell is a spell-checking engine known for its flexibility and support
for complex morphological rules. It powers spell-checking in popular
applications like LibreOffice, Mozilla Firefox, and Google Chrome.
Vale uses Hunspell-compatible dictionaries to power its own spell-checking features. This guide will discuss the basics of creating and using these
dictionaries.
You can find more thorough documentation at the official repository.
There’s also a well-documented Python port of the library called spylls.
Vale doesn’t use Hunspell directly and doesn’t require it to be installed on
your system.
Instead, Vale uses a pure-Go package to parse Hunspell-compatible dictionaries
and check the spelling of words. This package supports a (growing) subset of Hunspell’s features.
A Hunspell-compatible dictionary consists of two files:
Affix (.aff) file: This file defines the morphological rules, including prefixes, suffixes, and other language-specific grammar rules that govern how words are formed.
Dictionary (.dic) file: This file contains the list of root words and their associated affix codes to specify valid transformations.
You can name these files whatever you like, so long as the .aff and .dic files are named consistently – for example, en_US.aff and en_US.dic.
Here’s a minimal example of a dictionary:
plaintext
Copy
1software/M
“1” is the number of words in the dictionary and software/M is the root word “software” with the affix code M. This means that we accept the word
“software” and the variations derived from the affix code M.
Our affix file would look like this:
plaintext
Copy
SET UTF-8SFX M Y 1SFX M 0 's .
SFX M Y 1: This line defines a suffix rule (SFX) for the affix code M. The Y indicates that the rule is cross-productible and the 1 indicates that there is one rule.
SFX M 0 's .: This line defines the rule itself. It says that if a word has
the affix code M, we can add 's to the end of the word. The 0 indicates
that no part of the base word is removed when applying this suffix. The . indicates that there are no conditions for applying this rule.
The end result is that the dictionary will accept both “software” and
“software’s”. Other variations like “softwares” or “softwaring” will be
rejected.