Archive
This section stores old posts from different section in the journal.
Design
Learning Pieces
Improvements
Use a Trie to store word frequency (DONE!)
Concerned files
- modules/utils/*
Description
Currently, we have an unordered_map to store word frequency. Investigate if
better alternative exists.
Task
Evaluate the current implementation and compare with the two proposed alternatives below.
- Use a list
- Use a trie
According with benchmark word frequency, the alternative with best memory-consumption / performance is to use a trie.
Create post-filter for segmenter (DONE!)
Concerned files
- modules/utils*
Description
The segmenter role is to return a list of words from an input text. Depending on the application, we may need to filter the resulting list. E.g., we may want to ignore words with fewer than 3 characters.
Task
Delegate the filtering for the application, not for the module.