Skip to content

Archive

This section stores old posts from different section in the journal.

Design

Learning Pieces

Improvements

Use a Trie to store word frequency (DONE!)

Concerned files

  1. modules/utils/*

Description

Currently, we have an unordered_map to store word frequency. Investigate if better alternative exists.

Task

Evaluate the current implementation and compare with the two proposed alternatives below.

  1. Use a list
  2. Use a trie

According with benchmark word frequency, the alternative with best memory-consumption / performance is to use a trie.

Create post-filter for segmenter (DONE!)

Concerned files

  1. modules/utils*

Description

The segmenter role is to return a list of words from an input text. Depending on the application, we may need to filter the resulting list. E.g., we may want to ignore words with fewer than 3 characters.

Task

Delegate the filtering for the application, not for the module.