Perchance tools journal
2024-06-09
- Write documentation explaining the process for creating a perchance code from a thesaurus.
2024-06-07
- Setup typecheck as part of the default tox tasks.
- Add tests if appropriated (
digital-notesandperchance-tools) - Review the code and refactor it if appropriated.
2024-06-05
- Setup and push a version of
digital-notes
2024-06-04
- Add cache capabilities in llm-assistant.
- It uses the integrated langchain.community.SQLiteCache
- Add new parameters in the configuration file: use_cache, cache_file
- Create test for the custom method of llm-assistant api.
- There is a separated cache library: cache3
- I might use this for
word-def
- I might use this for
2024-05-30
Simplify the WordDict data structure.
- The
markdown-to-ymlcreates a list of dicts:
This is not necessary. Instead, we can simply have a dictionary of dictionaries.
root = [
{
"Personnages":
{
"Nouns":
{
"words": []
},
"Adjectives":
{
...
}
}
},
{
"Places":
{
...
}
}
]
2024-05-05
Setup a cache mechanism to save the responses of gpt4. I should actually implement that in the llm assistant.
I could not continue because I don't have the full list of words in this computer and I can't access giove.
2024-04-22
Simplified the yaml format. The category names are keys on the underlying dictionary.
2024-04-17
Created the perchance-tools repository. It currently has one working command
only: markdown-to-yml.
The correct-words command is partially working. I already have the code that
traverses the yml file and generates a prompt from it. The next step is to
connect with llm-assistant and generate a yml file with the corrections
applied.
The next step is code the translate-attributes.
2024-04-10
Decided to first create a tool in which I can create custom prompts to interact with OpenAI. The idea is to use this tool to make the corrections in the list of words.
2024-03-29
Error during creation of all.yml
I noticed that the all.yml file stopped in chronology. That alson explains the unexpected low number of lines (~4K). I need to regenerate everything. That is a good time to stop and think in a better approach for the text correction.
I corrected the creation of all.yml file.
Corrections made by ChatGPT
I used the original prompt of word-guru. This prompt was designed to correct complete
texts and not isolated words. It does not work very well because when providing a
word that is rarely used although it is written correctly, the LLM tends to give a
similar word that it is more popular.
Examples:
- donzelle -> demoiselle
- jouvenceau -> jeunesse
- petiot -> petit
- se manier -> se manifester
- souquenille -> sacoin
It would be better to pass a context. For example, saying that the given list of words belong to a category (vĂȘtements, adjectives pour dĂ©crire la jeunesse...).
There is also a second advantage of doing that: The current prompt sends a lot of context for just a single word. We may get better results and reduce OpenAI consumption by passing the complete list.
2024-03-23
Pipeline
- Digitalize thesaurus from a specialized book.
- Format the digitization using markdown headers.
- Convert the markdown to yml.
- Translate yml keys to english. Correct eventual typos of words (in the target language)
- Convert to perchance format (also sort alphabetically).