2025-11-13 19:10:45
Yomitan is a famous dictionary app people use to learn Japanese.
JL is an alternative desktop dictionary app for Windows.
Let's jump right into it.
Yomitan takes around 6 seconds for me to check duplicates in my mining deck.

My mining deck is only 3500 cards, so if you have a larger deck I imagine it's terribly slow.
Secondly, if you take your Yomitan cursor and drag it across some Japanese text it tends to lag.

That's 6 whole seconds to load Yomitan and run a duplication check.
canAddNotes which is deck specific, Yomitan uses noteinfo which is not deck specific.You can only use Yomitan with a browser.
Yomitan relies on local storage in the browser.
This corrupts – often.

When Yomitan corrupts you have to reinstall all of your dictionaries and settings.
It's good it's a browser extension as it can work on any platform, but it also comes with downsides.
Firstly, let's talk about speed.
JL is really, really fast.
It's actually one of the main reasons you should use it.

Look at this gif. Absolutely no lag. Not to mention that it is technically showing more dictionaries and doing longer scan length (32) than my Yomitan does.
Also, duplicate detection is blazingly fast.
Do you see that small X next to the word? That indicates I can add it to Anki.
If it's red, it means it's a duplicate.
Rewatch the gif. Look at how fast that small red X appears again. Seriously.
This takes minimum 6 seconds in Yomitan for me btw.
It's so unbelievably fast I genuinely have no idea how they are doing this.
I was hesitant to believe they're using AnkiConnect at all, maybe they're talking directly to the DB which would explain why Yomitan is so slow?
But no, they're using AnkiConnect just like Yomitan does.
In my newbie stages where I read super slowly and I only have 3.5k Anki cards, I have to look up multiple words a sentence.
With Yomitan taking 10+ seconds every lookup and mine, this slows me down a lot.
Look at this:

Top is my average chars / hour using Yomitan.
Bottom is with JL.
Unironically I have a +700 chars / hour buff by just using another program, simply because it's faster.
I really like this kanji dict:
But in Yomitan it kinda sucks.
It can't be an official Kanji dict, so it has to be a word dict. This is because of the format of Kanji dicts in Yomitan.
But because it's a word dict, it has to compete with all the other Yomitan dicts.
If your friend is inviting you to 飲み会 and you highlight this with Yomitan, you will have to scroll really far to find the Kanji dict for 飲.
To fix this, you have to use profiles and switch profiles per my blog post everytime you want to look up a kanji with this dict.
But in JL, you don't have to do that!

In mining mode, simply highlight the kanji you want to look at and click the Kanji dict and you get to see it instantly.
You can even look up words with jmdict, and then highlight the kanji in that definition to get specific Kanji definitions.
Being able to easily click what dict you want to see is so powerful. I believe this will make it easier for monolingual transitions too.
Have JP -> JP dicts show up first, then just use child windows and switch to JP -> EN if you need to using the tabs.
You can kinda make an overlay in JL.

With this mode, the text appears as if it was overlaid but not. You can then "click through" the text itself and it will click the window behind it, allowing you to progress in the story.
You have to enable this with a hotkey:

This kinda breaks for me on some full screen visual novels.
JL works great with GSM.

Set the addresses to use port 55002 and set it to auto reconnect to WebSocket.
Done 🥳
If you have a JL window over your visual novel, GSM uses OBS Window Capture so it won't show up in your screenshots.
Does this replace GSM overlay?
Nah, not really. Overlay overlays the text perfect on top of the game. It's 100% natural. This is a black box over your visual novel.
You can also use GSM to OCR a game or visual novel, and display it in JL.
In the settings you can enabled highlight longest match.

This just makes it easier to go through the text.

There is a custom search feature, allowing you to easily Google sentences or words.

You can change it from Google to whatever you want.

I have it set to an AI prompt that just breaks down words for me. Sometimes I really struggle with accents / slang that aren't in dictionaries, so this helps a lot.
Sadly right now you can only search a specific word and not a whole sentence.
Names, places, spells, and more are very custom to the media you are consuming.
There's things like VNDB name dictionaries, but it's not perfect.
JL has custom dictionaries.
If the word you want to look up does not have a definition, right click it and add it:


Now everytime you look up that word you'll see your custom definition:

This is super easy to edit later on. For example, I made a mistake here.
It's not the island name, but the name of a girl on the island.
JL stores these custom dicts in plaintext format. No JSON. Just open it and edit it!

You can even make profiles in JL and have custom dicts per profile!
JL has stats.
Not as good as GSM if I may say so ;)

But what's cool is that you can see how many times you have looked up a word.

I wish this showed up in the popup window so I knew if I should mine a word that appears often or not.

You can also control a caret in the JL window and go keyboard only.

If you enable this setting, you can also just click enter on your keyboard to advance in a visual novel.
If you hate seeing the black box in your game, you can change these settings:

Now you'll only see it when you hover over it!
JL is a Windows only program. This is where Yomitan is still great.
You need some sort of text input event, like textractor / Lunahook / GSM OCR.
This is where GSM still works well, it acts as a middleman between getting the text and using dictionary software.
In my opinion JL is perfect for video games / visual novels, but for other things Yomitan still reigns supreme.
Download JL from here:
Extract it and run the .exe everytime you want to start JL.
I pinned it to my toolbar.
Right-click to open the settings menu etc.
When you first start JL it'll ask to download dictionaries.
Say yes, they're pretty good.
You can use Yomitan formatted dictionaries with JL.
I used some from Marv's starter pack:
Right click and click "manage audio sources"
If you are using Local Audio Server for Yomitan, enter this:
http://127.0.0.1:5050/?sources=jpod,jpod_alternate,nhk16,forvo&term={Term}&reading={Reading}
Otherwise it's the same as Yomitan.
Go to the Anki tag and enabled it.

Here's what I've got for Lapis card type.


That's it! Enjoy playing with JL!
2025-11-12 06:38:39
TLDR - It has the raw data
Other stat apps simply collect data such as how many characters you read and when.
They normally collect data like:
This allows them to calculate stats like:
etc
GSM collects the actual sentences you read. You don't tell GSM anything, GSM stores the actual sentences.
Specifically this data is stored:
This allows GSM to calculate all the same stats as before, but we can get some extra data like:
etc
More importantly many statistic apps use an AFK timer to work out when you're AFK.
If you don't read for say 30 seconds, it considers you AFK.
Because they do not have the raw data, they calculate it once and that's it for life.
In GSM because we have the raw data, you can change your opinion about this anytime.
Other stat apps -> One time statistics, usually without the raw data, which cannot be changed and are inflexible
GSM Stats -> Has raw data, allows you to change your opinion whenever you want about your data
2025-11-11 14:46:36
I've been playing with Poe recently:
This is Yomitan for Androids but anywhere on the screen.

It's really easy to install. You just have to:
Actually I have no clue what dictionary is used lol. For sure there is no proper noun / name dictionary though.
I got this to help me understand complicated place names as I live in Japan and Google Maps is hard here if you don't know Kanji
I don't know why they only have a singular dictionary
May as well get it. It's free, it's the only app that does this with such an easy install (no termux etc needed).
if you are a developer please build something that works just as easily but is more like yomitan thanks :)
2025-11-11 07:24:56
I use this Anki Addon to reorder my Japanese cards
Specifically I want to reorder them based on 2 things:
This is my current config file:
{
"normal_prioritization": null,
"normal_search": "deck:例文マイニング",
"priority_cutoff": null,
"priority_limit": null,
"priority_search": [
"deck:例文マイニング added:2",
"deck:例文マイニング occurrences:reflectionblue>5",
],
"priority_search_mode": "sequential",
"reorder_before_sync": true,
"search_fields": {
"expression_field": "Expression",
"expression_reading_field": "ExpressionReading"
},
"shift_existing": true,
"sort_field": "Frequency",
"sort_reverse": false
}
Reflection Blue is a Jiten Yomitan freq dict
To get a freq dict go here:

Download deck.
Yomitan Occurences.

Add this to the user_files folder in the addon (see GitHub readme)
The name reflectionblue in the config comes from the folder name.

Capitalisation doesn't matter here.
The folder should contain the unzipped Yomitan occurrence dictionary.
Inside the folder it should look like:

Since publishing I changed my config, but its pretty much exactly the same.
{
"normal_prioritization": null,
"normal_search": "deck:例文マイニング",
"priority_cutoff": null,
"priority_limit": null,
"priority_search": [
"deck:例文マイニング added:7 occurrences:reflectionblue>30",
"deck:例文マイニング added:2",
"deck:例文マイニング occurrences:reflectionblue>30",
"deck:例文マイニング occurrences:steinsgate>30",
"deck:例文マイニング occurrences:limelight>30"
],
"priority_search_mode": "sequential",
"reorder_before_sync": true,
"search_fields": {
"expression_field": "Expression",
"expression_reading_field": "ExpressionReading"
},
"shift_existing": true,
"sort_field": "Frequency",
"sort_reverse": false
}
If I have seen a word in the last 7 days and it's highly frequent, prioritise that.
Else prioritise words added in the last 2 days.
Else prioritise my visual novels I want to play / am playing.
I do 20 new cards a day currently, but sometimes I mine say 40 cards a day. I try to only mine things I know already.
2025-11-10 20:06:34
It's nearly been 2 years since I started Japanese, and 3 months since my last Japanese update:
Firstly, let's look at Anki!
In my August 2025 update I had 928 mature cards. I was doing 20 new cards a day.
3 months later and I have:

Almost 3000 more mature cards! That's 34 cards matured a day.
This comes down to reading heavily.
Due to reading way more, my retention is a lot better.
Here's my retention almost 1 year ago:

And now...

Generally speaking my Young cards sit around 77% and my mature cards around 93%
You can see my retention is decreasing, likely because I am encountering harder words now and 20 words / day is a bit of an insane pace.

My average difficulty is 39%
Last update in August I speedran N5 and was doing N4:

And since then:

I've completed N4, and am halfway through N3 grammar!
Since August I started using GSM heavily:
I've played 8 total Visual Novels and I am working on my first visual novel of length 1 milly characters.
Here are my stat pages:
I also ended up contributing to GSM heavily, such as the entire stats page.

I also added this goals page to track my daily reading and show me how much I need to read to achieve my arbitrary goals.

My daily routine is this:
Everything after this is optional.
Sometimes I watch anime or YouTube, sometimes I don't do anymore, sometimes I read a visual novel more.
On Sundays I go through all my leech cards in Anki and if I see interesting Kanji I add them using the Kanji dict I talked about:
2025-10-19 10:35:33
Over the course of a weekend in-between job interviews I decided to speed up the loading of statistics in one of my favourite apps, GSM.
GSM is an application designed to make it easy to turn games into flashcards. It records the screen with OBS and uses OCR / Whisper to get text from it. You then hover over a word with a dictionary, click "add to Anki" and GSM sends the full spoken line from the game + a gif of the game to your Anki card.

GSM has a statistics page contributed by me, every time you read something in-game it adds it to a database which I then generate statistics from.




Some stats for ya
These stats take a while to load.
And I added /overview because /stats was too slow!
.exe, that serves Flask entirely locally. These times are absurd for a local app!This blog post talks about how I spent my weekend improving the loading speed of the website by around 200%

The entire database is one very long table called game_lines.
Every single time a game produces a line of text, that is recorded in game_lines with some statistics.
Each line looks like this:
e726c5f5-7d59-11f0-b39e-645d86fdbc49 NEKOPARA vol.3 「もう1回、同じように……」 C:\Users\XXX\AppData\Roaming\Anki2\User 1\collection.media\GSM 2025-08-20 10-35-15.avif C:\Users\XXX\AppData\Roaming\Anki2\User 1\collection.media\NEKOPARAvol.3_2025-08-20-10-35-28-515.opus 1755648553.21247 ebd4b051-27aa-4957-9b50-3495d1586ec1
Or in a more readable version:
🗂 Entry ID: e726c5f5-7d59-11f0-b39e-645d86fdbc49
🕒 Timestamp: 2025-08-20 10:35:15
🔊 Audio: NEKOPARAvol.3_2025-08-20-10-35-28-515.opus
🖼 Screenshot: GSM 2025-08-20 10-35-15.avif
🦜 Game Line: "「もう1回、同じように……」"
📁 File Paths:
C:\XXX\AppData\Roaming\Anki2\User 1\collection.media\GSM 2025-08-20 10-35-15.avif
C:\XXX\AppData\Roaming\Anki2\User 1\collection.media\NEKOPARAvol.3_2025-08-20-10-35-28-515.opus
🧩 Original Game Name: NEKOPARA vol.3
🧠 Translation Source: NULL
🪶 Internal Ref ID: ebd4b051-27aa-4957-9b50-3495d1586ec1
📆 Epoch Timestamp: 1755648553.21247
Then to calculate statistics, we query every gameline.
For me this takes around 10 seconds.
If you play a lot of games it can take around 1 minute....
All the statistics you have seen so far are calculated from this data alone, there's some easy things like:

But in the Japanese learning community there is 1 important bit of data everyone wants.
How many characters do I read per hour on average? What is my reading speed?
This is important because we know how many characters is in a game, if we know our reading speed we can work out how much of a slog it will be.
On a site like Jiten.moe we can insert our reading speed into the settings and see how long it'll take to read something.
At my very nooby reading speed of 2900 characters / hour, it'll take me 550 hours of non-stop reading to play Fate/Stay Night.

Although this is one of the most famous visual novels of all time and has been made into numerous anime, spending 550 hours slogging through it does not seem good.
Knowing my reading speed allows me to pick games / visual novels that I can do in a few weeks rather than a year or more.
Now looking at our data there is no easy way to calculate this, right? Games do not tell you "Oh yeah in this Call of Duty dialogue you read at this pace".
Other similar sites like ExStatic calculate this:

But interestingly they have sorta the same data as us.
But let's say we get 4 game lines come in. Each one is of length 10.
They come in every 15 minutes.
So our average reading speed is 40 characters per hour.
But then the next day, 24 hours later, we read another line of 10 characters.
Now our averaging reading speed is skewed to be much lower because in our code it looks like it took us 24 hours to read 10 characters.
The absence of data is data itself here, but how is everyone in the Japanese learning community handling this?

Everyone sets an AFK Timer.
If you do not get a new line within the timer, it assumes you are AFK and stops counting towards your stats.
This may seem uninteresting now, but this powers many of our design choices later on.
We have a couple of things we can do to speed up the loading of the stats site.
Currently we get all game lines multiple times calculate the stats that way. It's not as clear cut as 1 bar graph == 1 DB call. It's more like one section grabs all game_line and alters it to work for that section.
This makes a lot of sense, but sadly it doesn't work so well.
I've already tried this:
Ignore my bad PR etiquette. We talked more about this in the Discord. I don't want to write conventional commits + nice PRs for a very niche tool 😅
Firstly, it only saves around a second of time. We still have to pull all the game lines no matter what.
Secondly, this makes it much harder and more rigid to calculate statistics. We had one API call, and then we calculated every possible statistic out of that one call and put it into a dictionary.
It's a bit... hardcore...
We basically had one 1200 line function which calculated every stat and then fed it to each statistic.
We could have broken it up, but to save 1 second of time only? For all that work? Surely there's a faster way.
We've already done this as a little hack. We moved many important statistics from the /statistics page to the /overview page.
This improves loading because instead of loading every stat, we now only load important ones.

Obviously a hack... but it worked.... Load speed went from 7 seconds to 4... Still bad... 🤢
Do we really need to calculate stats on the fly?
What if we were to pre-calculate all of our statistics and then present them to the user?
The final option, pre-calculating stats, is what we will be doing.
Every time GSM runs, let's pre-calculate all previous days stats for the user and then calculate just todays.
This will save us a lot of time.
Specifically our algorithm will now look like:
"Why calculate today's stats on the fly at all? Why not turn each game_line into a rolled up stats and add it to today's rollup?"
By Jove, a great question!
When GSM receives a line of text from a game it does a lot of processing to make it appear on screen etc, so why not precalculate stats there and then?
This makes a lot of sense!
BUTTTT.....
The absence of data is data!
Each game line looks exactly like this:
🗂 Entry ID: e726c5f5-7d59-11f0-b39e-645d86fdbc49
🕒 Timestamp: 2025-08-20 10:35:15
🔊 Audio: NEKOPARAvol.3_2025-08-20-10-35-28-515.opus
🖼 Screenshot: GSM 2025-08-20 10-35-15.avif
🦜 Game Line: "「もう1回、同じように……」"
📁 File Paths:
C:\XXX\AppData\Roaming\Anki2\User 1\collection.media\GSM 2025-08-20 10-35-15.avif
C:\XXX\AppData\Roaming\Anki2\User 1\collection.media\NEKOPARAvol.3_2025-08-20-10-35-28-515.opus
🧩 Original Game Name: NEKOPARA vol.3
🧠 Translation Source: NULL
🪶 Internal Ref ID: ebd4b051-27aa-4957-9b50-3495d1586ec1
📆 Epoch Timestamp: 1755648553.21247
In the moment this imaginary rollup function only has this data.
When we calculate stats, we are looking at the past. We can see where the absences are to calculate the AFK time.
But in the moment, we don't know if the next game line will be 120 seconds or more later.
So therefore we cannot roll up today's stats because we cannot tell when a user takes an extended break away from the text or not.
The next big question is "okay, what do we actually calculate?"
There's 2 types of stats:
I made an original list, booted up Claude and asked it to confirm my list and see if it thinks anything else is important.
Together we made this list:
_fields = [
'date', # str — date
'total_lines', # int — total number of lines read
'total_characters', # int — total number of characters read
'total_sessions', # int — number of reading sessions
'unique_games_played', # int — distinct games played
'total_reading_time_seconds', # float — total reading time (seconds)
'total_active_time_seconds', # float — total active reading time (seconds)
'longest_session_seconds', # float — longest session duration
'shortest_session_seconds', # float — shortest session duration
'average_session_seconds', # float — average session duration
'average_reading_speed_chars_per_hour', # float — average reading speed (chars/hour)
'peak_reading_speed_chars_per_hour', # float — fastest reading speed (chars/hour)
'games_completed', # int — number of games completed
'games_started', # int — number of games started
'anki_cards_created', # int — Anki cards generated
'lines_with_screenshots', # int — lines that include screenshots
'lines_with_audio', # int — lines that include audio
'lines_with_translations', # int — lines that include translations
'unique_kanji_seen', # int — unique kanji encountered
'kanji_frequency_data', # str — kanji frequency JSON
'hourly_activity_data', # str — hourly activity (JSON)
'hourly_reading_speed_data', # str — hourly reading speed (JSON)
'game_activity_data', # str — per-game activity (JSON)
'games_played_ids', # str — list of game IDs (JSON)
'max_chars_in_session', # int — most characters read in one session
'max_time_in_session_seconds', # float — longest single session (seconds)
'created_at', # float — record creation timestamp
'updated_at' # float — last update timestamp
]
Then using this list we can calculate stats like:
We don't need to calculate every single thing, just have enough data to calculate it all in the moment.
If we calculate things like total_active_time_seconds / total_sessions the abstraction becomes kinda too much.
Like come on, we don't need a whole database column just to divide two numbers 😂
In GSM you can also see your stats data in a date range:

So we have all these columns, and each row is 1 day of stats. That way we can easily calculate stats for any date range.
And we just need a special case for today to calculate today's stats.
GSM is a Windows executable. Not a fully fledged server.
It could be ran every couple minutes, or ran once every couple months.
We need this code to successfully roll up stats regardless of when it runs, and we need it to be conservative in when it runs.
What we need is some kind of Cron system...
I added a new Database table called cron.
This table just stores information about tasks that GSM wants to run regularly.

We store some simple data:
Then when we start GSM, it:
SELECT * FROM {cls._table} WHERE enabled=1 AND next_run <= ? ORDER BY next_run ASC
Loop through our list and run a basic if statement to see if one of our crons needs to run:
for cron in due_crons:
detail = {
'name': cron.name,
'description': cron.description,
'success': False,
'error': None
}
try:
if cron.name == 'jiten_sync':
from GameSentenceMiner.util.cron.jiten_update import update_all_jiten_games
result = update_all_jiten_games()
# Mark as successfully run
CronTable.just_ran(cron.id)
executed_count += 1
detail['success'] = True
detail['result'] = result
logger.info(f"✅ Successfully executed {cron.name}")
logger.info(f" Updated: {result['updated_games']}/{result['linked_games']} games")
If it needs to run, we import that file (cron files are just python files we import and run. It's really simple)
We then run the command just_ran.
This command:
last_run to current timenext_run based on the schedule type (weekly, monthly etc)if cron.schedule == 'once':
# For one-time jobs, disable after running
cron.enabled = False
cron.next_run = now # Set to now since it won't run again
logger.debug(f"Cron job '{cron.name}' completed (one-time job) and has been disabled")
elif cron.schedule == 'daily':
next_run_dt = now_dt + timedelta(days=1)
cron.next_run = next_run_dt.timestamp()
logger.debug(f"Cron job '{cron.name}' completed, next run scheduled for {next_run_dt}")
elif cron.schedule == 'weekly':
next_run_dt = now_dt + timedelta(weeks=1)
cron.next_run = next_run_dt.timestamp()
logger.debug(f"Cron job '{cron.name}' completed, next run scheduled for {next_run_dt}")
This is just a super simple way to make GSM run tasks on a schedule without running every single time the app starts.
With all of these changes, our API speed is now....
But the webpage itself still loads in 3.5 seconds.
Google Lighthouse rates our website as a 37.
It complains about some simple things like:
So what I did was:
rel=preload for important cssflask-compress dependency to compress the Flask payload, using Brotli. I read this HN comment thread on Brotli vs zstd and I believe Brotli makes the most sense for now./api/stats endpoint returns a massive JSON payload containing all the stats (rolled up and todays) that's parsed by the frontend into pretty charts. Compressing it makes total sense.This led to our lighthouse score becoming 89, with the speed going from 3.5 seconds to 1.4 seconds.


Very speedy!
We successfully doubled the loading speed of the statistics sites, but more importantly here are some key takeaways.