Or, How to turn any text into audio files
I am fond of reading light novels. So much so, that I once wrote a script that pulls chapters from any story I’m currently inhaling from RoyalRoad, which is a website for sharing the same, so I can read when I’m on the go.
I have a couple of variations of said script (in typescript and go); however, both essentially do the same thing: Get the story from the website into a local txt file.
I was okay with this setup for a while, until I realized I couldn’t actually “read” them while driving (or any activity where I was otherwise busy).
I’ve tried several options like using the built-in tts (MacOS’s say command, or Android’s accessibility) to websites that proffer to convert text to speech but I was always let down either by the robotic read-back, the pricing, the word-count, or some other limitations that I couldn’t work around reliably.
And yet, I may have found a solution. Recently, I came across something called kokoro-tts. I finagled with the source for a bit but the easiest way to get it on your system (if have uv installed) is:
uv tool install kokoro-tts
It also requires a couple of helper files that you could download here:
And here’s how you turn an input.txt text file into an output.wav audio file:
kokoro-tts input.txt output.wav \
--model /path/to/kokoro-v1.0.onnx \
--voices /path/to/voices-v1.0.bin
I made the setup more ergonomic for use:
- Move both files into
~/.local/share/kokoro-tts/folder (so it aligns with my.dotfilesrepository structure) - Add a shell function in your shell config (my
.aliases) to link to said files like so:
# Usage: tts input.txt --> output.wav
tts() {
kokoro-tts $1 "$(basename "$1" .txt).wav" \
--model ~/.local/share/kokoro-tts/kokoro-v1.0.onnx \
--voices ~/.local/share/kokoro-tts/voices-v1.0.bin \
--voice af_bella
}
And that’s it! To test, try this:
echo -e "Then he spoke. “I agree to go on an adventure on Renavor.”
A strange sensation flowed through him in the same moment, the golden light expanding around their arms before wisps of light floated out and away. Veron held on as searing runes burned into his forearm. He grit his teeth and breathed out. The smell of iron was in the air.
“The pact, is sealed. Rise now, Bestowed, and forge your path, to the end of the world,” Envolon said, his tone carrying weight beyond what he had displayed so far. He stood straight and let go of Veron’s hand. He raised his four arms and started moving his fingers into vaguely geometrical shapes, deep blue light appearing as a strange energy thrummed to life between his hands.
The end of the world? I hope that’s just a figure of speech.
“Any last advice?” Veron asked.
“Be wary of monsters, be wary of those not of your kind, and be wary of the night, when the moon is brightest.”
“That’s all very vague, Envolon.”
“I suppose it is.”" > test-file.txt && tts test-file.txt
It should output a test-file.wav in the same directory.
Here’s the sample with some different voices:
| af_bella: | |
| af_heart: | |
| af_nicole: | |
| am_michael: | |
| bf_emma: | |
| bf_isabella: | |
| bm_george: | |
| bm_lewis: |
So… audio-books?
Yes yes, I’m getting there. Remember how I mentioned I built scrapers to pull the story that I’m reading?
Ah, I see you already connected the dots!
Regardless, here’s a snapshot of how I turn a story from RoyalRoad into an audiobook I can play whilst driving:
go run scrape.go https://www.royalroad.com/fiction/17731/i-never-wanted-you-dead/chapter/209061/i-never-wanted-you-dead && tts story.txt
This story is an absolute treat with
af_nicole:70,af_bella:30voice.
Yes, that’s all it takes (after you’ve got the rest of the plumbing down, of course).
Besides, you don’t need my scraper to get the text down. There are websites like webpage-content-extractor (just as an example) to turn a website’s content into text files, and then feed them to the tts.
And there you have it - your own audiobooks (with a lovely voice to boot!). Have fun!