Thoughts on the beta of iOS 27
Loading reactions…
Loading reactions…
Synfony doesn’t seem to have any crash words as far as I know, but I could be wrong. I’ve been testing it since September. It’s not a bad synthesizer though, but it could really benefit from a bit of shaping when it comes to speech rate. I have tested the slowest speed, and it kind of reminds me of when you set the vowel factor in SoftVoice to the maximum. Eloquence doesn’t do that, and you can actually make it go a lot slower. It’s too bad that we don’t have the multilingual architectures to implement into something like TG speech box. The thing does still sound pretty foreign to me. It’s not a bad synthesizer, but I wish there could be more multilingual shaping that could bring it straightforward.
You seem pretty eloquent yourself when it comes to this stuff. I wonder what’s next? Anything big?
Loading reactions…
Not that I know of, I wish. Something else I’ve definitely noticed is the pitch floor. Now, when you set the thing to a monotone with inflection 0, it’s right at the bottom end of the intonation floor. that’s what it’s like with Synfony, so it’s the same idea like Keynote gold/BestSpeech. If I’m not wrong, I believe eloquence was right in the middle.
Loading reactions…
See you seem really smart about this stuff, so I wonder what it would be like if you built a speech synthesizer? I mean hell, then you could put it on the iPhone maybe. Man, it’s so cool learning all this fun stuff. A singing speech synthesizer would also be eloquently awesome to have in 2026. I guess that's what sunos for, but it's no vocalwriter.
Loading reactions…
Absolutely, and thank you. I believe formant synthesizers are the way to go. I wish I had the knowledge to tune them though, but I definitely know how American English should sound. I just don’t like those concatenated ones. And if I had to work with tuning the language rules, you bet I’m definitely gonna be using Eloquence as a reference for American English. Either that, or. DECTalk 4.0. I don’t really like the weird accent that version 4.3 has.
Loading reactions…
Diego is a smart cookie.
Loading reactions…
Loading reactions…
Yes, he is, a very smart Cookie. I like that guy. I wonder if he knows anything about the Apple eloquence elf github thing?
Loading reactions…
Loading reactions…
I wonder if Diego wants an account on Dane's TeamTalk server, it's called Stonercloud! I go up there.
Loading reactions…
Loading reactions…
Loading reactions…
Man, dude, your voice reminds me of john dowling‘s voice as well. Man, I need to tell him about this thread. Oh my God, I have to tell him about this thread. This is so awesome! him and you were both know about this is Sal
Loading reactions…
Loading reactions…
Loading reactions…
I didn’t mean that to offend you by the way, Diego, I just mean that because it’s awesome. You’re awesome bro, it’s so cool how you know all that cool stuff about this beautiful synthesizer. That’s cool to know that you’ve been at this for so long, you’ve probably experienced a lot in the Blind community then as well. i’m really glad to have met you man, I really hope one of these days we never disconnect, like I hope you use this app a lot, or that we come across each other again.
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
https://www.synfonicaspeech.com/
Loading reactions…
Loading reactions…
Here's a thing I'll have to throw into pi using a deepseek key or even chatGPT maybe can make something I can test. But the new ios27 siri reseached susans thing and? Here! Firstly, woot to new siri I recommend it. You are an expert audio DSP engineer and systems architect specializing in speech synthesis. We are building an open-source, next-generation formant speech synthesizer. Our goal is to create a highly efficient, deterministic, and expressive Text-to-Speech (TTS) engine. It should rival the legendary speed and tiny footprint of classic systems like ETI-Eloquence, but utilize modern differentiable DSP techniques to achieve natural, multilingual speech. ### Core Philosophy 1. **Efficiency First:** The DSP backend must be dependency-free, lightweight, and capable of running on embedded devices or in a browser via WebAssembly (WASM). 2. **Predictability:** Unlike neural end-to-end models, this system must be deterministic. The same input parameters must always yield the exact same acoustic output. 3. **Expressive Control:** The architecture must allow explicit, real-time manipulation of pitch, formants, and speaking rate without degrading audio quality. ### Tech Stack * **Backend (DSP):** Rust or C++ (Your choice based on best modern practices for audio). * **Frontend (Text/Rules):** Python or TypeScript (for easy rule definition and parsing). * **Build Target:** Native binaries + WebAssembly (WASM) for AudioWorklet integration. ### Architecture Requirements 1. **The Glottal Source:** Implement a differentiable LF (Liljencrants-Fant) glottal pulse model. It must support real-time manipulation of $F_0$ (fundamental frequency) and breathiness (aspiration noise). 2. **The Vocal Tract:** Implement a cascade/parallel filter structure using State Variable Filters (SVF) for stability during rapid parameter modulation. We need at least 5 formants ($F_1$ through $F_5$) with controllable center frequencies, bandwidths, and amplitudes. 3. **The Parameter Generator:** Design a lightweight, rule-based interpolation engine that takes phonetic input (e.g., IPA or custom phoneme codes) and outputs smooth, 100Hz parameter frames for the DSP. 4. **Coarticulation:** Implement a basic transition matrix that smooths formant frequencies between adjacent phonemes. ### Your First Task Initialize the repository structure. Write a minimal, single-file prototype in your chosen language (Rust or C++) that generates a 1-second sustained vowel (e.g., /a/ as in "father") using a basic pulse train and 3 resonant filters. Output the result as a raw 16-bit PCM headerless file or a simple WAV. Explain your architectural choices for the filter design and how we will scale this into a full multilingual TTS engine.
Loading reactions…
right, eloq didn't have one to base off of did they? So why should we? I mean, we do have AI that could just build one, eh? Hmm. This, I don't know.
Loading reactions…
That's the thing, I don't think I wanna be a tgspeechbox. I'd wanna build something that's efficient all around, though speechplayer is efficient. But it's very metal. As in, very robotic.
Loading reactions…
I mean, eloquence is metal too, but man they really made eloquent sound human. It doesn’t sound like something from the 90s, it keeps up with the times.
Loading reactions…
Loading reactions…
Loading reactions…
Man, my AirPods really like to cut in and out, which is damning.
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
Loading reactions…
😒😒 Thoght that nothing interested me or surprised me from Apple. Sorry to sound morbid.
Loading reactions…
I think not only were they focusing on this, but I think it’s mostly stability fixes as well.
Loading reactions…
Loading reactions…