If you're producing audiobooks for Audible, iTunes, or other retailers through ACX, your audio files must meet a specific set of technical requirements before they'll be accepted. These aren't suggestions. Files that don't meet the specs get rejected, and you'll need to re-master and resubmit.
This guide covers every technical specification ACX enforces, what each one means in practice, and why it matters for your audiobook.
The 8 ACX Technical Requirements
ACX runs automated quality checks on every file you submit. Here's what they're measuring.
1. RMS Loudness: -23 to -18 dBFS
RMS (Root Mean Square) measures the average loudness of your entire chapter. ACX requires each file to land between -23 and -18 dBFS. This range ensures listeners don't need to constantly adjust their volume between chapters or between different audiobooks.
Most raw recordings from home studios come in significantly quieter than -23 dBFS. Getting into range requires careful gain adjustment and often light compression to keep the dynamic range consistent without squashing the performance.
2. True Peak: Below -3 dBFS
True peak is different from sample peak. It measures the actual waveform between samples, accounting for the analog signal that digital samples represent. A file can have sample peaks at -3 dBFS but true peaks above that threshold due to inter-sample overs.
ACX sets the ceiling at -3 dBFS to leave headroom for the lossy encoding process. MP3 compression can raise peak levels slightly, so the -3 dBFS ceiling prevents clipping in the final delivered file. A brickwall limiter with true peak detection is the standard tool for ensuring compliance.
3. Noise Floor: Below -60 dBFS
The noise floor is the level of background sound in your recording when nobody is speaking. ACX requires this to stay below -60 dBFS. This catches problems like audible room tone, HVAC noise, computer fans, electrical hum, and other environmental sounds.
Achieving a low noise floor starts with good recording conditions, but post-production tools like noise gates and noise reduction can help bring marginal recordings into spec. The key is reducing noise without introducing artifacts that sound worse than the original problem.
4. Sample Rate: 44,100 Hz
All submitted files must be at a 44.1 kHz sample rate. This is CD-quality audio and the standard for consumer audio distribution. If you recorded at 48 kHz or 96 kHz (common DAW defaults), you'll need to resample before submission.
Resampling should use a high-quality algorithm to avoid aliasing artifacts. Most audio editors handle this well, but it's worth verifying the sample rate of your final output files rather than assuming.
5. Channels: Mono
ACX requires mono audio. Spoken word doesn't benefit from stereo, and mono files are half the size, which matters for streaming delivery. If your recording setup captures stereo (two-channel), you'll need to downmix to a single channel.
When downmixing, check for phase issues. If your stereo channels are out of phase, summing them to mono can cause cancellation and thin-sounding audio. This is rare with single-mic spoken word recordings but worth being aware of.
6. Output Format: MP3, 192 kbps CBR
The final deliverable must be an MP3 file encoded at 192 kbps with a constant bit rate (CBR). Variable bit rate (VBR) MP3s will be rejected even if the average bitrate is 192 kbps.
Use a quality MP3 encoder (LAME is the industry standard) and explicitly set CBR mode. Some encoding tools default to VBR, so verify your settings. The 192 kbps rate provides good quality for spoken word while keeping file sizes manageable for streaming.
7. Head and Tail Silence
Each chapter must start with 0.5 to 1 second of silence and end with 1 to 5 seconds of silence. The opening and closing credits files have slightly different requirements, but the per-chapter range is what most producers need to worry about.
This silence must be actual room tone level, not digital silence (absolute zero). A tiny bit of ambient noise in the silence sections sounds natural. Dead silence followed by speech is jarring and can also trigger the noise floor check at unexpected levels.
8. File Naming and Consistency
While not a DSP measurement, ACX also requires consistent audio quality across all chapters. Large variations in loudness, noise floor, or tone between chapters will trigger a manual review and potential rejection.
Processing all chapters through the same mastering chain ensures consistency. Batch processing is particularly useful here because it applies identical settings to every file.
Quick Reference Table
| Requirement | Specification | |---|---| | RMS Loudness | -23 to -18 dBFS | | True Peak | Below -3 dBFS | | Noise Floor | Below -60 dBFS | | Sample Rate | 44,100 Hz | | Channels | Mono | | Format | MP3, 192 kbps CBR | | Head Silence | 0.5 -- 1 second | | Tail Silence | 1 -- 5 seconds |
Why These Requirements Exist
Every spec serves a practical purpose in the listener's experience. The loudness range ensures consistent volume across titles. The peak ceiling prevents distortion on playback devices. The noise floor keeps silence actually silent. The format requirements optimize for streaming delivery across Audible's apps and devices.
Meeting these specs isn't just about passing an automated check. It's about delivering professional audio that sounds good on earbuds during a commute, on car speakers during a road trip, and on home systems during quiet evenings. The technical requirements encode decades of audio engineering best practices for the spoken word format.
ACX Pass handles all 8 requirements automatically. Upload your chapters and download compliant MP3s—no manual mastering needed.