Audio watermarking and deepfake detection, explained

Forging a convincing voice now takes minutes and very little money. Proving a clip is forged, once it has spread, requires an investigation and still yields only a probability. Generation has become cheap while detection has remained expensive, and provenance, knowing where a piece of audio came from, is a central question of synthetic speech. The two responses are marking audio at creation and detecting it after the fact.

Limits of deepfake detection

One approach is a detector: a classifier trained to spot the artifacts of synthesis, the faint statistical tells a generator leaves behind. Such a classifier performs well on controlled test data and degrades on real-world audio.

A detector learns the fingerprints of the generators it trained on. A new generator, or an old one with different settings, leaves a different fingerprint, and performance drops sharply out of distribution, on precisely the novel fakes that most need to be caught. In this arms race the defender is at a structural disadvantage: every improvement in generation erases the tells the detector relied on, and because generation keeps improving, a detector always works from older fakes than the ones it now faces.^[1]^[2]^[3]

Watermark durability

Watermarking reverses the problem. Instead of searching for fakes among unknown audio, the generator marks its own output as it is created, embedding a signal below the threshold of hearing that a matching detector can later recover.^[6]^[7] This side of the problem is more tractable, because the defender controls the marking. The work is still difficult: the mark has to survive what the world does to audio.

Audio undergoes heavy processing. A clip gets re-encoded, squeezed to a low bitrate, trimmed, pitch-shifted, mixed with traffic noise, and, in the most demanding case, played through a phone speaker and re-recorded by a second phone across the room, the analog hole. A watermark that survives all of these transformations is useful; one that survives only the pristine original is not, because an adversary will never supply the pristine original.^[4]^[5]

Audio without watermarks

Watermarking only marks the output of generators that choose to watermark. A responsible system stamps its audio, while a malicious actor can simply download a model that does not, and no authority can compel its use.

Voluntary watermarking applies to honest users and not to malicious ones, the reverse of what a security control should do. The open-source, offline models most attractive for abuse are also the ones least likely to carry a mark.^[9] Watermarking is therefore effective for accountability among cooperating tools and for honest disclosure, but of little value against a determined adversary who routes around it.

Removal and forgery attacks

Even a sturdy watermark can be attacked directly. Adversaries strip the mark, overwrite it, or forge it, stamping the "authentic human audio" or "made by X" signal onto content that is neither.

Removal and forgery differ in severity. A watermark that can be erased only loses the ability to trace the audio. Forgery is more dangerous, because the mark then lies, presenting fabricated content as authentic; because people trust the mark, a forged one misleads them more than no mark would.^[8]^[10] Cryptography addresses this, which is the idea behind C2PA Content Credentials: a signed record of how a piece of media was made is attached, so provenance can be verified rather than guessed at. But metadata strips from a file trivially. Signed provenance proves origin when present and proves nothing when absent, and an adversary will strip it deliberately.^[11]^[12]

	Watermarking	Detection	Signed provenance (C2PA)
When it acts	At generation	After the fact	At creation, in metadata
Who must cooperate	The generator	Nobody	The whole toolchain
Fails when	The mark is stripped, or was never applied	The generator is newer than the training data	The metadata is removed

Three tactics, three failure modes. None survives an adversary alone, which is why serious deployments stack them.

The cost of false positives

Tuning a detector to catch more fakes causes it to flag more real audio as synthetic. The two error rates trade against each other, and neither can be driven to zero.

A false positive is also damaging. Labeling a genuine interview, confession, or piece of evidence as "fake" does as much harm as missing a fake outright. In a courtroom or a newsroom the wrong call is costly in both directions: a real recording dismissed as a deepfake, or a deepfake admitted as real. For this reason a detector output is treated as a probabilistic signal feeding human judgment, not a verdict, and "the detector said fake" is not, by itself, proof of anything.^[13]^[14]^[15]

Current research directions

No single tactic is sufficient, so the practical approach combines all of them. Marking at the source is more tractable than detecting afterward, and it covers the large and growing share of audio made by responsible tools. Signed provenance is attached where the workflow allows it, so origin can be verified rather than guessed. Detection remains a fallback for unmarked audio, with the understanding that it lags generation and makes both kinds of mistake. Disclosure law adds further pressure: the European Union's AI Act requires marking AI-generated and manipulated media, the same regulatory pressure applied to voice cloning.^[9]^[16]

No flawless method for detecting fake audio exists. The achievable goal is narrower: an ecosystem in which honest audio can demonstrate its authenticity and dishonest audio is at least more expensive to pass off. The remainder of the problem is the consent-and-provenance question at the center of voice cloning.

Common questions

What is an audio watermark?

An inaudible signal embedded into synthetic speech at generation, recoverable later to confirm the audio is machine-made or trace its source. The decisive test of a watermark is the analog hole: whether the mark survives being played through a speaker and re-recorded through a microphone.

Why is detecting deepfake audio so hard?

Because detection is an arms race in which the defender is at a structural disadvantage: a detector learns the fingerprints of known generators, and every advance in generation erases exactly those tells on the novel fakes that most need to be caught. For this reason, marking at the source is more tractable than detecting after the fact.

Does watermarking stop voice cloning misuse?

No. It marks only the output of generators that choose to watermark, the reverse of how a security control should behave: it applies to honest users and not to malicious ones, and the open-source offline models most attractive for abuse are the ones least likely to watermark. It raises the baseline for honest use without stopping a determined adversary.

What are Content Credentials (C2PA)?

A standard for attaching a cryptographically signed record of how a piece of media was created, so provenance can be verified rather than detected. Its main limitation is that the metadata strips from a file trivially, so it proves origin when present and proves nothing when absent, which a determined adversary will ensure.

References

Zhang, Z., Hao, W., Sankoh, A., Lin, W., et al. (2025). I can hear you: Selective robust training for deepfake audio detection. International Conference on Learning Representations (ICLR) 2025.
Farooq, M. A., et al. (2025). Transferable Adversarial Attacks on Audio Deepfake Detection. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025.
Zhang, B., Cui, H., Nguyen, V., & Whitty, M. (2025). Audio Deepfake Detection: What Has Been Achieved and What Lies Ahead. Sensors, 25(7), 1989.
Nadeau, P., & Sharma, G. (2017). An Audio Watermark Designed for Efficient and Robust Analog Playback. IEEE Transactions on Information Forensics and Security.
Cvejic, N. (2004). Algorithms for audio watermarking and steganography. Oulu University Press.
San Roman, R., Fernandez, P., Elsahar, H., et al. (2024). Proactive Detection of Voice Cloning with Localized Watermarking (AudioSeal). International Conference on Machine Learning (ICML) 2024.
Liu, Y., et al. (2023). WavMark: Watermarking for audio generation. arXiv preprint arXiv:2308.12770.
Jiang, Y., et al. (2024). AudioMarkBench: Benchmarking Robustness of Audio Watermarking. arXiv preprint arXiv:2406.06979.
Hacker, P., et al. (2025). Adoption of Watermarking for Generative AI Systems in Practice and Implications Under the New EU AI Act. arXiv preprint arXiv:2503.18156.
Wang, Z., et al. (2025). Yours or Mine? Overwriting Attacks against Neural Audio Watermarks. arXiv preprint arXiv:2509.05835.
C2PA (2024). C2PA and Content Credentials Explainer. Coalition for Content Provenance and Authenticity.
Farid, H. (2026). Verifying Provenance of Digital Media: Why the C2PA Specifications Are Necessary but Insufficient. arXiv preprint arXiv:2604.24890.
Chintha, A., et al. (2020). Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE Journal of Selected Topics in Signal Processing.
Warren, K., et al. (2024). 'Better Be Computer or I'm Dumb': A Large-Scale Evaluation of Humans as Audio Deepfake Detectors. Proceedings of the ACM on Human-Computer Interaction.
Delfino, R. (2025). Deepfakes on Trial 2.0: A Revised Proposal for a New Rule of Evidence. Federal Rules of Evidence Advisory Committee.
European Commission (2024). Code of Practice on Transparency of AI-Generated Content. European Commission, Digital Strategy.