27 measurements. One honest verdict.
Upload a photo and it gets measured — decoder upsampling grids, spectral shape, missing camera-sensor physics, local self-consistency — then answered with one of three verdicts: likely AI-generated, likely real, or an honest inconclusive. Every number behind the answer is shown. No neural networks. Nothing hidden.
Runs the exact pipeline published on GitHub
* dragged images are often resized and re-compressed by the site serving them. For the most reliable verdict, download the original image and upload the file.
How it decides
-
Normalize
Center-crop to 512 px and re-encode to a fixed JPEG substrate, so every image is judged on identical footing. Images smaller than 512 px are rejected, not upscaled — upscaling would destroy the statistics being measured.
-
Measure
27 signal-processing measurements, pure NumPy and SciPy: the upsampling-grid fingerprints diffusion decoders leave behind, spectral shape, camera-noise physics that generators fail to reproduce, and how well neighboring patches agree with each other.
-
Score
A logistic regression turns the measurements into a single score. The model was trained once and frozen — it is never retrained to flatter a benchmark.
-
Verdict
The score lands on a scale built from real photos from several independent sources. Beyond the 95th percentile of real photos: likely AI-generated. Past the 80th: leaning AI-generated. Well inside the envelope: likely real. Anywhere between: inconclusive, said plainly.
The model is deliberately small. Every one of the 27 measurements has a physical meaning you can state in a sentence, so the page can show its reasoning instead of asking to be trusted.
During development, every candidate measurement had to pass a control test: given two collections of only real photos from two different sources, it must not be able to tell them apart. A clue that separates them is reading compression history and website plumbing — the picture frame — rather than anything about AI — the painting.
Eighteen feature families failed that control and were thrown out, several of them with benchmark scores far better than the features I kept. They are published with their rejection reasons in the experiments folder.
That is also why the verdict does not care where your image has been. WhatsApp, screenshots, ten years on a hard drive — the clues it uses live in the pixels, not in the file's history. The longer version, with the art-expert analogy, is in how-it-works.
What the model reads
| decoder grid | 18 | upsampling-grid fingerprints and spectral shape left by diffusion-family decoders |
|---|---|---|
| sensor absence | 2 | camera noise physics that generators fail to reproduce |
| inversion residual | 2 | how the image responds to resampling round-trips |
| self-consistency | 5 | statistical agreement between neighboring patches |
| visible watermark | +1 | known generator stamps (Gemini’s diamond, Kling’s wordmark) matched at their documented corner positions on the full image |
All 27 measurements read the luminance of the image — the grayscale structure where generator artifacts live. Color statistics were implemented and tested too, as four separate feature families; every one of them turned out to fingerprint the source pipeline instead of the generator, and they were rejected.
One check stands apart from the statistics: known visible watermarks. Some consumer apps stamp their images (Gemini’s diamond, Kling’s wordmark) at exact, documented positions. Finding one settles the verdict on its own, because the thresholds are calibrated so that no clean photo in the reference corpora triggered them. Not finding one means nothing: APIs and paid tiers do not stamp, and a mark can simply be cropped away.
Two more channels are reported as evidence panels, shown beside the verdict but never mixed into it: a frequency-domain signature specific to Google’s image generators (Gemini/Imagen family, measured 0.55–0.70 AUC across surfaces), and provenance metadata — C2PA manifests, IPTC AI markers, generator EXIF tags — which is trivial to strip and therefore never trusted as pixel evidence.
Honest numbers
Most detectors advertise 95%+ accuracy, measured on images similar to their training data. This one is measured the hard way on purpose: trained once, then evaluated only on photos and generators from sources it has never seen.
The numbers are lower — and they are the ones you should expect on your own images, not a lab-only best case. Where current methods genuinely cannot tell, the verdict is “inconclusive” rather than a confident guess.
One more number matters for trust: when the verdict says likely AI-generated, fewer than 9% of real photos ever land there by mistake, held across four independent photo collections. The full protocol and per-generator tables are in the accompanying dissertation.
| Generator family | AUC | Status |
|---|---|---|
| 2022-era generators | 0.88 | covered |
| FLUX / Stable Diffusion lineage | 0.66–0.82 | covered |
| Seedream, Kling, Hunyuan | 0.55–0.65 | partial |
| Google family (Gemini, Imagen) | 0.55–0.70 | evidence panel |
| GPT-Image, Midjourney v7, Qwen, GLM | ~0.5 | abstains |
Questions people actually ask
What happens to the images I upload?
Nothing stays on the server. Your image is measured in memory and discarded as soon as the verdict is computed. It is never written to disk, never logged and never used for anything else. The PDF report is built on demand and sent straight back to you. If an image is too private to send anywhere, you can run the whole pipeline locally from the GitHub repo and nothing leaves your machine.
Why does it sometimes say “inconclusive”?
Because sometimes the honest answer is “I can’t tell”, and I would rather say that than guess. Either the image came from one of the newest generators that current methods cannot detect from pixels alone, or it has been re-compressed so many times that the evidence is destroyed. The verdict degrades into “I don’t know” — never into a confident wrong answer.
What is AUC?
Hand the tool one random real photo and one random AI image, and ask which is which. AUC is the probability it ranks the pair correctly: 1.0 means always right, 0.5 means a coin flip. So 0.88 on 2022-era generators means it picks the AI image about 9 times out of 10.
Other tools advertise 95%+. Why are these numbers lower?
Because the measurement is different, not the ambition. Numbers measured on data similar to the training set reward memorizing incidental details — which website the photos came from, how they were compressed. This model is scored only on sources it has never seen, and every feature had to prove it reads the image rather than the image’s history. Independent studies keep finding the same gap: detectors advertising 99% drop to roughly 78% on in-the-wild images. Lower and true beats high and lab-only.
Does it matter where my image has been — WhatsApp, screenshots, editing?
Where it has been: no. The features were selected so that compression history, resizing, and the site an image passed through cannot push the verdict either way. Very heavy processing can erase the evidence itself — in which case you get “inconclusive”, not a wrong answer.
Can it be fooled?
Yes, and the page says so. Determined laundering erases pixel evidence, metadata can always be stripped, and the newest closed-lab generators are largely undetectable from pixels alone — for every tool, not just this one. Treat the output as forensic evidence that supports a judgment, not as a verdict machine. More in the full FAQ.