27 measurements. One honest verdict.

Upload a photo and it gets measured — decoder upsampling grids, spectral shape, missing camera-sensor physics, local self-consistency — then answered with one of three verdicts: likely AI-generated, likely real, or an honest inconclusive. Every number behind the answer is shown. No neural networks. Nothing hidden.

How it decides

Runs the exact pipeline published on GitHub

analysis · 27 features + watermark check ready

Drop an image, or browse JPEG · PNG · WebP — short side ≥ 512 px
drag one straight from another website, too*

* dragged images are often resized and re-compressed by the site serving them. For the most reliable verdict, download the original image and upload the file.

demo — real reports from real AI images:

27 + 1hand-crafted measurements, plus a watermark check

18feature families rejected by controls

29modern generators evaluated

<9%real photos ever misflagged as AI

How it decides

Normalize

Center-crop to 512 px and re-encode to a fixed JPEG substrate, so every image is judged on identical footing. Images smaller than 512 px are rejected, not upscaled — upscaling would destroy the statistics being measured.
Measure

27 signal-processing measurements, pure NumPy and SciPy: the upsampling-grid fingerprints diffusion decoders leave behind, spectral shape, camera-noise physics that generators fail to reproduce, and how well neighboring patches agree with each other.
Score

A logistic regression turns the measurements into a single score. The model was trained once and frozen — it is never retrained to flatter a benchmark.
Verdict

The score lands on a scale built from real photos from several independent sources. Beyond the 95th percentile of real photos: likely AI-generated. Past the 80th: leaning AI-generated. Well inside the envelope: likely real. Anywhere between: inconclusive, said plainly.

The model is deliberately small. Every one of the 27 measurements has a physical meaning you can state in a sentence, so the page can show its reasoning instead of asking to be trusted.

During development, every candidate measurement had to pass a control test: given two collections of only real photos from two different sources, it must not be able to tell them apart. A clue that separates them is reading compression history and website plumbing — the picture frame — rather than anything about AI — the painting.

Eighteen feature families failed that control and were thrown out, several of them with benchmark scores far better than the features I kept. They are published with their rejection reasons in the experiments folder.

That is also why the verdict does not care where your image has been. WhatsApp, screenshots, ten years on a hard drive — the clues it uses live in the pixels, not in the file's history. The longer version, with the art-expert analogy, is in how-it-works.

What the model reads

decoder grid	18	upsampling-grid fingerprints and spectral shape left by diffusion-family decoders
sensor absence	2	camera noise physics that generators fail to reproduce
inversion residual	2	how the image responds to resampling round-trips
self-consistency	5	statistical agreement between neighboring patches
visible watermark	+1	known generator stamps (Gemini’s diamond, Kling’s wordmark) matched at their documented corner positions on the full image

All 27 measurements read the luminance of the image — the grayscale structure where generator artifacts live. Color statistics were implemented and tested too, as four separate feature families; every one of them turned out to fingerprint the source pipeline instead of the generator, and they were rejected.

One check stands apart from the statistics: known visible watermarks. Some consumer apps stamp their images (Gemini’s diamond, Kling’s wordmark) at exact, documented positions. Finding one settles the verdict on its own, because the thresholds are calibrated so that no clean photo in the reference corpora triggered them. Not finding one means nothing: APIs and paid tiers do not stamp, and a mark can simply be cropped away.

Two more channels are reported as evidence panels, shown beside the verdict but never mixed into it: a frequency-domain signature specific to Google’s image generators (Gemini/Imagen family, measured 0.55–0.70 AUC across surfaces), and provenance metadata — C2PA manifests, IPTC AI markers, generator EXIF tags — which is trivial to strip and therefore never trusted as pixel evidence.

Honest numbers

Most detectors advertise 95%+ accuracy, measured on images similar to their training data. This one is measured the hard way on purpose: trained once, then evaluated only on photos and generators from sources it has never seen.

The numbers are lower — and they are the ones you should expect on your own images, not a lab-only best case. Where current methods genuinely cannot tell, the verdict is “inconclusive” rather than a confident guess.

One more number matters for trust: when the verdict says likely AI-generated, fewer than 9% of real photos ever land there by mistake, held across four independent photo collections. The full protocol and per-generator tables are in the accompanying dissertation.

AUC on held-out corpora the model never trained on. 0.5 is a coin flip; 1.0 is perfect.
Generator family	AUC	Status
2022-era generators	0.88	covered
FLUX / Stable Diffusion lineage	0.66–0.82	covered
Seedream, Kling, Hunyuan	0.55–0.65	partial
Google family (Gemini, Imagen)	0.55–0.70	evidence panel
GPT-Image, Midjourney v7, Qwen, GLM	~0.5	abstains

Questions people actually ask

What happens to the images I upload?

Nothing stays on the server. Your image is measured in memory and discarded as soon as the verdict is computed. It is never written to disk, never logged and never used for anything else. The PDF report is built on demand and sent straight back to you. If an image is too private to send anywhere, you can run the whole pipeline locally from the GitHub repo and nothing leaves your machine.

Why does it sometimes say “inconclusive”?

Because sometimes the honest answer is “I can’t tell”, and I would rather say that than guess. Either the image came from one of the newest generators that current methods cannot detect from pixels alone, or it has been re-compressed so many times that the evidence is destroyed. The verdict degrades into “I don’t know” — never into a confident wrong answer.

What is AUC?

Hand the tool one random real photo and one random AI image, and ask which is which. AUC is the probability it ranks the pair correctly: 1.0 means always right, 0.5 means a coin flip. So 0.88 on 2022-era generators means it picks the AI image about 9 times out of 10.

Other tools advertise 95%+. Why are these numbers lower?

Because the measurement is different, not the ambition. Numbers measured on data similar to the training set reward memorizing incidental details — which website the photos came from, how they were compressed. This model is scored only on sources it has never seen, and every feature had to prove it reads the image rather than the image’s history. Independent studies keep finding the same gap: detectors advertising 99% drop to roughly 78% on in-the-wild images. Lower and true beats high and lab-only.

Does it matter where my image has been — WhatsApp, screenshots, editing?

Where it has been: no. The features were selected so that compression history, resizing, and the site an image passed through cannot push the verdict either way. Very heavy processing can erase the evidence itself — in which case you get “inconclusive”, not a wrong answer.

Can it be fooled?

Yes, and the page says so. Determined laundering erases pixel evidence, metadata can always be stripped, and the newest closed-lab generators are largely undetectable from pixels alone — for every tool, not just this one. Treat the output as forensic evidence that supports a judgment, not as a verdict machine. More in the full FAQ.

27 measurements. One honest verdict.

How it decides

Normalize

Measure

Score

Verdict

What the model reads

Honest numbers

Questions people actually ask