Blog

Image and Asset Validation for Google Shopping Feeds

Pre-publish image validation for Google Shopping: dimensions, overlay detection, CDN stability, lifestyle vs gallery split, and recovery from image disapprovals.

Maya SinghMaya Singhon February 28, 2026· Updated May 9, 2026
Image and Asset Validation for Google Shopping Feeds

Image errors are the most expensive class of Google Shopping disapproval, not because they’re hard to fix, but because they’re the slowest to recover. Once Google has cached a broken image URL, the SKU stays disapproved through the next full image re-crawl. That can be 24 hours, sometimes longer for low-volume catalogs. Catching image errors before publish is worth more than catching almost any other category.

Why image validation is a separate problem

Most feed validation runs at the data layer, does this field exist, is it the right type, does it match the spec. Image validation requires actually fetching the URL, decoding the file, and inspecting the pixels. That makes it expensive, which is why most feed managers don’t do it by default. The result: image errors are the most common surprise at publish time.

A pre-publish image gate isolates the failure mode. Better to spend 5 minutes verifying 100 image URLs than to spend the next 24 hours waiting for re-crawl after the wave hits Merchant Centre.

The Google image spec in plain English

Google’s image requirements cover technical constraints (dimensions, file size, format) and policy constraints (content, overlays, framing). Both matter; the policy ones disapprove faster.

Technical

  • Non-apparel: minimum 100x100px, recommended 800x800px+
  • Apparel: minimum 250x250px, recommended 1200x1500px (portrait)
  • Maximum 64 megapixels, maximum 16MB file size
  • Supported formats: JPEG, PNG, non-animated GIF, BMP, TIFF, WebP
  • URL must resolve to a real image (not a placeholder, not a 404)

Policy

  • No overlays of any kind on the primary image_link: no text, logos, watermarks, badges, or borders
  • Primary image must show the actual product; placeholder images, illustrations, and stock photos are disapproved
  • Adult/restricted category imagery has additional rules per region
  • Single product per primary image (no multipack collages in image_link; use additional_image_link for those)

Pre-publish image gate (the checklist)

The minimum set of checks to run before every feed push:

1. URL resolves and image decodes

A surprising number of “image_link_broken” disapprovals come from URLs that 200 but return HTML (a 200-status “soft 404” page) or return a corrupted file. Fetch the URL, check the content-type is image/*, decode the file, confirm dimensions are non-zero.

2. Dimensions meet category minimum

Apparel and non-apparel have different floors. Most teams set a single high floor (800x800) and don’t worry about the spec minimums, easier than tracking per-category rules.

3. No text overlay

Run OCR (Tesseract is fine for this; cloud OCR APIs are faster for high volume) and flag any image that returns recognisable text. False positives are common, a logo embossed on a product is text but isn’t an overlay, so this becomes a human-review queue rather than an auto-reject. The auto-reject path catches obvious cases (promotional badges, “20% off” text); the manual queue catches edge cases.

4. Background is neutral (for non-apparel primary)

Hard to automate fully. A reasonable proxy: sample the corner pixels of the image and confirm they’re within a near-white range. Lifestyle photos fail this check, which is correct, they shouldn’t be the primary image_link.

5. URL is stable

Check the image URL against your last successful crawl. If the path changed (different hash, different bucket, different query string), flag it. URL changes are the single most common cause of “random” image disapproval waves after a deploy.

You get up to 10 additional images. Lifestyle photos, detail shots, packaging, scale references all belong here. They don’t disapprove the SKU if missing, but they materially improve CTR, Google’s product detail panel uses them.

Image URL architecture that survives at scale

The teams that don’t fight image disapprovals all do the same things:

Pin image URLs to a stable bucket. Don’t include build hashes, version numbers, or deploy IDs in the path. The image URL for a product should be the same today as it was last year unless the product image actually changed.

Separate marketing images from feed images. Maintain two fields in your CMS, marketing_image_url (with badges, overlays, lifestyle context for site merchandising) and feed_image_url (clean product on neutral background for Merchant Centre). The feed pipeline only ever pulls the clean version.

Don’t sign URLs with short TTLs. Google’s image cache uses the URL as the key. If you sign your image URLs with a 1-hour TTL, the cached image becomes a 401/403 the moment it expires.

Use a CDN with edge caching but pin the canonical origin. Cloudflare, Fastly, BunnyCDN all work. What matters is that the origin path is stable and the CDN doesn’t rewrite URLs.

Recovery from an image disapproval wave

When you wake up to a sea of image_link_broken or invalid_image_overlay disapprovals, the recovery shape is different from data-error recovery.

Step 1: don’t push the corrected feed yet. First, regenerate the image URLs at the CDN and warm the cache by hitting each new URL once. This pre-populates Google’s image fetcher when it does come around.

Step 2: push the corrected feed. Include only the affected SKUs as a slice, full feed pushes for image fixes are no faster and cost you visibility into which SKUs recovered first.

Step 3: monitor via Diagnostics, not the UI. The Diagnostics CSV updates every 1-4 hours. The UI’s per-SKU status can lag much longer.

Step 4: budget 24-72 hours. Image re-crawl is slower than data re-crawl. Don’t keep pushing the same fix every 2 hours, it doesn’t help and burns your fetch quota.

What an image audit catches that the data audit misses

A typical feed audit (the kind built into most feed managers) checks the URL is non-empty and matches a regex. That’s not enough. The image-level audit needs to:

  • Decode the file and confirm it’s actually an image
  • Compare current dimensions vs your minimum threshold
  • Detect overlays via OCR
  • Compare URL stability across publishes
  • Track per-SKU image change frequency (a SKU whose image URL changes weekly is almost always a CDN configuration problem, not a product problem)

AI Shopping Feeds runs all five checks pre-publish for catalogs of any size, with an OCR overlay detector that’s tuned to ignore on-product text (embossed logos, model numbers stamped on the product) while flagging genuine promotional badges. Catalogs running this gate typically see image-related disapprovals drop below 0.5% per publish, vs the 3-8% range we see on unmanaged feeds.

Image checklist (printable)

  • Every image URL fetches a real image, content-type starts with image/
  • Every primary image is at least 800x800px
  • No primary image contains promotional text overlay
  • Primary images use a neutral background (or model for apparel)
  • Image URLs are stable across deploys (no hash in path, no signed-URL TTL under 30 days)
  • additional_image_link populated for hero SKUs (lifestyle, detail, scale)
  • Pre-publish cache warm step in the export pipeline

Sources

Free forever · No card

Why wait? Try it free today.

Stop managing feeds manually. Start optimising with AI in 30 seconds.

  • 100% free forever, no credit card required
  • 1 brand, 1 feed, 100,000 products per feed
  • Full AI Product Optimisation, Rule Engine, and 200+ channel exports
  • Pay only for AI credits when you need them