Schema.org Usage Dataset: What SEOs Can Actually Learn

Schema.org Usage Dataset Recap

Watch the 2-minute structured data breakdown

A concise walkthrough of what Google's public crawl data says, what it does not say, and how SEO teams should use the buckets in schema audits.

TL;DR: Google and Schema.org now publish public usage buckets for Schema.org terms seen across unique domains. Use the data as an adoption benchmark for structured data audits, not as a ranking factor list or rich result report.

The useful part is the shape of adoption: a tiny head of common schema terms and a huge tail of rarely used vocabulary. The risky part is forgetting that the unit is domains, not pages.

Google and Schema.org have finally given structured data teams a public adoption reference.

On June 4, 2026, Schema.org announced the Schema.org Usage Statistics Dataset, created with Google and published in the official Schema.org GitHub repository. The files are available as CSV, JSON, and summary JSON. The first public snapshot covers May 2026.

That matters because structured data work has always had a gap between documentation and adoption. We can check Google's rich result rules. We can validate markup. We can audit templates. We can compare competitors. What we did not have was an official, crawl-scale reference showing which Schema.org terms appear across domains.

Now we do.

But the first mistake would be reading this as ranking data.

The dataset does not tell you which schema types boost rankings. It does not tell you which pages get rich results. It does not count JSON-LD blocks. It does not count URLs.

It tells you which Schema.org terms were observed across unique domains in Google's public crawl, grouped into adoption ranges.

That is still useful. It is useful for schema audits, CMS defaults, ecommerce roadmaps, stakeholder education, and deciding when a niche term deserves custom implementation.

What Google And Schema.org Published

The official Schema.org announcement explains that the dataset reports high-level term usage across millions of domains. The companion documentation at schema.org/docs/usage_stats.html explains the main boundaries:

The data comes from Google's public web crawling infrastructure.
Frequencies are aggregated at the domain level.
Exact counts are not published.
Terms are grouped into buckets.
JSON-LD, Microdata, and RDFa are merged.
Sites blocked from Google's crawl are not included.

The May 2026 data file has three practical columns.

Field	What it means	Why SEOs should care
`Class`	Whether the row is an Itemtype or Predicate.	Helps separate types from properties during audits.
`Name`	The Schema.org term.	Lets you map detected or proposed schema to the official term list.
`Domain Bucket`	The unique-domain range where the term was observed.	Gives a public adoption benchmark, not a page count.

Natzir's Schema.org public usage explorer is a helpful way to browse the data. Use it for searching, filtering, and explaining the buckets to non-technical teams.

For final numeric claims, use the official CSV or JSON from the Schema.org repository.

The May 2026 Numbers

The May 2026 snapshot contains 5,545 Schema.org terms.

Metric	Value
Total terms in May 2026 file	5,545
Types / Itemtypes	958
Properties / Predicates	4,587
Terms in `10M+` domain bucket	43
Terms in `1M - 10M` domain bucket	100
Terms in `100K - 1M` domain bucket	158
Terms in `10K - 100K` domain bucket	420
Terms in `1K - 10K` domain bucket	560
Terms in `< 1K` domain bucket	4,264
Share of all terms in `< 1K` bucket	76.9%
Share of all terms below 10K domains	87.0%
Share of all terms at 1M+ domains	2.6%

Bar chart showing Schema.org usage buckets in the May 2026 public stats file, with most terms under one thousand domains — The May 2026 public stats file groups Schema.org terms into unique-domain buckets, not page or URL counts.

The shape is clear: a small head, a huge tail.

Only 43 terms sit in the 10M+ domain bucket. Only 143 terms sit at 1M+ domains. Most terms sit below 10K domains, and more than three quarters sit in the < 1K bucket.

That does not make the long tail worthless. It means the common web has a small set of reusable structured data patterns, while specialized vocabulary serves specialized content.

Read The Unit Correctly: Domains, Not Pages

The most important caveat is the unit of measurement.

This is unique-domain data.

If a retail site uses Product schema on 500,000 URLs, it contributes one domain to the Product bucket. If a publisher marks up every article with Article, that domain still contributes one domain to the Article bucket.

That means the dataset answers this question:

Across how many unique domains did Google's public crawl observe this Schema.org term, grouped into ranges?

It does not answer these questions:

How many pages use the term?
How many markup objects exist?
How many rich results were triggered?
How many impressions or clicks came from pages with the term?
Which syntax was used?
Which implementations were valid?

This boundary matters in client work. SEO teams usually audit URLs, templates, and page types. This dataset audits adoption breadth by domain.

Use it as a benchmark. Do not treat it as a crawl report for your site.

Buckets Are Not Exact Counts

The dataset groups terms into ranges:

Bucket	Meaning
`10M+`	Seen on more than ten million domains.
`1M - 10M`	Seen on one million to ten million domains.
`100K - 1M`	Seen on one hundred thousand to one million domains.
`10K - 100K`	Seen on ten thousand to one hundred thousand domains.
`1K - 10K`	Seen on one thousand to ten thousand domains.
`< 1K`	Seen on fewer than one thousand domains.

That is precise enough for prioritization. It is not precise enough for claims like "Google found 8,221,409 Product implementations."

Say Product is in the 1M - 10M domain bucket in the May 2026 file. Do not invent exact counts.

The Dataset Does Not Separate JSON-LD, Microdata, And RDFa

Many SEOs will want to use this dataset to argue about syntax.

Do not.

Schema.org's usage documentation says JSON-LD, Microdata, and RDFa are merged into one statistic. That means the dataset cannot prove which markup syntax is used more.

For syntax decisions, use Google's structured data documentation, your CMS constraints, developer maintenance cost, validation results, and the reliability of your source data.

What The High-Adoption Terms Tell Us

The top bucket is mostly structural.

Type	May 2026 domain bucket	Practical interpretation
`BreadcrumbList`	10M+	Common site navigation and hierarchy markup.
`WebSite`	10M+	Site-level identity and search actions.
`WebPage`	10M+	Generic page description layer.
`Organization`	10M+	Brand, company, publisher, and entity markup.
`Person`	10M+	Author, profile, and person references.
`SearchAction`	10M+	Site search action markup.

That lines up with how structured data ships on many sites: SEO plugins, CMS defaults, article templates, breadcrumbs, organization blocks, and page-level schema.

The next tier moves into commerce, reviews, articles, video, and local business data.

Type	May 2026 domain bucket	Where it usually matters
`Product`	1M - 10M	Product detail pages and ecommerce catalogs.
`Offer`	1M - 10M	Price, availability, and sales terms.
`Review`	1M - 10M	Review content and product feedback.
`AggregateRating`	1M - 10M	Aggregate rating summaries where policy allows.
`FAQPage`	1M - 10M	FAQ content, with current Google visibility limits in mind.
`Article`	1M - 10M	Editorial pages and content publishing.
`BlogPosting`	1M - 10M	Blog templates and post archives.
`VideoObject`	1M - 10M	Pages with real video assets.
`LocalBusiness`	1M - 10M	Local entity pages where the business is represented.
`Service`	1M - 10M	Service pages with clear service information.

This helps with prioritization.

If an ecommerce site has product pages with missing or invalid Product, Offer, price, availability, sku, and brand data, that is a practical schema issue. These concepts sit in high adoption ranges because many sites and platforms need them.

That does not mean every implementation is correct. It means the concepts are common enough that your site should have a clear reason if they are absent from pages that support them.

This connects with the machine-readable commerce work I covered in Google Universal Cart: AI Agents Change Ecommerce SEO. The usage dataset does not measure AI agent readiness, but it can help ecommerce teams separate baseline product data from custom schema maturity work.

High Adoption Is Not The Same As SEO Value

A high domain bucket means many domains use a term. It does not mean Google rewards the term with higher rankings.

For example, FAQPage appears in the 1M - 10M domain bucket in the May 2026 snapshot. That does not undo Google's reduced FAQ rich result visibility for most sites.

If your team is reviewing old FAQ markup, pair this dataset with the practical guidance in Google Removed FAQ Rich Results. What Should SEO Teams Do Now?. Adoption can tell you that a term is widely used. It cannot tell you that a rich result still appears for your page.

The same rule applies across structured data. Common terms deserve attention because they may represent baseline page meaning, CMS defaults, or commerce requirements. They still need page-level validation and a real business case.

The Long Tail Is Where Teams Need Judgment

The May 2026 file puts 4,264 terms in the < 1K bucket. That is 76.9% of all terms.

It would be a mistake to treat that bucket as a discard pile.

Some Schema.org terms are narrow because they belong to narrow content types. Medical, education, government, scientific, event, transport, and dataset vocabulary will never appear on as many domains as name, url, image, or WebPage.

Some terms are newer. Some need source data that many CMS setups do not store. Some lack a visible rich result, so plugin authors rarely make them defaults.

A rare term can still be the right term.

The decision should be based on fit:

Question	Good reason to use a low-bucket term	Weak reason to use it
Does the page truly match the term?	A dataset page using `Dataset`.	A generic blog post adding `Dataset` because the term exists.
Is the information visible or supported?	A course page with real course metadata.	A template creating fields users never see.
Is source data reliable?	A product system with maintained return policy fields.	Manual fields that decay after launch.
Does the business case matter?	A regulated publisher using precise domain vocabulary.	A sitewide schema block copied from a competitor.

Low adoption should trigger better questions, not automatic rejection.

Vertical infographic summarizing how to use Schema.org public usage buckets in a structured data audit without treating adoption as ranking data — The public usage buckets are an input for judgment: adoption context first, then page fit, visible content, source data, validation, and business value.

How To Use This In A Schema Audit

Add a Public usage bucket column to your structured data audit.

Then use the bucket as context, not the verdict.

Term	Current site status	May 2026 bucket	Audit action
`BreadcrumbList`	Missing from article templates	10M+	Add if breadcrumbs are visible and template data is stable.
`Product`	Present but missing price and availability	1M - 10M	Fix source data and validation before adding new schema ideas.
`MerchantReturnPolicy`	Not implemented	100K - 1M	Consider if return policy fields are reliable and relevant.
`Dataset`	Proposed for a research library	10K - 100K	Use where the page is a real dataset page.
`MedicalCondition`	Proposed for generic wellness posts	10K - 100K	Use only when the content and policy context support it.

This changes the meeting.

Instead of saying "we should add schema because competitors have it," you can say:

This term is common across domains, our page type supports it, and our CMS has the right fields.

Or:

This term is valid for a niche use case, but our page does not support it and our source data is weak.

That is how the dataset earns its place in SEO work.

A Practical Decision Workflow

Use this sequence before adding or removing structured data.

Step	Question	Output
1. Bucket check	Where does the term sit in the May 2026 public stats file?	Adoption context.
2. Eligibility check	Does Google document a relevant search feature for this page type?	Search feature context.
3. Page-content check	Is the information visible or clearly represented on the page?	Accuracy check.
4. Source-data check	Can the CMS, feed, PIM, CRM, or editorial system maintain it?	Maintenance check.
5. Validation check	Does the markup pass Schema.org and Google validation tools?	Technical QA.
6. Business-value check	Does it help search engines, AI systems, commerce systems, QA, or internal tooling understand the page?	Priority call.
7. Monitoring check	Can you detect regressions after template, plugin, or feed changes?	Ongoing control.

Structured data audit workflow using bucket check, search eligibility, page content, source data, validation, business value, and monitoring — Use the public usage bucket as one input, then test the schema against eligibility, page content, source data, validation, business value, and monitoring.

The public bucket is step one. It is not the final answer.

For teams working on AI search visibility, this same discipline matters. In Google's AI Search Documentation Is Finally Here, the practical point was that AI search still depends on crawlable, understandable, well-structured content. Schema.org usage data can support that work, but it does not replace content quality, source reliability, or search policy.

What CMS And Plugin Teams Should Do

If you build SEO tooling, WordPress plugins, Shopify apps, headless CMS modules, or ecommerce templates, this dataset is a useful benchmark.

It can help you decide:

Which common terms deserve first-class defaults.
Which fields need admin UI support.
Which schema modules should stay optional.
Which niche terms need warnings, docs, or source-data requirements.
Which existing defaults may be too generic for the page type.

For example, article templates should handle Article or BlogPosting, author, publisher, headline, image, date published, and date modified cleanly. Product templates should not emit half-populated Product markup without reliable offer data. Local templates should not place LocalBusiness markup on every page when the page does not represent a real local entity.

The dataset will not design your defaults for you. It gives you a public adoption reference to test those defaults against.

What Ecommerce Teams Should Do

Ecommerce teams should read the dataset beside product feed quality, Merchant Center setup, return policies, availability data, review policy, and checkout readiness.

Several commerce terms sit in high adoption ranges:

Commerce term	May 2026 bucket
`Product`	1M - 10M
`Offer`	1M - 10M
`price`	1M - 10M
`priceCurrency`	1M - 10M
`availability`	1M - 10M
`sku`	1M - 10M
`brand`	1M - 10M
`aggregateRating`	1M - 10M
`review`	1M - 10M
`MerchantReturnPolicy`	100K - 1M

This does not prove that stores with those terms perform better.

It does show that product, offer, pricing, availability, brand, SKU, review, and policy data are common machine-readable commerce concepts. If your store lacks them, the next step is not chasing exotic schema. The next step is fixing the product data layer.

What Not To Do With The Dataset

Do not turn adoption into ranking claims.
Do not compare the buckets to page counts.
Do not claim the file proves JSON-LD usage share.
Do not remove accurate niche schema only because it sits in a low bucket.
Do not add a high-bucket term when the page does not support it.
Do not use Natzir's explorer as the final numeric source when the official CSV and JSON are available.
Do not treat the May 2026 snapshot as permanent. Schema.org says the files are planned for monthly updates, so rerun your comparisons when new snapshots arrive.

Practical Takeaway

The Schema.org usage dataset gives SEOs a public adoption benchmark for structured data vocabulary.

Use it to make audits sharper. Use it to help stakeholders understand why baseline schema matters. Use it to separate common template issues from niche enhancements. Use it to improve plugin and CMS defaults.

Keep the boundary clear:

Domains, not pages.
Buckets, not exact counts.
Google's public crawl, not the full web.
Merged formats, not syntax share.
Adoption data, not ranking data.

The best use of the dataset is simple: make structured data decisions less anecdotal.

Start with the bucket. Then validate the page, the policy, the source data, the markup, and the business value.

FAQ

Is the Schema.org usage dataset a Google ranking factor list?

No. It is an adoption dataset. A high domain bucket means a term appears across many unique domains in Google's public crawl. It does not mean the term has ranking weight.

Does the dataset count pages or URLs?

No. It counts unique domains. A term used on many pages of the same domain still contributes one domain to that term's bucket.

Does it separate JSON-LD, Microdata, and RDFa?

No. Schema.org's usage documentation says those formats are combined into one statistic, so the dataset cannot answer syntax-share questions.

Should I remove schema types that sit in the < 1K bucket?

No. Low adoption can mean a term is specialized, new, difficult to populate, or relevant to a narrow content model. Remove it only if it is inaccurate, unsupported, invalid, or has no business case.

Should I add every term in a high bucket?

No. Add schema only when it accurately describes the page and can be maintained from reliable data. A high bucket gives context, not permission to mark up unrelated content.

What is the best way to use this in an SEO audit?

Add the public usage bucket as a column, then review each term against page eligibility, visible content, source data, validation, business value, and monitoring.

Is Natzir's explorer official?

No. It is a helpful third-party interface for exploring the official public stats dataset. Use the official Schema.org GitHub CSV or JSON for final numeric claims.

Does this change Google's structured data guidelines?

No. It gives the community adoption data. You still need to follow Google's structured data documentation, quality rules, and rich result requirements for the specific page type.

About the Author

Francisco Leon de Vivero is VP of Growth at Growing Search and a global SEO expert with 15+ years of experience across enterprise, ecommerce, and international search. He previously led Global SEO at Shopify and has worked on large-scale technical SEO systems across multinational brands.

LinkedIn · YouTube · Book a Consultation

Schema.org Usage Dataset: What SEOs Can Actually Learn

Watch the 2-minute structured data breakdown

What Google And Schema.org Published

The May 2026 Numbers

Read The Unit Correctly: Domains, Not Pages

Buckets Are Not Exact Counts

The Dataset Does Not Separate JSON-LD, Microdata, And RDFa

What The High-Adoption Terms Tell Us

High Adoption Is Not The Same As SEO Value

The Long Tail Is Where Teams Need Judgment

How To Use This In A Schema Audit

A Practical Decision Workflow

What CMS And Plugin Teams Should Do

What Ecommerce Teams Should Do

What Not To Do With The Dataset

Practical Takeaway

FAQ

Is the Schema.org usage dataset a Google ranking factor list?

Does the dataset count pages or URLs?

Does it separate JSON-LD, Microdata, and RDFa?

Should I remove schema types that sit in the < 1K bucket?

Should I add every term in a high bucket?

What is the best way to use this in an SEO audit?

Is Natzir's explorer official?

Does this change Google's structured data guidelines?

Turn this background reading into a more current SEO plan.

Technical SEO Advisory

See the proof behind the advice.

Start a focused consultation.

Contact Francisco

Schema.org Usage Dataset: What SEOs Can Actually Learn

Watch the 2-minute structured data breakdown

What Google And Schema.org Published

The May 2026 Numbers

Read The Unit Correctly: Domains, Not Pages

Buckets Are Not Exact Counts

The Dataset Does Not Separate JSON-LD, Microdata, And RDFa

What The High-Adoption Terms Tell Us

High Adoption Is Not The Same As SEO Value

The Long Tail Is Where Teams Need Judgment

How To Use This In A Schema Audit

A Practical Decision Workflow

What CMS And Plugin Teams Should Do

What Ecommerce Teams Should Do

What Not To Do With The Dataset

Practical Takeaway

FAQ

Is the Schema.org usage dataset a Google ranking factor list?

Does the dataset count pages or URLs?

Does it separate JSON-LD, Microdata, and RDFa?

Should I remove schema types that sit in the < 1K bucket?

Should I add every term in a high bucket?

What is the best way to use this in an SEO audit?

Is Natzir's explorer official?

Does this change Google's structured data guidelines?

Related Articles

Get the weekly SEO Pulse

Turn this background reading into a more current SEO plan.

Technical SEO Advisory

See the proof behind the advice.

Start a focused consultation.

Contact Francisco