From SEO to AI: The Expanding Role of Structured Data
JSON-LD (JavaScript Object Notation for Linked Data) and Schema.org were created in 2011–2012 as a collaboration between Google, Bing, Yahoo, and Yandex to standardize how websites communicate structured information to search engines. A webpage about a restaurant could include JSON-LD markup specifying its name, address, cuisine type, opening hours, and price range in a machine-readable format, enabling rich search results.
For a decade, structured data was primarily an SEO tool. In 2026, it is becoming something more fundamental: the primary mechanism by which AI systems understand, cite, and reason about web content. Large language models are trained on web-scale text, but they struggle to reliably extract structured facts from unstructured prose. JSON-LD provides those facts in a format that is unambiguous, machine-readable, and directly consumable by AI systems.
The evidence for this shift is accumulating. Microsoft's guidance for Bing Chat explicitly recommends structured data markup for AI visibility. Google's Search Generative Experience preferentially cites pages with structured data. Perplexity, Claude, and other AI assistants are increasingly using structured data to extract facts for citation. The ROI of structured data has shifted from "better rich snippets" to "AI system visibility."
JSON-LD Fundamentals for AI
JSON-LD is a method of encoding Linked Data using JSON. It uses a context (the @context key) to map JSON property names to IRIs (Internationalized Resource Identifiers) from a vocabulary like Schema.org, and a type (the @type key) to specify what kind of thing is being described. A JSON-LD block embedded in a webpage's
The key advantage of JSON-LD over other structured data formats (Microdata, RDFa) is that it is embedded in a separate script block rather than interleaved with the HTML markup. This makes it easier to maintain, easier to generate programmatically, and less likely to break when the page design changes. It is the format recommended by Google, and it is the format that AI systems are most likely to parse reliably.
For AI visibility specifically, the most important JSON-LD types are: Article and NewsArticle for content pages (enabling AI systems to extract author, publication date, and headline); FAQPage for Q&A content (directly consumable by AI assistants as structured knowledge); HowTo for instructional content; and Dataset for data resources. The DefinedTerm type is particularly relevant for glossary and knowledge base sites — it explicitly marks up term definitions in a format that AI systems can extract and cite.
Implementing Structured Data for a Knowledge Base
For a knowledge base or glossary site like semantic.io, the most impactful structured data implementations are: DefinedTerm markup for each glossary entry, Article markup for each in-depth article, FAQPage markup for any Q&A content, and WebSite markup with a SearchAction for the site's search functionality.
A DefinedTerm implementation for a glossary entry looks like this: the @type is "DefinedTerm", the name is the term being defined, the description is the definition, the inDefinedTermSet points to the glossary as a whole (using the DefinedTermSet type), and the url is the canonical URL of the term's page. This markup tells AI systems exactly what the page is defining and provides the definition in a structured, citable format.
The Article markup for in-depth articles should include: headline (the article title), description (the excerpt), author (with Person type including name and url), datePublished, dateModified, publisher (with Organization type), and keywords. The author markup is particularly important for AI citation: AI systems that cite sources prefer to attribute them to named authors rather than anonymous organizations. Including the author's name, credentials, and a link to their profile page significantly increases the likelihood of accurate attribution.
For implementation, React-based sites can use next-seo or a custom hook that injects JSON-LD script tags into the page head. The key is to generate the structured data dynamically from the same data source that drives the page content, ensuring that the structured data and the visible content are always in sync.
Schema.org Extensions for AI and Data
Schema.org's core vocabulary covers most common content types, but the AI and data domains have specific concepts that require extensions. The Schema.org health and life sciences extension and the pending schema (types proposed but not yet fully ratified) include types relevant to AI applications.
For AI-specific content, the most useful Schema.org types include: SoftwareApplication for AI tools and platforms (with applicationCategory set to "Artificial Intelligence"); Dataset for training datasets and benchmarks (with the DCAT vocabulary for richer dataset descriptions); and TechArticle for technical documentation.
For vendor comparison and product pages, Product with Review and AggregateRating markup enables rich comparison displays in AI-generated responses. The vendor comparison page on this site uses this markup to ensure that AI systems can extract and compare the structured data about each vendor.
Measuring AI Visibility Impact
Measuring the impact of structured data on AI visibility is more complex than measuring its impact on traditional SEO. Traditional SEO impact is measured through Google Search Console: rich result impressions, click-through rates, and position changes. AI visibility impact requires different measurement approaches.
Citation tracking monitors when AI systems cite your content. Tools like Brandwatch and manual monitoring of AI assistant responses can track when and how AI systems reference your content. Structured data improves citation accuracy — AI systems are more likely to cite the correct author, date, and source when that information is explicitly marked up.
Structured data validation ensures that your markup is correctly implemented and parseable. Schema.org's validator and Google's Rich Results Test check for syntax errors and missing required properties. Regular validation as part of the deployment pipeline prevents regressions.
A/B testing structured data is possible but requires careful experimental design. The treatment group (pages with structured data) and control group (pages without) must be comparable in all other respects. The metric to optimize is AI citation rate, which requires a methodology for systematically querying AI assistants about topics covered by your content and recording whether they cite your pages. This is an emerging measurement practice without established tooling, but it is becoming increasingly important as AI assistants become a significant traffic source.
Further Reading
About the Author

Nick Eubanks
Entrepreneur, SEO Strategist & AI Infrastructure Builder
Nick Eubanks is a serial entrepreneur and digital strategist with nearly two decades of experience at the intersection of search, data, and emerging technology. He is the Global CMO of Digistore24, founder of IFTF Agency (acquired), and co-founder of the TTT SEO Community (acquired). A former Semrush team member and recognized authority in organic growth strategy, Nick has advised and built companies across SEO, content intelligence, and AI-driven marketing infrastructure. He is the founder of semantic.io — the definitive reference for the semantic AI era — and the Enterprise Risk Association at riskgovernance.com, where he publishes research on agentic AI governance for enterprise executives. Based in Miami, Nick writes at the frontier of semantic technology, AI architecture, and the infrastructure required to make enterprise AI actually work.