The discovery contract (`gtmesh-discovery.yaml`)

gtmesh-discovery.yaml is the structured output of project discovery — the file the discovery skill writes after interviewing you (see step 0 of the walkthrough). gtmesh init --from-discovery reads it and scaffolds a mesh tailored to your business instead of the generic Acme demo.

It is structural by design: it carries only what serialises cleanly into config and seed files (the mesh shape and clustered starter terms). Brand and voice are not here — those stay as foundation files you edit. Discovery is an import, not a plan: the open-ended, networked research happens outside the deterministic engine, which then consumes this lean, predictable shape.

init --from-discovery validates this file against the same column contracts gtmesh validate uses, and exits non-zero if the discovered files don’t conform. A malformed discovery file fails at scaffold time, not silently downstream.

The blocks

`project` (required)

The project slug, e.g. acme-integrations. Always overlaid onto gtmesh.config.yaml’s project.

`kind` (optional)

The site type — the first thing discovery establishes, because it fixes the authority topology (how pages tier and link). One of:

non_commercial — content hubs discovered from demand; hubs and spokes. This is the scaffold default if kind is omitted.
commercial_catalog — many product/category pages plus an order action; clusters group them.
commercial_pages — a handful of core service/product landings; content funnels up into them.

kind drives page-type selection: a commercial project selects the money-layer types (product, tool, directory, trust); a content site selects hubs and spokes (entity-hub, educational, guide, …). See mental models for the authority layer this implies.

`profile` (optional)

The human-facing identity gathered in the interview — name, url, description (all optional). init templates these into the scaffolded README header and the config banner, so a tailored repo doesn’t ship saying “Acme”. This is identity, not brand/voice theory.

`taxonomy` (required)

The mesh’s structural vocabulary. Always overlaid:

markets — ISO 3166-1 alpha-2 codes (lowercased), defaults to ["us"]. The single source of truth for geo; the Ahrefs pull country derives from it.
sections — the URL/section vocabulary (at least one).
clusters — named cluster axes, each mapping to a list of values (e.g. category: [crm, messaging]).

`page_types` (optional)

A list of page-type ids selected from the scaffold library (e.g. review, guide, educational, product). init filters the default page_types down to these — selection, never synthesis of new prose theory. Omit it to keep the full library.

`sections_map` (optional)

Section routing rules (when → then), the same shape as the config’s sections_map. When provided, it replaces the scaffold default.

`link_rules` (optional)

Mesh link rules (when → up / up_by_subtype / siblings), same shape as the config’s link_rules. When provided it replaces the default — so the skill can author the catalogue’s comparison and dual-parent model from its research rather than leaving it to a config edit you might not know to make.

`discovery` (optional)

Harvest-class configs, same shape as the config’s discovery: block (taxonomy / demand_sources / validation / output). When provided, it is overlaid. Note: every class that has seeds must be a discovery class, or plan won’t load its term list — init synthesises a minimal { output } entry for any seed class that lacks one.

`seeds` (optional)

Clustered starter term lists, keyed by class name. Each class’s rows are written to its seed file, seeds/<class>.csv (or the class’s discovery.<class>.output path if set). Each row carries term (required) plus optional target_keyword, family, section, page_type, source, funnel, funnel_target — and never metrics.

`signals` (optional)

The keyword→role/tier rules — rows of reference/signals.csv. classify matches each keyword against a row’s match (substring or /regex/) and assigns its role/tier. Columns: signal (the unique row key), match, and optional role, tier, note. These are written to reference/signals.csv, replacing the demo’s Acme rules — because leftover demo rules would silently mis-tier an unrelated mesh. Omitted → a header-only stub (an honest empty file beats partial matches).

`seed_pages` (optional)

Source-less apex/pillar pages — rows of reference/seed-pages.csv. These are the Tier-A pages with no keyword data behind them (the converging hubs and commercial landings the mesh is built around). Required columns: parent_topic, page_type (must be a selected type), slug; optional section, tier, intent, primary_keyword, notes, funnel, funnel_target. Written to reference/seed-pages.csv, replacing the demo pages.

How `init --from-discovery` overlays it

The overlay is comment-preserving — it edits the parsed YAML in place, so the educational comments on untouched blocks survive. Only the values it sets change.

From discovery	Goes to	When
`project`, `taxonomy`	`gtmesh.config.yaml` (project, taxonomy)	always
`kind`	`gtmesh.config.yaml` (`kind`)	when provided (else scaffold default)
`sections_map`, `link_rules`, `discovery`	`gtmesh.config.yaml`	when provided (else default kept)
`page_types`	`gtmesh.config.yaml` `page_types`	filtered to the selection
`seeds`	`seeds/<class>.csv`	per class with rows
`signals`	`reference/signals.csv`	replaces demo (else header-only stub)
`seed_pages`	`reference/seed-pages.csv`	replaces demo (else header-only stub)
`profile`	README header + config banner	always

init also resets the demo’s other worked-example data (entities, the demand corpus, the keyed-corrections file) to honest empty templates, and clears the demo’s per-cluster Ahrefs seed queries — so a tailored mesh tiers and scopes itself instead of inheriting Acme’s leftovers.

`funnel` and `funnel_target`

For commercial sites, a page’s place in the funnel is a frozen decision the discovery skill makes by reading the SERP. On seed rows and seed-pages it appears as two fields:

funnel — one of commercial (the page is a landing; the money layer), content (the page funnels up into a landing), or mixed (the SERP is split ~50/50 — a human call, never auto-resolved).
funnel_target — for a content row, the slug (or parent_topic) of the landing it links up to.

This is how a commercial mesh ships its money pages by business priority rather than burying them behind volume-ranked blog content. On a non_commercial mesh nothing carries funnel, so the mechanism is inert.

Worked examples

Three compact, realistic gtmesh-discovery.yaml files — one per kind.

A content / publisher site (`non_commercial`)

Hubs discovered from demand, with educational spokes funnelling up to them.


project: trailhead-gear-guides
kind: non_commercial
profile:
  name: Trailhead Gear Guides
  url: https://trailheadguides.example
  description: Independent reviews and how-tos for backpacking gear.
taxonomy:
  markets: [us]
  sections: [reviews, best, guides, glossary]
  clusters:
    category: [tents, packs, sleeping, cooking]
    topic: [education, comparison, use-case]
page_types: [entity-hub, review, best-for, guide, educational]
sections_map:
  - { when: { match: "review" }, then: { section: reviews } }
  - { when: { match: "best " }, then: { section: best } }
  - { when: { match: "how to" }, then: { section: guides } }
  - { when: { match: "what is" }, then: { section: glossary } }
discovery:
  glossary:
    taxonomy: [fabric, insulation, fit, care]
    demand_sources: [reference/demand-glossary.csv]
    validation: ahrefs
    output: seeds/glossary.csv
seeds:
  glossary:
    - { term: denier, family: fabric, section: glossary, page_type: educational, source: harvest }
    - { term: r-value, family: insulation, section: glossary, page_type: educational, source: harvest }
signals:
  - { signal: review, match: "review", role: spoke, tier: D }
  - { signal: best, match: "best ", role: feeder, tier: B }

A product catalogue (`commercial_catalog`)

Many product/category landings plus the trust and directory pages that convert and earn links; content rows funnel up to the money layer.


project: brightbrew-coffee
kind: commercial_catalog
profile:
  name: BrightBrew Coffee
  url: https://brightbrew.example
  description: Specialty coffee gear and beans, shipped fresh.
taxonomy:
  markets: [us, gb]
  sections: [products, directory, trust, blog]
  clusters:
    category: [grinders, espresso, pourover, beans]
page_types: [product, trust, directory, tool, educational]
sections_map:
  - { when: { match: "buy " }, then: { section: products } }
  - { when: { match: "best " }, then: { section: directory } }
  - { when: { match: "how to" }, then: { section: blog } }
seed_pages:
  - { parent_topic: espresso machines, page_type: product, section: products, tier: A, slug: /espresso-machines, funnel: commercial }
  - { parent_topic: burr grinders, page_type: product, section: products, tier: A, slug: /grinders, funnel: commercial }
  - { parent_topic: coffee grind guide, page_type: educational, section: blog, slug: /blog/grind-size, funnel: content, funnel_target: /grinders }
seeds:
  directory:
    - { term: best espresso machines under 500, section: directory, page_type: directory, source: harvest }

A few service landings + a blog funnel (`commercial_pages`)

A handful of core landings; the blog content funnels up into them.


project: northstar-bookkeeping
kind: commercial_pages
profile:
  name: NorthStar Bookkeeping
  url: https://northstarbooks.example
  description: Outsourced bookkeeping for small US businesses.
taxonomy:
  markets: [us]
  sections: [services, blog]
  clusters:
    topic: [tax, payroll, software]
page_types: [product, trust, guide, educational]
sections_map:
  - { when: { match: "services" }, then: { section: services } }
  - { when: { match: "how to" }, then: { section: blog } }
seed_pages:
  - { parent_topic: bookkeeping services, page_type: product, section: services, tier: A, slug: /services/bookkeeping, funnel: commercial }
  - { parent_topic: payroll services, page_type: product, section: services, tier: A, slug: /services/payroll, funnel: commercial }
  - { parent_topic: how to categorize expenses, page_type: guide, section: blog, slug: /blog/categorize-expenses, funnel: content, funnel_target: /services/bookkeeping }
seeds:
  blog:
    - { term: small business expense categories, section: blog, page_type: guide, funnel: content, funnel_target: /services/bookkeeping, source: harvest }

From any of these, init --from-discovery scaffolds the tailored repo and you continue with the normal loop in the walkthrough: extract → plan → apply, with harvest growing each class.

The discovery contract (gtmesh-discovery.yaml)

The blocks

project (required)

kind (optional)

profile (optional)

taxonomy (required)

page_types (optional)

sections_map (optional)

link_rules (optional)

discovery (optional)

seeds (optional)

signals (optional)

seed_pages (optional)

How init --from-discovery overlays it

funnel and funnel_target