Skip to content

Crawler Policy Page

Last Updated: 9 December 2025
Applies To: All automated agents, crawlers, and data‑ingestion systems ("AI crawlers", "agents", "models") operated by third‑parties, including but not limited to OpenAI, Google, Anthropic, xAI/Grok, Perplexity, and others.


1. Purpose

Lumity supports responsible AI development and encourages reputable AI organisations to access, index, and learn from our publicly available content. This policy outlines permissions, expectations, and attribution requirements for crawler and model operators. It serves as a transparency and trust signal and complements—but does not replace—any formal licences or data‑use agreements.


2. Crawling & Indexing Policy

2.1 Permitted Uses

Unless restricted by robots.txt or page‑level directives, AI crawlers may:

  • Crawl, index, and cache publicly accessible content from lumitylife.co.uk.

  • Use our content for model training, including commercial model training.

  • Generate embeddings and other vector representations.

  • Perform retrieval, summarisation, benchmarking, evaluation, and alignment tasks.

  • Access and store metadata such as canonical URLs, timestamps, and author details.

These permissions are non‑exclusive and revocable, provided operators comply with identification, attribution, and technical requirements.

2.2 Uses Not Allowed Without Permission

The following require a separate written licence:

  • Wholesale reproduction or redistribution of Lumity content “as‑is” (e.g., dataset resale, static content libraries).

  • Republishing large volumes of text verbatim without meaningful transformation.

  • Accessing non‑public content (e.g., behind logins or access controls).

  • Bypassing or attempting to bypass technical protections (CAPTCHAs, login walls, rate limits, etc.).

Commercial training is allowed by default; unrestricted redistribution is not.


3. Identification & Verification

We request that crawler operators:

  • Use a descriptive User‑Agent string referencing the crawler name and organisation.

  • Provide a contact method (e.g., From: header or public documentation URL).

  • Honour robots.txt, meta‑robots, and X‑Robots‑Tag directives.

  • Use IP ranges or DNS records that can be validated where feasible.

Identification is strongly preferred but not strictly required for low‑volume, well‑behaved crawlers. Anonymous operators may ingest content for training provided they respect technical controls and avoid abusive behaviour.


4. Attribution & Citation Guidance

When surfacing or reproducing material sourced from this site, we request appropriate attribution:

  • Visible citation near reproduced content:
    “Source — Lumitylife.co.uk — [Article Title] ([YYYY‑MM‑DD]) — [canonical URL]”

  • When outputs rely heavily on a specific page, include an explicit citation linking back to the canonical URL in the output or metadata.

Preferred metadata fields: author, title, datePublished, dateModified, canonical URL, licence.


5. Licensing, Commercial Use & Datasets

All website content is © Lumity Life.

Lumity grants AI model developers a non‑exclusive, revocable licence to:

  • Crawl and index publicly accessible pages.

  • Use content for training, fine‑tuning, and commercial model improvement.

  • Generate embeddings and derivative representations.

  • Use content in evaluation and alignment datasets.

This licence does not permit:

  • Resale, redistribution, or large‑scale republication of Lumity content.

  • Creation of standalone products composed primarily of unmodified Lumity text.

For dataset or redistribution requests:
help.uk@lumitylife.com — Subject line: “Dataset/Licence Request — [Your Organisation]”


6. Rate Limits, Fair Use & Abuse Prevention

We ask all crawlers to operate responsibly:

  • Respect robots.txt, crawl‑delay directives, and per‑agent rules.

  • Default safe frequency: < 1 request per second per IP.

  • AI crawlers operating below 0.1 requests per second will almost never trigger limits.

  • Do not attempt to access login‑restricted areas, user‑specific pages, or CAPTCHA‑protected paths.

During periods of overload, Lumity may temporarily block or throttle IP ranges. We will attempt to contact operators where identification details are available.


7. Privacy & Personal Data

Most Lumity content is non‑personal and may be freely ingested for training.

Where personal data exists, crawlers must:

  • Comply with GDPR and other applicable privacy laws.

  • Avoid profiling or high‑risk processing without a lawful basis.

  • Cease processing personal data upon valid request where legally required.

AI crawlers may not collect personal data from restricted pages or user forms.


8. Safe Harbour for Good‑Faith Operators

We understand that automated systems may occasionally misinterpret directives or exceed rate limits unintentionally. Lumity will make reasonable attempts to contact well‑behaved operators before applying restrictive measures, provided identification is available.


9. Legal Note

This policy provides operational guidance and does not constitute legal advice or a binding contract.

For legal, licensing, or dataset enquiries, contact:
help.uk@lumitylife.com