docs: EU AI Act compliance guide for Dify deployers (#33838)

2026-03-23 12:43:51 +05:45 · 2026-03-23 12:43:51 +05:45 · e844edcf26
parent 244f9e0c11
commit e844edcf26
1 changed files with 186 additions and 0 deletions
--- a/docs/eu-ai-act-compliance.md
+++ b/docs/eu-ai-act-compliance.md
@ -0,0 +1,186 @@
 # EU AI Act Compliance Guide for Dify Deployers
 Dify is an LLMOps platform for building RAG pipelines, agents, and AI workflows. If you deploy Dify in the EU — whether self-hosted or using a cloud provider — the EU AI Act applies to your deployment. This guide covers what the regulation requires and how Dify's architecture maps to those requirements.
 ## Is your system in scope?
 The detailed obligations in Articles 12, 13, and 14 only apply to **high-risk AI systems** as defined in Annex III of the EU AI Act. A Dify application is high-risk if it is used for:
 - **Recruitment and HR** — screening candidates, evaluating employee performance, allocating tasks
 - **Credit scoring and insurance** — assessing creditworthiness or setting premiums
 - **Law enforcement** — profiling, criminal risk assessment, border control
 - **Critical infrastructure** — managing energy, water, transport, or telecommunications systems
 - **Education assessment** — grading students, determining admissions
 - **Essential public services** — evaluating eligibility for benefits, housing, or emergency services
 Most Dify deployments (customer-facing chatbots, internal knowledge bases, content generation workflows) are **not** high-risk. If your Dify application does not fall into one of the categories above:
 - **Article 50** (end-user transparency) still applies if users interact with your application directly. See the [Article 50 section](#article-50-end-user-transparency) below.
 - **GDPR** still applies if you process personal data. See the [GDPR section](#gdpr-considerations) below.
 - The high-risk obligations (Articles 9-15) are less likely to apply, but risk classification is context-dependent. **Do not self-classify without legal review.** Focus on Article 50 (transparency) and GDPR (data protection) as your baseline obligations.
 If you are unsure whether your use case qualifies as high-risk, consult a qualified legal professional before proceeding.
 ## Self-hosted vs cloud: different compliance profiles
 | Deployment | Your role | Dify's role | Who handles compliance? |
 |-----------|----------|-------------|------------------------|
 | **Self-hosted** | Provider and deployer | Framework provider — obligations under Article 25 apply only if Dify is placed on the market or put into service as part of a complete AI system bearing its name or trademark | You |
 | **Dify Cloud** | Deployer | Provider and processor | Shared — Dify handles SOC 2 and GDPR for the platform; you handle AI Act obligations for your specific use case |
 Dify Cloud already has SOC 2 Type II and GDPR compliance for the platform itself. But the EU AI Act adds obligations specific to AI systems that SOC 2 does not cover: risk classification, technical documentation, transparency, and human oversight.
 ## Supported providers and services
 Dify integrates with a broad range of AI providers and data stores. The following are the key ones relevant to compliance:
 - **AI providers:** HuggingFace (core), plus integrations with OpenAI, Anthropic, Google, and 100+ models via provider plugins
 - **Model identifiers include:** gpt-4o, gpt-3.5-turbo, claude-3-opus, gemini-2.5-flash, whisper-1, and others
 - **Vector database connections:** Extensive RAG infrastructure supporting numerous vector stores
 Dify's plugin architecture means actual provider usage depends on your configuration. Document which providers and models are active in your deployment.
 ## Data flow diagram
 A typical Dify RAG deployment:
 ```mermaid
 graph LR
    USER((User)) -->|query| DIFY[Dify Platform]
    DIFY -->|prompts| LLM([LLM Provider])
    LLM -->|responses| DIFY
    DIFY -->|documents| EMBED([Embedding Model])
    EMBED -->|vectors| DIFY
    DIFY -->|store/retrieve| VS[(Vector Store)]
    DIFY -->|knowledge| KB[(Knowledge Base)]
    DIFY -->|response| USER
    classDef processor fill:#60a5fa,stroke:#1e40af,color:#000
    classDef controller fill:#4ade80,stroke:#166534,color:#000
    classDef app fill:#a78bfa,stroke:#5b21b6,color:#000
    classDef user fill:#f472b6,stroke:#be185d,color:#000
    class USER user
    class DIFY app
    class LLM processor
    class EMBED processor
    class VS controller
    class KB controller
 ```
 **GDPR roles** (providers are typically processors for customer-submitted data, but the exact role depends on each provider's terms of service and processing purpose; deployers should review each provider's DPA):
 - **Cloud LLM providers (OpenAI, Anthropic, Google)** typically act as processors — requires DPA.
 - **Cloud embedding services** typically act as processors — requires DPA.
 - **Self-hosted vector stores (Weaviate, Qdrant, pgvector):** Your organization remains the controller — no third-party transfer.
 - **Cloud vector stores (Pinecone, Zilliz Cloud)** typically act as processors — requires DPA.
 - **Knowledge base documents:** Your organization is the controller — stored in your infrastructure.
 ## Article 11: Technical documentation
 High-risk systems need Annex IV documentation. For Dify deployments, key sections include:
 | Section | What Dify provides | What you must document |
 |---------|-------------------|----------------------|
 | General description | Platform capabilities, supported models | Your specific use case, intended users, deployment context |
 | Development process | Dify's architecture, plugin system | Your RAG pipeline design, prompt engineering, knowledge base curation |
 | Monitoring | Dify's built-in logging and analytics | Your monitoring plan, alert thresholds, incident response |
 | Performance metrics | Dify's evaluation features | Your accuracy benchmarks, quality thresholds, bias testing |
 | Risk management | — | Risk assessment for your specific use case |
 Some sections can be derived from Dify's architecture and your deployment configuration, as shown in the table above. The remaining sections require your input.
 ## Article 12: Record-keeping
 Dify's built-in logging covers several Article 12 requirements:
 | Requirement | Dify Feature | Status |
 |------------|-------------|--------|
 | Conversation logs | Full conversation history with timestamps | **Covered** |
 | Model tracking | Model name recorded per interaction | **Covered** |
 | Token usage | Token counts per message | **Covered** |
 | Cost tracking | Cost per conversation (if provider reports it) | **Partial** |
 | Document retrieval | RAG source documents logged | **Covered** |
 | User identification | User session tracking | **Covered** |
 | Error logging | Failed generation logs | **Covered** |
 | Data retention | Configurable | **Your responsibility** |
 **Retention periods:** The required retention period depends on your role under the Act. Article 18 requires **providers** of high-risk systems to retain logs and technical documentation for **10 years** after market placement. Article 26(6) requires **deployers** to retain logs for at least **6 months**. If you self-host Dify and have substantially modified the system, you may be classified as a provider rather than a deployer. Confirm the applicable retention period with legal counsel.
 ## Article 13: Transparency to deployers
 Article 13 requires providers of high-risk AI systems to supply deployers with the information needed to understand and operate the system correctly. This is a **documentation obligation**, not a logging obligation. For Dify deployments, this means the upstream LLM and embedding providers must give you:
 - Instructions for use, including intended purpose and known limitations
 - Accuracy metrics and performance benchmarks
 - Known or foreseeable risks and residual risks after mitigation
 - Technical specifications: input/output formats, training data characteristics, model architecture details
 As a deployer, collect model cards, system documentation, and accuracy reports from each AI provider your Dify application uses. Maintain these as part of your Annex IV technical documentation.
 Dify's platform features provide **supporting evidence** that can inform Article 13 documentation, but they do not satisfy Article 13 on their own:
 - **Source attribution** — Dify's RAG citation feature shows which documents informed the response, supporting deployer-side auditing
 - **Model identification** — Dify logs which LLM model generates responses, providing evidence for system documentation
 - **Conversation logs** — execution history helps compile performance and behavior evidence
 You must independently produce system documentation covering how your specific Dify deployment uses AI, its intended purpose, performance characteristics, and residual risks.
 ## Article 50: End-user transparency
 Article 50 requires deployers to inform end users that they are interacting with an AI system. This is a separate obligation from Article 13 and applies even to limited-risk systems.
 For Dify applications serving end users:
 1. **Disclose AI involvement** — tell users they are interacting with an AI system
 2. **AI-generated content labeling** — identify AI-generated content as such (e.g., clear labeling in the UI)
 Dify's "citation" feature also supports end-user transparency by showing users which knowledge base documents informed the answer.
 > **Note:** Article 50 applies to chatbots and systems interacting directly with natural persons. It has a separate scope from the high-risk designation under Annex III — it applies even to limited-risk systems.
 ## Article 14: Human oversight
 Article 14 requires that high-risk AI systems be designed so that natural persons can effectively oversee them. Dify provides **automated technical safeguards** that support human oversight, but they are not a substitute for it:
 | Dify Feature | What It Does | Oversight Role |
 |-------------|-------------|----------------|
 | Annotation/feedback system | Human review of AI outputs | **Direct oversight** — humans evaluate and correct AI responses |
 | Content moderation | Built-in filtering before responses reach users | **Automated safeguard** — reduces harmful outputs but does not replace human judgment on edge cases |
 | Rate limiting | Controls on API usage | **Automated safeguard** — bounds system behavior, supports overseer's ability to maintain control |
 | Workflow control | Insert human review steps between AI generation and output | **Oversight enabler** — allows building approval gates into the pipeline |
 These automated controls are necessary building blocks, but Article 14 compliance requires **human oversight procedures** on top of them:
 - **Escalation procedures** — define what happens when moderation triggers or edge cases arise (who is notified, what action is taken)
 - **Human review pipeline** — for high-stakes decisions, route AI outputs to a qualified person before they take effect
 - **Override mechanism** — a human must be able to halt AI responses or override the system's output
 - **Competence requirements** — the human overseer must understand the system's capabilities, limitations, and the context of its outputs
 ### Recommended pattern
 For high-risk use cases (HR, legal, medical), configure your Dify workflow to require human approval before the AI response is delivered to the end user or acted upon.
 ## Knowledge base compliance
 Dify's knowledge base feature has specific compliance implications:
 1. **Data provenance:** Document where your knowledge base documents come from. Article 10 requires data governance for training data; knowledge bases are analogous.
 2. **Update tracking:** When you add, remove, or update documents in the knowledge base, log the change. The AI system's behavior changes with its knowledge base.
 3. **PII in documents:** If knowledge base documents contain personal data, GDPR applies to the entire RAG pipeline. Implement access controls and consider PII redaction before indexing.
 4. **Copyright:** Ensure you have the right to use the documents in your knowledge base for AI-assisted generation.
 ## GDPR considerations
 1. **Legal basis** (Article 6): Document why AI processing of user queries is necessary
 2. **Data Processing Agreements** (Article 28): Required for each cloud LLM and embedding provider
 3. **Data minimization:** Only include necessary context in prompts; avoid sending entire documents when a relevant excerpt suffices
 4. **Right to erasure:** If a user requests deletion, ensure their conversations are removed from Dify's logs AND any vector store entries derived from their data
 5. **Cross-border transfers:** Providers based outside the EEA — including US-based providers (OpenAI, Anthropic), and any other non-EEA providers you route to — require Standard Contractual Clauses (SCCs) or equivalent safeguards under Chapter V of the GDPR. Review each provider's transfer mechanism individually.
 ## Resources
 - [EU AI Act full text](https://artificialintelligenceact.eu/)
 - [Dify documentation](https://docs.dify.ai/)
 - [Dify SOC 2 compliance](https://dify.ai/trust)
 ---
 *This is not legal advice. Consult a qualified professional for compliance decisions.*