mirror of https://github.com/langgenius/dify.git
docs: EU AI Act compliance guide for Dify deployers (#33838)
This commit is contained in:
parent
244f9e0c11
commit
e844edcf26
|
|
@ -0,0 +1,186 @@
|
||||||
|
# EU AI Act Compliance Guide for Dify Deployers
|
||||||
|
|
||||||
|
Dify is an LLMOps platform for building RAG pipelines, agents, and AI workflows. If you deploy Dify in the EU — whether self-hosted or using a cloud provider — the EU AI Act applies to your deployment. This guide covers what the regulation requires and how Dify's architecture maps to those requirements.
|
||||||
|
|
||||||
|
## Is your system in scope?
|
||||||
|
|
||||||
|
The detailed obligations in Articles 12, 13, and 14 only apply to **high-risk AI systems** as defined in Annex III of the EU AI Act. A Dify application is high-risk if it is used for:
|
||||||
|
|
||||||
|
- **Recruitment and HR** — screening candidates, evaluating employee performance, allocating tasks
|
||||||
|
- **Credit scoring and insurance** — assessing creditworthiness or setting premiums
|
||||||
|
- **Law enforcement** — profiling, criminal risk assessment, border control
|
||||||
|
- **Critical infrastructure** — managing energy, water, transport, or telecommunications systems
|
||||||
|
- **Education assessment** — grading students, determining admissions
|
||||||
|
- **Essential public services** — evaluating eligibility for benefits, housing, or emergency services
|
||||||
|
|
||||||
|
Most Dify deployments (customer-facing chatbots, internal knowledge bases, content generation workflows) are **not** high-risk. If your Dify application does not fall into one of the categories above:
|
||||||
|
|
||||||
|
- **Article 50** (end-user transparency) still applies if users interact with your application directly. See the [Article 50 section](#article-50-end-user-transparency) below.
|
||||||
|
- **GDPR** still applies if you process personal data. See the [GDPR section](#gdpr-considerations) below.
|
||||||
|
- The high-risk obligations (Articles 9-15) are less likely to apply, but risk classification is context-dependent. **Do not self-classify without legal review.** Focus on Article 50 (transparency) and GDPR (data protection) as your baseline obligations.
|
||||||
|
|
||||||
|
If you are unsure whether your use case qualifies as high-risk, consult a qualified legal professional before proceeding.
|
||||||
|
|
||||||
|
## Self-hosted vs cloud: different compliance profiles
|
||||||
|
|
||||||
|
| Deployment | Your role | Dify's role | Who handles compliance? |
|
||||||
|
|-----------|----------|-------------|------------------------|
|
||||||
|
| **Self-hosted** | Provider and deployer | Framework provider — obligations under Article 25 apply only if Dify is placed on the market or put into service as part of a complete AI system bearing its name or trademark | You |
|
||||||
|
| **Dify Cloud** | Deployer | Provider and processor | Shared — Dify handles SOC 2 and GDPR for the platform; you handle AI Act obligations for your specific use case |
|
||||||
|
|
||||||
|
Dify Cloud already has SOC 2 Type II and GDPR compliance for the platform itself. But the EU AI Act adds obligations specific to AI systems that SOC 2 does not cover: risk classification, technical documentation, transparency, and human oversight.
|
||||||
|
|
||||||
|
## Supported providers and services
|
||||||
|
|
||||||
|
Dify integrates with a broad range of AI providers and data stores. The following are the key ones relevant to compliance:
|
||||||
|
|
||||||
|
- **AI providers:** HuggingFace (core), plus integrations with OpenAI, Anthropic, Google, and 100+ models via provider plugins
|
||||||
|
- **Model identifiers include:** gpt-4o, gpt-3.5-turbo, claude-3-opus, gemini-2.5-flash, whisper-1, and others
|
||||||
|
- **Vector database connections:** Extensive RAG infrastructure supporting numerous vector stores
|
||||||
|
|
||||||
|
Dify's plugin architecture means actual provider usage depends on your configuration. Document which providers and models are active in your deployment.
|
||||||
|
|
||||||
|
## Data flow diagram
|
||||||
|
|
||||||
|
A typical Dify RAG deployment:
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
graph LR
|
||||||
|
USER((User)) -->|query| DIFY[Dify Platform]
|
||||||
|
DIFY -->|prompts| LLM([LLM Provider])
|
||||||
|
LLM -->|responses| DIFY
|
||||||
|
DIFY -->|documents| EMBED([Embedding Model])
|
||||||
|
EMBED -->|vectors| DIFY
|
||||||
|
DIFY -->|store/retrieve| VS[(Vector Store)]
|
||||||
|
DIFY -->|knowledge| KB[(Knowledge Base)]
|
||||||
|
DIFY -->|response| USER
|
||||||
|
|
||||||
|
classDef processor fill:#60a5fa,stroke:#1e40af,color:#000
|
||||||
|
classDef controller fill:#4ade80,stroke:#166534,color:#000
|
||||||
|
classDef app fill:#a78bfa,stroke:#5b21b6,color:#000
|
||||||
|
classDef user fill:#f472b6,stroke:#be185d,color:#000
|
||||||
|
|
||||||
|
class USER user
|
||||||
|
class DIFY app
|
||||||
|
class LLM processor
|
||||||
|
class EMBED processor
|
||||||
|
class VS controller
|
||||||
|
class KB controller
|
||||||
|
```
|
||||||
|
|
||||||
|
**GDPR roles** (providers are typically processors for customer-submitted data, but the exact role depends on each provider's terms of service and processing purpose; deployers should review each provider's DPA):
|
||||||
|
- **Cloud LLM providers (OpenAI, Anthropic, Google)** typically act as processors — requires DPA.
|
||||||
|
- **Cloud embedding services** typically act as processors — requires DPA.
|
||||||
|
- **Self-hosted vector stores (Weaviate, Qdrant, pgvector):** Your organization remains the controller — no third-party transfer.
|
||||||
|
- **Cloud vector stores (Pinecone, Zilliz Cloud)** typically act as processors — requires DPA.
|
||||||
|
- **Knowledge base documents:** Your organization is the controller — stored in your infrastructure.
|
||||||
|
|
||||||
|
## Article 11: Technical documentation
|
||||||
|
|
||||||
|
High-risk systems need Annex IV documentation. For Dify deployments, key sections include:
|
||||||
|
|
||||||
|
| Section | What Dify provides | What you must document |
|
||||||
|
|---------|-------------------|----------------------|
|
||||||
|
| General description | Platform capabilities, supported models | Your specific use case, intended users, deployment context |
|
||||||
|
| Development process | Dify's architecture, plugin system | Your RAG pipeline design, prompt engineering, knowledge base curation |
|
||||||
|
| Monitoring | Dify's built-in logging and analytics | Your monitoring plan, alert thresholds, incident response |
|
||||||
|
| Performance metrics | Dify's evaluation features | Your accuracy benchmarks, quality thresholds, bias testing |
|
||||||
|
| Risk management | — | Risk assessment for your specific use case |
|
||||||
|
|
||||||
|
Some sections can be derived from Dify's architecture and your deployment configuration, as shown in the table above. The remaining sections require your input.
|
||||||
|
|
||||||
|
## Article 12: Record-keeping
|
||||||
|
|
||||||
|
Dify's built-in logging covers several Article 12 requirements:
|
||||||
|
|
||||||
|
| Requirement | Dify Feature | Status |
|
||||||
|
|------------|-------------|--------|
|
||||||
|
| Conversation logs | Full conversation history with timestamps | **Covered** |
|
||||||
|
| Model tracking | Model name recorded per interaction | **Covered** |
|
||||||
|
| Token usage | Token counts per message | **Covered** |
|
||||||
|
| Cost tracking | Cost per conversation (if provider reports it) | **Partial** |
|
||||||
|
| Document retrieval | RAG source documents logged | **Covered** |
|
||||||
|
| User identification | User session tracking | **Covered** |
|
||||||
|
| Error logging | Failed generation logs | **Covered** |
|
||||||
|
| Data retention | Configurable | **Your responsibility** |
|
||||||
|
|
||||||
|
**Retention periods:** The required retention period depends on your role under the Act. Article 18 requires **providers** of high-risk systems to retain logs and technical documentation for **10 years** after market placement. Article 26(6) requires **deployers** to retain logs for at least **6 months**. If you self-host Dify and have substantially modified the system, you may be classified as a provider rather than a deployer. Confirm the applicable retention period with legal counsel.
|
||||||
|
|
||||||
|
## Article 13: Transparency to deployers
|
||||||
|
|
||||||
|
Article 13 requires providers of high-risk AI systems to supply deployers with the information needed to understand and operate the system correctly. This is a **documentation obligation**, not a logging obligation. For Dify deployments, this means the upstream LLM and embedding providers must give you:
|
||||||
|
|
||||||
|
- Instructions for use, including intended purpose and known limitations
|
||||||
|
- Accuracy metrics and performance benchmarks
|
||||||
|
- Known or foreseeable risks and residual risks after mitigation
|
||||||
|
- Technical specifications: input/output formats, training data characteristics, model architecture details
|
||||||
|
|
||||||
|
As a deployer, collect model cards, system documentation, and accuracy reports from each AI provider your Dify application uses. Maintain these as part of your Annex IV technical documentation.
|
||||||
|
|
||||||
|
Dify's platform features provide **supporting evidence** that can inform Article 13 documentation, but they do not satisfy Article 13 on their own:
|
||||||
|
- **Source attribution** — Dify's RAG citation feature shows which documents informed the response, supporting deployer-side auditing
|
||||||
|
- **Model identification** — Dify logs which LLM model generates responses, providing evidence for system documentation
|
||||||
|
- **Conversation logs** — execution history helps compile performance and behavior evidence
|
||||||
|
|
||||||
|
You must independently produce system documentation covering how your specific Dify deployment uses AI, its intended purpose, performance characteristics, and residual risks.
|
||||||
|
|
||||||
|
## Article 50: End-user transparency
|
||||||
|
|
||||||
|
Article 50 requires deployers to inform end users that they are interacting with an AI system. This is a separate obligation from Article 13 and applies even to limited-risk systems.
|
||||||
|
|
||||||
|
For Dify applications serving end users:
|
||||||
|
|
||||||
|
1. **Disclose AI involvement** — tell users they are interacting with an AI system
|
||||||
|
2. **AI-generated content labeling** — identify AI-generated content as such (e.g., clear labeling in the UI)
|
||||||
|
|
||||||
|
Dify's "citation" feature also supports end-user transparency by showing users which knowledge base documents informed the answer.
|
||||||
|
|
||||||
|
> **Note:** Article 50 applies to chatbots and systems interacting directly with natural persons. It has a separate scope from the high-risk designation under Annex III — it applies even to limited-risk systems.
|
||||||
|
|
||||||
|
## Article 14: Human oversight
|
||||||
|
|
||||||
|
Article 14 requires that high-risk AI systems be designed so that natural persons can effectively oversee them. Dify provides **automated technical safeguards** that support human oversight, but they are not a substitute for it:
|
||||||
|
|
||||||
|
| Dify Feature | What It Does | Oversight Role |
|
||||||
|
|-------------|-------------|----------------|
|
||||||
|
| Annotation/feedback system | Human review of AI outputs | **Direct oversight** — humans evaluate and correct AI responses |
|
||||||
|
| Content moderation | Built-in filtering before responses reach users | **Automated safeguard** — reduces harmful outputs but does not replace human judgment on edge cases |
|
||||||
|
| Rate limiting | Controls on API usage | **Automated safeguard** — bounds system behavior, supports overseer's ability to maintain control |
|
||||||
|
| Workflow control | Insert human review steps between AI generation and output | **Oversight enabler** — allows building approval gates into the pipeline |
|
||||||
|
|
||||||
|
These automated controls are necessary building blocks, but Article 14 compliance requires **human oversight procedures** on top of them:
|
||||||
|
- **Escalation procedures** — define what happens when moderation triggers or edge cases arise (who is notified, what action is taken)
|
||||||
|
- **Human review pipeline** — for high-stakes decisions, route AI outputs to a qualified person before they take effect
|
||||||
|
- **Override mechanism** — a human must be able to halt AI responses or override the system's output
|
||||||
|
- **Competence requirements** — the human overseer must understand the system's capabilities, limitations, and the context of its outputs
|
||||||
|
|
||||||
|
### Recommended pattern
|
||||||
|
|
||||||
|
For high-risk use cases (HR, legal, medical), configure your Dify workflow to require human approval before the AI response is delivered to the end user or acted upon.
|
||||||
|
|
||||||
|
## Knowledge base compliance
|
||||||
|
|
||||||
|
Dify's knowledge base feature has specific compliance implications:
|
||||||
|
|
||||||
|
1. **Data provenance:** Document where your knowledge base documents come from. Article 10 requires data governance for training data; knowledge bases are analogous.
|
||||||
|
2. **Update tracking:** When you add, remove, or update documents in the knowledge base, log the change. The AI system's behavior changes with its knowledge base.
|
||||||
|
3. **PII in documents:** If knowledge base documents contain personal data, GDPR applies to the entire RAG pipeline. Implement access controls and consider PII redaction before indexing.
|
||||||
|
4. **Copyright:** Ensure you have the right to use the documents in your knowledge base for AI-assisted generation.
|
||||||
|
|
||||||
|
## GDPR considerations
|
||||||
|
|
||||||
|
1. **Legal basis** (Article 6): Document why AI processing of user queries is necessary
|
||||||
|
2. **Data Processing Agreements** (Article 28): Required for each cloud LLM and embedding provider
|
||||||
|
3. **Data minimization:** Only include necessary context in prompts; avoid sending entire documents when a relevant excerpt suffices
|
||||||
|
4. **Right to erasure:** If a user requests deletion, ensure their conversations are removed from Dify's logs AND any vector store entries derived from their data
|
||||||
|
5. **Cross-border transfers:** Providers based outside the EEA — including US-based providers (OpenAI, Anthropic), and any other non-EEA providers you route to — require Standard Contractual Clauses (SCCs) or equivalent safeguards under Chapter V of the GDPR. Review each provider's transfer mechanism individually.
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [EU AI Act full text](https://artificialintelligenceact.eu/)
|
||||||
|
- [Dify documentation](https://docs.dify.ai/)
|
||||||
|
- [Dify SOC 2 compliance](https://dify.ai/trust)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This is not legal advice. Consult a qualified professional for compliance decisions.*
|
||||||
Loading…
Reference in New Issue