suguo/dify

Fork 0

Files

hj24 87bf7401f1 feat: add backend-code-review skill (#32719 )

2026-02-28 14:17:48 +08:00

6.8 KiB

Raw Blame History

Rule Catalog — DB Schema Design

Scope

Covers: model/base inheritance, schema boundaries in model properties, tenant-aware schema design, index redundancy checks, dialect portability in models, and cross-database compatibility in migrations.
Does NOT cover: session lifecycle, transaction boundaries, and query execution patterns (handled by sqlalchemy-rule.md).

Rules

Do not query other tables inside `@property`

Category: [maintainability, performance]
Severity: critical
Description: A model @property must not open sessions or query other tables. This hides dependencies across models, tightly couples schema objects to data access, and can cause N+1 query explosions when iterating collections.
Suggested fix:
- Keep model properties pure and local to already-loaded fields.
- Move cross-table data fetching to service/repository methods.
- For list/batch reads, fetch required related data explicitly (join/preload/bulk query) before rendering derived values.

Example:

Bad:

class Conversation(TypeBase):
    __tablename__ = "conversations"

    @property
    def app_name(self) -> str:
        with Session(db.engine, expire_on_commit=False) as session:
            app = session.execute(select(App).where(App.id == self.app_id)).scalar_one()
            return app.name

Good:

class Conversation(TypeBase):
    __tablename__ = "conversations"

    @property
    def display_title(self) -> str:
        return self.name or "Untitled"


# Service/repository layer performs explicit batch fetch for related App rows.

Prefer including `tenant_id` in model definitions

Category: maintainability
Severity: suggestion
Description: In multi-tenant domains, include tenant_id in schema definitions whenever the entity belongs to tenant-owned data. This improves data isolation safety and keeps future partitioning/sharding strategies practical as data volume grows.
Suggested fix:
- Add a tenant_id column and ensure related unique/index constraints include tenant dimension when applicable.
- Propagate tenant_id through service/repository contracts to keep access paths tenant-aware.
- Exception: if a table is explicitly designed as non-tenant-scoped global metadata, document that design decision clearly.

Example:

Bad:

from sqlalchemy.orm import Mapped

class Dataset(TypeBase):
    __tablename__ = "datasets"
    id: Mapped[str] = mapped_column(StringUUID, primary_key=True)
    name: Mapped[str] = mapped_column(sa.String(255), nullable=False)

Good:

from sqlalchemy.orm import Mapped

class Dataset(TypeBase):
    __tablename__ = "datasets"
    id: Mapped[str] = mapped_column(StringUUID, primary_key=True)
    tenant_id: Mapped[str] = mapped_column(StringUUID, nullable=False, index=True)
    name: Mapped[str] = mapped_column(sa.String(255), nullable=False)

Detect and avoid duplicate/redundant indexes

Category: performance
Severity: suggestion
Description: Review index definitions for leftmost-prefix redundancy. For example, index (a, b, c) can safely cover most lookups for (a, b). Keeping both may increase write overhead and can mislead the optimizer into suboptimal execution plans.
Suggested fix:
- Before adding an index, compare against existing composite indexes by leftmost-prefix rules.
- Drop or avoid creating redundant prefixes unless there is a proven query-pattern need.
- Apply the same review standard in both model __table_args__ and migration index DDL.

Example:

Bad:

__table_args__ = (
    sa.Index("idx_msg_tenant_app", "tenant_id", "app_id"),
    sa.Index("idx_msg_tenant_app_created", "tenant_id", "app_id", "created_at"),
)

Good:

__table_args__ = (
    # Keep the wider index unless profiling proves a dedicated short index is needed.
    sa.Index("idx_msg_tenant_app_created", "tenant_id", "app_id", "created_at"),
)

Avoid PostgreSQL-only dialect usage in models; wrap in `models.types`

Category: maintainability
Severity: critical
Description: Model/schema definitions should avoid PostgreSQL-only constructs directly in business models. When database-specific behavior is required, encapsulate it in api/models/types.py using both PostgreSQL and MySQL dialect implementations, then consume that abstraction from model code.
Suggested fix:
- Do not directly place dialect-only types/operators in model columns when a portable wrapper can be used.
- Add or extend wrappers in models.types (for example, AdjustedJSON, LongText, BinaryData) to normalize behavior across PostgreSQL and MySQL.

Example:

Bad:

from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped

class ToolConfig(TypeBase):
    __tablename__ = "tool_configs"
    config: Mapped[dict] = mapped_column(JSONB, nullable=False)

Good:

from sqlalchemy.orm import Mapped

from models.types import AdjustedJSON

class ToolConfig(TypeBase):
    __tablename__ = "tool_configs"
    config: Mapped[dict] = mapped_column(AdjustedJSON(), nullable=False)

Guard migration incompatibilities with dialect checks and shared types

Category: maintainability
Severity: critical
Description: Migration scripts under api/migrations/versions/ must account for PostgreSQL/MySQL incompatibilities explicitly. For dialect-sensitive DDL or defaults, branch on the active dialect (for example, conn.dialect.name == "postgresql"), and prefer reusable compatibility abstractions from models.types where applicable.
Suggested fix:
- In migration upgrades/downgrades, bind connection and branch by dialect for incompatible SQL fragments.
- Reuse models.types wrappers in column definitions when that keeps behavior aligned with runtime models.
- Avoid one-dialect-only migration logic unless there is a documented, deliberate compatibility exception.

Example:

Bad:

with op.batch_alter_table("dataset_keyword_tables") as batch_op:
    batch_op.add_column(
        sa.Column(
            "data_source_type",
            sa.String(255),
            server_default=sa.text("'database'::character varying"),
            nullable=False,
        )
    )

Good:

def _is_pg(conn) -> bool:
    return conn.dialect.name == "postgresql"


conn = op.get_bind()
default_expr = sa.text("'database'::character varying") if _is_pg(conn) else sa.text("'database'")

with op.batch_alter_table("dataset_keyword_tables") as batch_op:
    batch_op.add_column(
        sa.Column("data_source_type", sa.String(255), server_default=default_expr, nullable=False)
    )

6.8 KiB Raw Blame History

Rule Catalog — DB Schema Design

Scope

Rules

Do not query other tables inside @property

Prefer including tenant_id in model definitions

Detect and avoid duplicate/redundant indexes

Avoid PostgreSQL-only dialect usage in models; wrap in models.types

Guard migration incompatibilities with dialect checks and shared types

6.8 KiB

Raw Blame History

Do not query other tables inside `@property`

Prefer including `tenant_id` in model definitions

Avoid PostgreSQL-only dialect usage in models; wrap in `models.types`