Skip to content

Data Flow

End-to-End Data Flow

The FFG pipeline processes data through multiple stages, transforming raw inputs into business-ready analytics.

Data Flow Diagram

flowchart LR
    subgraph "Input Sources"
        A1[Excel Files<br/>Sell-Out Data]
        A2[CSV Files<br/>Mappings]
        A3[SQL Tables<br/>PL Global]
        A4[Excel Files<br/>RSP Data]
        A5[MDM Tables<br/>Master Data]
    end

    subgraph "Bronze Layer"
        B1[bronze_sellout]
        B2[bronze_sellin]
        B3[bronze_rsp]
        B4[Mapping Tables]
    end

    subgraph "Processing"
        C1[Enrichment]
        C2[Validation]
        C3[Aggregation]
        C4[Ventilation]
    end

    subgraph "Gold Layer"
        D1[gold_sellout]
        D2[gold_sellin]
        D3[gold_rsp]
        D4[gold_ventilated]
        D5[gold_consolidated]
    end

    A1 --> B1
    A2 --> B4
    A3 --> B2
    A4 --> B3
    A5 --> C1

    B1 --> C1
    B2 --> C1
    B3 --> C1
    B4 --> C1

    C1 --> C2
    C2 --> C3
    C3 --> C4

    C4 --> D1
    C4 --> D2
    C4 --> D3
    C4 --> D4
    D4 --> D5

Stage 1: Data Ingestion

Sell-Out Data Flow

  1. Source: Excel files uploaded to Azure Blob Storage
  2. Location: /sellout_data/incoming/
  3. Processing:
  4. Header detection
  5. Column standardization
  6. Data type conversion
  7. Output: Bronze sellout table

Sell-In Data Flow

  1. Source: SQL queries to pl_global table
  2. Filters:
  3. Date range: Previous year to current
  4. Division: Luxury only
  5. Reporting units in scope
  6. Aggregation: By fiscal year, month, and dimensions
  7. Output: Bronze sellin table

RSP Data Flow

  1. Source: Excel files with price data
  2. Processing:
  3. Market to reporting unit mapping
  4. FX rate conversion to EUR
  5. Inheritance application
  6. Date Expansion: One row per day per EAN
  7. Output: Bronze and gold RSP tables

Stage 2: Enrichment & Mapping

Mapping Application Flow

Source Data → Mapping Join → Validation → Enriched Data

Sell-Out Mappings

  • Retailer Mapping: Links retailers to customers and countries
  • Company Mapping: Maps brands to companies
  • Product Mapping: Adds product details via UPC

Sell-In Mappings

  • Region Mapping: Assigns regions to reporting units
  • Customer Info: Adds customer classifications
  • Freight & Royalty: Applies percentages
  • FX Rates: Converts LC to EUR
  • HFM Groups: Creates financial aggregations

MDM Enrichment Flow

Raw Data → MDM Lookup → Attribute Addition → Validated Data

Enrichment sources: - Material master - Customer hierarchy - Brand/House hierarchies - Product hierarchies

Stage 3: Transformation & Aggregation

HFM Processing

  1. Direct Accounts: Individual HFM account aggregations
  2. PNL Lines: Grouped account calculations
  3. Trade Terms: Category-based groupings
  4. Special Cases: COGS subtractions for specific accounts

Aggregation Levels

House Level

  • Dimensions: Brand × Retailer × Reporting Unit × House
  • Metrics: Gross sales, on/off invoice, freight
  • Special handling for multiline brands

Product Level

  • Dimensions: All house dimensions + Material key
  • Metrics: Individual product performance
  • Percentage allocations from house level

Stage 4: Ventilation Process

Ventilation Flow

flowchart TB
    A[Gold Sell-In Data]
    B[Compute Derived Columns]
    C[House Level Aggregation]
    D[Product Level Aggregation]
    E[Percentage Calculations]
    F[Allocation Distribution]
    G[KPI Calculation]
    H[Final Ventilated Data]

    A --> B
    B --> C
    B --> D
    C --> E
    D --> E
    E --> F
    F --> G
    G --> H

Key Ventilation Rules

  1. Regular Items Only: Filter article_type = "Regular Item"
  2. Quality Filters: Apply cleaning_total = 1
  3. Not Assigned Handling: Special allocation logic
  4. Exception Cases: Douglas, ICI Paris XL, BENELUX

Stage 5: Consolidation

Master Table Creation

Ventilated Sell-In + Sell-Out + RSP → Master Consolidated

Key joins: - Date alignment - EAN/UPC matching - Reporting unit consistency

Stage 6: Output & Archive

Output Flow

  1. Gold Tables: Business-ready data
  2. Processing Markers: Completion indicators
  3. Power Automate: Trigger downstream processes
  4. Archival: Move processed files

Archive Structure

/incoming → /processed (overwrite)
         ↘ /archive (timestamp)

Data Quality Checkpoints

Validation Points

  1. Input Validation
  2. File format checks
  3. Required columns verification
  4. Data type validation

  5. Join Validation

  6. No data inflation
  7. Key matching rates
  8. Null handling

  9. Business Rules

  10. Threshold checks
  11. Exception handling
  12. Consistency validation

  13. Output Validation

  14. Row count verification
  15. KPI reasonability
  16. Completeness checks

Performance Optimization

Caching Strategy

  • Cache frequently accessed DataFrames
  • Clear cache after major operations
  • Monitor memory usage

Join Optimization

  • Broadcast small lookup tables
  • Use composite keys for efficiency
  • Pre-filter data before joins

Aggregation Optimization

  • Single pass aggregations
  • Minimize shuffles
  • Use window functions efficiently

Error Handling Flow

flowchart LR
    A[Process Step] --> B{Success?}
    B -->|Yes| C[Next Step]
    B -->|No| D[Log Error]
    D --> E{Recoverable?}
    E -->|Yes| F[Retry Logic]
    E -->|No| G[Graceful Failure]
    F --> A
    G --> H[Alert & Exit]

Data Lineage

Traceability

  • Processing timestamps at each stage
  • Source file tracking
  • Transformation audit trail
  • User/job identification

Versioning

  • Pipeline version in metadata
  • Schema evolution support
  • Backward compatibility