Data Flow¶

End-to-End Data Flow¶

The FFG pipeline processes data through multiple stages, transforming raw inputs into business-ready analytics.

Data Flow Diagram¶

flowchart LR
    subgraph "Input Sources"
        A1[Excel Files<br/>Sell-Out Data]
        A2[CSV Files<br/>Mappings]
        A3[SQL Tables<br/>PL Global]
        A4[Excel Files<br/>RSP Data]
        A5[MDM Tables<br/>Master Data]
    end

    subgraph "Bronze Layer"
        B1[bronze_sellout]
        B2[bronze_sellin]
        B3[bronze_rsp]
        B4[Mapping Tables]
    end

    subgraph "Processing"
        C1[Enrichment]
        C2[Validation]
        C3[Aggregation]
        C4[Ventilation]
    end

    subgraph "Gold Layer"
        D1[gold_sellout]
        D2[gold_sellin]
        D3[gold_rsp]
        D4[gold_ventilated]
        D5[gold_consolidated]
    end

    A1 --> B1
    A2 --> B4
    A3 --> B2
    A4 --> B3
    A5 --> C1

    B1 --> C1
    B2 --> C1
    B3 --> C1
    B4 --> C1

    C1 --> C2
    C2 --> C3
    C3 --> C4

    C4 --> D1
    C4 --> D2
    C4 --> D3
    C4 --> D4
    D4 --> D5

Stage 1: Data Ingestion¶

Sell-Out Data Flow¶

Source: Excel files uploaded to Azure Blob Storage
Location: /sellout_data/incoming/
Processing:
Header detection
Column standardization
Data type conversion
Output: Bronze sellout table

Sell-In Data Flow¶

Source: SQL queries to pl_global table
Filters:
Date range: Previous year to current
Division: Luxury only
Reporting units in scope
Aggregation: By fiscal year, month, and dimensions
Output: Bronze sellin table

RSP Data Flow¶

Source: Excel files with price data
Processing:
Market to reporting unit mapping
FX rate conversion to EUR
Inheritance application
Date Expansion: One row per day per EAN
Output: Bronze and gold RSP tables

Stage 2: Enrichment & Mapping¶

Mapping Application Flow¶

Source Data → Mapping Join → Validation → Enriched Data

Sell-Out Mappings¶

Retailer Mapping: Links retailers to customers and countries
Company Mapping: Maps brands to companies
Product Mapping: Adds product details via UPC

Sell-In Mappings¶

Region Mapping: Assigns regions to reporting units
Customer Info: Adds customer classifications
Freight & Royalty: Applies percentages
FX Rates: Converts LC to EUR
HFM Groups: Creates financial aggregations

MDM Enrichment Flow¶

Raw Data → MDM Lookup → Attribute Addition → Validated Data

Enrichment sources: - Material master - Customer hierarchy - Brand/House hierarchies - Product hierarchies

Stage 3: Transformation & Aggregation¶

HFM Processing¶

Direct Accounts: Individual HFM account aggregations
PNL Lines: Grouped account calculations
Trade Terms: Category-based groupings
Special Cases: COGS subtractions for specific accounts

Aggregation Levels¶

House Level¶

Dimensions: Brand × Retailer × Reporting Unit × House
Metrics: Gross sales, on/off invoice, freight
Special handling for multiline brands

Product Level¶

Dimensions: All house dimensions + Material key
Metrics: Individual product performance
Percentage allocations from house level

Stage 4: Ventilation Process¶

Ventilation Flow¶

flowchart TB
    A[Gold Sell-In Data]
    B[Compute Derived Columns]
    C[House Level Aggregation]
    D[Product Level Aggregation]
    E[Percentage Calculations]
    F[Allocation Distribution]
    G[KPI Calculation]
    H[Final Ventilated Data]

    A --> B
    B --> C
    B --> D
    C --> E
    D --> E
    E --> F
    F --> G
    G --> H

Key Ventilation Rules¶

Regular Items Only: Filter article_type = "Regular Item"
Quality Filters: Apply cleaning_total = 1
Not Assigned Handling: Special allocation logic
Exception Cases: Douglas, ICI Paris XL, BENELUX

Stage 5: Consolidation¶

Master Table Creation¶

Ventilated Sell-In + Sell-Out + RSP → Master Consolidated

Key joins: - Date alignment - EAN/UPC matching - Reporting unit consistency

Stage 6: Output & Archive¶

Output Flow¶

Gold Tables: Business-ready data
Processing Markers: Completion indicators
Power Automate: Trigger downstream processes
Archival: Move processed files

Archive Structure¶

/incoming → /processed (overwrite)
         ↘ /archive (timestamp)

Data Quality Checkpoints¶

Validation Points¶

Input Validation
File format checks
Required columns verification
Data type validation
Join Validation
No data inflation
Key matching rates
Null handling
Business Rules
Threshold checks
Exception handling
Consistency validation
Output Validation
Row count verification
KPI reasonability
Completeness checks

Performance Optimization¶

Caching Strategy¶

Cache frequently accessed DataFrames
Clear cache after major operations
Monitor memory usage

Join Optimization¶

Broadcast small lookup tables
Use composite keys for efficiency
Pre-filter data before joins

Aggregation Optimization¶

Single pass aggregations
Minimize shuffles
Use window functions efficiently

Error Handling Flow¶

flowchart LR
    A[Process Step] --> B{Success?}
    B -->|Yes| C[Next Step]
    B -->|No| D[Log Error]
    D --> E{Recoverable?}
    E -->|Yes| F[Retry Logic]
    E -->|No| G[Graceful Failure]
    F --> A
    G --> H[Alert & Exit]