Merge branch 'main' of https://git.parinaam.in/dipanshuk/clickHouseData
This commit is contained in:
@@ -0,0 +1,105 @@
|
||||
Recommended relationship model
|
||||
1. Use a dimensional/star schema
|
||||
Create:
|
||||
|
||||
DimStore
|
||||
DimEmployee
|
||||
DimProduct / SKU
|
||||
DimPromotionDefinition
|
||||
DimVisibilityDefinition
|
||||
DimVisibilityReason
|
||||
DimDisplay
|
||||
DimSalesTerritory / territory hierarchy
|
||||
DimDate
|
||||
maybe DimChannel / DimChain / DimStoreType if you need clean lookup values
|
||||
2. Map the fact tables to those dimensions
|
||||
Store dimension
|
||||
Use Store_Master.store_id as the primary store key.
|
||||
Fact tables referencing store:
|
||||
|
||||
Sales.StoreId
|
||||
Promotion.store_id
|
||||
Mapping_StorePromotion.StoreId
|
||||
Mapping_StoreVisibility.StoreId
|
||||
Journey_Plan.store_id
|
||||
Contact Conversion.store_id
|
||||
PaidVisibility.store_id
|
||||
PaidVisibility_Compliance.store_id
|
||||
Coverage.store_id
|
||||
additional_visibility.store_id
|
||||
Employee dimension
|
||||
Use Employee_Master.employee_id.
|
||||
Fact tables referencing employee:
|
||||
|
||||
Sales.EmpID
|
||||
Promotion.employee_id
|
||||
PaidVisibility.employee_id
|
||||
PaidVisibility_Compliance.employee_id
|
||||
OQaD.employee_id
|
||||
Journey_Plan.employee_id
|
||||
Attendance.employee_id
|
||||
Login.employee_id
|
||||
Contact Conversion.emp_id
|
||||
additional_visibility.emp_id
|
||||
Product/SKU dimension
|
||||
Use SKU Master.product_id (or pk if you want a warehouse surrogate).
|
||||
|
||||
Sales.ProductId should join to SKU Master.product_id
|
||||
Visibility dimension
|
||||
Use Master_VisibilityDefinition.VisibilityDefinitionid
|
||||
|
||||
Mapping_StoreVisibility.VisibilityDefinitionid
|
||||
PaidVisibility.Visibility_definition_id
|
||||
PaidVisibility_Compliance.visibility_definition_id
|
||||
Reason dimension
|
||||
Use Master_VisibilityReason.ReasonId
|
||||
|
||||
Promotion.ReasonId
|
||||
PaidVisibility.ReasonId
|
||||
coverage_remarks.reason_id
|
||||
Promotion mapping
|
||||
Likely relationship:
|
||||
|
||||
Mapping_StorePromotion.PromotionDefinitionid → Master_PromotionDefinition
|
||||
Promotion.promo_definition_id → Master_PromotionDefinition
|
||||
Display mapping
|
||||
display_master.display_id → additional_visibility.display_id
|
||||
Territory hierarchy
|
||||
Master_SalesTerritory and Master_Salesterritorylayer are hierarchical masters
|
||||
join them via matching StLayerOneId … StLayerFourId and project_id
|
||||
enrich Store_Master with territory hierarchy
|
||||
Best practices for ClickHouse and Generative BI
|
||||
Use a curated warehouse model, not the raw SQL Server layout
|
||||
Keep raw source tables as landing tables.
|
||||
Build cleaned dimension tables and cleaned facts in ClickHouse.
|
||||
Do not rely on ClickHouse to enforce FKs—use ETL validation and metadata.
|
||||
Standardize keys and naming
|
||||
Normalize EmpID / employee_id / emp_id to a single warehouse key.
|
||||
Normalize StoreId / store_id / Unique_Store_ID.
|
||||
Normalize ProductId / product_id, channel_id, chain_id, storetype_id.
|
||||
Choose consistent types
|
||||
Many facts use int while masters use bigint; choose one type in the warehouse and convert consistently.
|
||||
Prefer UInt32 or UInt64 in ClickHouse based on value range.
|
||||
Partition and sort facts by date
|
||||
Use visit_date, login_date, audit_date as the partitioning key for fact tables.
|
||||
ORDER BY should include the join keys used often in queries, for example:
|
||||
ORDER BY (project_id, store_id, visit_date)
|
||||
or ORDER BY (project_id, employee_id, visit_date)
|
||||
Keep dimension tables narrow
|
||||
Dimension tables like DimStore, DimEmployee, DimProduct, DimVisibilityDefinition, DimDisplay should be small and stable.
|
||||
Fact tables should contain foreign keys to dims plus measures.
|
||||
For Generative BI
|
||||
A clean, consistent schema is critical.
|
||||
Use descriptive dimension columns: store name, region, employee name, product name, visibility name, reason text, etc.
|
||||
Avoid raw code-only facts; enrich them with lookup labels in ETL or views.
|
||||
Practical next step
|
||||
I recommend this immediate design:
|
||||
|
||||
DimStore(store_id, project_id, store_name, region, state, city, channel, distributor, store_type, ...)
|
||||
DimEmployee(employee_id, project_id, employee_name, manager_id, role, channel_id, ...)
|
||||
DimProduct(product_id, category, brand, product_name, mrp, ... )
|
||||
DimVisibility(VisibilityDefinitionid, VisibilityDefinitionName)
|
||||
DimReason(ReasonId, Reason)
|
||||
FactSales(StoreId, EmpID, ProductId, ChannelId, VisitDate, Sale, Value, ...)
|
||||
FactPromotion(store_id, employee_id, promo_definition_id, visit_date, promotion_status, ...)
|
||||
FactPaidVisibility(...), FactPaidVisibilityCompliance(...), FactCoverage(...), FactAttendance(...), FactLogin(...), FactJourneyPlan(...), FactContactConversion(...)
|
||||
Reference in New Issue
Block a user