Data Team Organization

The Three Pillars

Data Engineering owns the infrastructure. They build and maintain ingestion pipelines, data warehouses, streaming systems, and orchestration (Airflow, Dagster, Prefect). Their output is reliable, well-structured data that other teams can use. Think of them as the platform team for data.

Analytics Engineering owns the transformation and modeling layer. They take raw data and turn it into clean, tested, documented datasets. dbt has become the standard tool here. A good analytics engineer knows SQL deeply, understands the business domain, and writes data models with the same rigor a software engineer brings to application code.

Data Science owns experimentation, ML models, and advanced analytics. They need clean, reliable data as input (which is why they depend on the other two groups). Data scientists who spend 80% of their time cleaning data are a sign that your data engineering and analytics engineering functions are understaffed.

Centralized vs Embedded

At 3-5 data people, centralized is the only thing that makes sense. One team, one backlog, one set of standards.

Between 8-15, the centralized model starts to crack. Request queues grow long. Product teams complain that data work takes weeks. This is when companies start embedding data people into product teams while keeping a small central platform group that owns shared infrastructure and governance.

Beyond 20 data professionals, you almost certainly need a hub-and-spoke model. The hub maintains the warehouse, shared tooling, and data quality standards. The spokes are embedded analysts and analytics engineers who serve specific product areas.

When to Split Into Specialized Teams

The signal is usually workload imbalance. If your data engineers are constantly fighting fires on pipelines while data scientists wait around for clean data, you need to formalize the separation. Similarly, if analysts are writing the same complex SQL transformations over and over, that's a sign you need dedicated analytics engineers building reusable data models.

Reporting Structure

There's no single right answer, but the trend among companies that do data well (Airbnb, Spotify, Netflix) is toward a standalone data organization led by a Head of Data or CDO, with dotted-line relationships to the product and engineering teams they serve. This gives data professionals a clear career path and prevents them from being treated as a service desk.

The Three Pillars

Centralized vs Embedded

At 3-5 data people, centralized is the only thing that makes sense. One team, one backlog, one set of standards.

When to Split Into Specialized Teams

Reporting Structure

The Three Pillars

Centralized vs Embedded

When to Split Into Specialized Teams

Reporting Structure

Key Points

Common Mistakes

Related Topics

Data Team Organization

The Three Pillars

Centralized vs Embedded

When to Split Into Specialized Teams

Reporting Structure

Key Points

Common Mistakes

Related Topics