Unity Catalog
Overview
Skypoint AI leverages Unity Catalog for centralized data governance, ensuring secure management of data and AI assets across Lakehouse SQL and multiple workspaces.
It provides a structured framework for organizing, controlling, and monitoring data access, enhancing consistency, security, transparency, operational efficiency, and compliance across the organization.
Key features of Unity Catalog:
- Centralized Access Control: Administrators Can define data access policies in one place, ensuring uniform enforcement across all workspaces, simplifying permission management, and enhancing security.
- Standards-Compliant Security Model: Utilizing standard ANSI SQL, Unity Catalog allows for familiar and straightforward permission management at various levels, including catalogs, schemas, tables, and views.
- Built-in Auditing and Data Lineage: The system automatically captures detailed audit logs and tracks data asset creation and usage across languages and platforms, enhancing compliance and troubleshooting.
- Data Discovery: Users can tag, document, and search for data assets easily, facilitating efficient data discovery and collaboration among teams.
Skypoint AI - Unity Catalog model
In Skypoint AI Lakehouse, the Unity Catalog metastore follows a three-tier structure, where catalogs contain schemas that organize data and AI assets such as tables and models. The metastore, catalogs, and schemas help structure and manage stored data while enabling seamless access and manipulation. The object model flows from the metastore to the table as follows:
- Metastore: Metastore (
skypoint_metastore
) is a central repository that stores metadata about the data stored in the Lakehouse SQL. Each metastore exposes a three-level namespace (catalog.schema.table-volumes-views-functions-aimodels
) that organizes your data.
Level One:
- Catalogs: The first layer of Unity Catalog’s three-level namespace, which is used to organize your data assets. Each catalog can have any number of schemas, for example,
tenant_instance_main
.
Level Two:
- Schemas: Schemas are the second layer of the object hierarchy that contain a set of related objects, such as tables, volumes, functions, AI Models and views. Each schema can have any number of tables. For example, bronze, silver, and gold.
Level Three:
Table: The lowest level in the object hierarchy, a table is a collection of data that is organized into rows and columns. For example, profiles, audience, metrics, predictions, etc.
Volumes: Volumes store unstructured, non-tabular data in cloud storage. They can be managed, where Unity Catalog controls data lifecycle, or external, where it manages access within Azure Databricks but not external clients.
Views: Views are stored queries that retrieve data from one or more tables.
Functions: These are predefined logic units that return either a single scalar value or a set of rows.
AI Models: AI models, integrated with MLflow, are registered in Unity Catalog as callable functions.
Workspace-Catalog Binding & Privileges in SkyPoint AI
Unity Catalog in Skypoint AI enables workspace-catalog binding, allowing controlled data access across workspaces. By default, catalogs are shared, but admins can restrict access for data isolation, security, and compliance, ensuring data is processed only in designated environments.
When a workspace is enabled for Unity Catalog, it is automatically attached to a metastore, and a workspace catalog is created. Workspace admins are the default owners, managing privileges for the catalog and its child objects. All workspace users receive the USE CATALOG privilege, along with USE SCHEMA, CREATE TABLE, CREATE VOLUME, CREATE MODEL, CREATE FUNCTION, and CREATE MATERIALIZED VIEW privileges on the default schema within the catalog.
This setup ensures structured governance, controlled access, and seamless data management within Skypoint AI.