Integration with Databricks
Introduction
Outdoo’s platform analyzes your sales call data (e.g. recorded customer meetings) in real-time and deal data (pipeline and CRM opportunities) on scheduled intervals. Integrating this analyzed Outdoo data into your own data warehouse – whether Snowflake, Databricks, or Amazon Redshift – allows RevOps and data teams to unify conversation intelligence with other business data.
This empowers deeper analysis of deal trends, rep performance, customer behavior, and pipeline risks within your existing BI tools and workflows. In short, connecting Outdoo to your warehouse enables you to correlate call insights and AI-derived deal metrics with CRM, product usage, and revenue data for more informed decision-making across your go-to-market operations.
Integration Overview and Supported Methods
Outdoo provides direct integration options to sync data to your warehouse without requiring third-party ETL tools. The following integration methods are supported:
REST API Pull: Outdoo also offers RESTful APIs to retrieve call and deal data for custom integrations. Advanced customers can use this to script data pulls (e.g. via Snowflake External Functions or Databricks notebooks). This requires more custom development, but provides flexibility if the above methods don’t meet requirements.
Cloud Storage Stage (S3 or ADLS): As an alternative, Outdoo can export data files (e.g. in JSON format) to a cloud storage bucket that you own (such as Amazon S3). Your warehouse can then ingest from this “stage” location – for example, Snowflake or Redshift can bulk-load from S3, and Databricks can autoload files from cloud storage. This method is useful if direct database connections are not possible, but may introduce slight delays (e.g. daily batch file drops).
Each method emphasizes security and minimal maintenance. Next, we detail how to set up integration for each target warehouse, including authentication, connection configuration, table schema, and workflow examples.
Security and Authentication Setup
Before diving into platform-specific steps, ensure you’ve prepared secure access for Outdoo:
Create Dedicated Credentials: It’s recommended to provision a dedicated user/role in your data warehouse for Outdoo. For example, create a Snowflake user with a custom role, a Redshift database user (or IAM role), or a Databricks personal access token with limited scope. Grant only the minimum privileges needed (e.g. ability to create and write to Outdoo-specific schemas). This principle of least privilege protects your warehouse while allowing Outdoo to deliver data.
Network Access: If your warehouse is behind a firewall or VPC, you may need to whitelist Outdoo’s egress IP addresses or use a private connectivity method. (For instance, Redshift users should allow inbound access from Outdoo’s known static IP in their cluster’s security group settings.) Ensure any required VPC peering or security group rules are in place so Outdoo’s service can reach your warehouse.
API Keys & Tokens: When using REST API or Databricks connections, generate secure tokens. For Databricks, prefer using a Service Principal + token rather than a personal token for long-term integrations. Outdoo will securely store any tokens or secrets you provide.
Certificate and Encryption: All integrations use encrypted channels (HTTPS for APIs, TLS for JDBC). If using JDBC, you may need to provide SSL certificates or verify Outdoo’s connection meets your encryption standards.
Auditing and Revocation: Treat the Outdoo integration credentials as you would any service account: monitor their activity and rotate keys/tokens periodically. Outdoo will only access your warehouse for data sync operations, but you should revoke credentials if the integration is ever disabled.
With security considerations in mind, let’s proceed to the setup for each platform.
Databricks Integration
For Databricks (Lakehouse on AWS or Azure), Outdoo integrates via a direct write into your Databricks environment using the SQL Warehouse interface. In practice, Outdoo will connect to a Databricks SQL Warehouse endpoint (also known as a SQL Endpoint or Serverless Endpoint) using JDBC or HTTP and create the data tables as Delta tables. This eliminates the need for intermediate files or separate ETL jobs – Outdoo pushes the data straight into your lakehouse in an “analytics-ready” form.
Setup Steps:
- Prepare a Databricks SQL Warehouse: In your Databricks workspace, create a new SQL Warehouse (if you don’t have one dedicated for integrations). This can be done from the Databricks SQL UI by clicking Create > New SQL Warehouse, choosing a size, and enabling Unity Catalog if you use it. Unity Catalog integration is recommended for governance (Outdoo can write to a specific catalog and schema you designate). Ensure the warehouse is started and you have the hostname and HTTP Path for it.
- Generate Access Token: Outdoo will need an authentication token to connect. The simplest way is to create a Personal Access Token in Databricks. In your Databricks user settings, generate a new token (with adequate lifetime) and copy it securely. For better security, you can create a dedicated Service Principal for Outdoo, grant it workspace and SQL access, and then generate a token for that principal. This avoids tying the integration to a personal user account. Ensure that the token you use has permission to access the target warehouse and write to the target catalog/schema (see the next step).
- Set Up Target Schema: Decide on a target Catalog and Schema in Databricks where Outdoo should write the data. For example, you might use the main catalog (or a special catalog for third-party data) and create a schema named outdoo_data. If using Unity Catalog, you may need to grant the service principal or user appropriate privileges on the catalog and schema (e.g. USE CATALOG, CREATE on schema, etc.). Ensure the Outdoo identity can CREATE TABLE in that schema, as Outdoo will create the tables on first sync.
- Provide Databricks Connection Info to Outdoo: In the Outdoo admin panel (Data Integration settings), select Databricks as a destination and enter the required fields: Server Hostname (from the SQL Warehouse connection details), HTTP Path (for the warehouse), Port (usually 443), Catalog and Schema name, and the Access Token you generated. Outdoo will validate these and attempt to connect. Click “Connect” to authorize the integration.
- Data Sync Process: Upon successful connection, Outdoo’s service will begin creating tables in the specified Databricks schema. It will typically create Delta tables for calls and deals (and any related tables such as transcripts). Data will then start loading: Outdoo will batch-insert historical data first. This might involve writing data via JDBC or the Databricks REST API for bulk loading. Under the hood, small batch files may be written to your DBFS storage as part of Delta ingestion, but this is managed by Databricks. After initial load, Outdoo will continuously push new call records (e.g. after each call completes analysis) to the CALLS table, and update the DEALS table on its scheduled interval (e.g. once per day with the latest pipeline info). The integration ensures that new data is appended or merged appropriately – for example, if a deal record changes stage, Outdoo may update that row or add a historical record depending on the schema design.
- Verify Tables in Databricks: You can query the Delta tables via Databricks SQL or even in a notebook. For example, run SELECT count(*) FROM outdoo_data.calls; to see if it matches the expected number of calls. The data should update automatically without manual intervention. Outdoo might also provide a log or status in its UI indicating last sync time and record counts.
Schema and Data Considerations: The schema of Outdoo data on Databricks will mirror what is available in Outdoo’s own system. Likely, the calls table contains columns such as call_id, call_datetime, participant_names, duration, sentiment_score, talk_ratio, keywords, etc., and the deals table contains deal_id, deal_name, owner, stage, amount, close_date, last_call_date, deal_risk_score, etc. If Outdoo tracks historical changes (for example, deal stage history or forecast changes), additional tables may be created, such as DEAL_HISTORY or FORECAST_SUBMISSIONS. Check Outdoo’s documentation for a database reference on all available tables and columns. In Databricks, these will be Delta Lake tables, which support ACID updates. Outdoo may perform upserts (MERGE) for certain tables (e.g. updating a deal’s information), which Delta supports. It may also mark deleted records – e.g., if a call or deal is deleted in Outdoo (or filtered out), the integration might use a flag (is_deleted) or a soft-delete approach, allowing you to mirror those changes (for instance, a column indicating a record is no longer active). Example Workflow: With Outdoo data in Databricks, you can combine it with any other data in your Lakehouse. For example, your data science team might use a Python notebook to join the outdoo_data.calls table with product usage logs in your bronze layer to see if increased call sentiment correlates with higher product adoption. Or your analytics team might build a dashboard in a BI tool (connected to Databricks SQL) that blends Outdoo’s conversation insights with financial data from Redshift or Snowflake (since Databricks can query external sources too). Another example: you could create a machine learning pipeline in Databricks ML that trains on Outdoo’s call transcripts (from the call_transcripts table as a JSON field) combined with deal outcomes to predict deal success. Having Outdoo data in Databricks means these advanced use cases are possible without manual CSV exports – the data flows directly into the Lakehouse environment.