Considerations When Writing Code Extension

To write code extension logic in Data 360, follow these best practices, architectural constraints, and Spark API limitations.

Edition Table
Available in: Developer, Enterprise, Performance, and Unlimited Editions. See Data 360 edition availability.

Code extension isn’t currently supported in orgs that have Bring Your Own Key (BYOK) enabled.

Keep your code extension logic in DataFrame and Dataset operations. When you write code extension logic for batch data transforms, use DataFrame operations (such as filter(), groupBy(), agg(), join(), and withColumn()) rather than iterating over data row by row. These operations are optimized for server-side execution and provide the best performance in Data 360’s execution environment.

Prefer supported Spark-native expressions and built-in UDFs (user-defined functions) that run on the server over client-side iteration. Use functions from pyspark.sql.functions and other Spark-native APIs in your code extension logic. This approach maximizes performance and ensures compatibility with Data 360’s execution model.

Keep your transformations free of side effects. The same execution plan can be analyzed or optimized multiple times. Eliminating side effects ensures your code produces consistent results when run with the same input data.

Treat code extension like a separate API connection. Not all Spark features are available in code extension. Test your code thoroughly in a sandbox before deploying to production.

In your code extension logic, log important steps in your transformation logic, handle errors gracefully, and provide meaningful error messages. Review execution logs in the Data 360 UI to troubleshoot issues.

For more information, see Write and Validate Custom Scripts (Beta).

These features are not supported in the Data 360 execution environment:

Custom UDFs
Spark Listeners
Spark Extensions
Full access to configuration options

Code extension is built on DataFrame and Dataset APIs. RDDs are not supported. Use DataFrame operations instead.

Many SparkContext-era patterns, such as custom accumulators, arbitrary driver-side callbacks, and some listeners, don’t map cleanly to Data 360’s execution model. Use DataFrame and Dataset APIs and supported patterns instead.

Operations such as calling a service per row, writing to a database inside a map operation, or incrementing counters can behave unexpectedly due to retries or replanning. Call external services in a controlled manner, outside of row-level transformations.

If downstream processes need data, write results to a Data 360 object (DLO or DMO) instead of by using collect(). Large result sets can cause memory issues and performance problems.

Data 360’s execution environment has different performance characteristics from in-process Spark. Extra serialization and network overhead mean that chatty patterns (many tiny actions) can hurt performance more than in in-process Spark. Design your code for efficient batch operations. Performance when you run locally with the Salesforce CLI can differ from performance when the same code runs in production in Data 360.

Even printSchema() or schema analysis can trigger resolution paths that present issues such as duplicate columns earlier than you expect. These operations have performance implications in the Data 360 execution environment.

Write and Validate Custom Scripts (Beta)