Considerations When Writing Custom Chunking Functions

Review guardrails and constraints when writing custom chunking functions for code extension in Data 360.

Code extension isn’t currently supported in orgs that have BYOK enabled.

Follow these practices while implementing your chunking logic.

Keep payload/entrypoint.py easy to review and test. Place configuration defaults near the top, move reusable logic into helper functions, and keep the top-level function focused on request validation, chunk assembly, and response creation.

List any Python libraries your function needs in requirements.txt (or the equivalent in your payload). Use only pip-installable dependencies. Libraries that require operating system-level installations or system configuration are not supported in the Data 360 execution environment.

Handle missing or malformed input elements gracefully and avoid uncaught exceptions that can fail the entire invocation.

For the same input, return stable chunking output so that indexing behavior is predictable across runs. Avoid nondeterministic logic, for example, random ordering or unstable sequence assignment, that can produce different chunk boundaries for identical input.

Execution logs may be visible to users with access to the Code Extension logs data lake object (DLO). Do not log personally identifiable information (PII), credentials, or other sensitive data from your chunking logic. Prefer structured logger calls over ad hoc debug prints so that you can control log level and output.

Be aware of these constraints for custom chunking functions.

Chunking functions run in a request-response model. They receive a batch of document elements and return a list of chunks. Data 360 can invoke your function multiple times in a single search index run depending on batching. Your function must not read from or write to DLOs or data model objects (DMOs) directly. Use only the request input and return the required output structure. API calls from inside the function aren’t supported.

Keep your deployment package under the size limit, for example, 5 GB. Include only the code and dependencies required for chunking. Unnecessary files increase upload time and can affect startup performance.

  • Using nondeterministic logic that produces different chunk boundaries for identical input.
  • Calling APIs from inside the function.
  • Logging sensitive data, such as PII, credentials, or other restricted content.
  • Including unnecessary files or oversized dependencies in the deployment package.
  • Changing payload/config.json unintentionally.
  • Relying on mutable global state across invocations instead of processing each request independently.