Sending Data Differences for Fields of Updated Records
Differences are computed for each line in the text value. The diff algorithm breaks the field value into lines by using the line breaks found in the value.
If sending the diff for updates of large text fields does not reduce the field size, the entire value is sent. The diff value is not sent for the following conditions.
- The length of the field value is less than 1,000 characters.
- The difference between the old and new values is greater than 50% in length.
- More than 25% of the lines of the total of number of lines in the old and new values are changed.
- The diff’s length is greater than the length of the new value.
For more information about the unified diff format and the diff utility, see the Diff Utility Wikipedia article.
The diff value includes an SHA-256 hash value that is computed on the entire updated value. Use the hash value to verify that the reconstructed value matches the original value before it was converted to a diff. To do so, compute the SHA-256 hash after expanding the diff value. Then compare the two hash values to ensure that they’re equal. If the reconstructed content is different from the original content, the hash value is different. To compute an SHA-256 hash value, you can use a utility such as the UNIX sha256sum command or the DigestUtils class from the Apache Commons library.