Task #54730

Epic #55070: Workpackages

Epic #54260: WP: FAL Missing Issues / Features / API

Story #54266: As an User I want FAL to be performant

Task #51094: SQL-Optimize the FAL

sys_file_processedfile.checksum shorten DB field

Added by Ingo Schmitt over 1 year ago. Updated 6 months ago.

Status:Resolved Start date:2014-01-03
Priority:Should have Due date:
Assigned To:Ingo Schmitt % Done:

100%

Category:File Abstraction Layer (FAL) Spent time: -
Target version:7.1 (Cleanup)
TYPO3 Version:6.2 Complexity:
PHP Version: Sprint Focus:On Location Sprint

Description

The contents for sys_file_processedfile.checksum are created by \TYPO3\CMS\Core\Resource\Processing\AbstractTask\getConfigurationChecksum by calling \TYPO3\CMS\Core\Utility\GeneralUtility::shortMD5(implode('|', $this->getChecksumData())).

Since shotMD5 will always return at maximum 10 Characters, the size of the database field could be lowered to 10 characters.

Associated revisions

Revision fd19c522
Added by Mathias Schreiber 6 months ago

[BUGFIX] Reduced sys_file_processedfile.checksum to correct size

Since only a shortMD5 (of 10 characters length) is used in this field
the size is changed to 10 characters and the field type has been set to
char instead of varchar.

Resolves: #54730
Releases: master, 6.2
Change-Id: I8e846786230b55d42464f6ea791202579e6d7873
Reviewed-on: http://review.typo3.org/36388
Reviewed-by: Ingo Schmitt <>
Tested-by: Ingo Schmitt <>
Reviewed-by: Cedric Ziel <>
Reviewed-by: Frans Saris <>
Tested-by: Frans Saris <>
Reviewed-by: Michael Oehlhof <>
Reviewed-by: Mathias Schreiber <>
Tested-by: Mathias Schreiber <>

History

#1 Updated by Steffen Ritter over 1 year ago

  • Parent task set to #51094

#2 Updated by Steffen Ritter over 1 year ago

is there any benefit to use a shortened MD5 instead of a real one? would rather opt to change that.

#3 Updated by Ingo Schmitt over 1 year ago

A short MD5 uses a smaler field size, thus index is smaller and faster.
While reading the code, I had a similar thought, normally we can use the full md5, since the caluclation is fast and the byte size is not the problem at B-Trees, more the differences of keys.
When using a MD5, the isze could be 32 characters, so set size to 32 varchar

#4 Updated by Christoph Dörfel over 1 year ago

Ingo Schmitt wrote:

A short MD5 uses a smaler field size, thus index is smaller and faster.
While reading the code, I had a similar thought, normally we can use the full md5, since the caluclation is fast and the byte size is not the problem at B-Trees, more the differences of keys.
When using a MD5, the isze could be 32 characters, so set size to 32 varchar

While looking through the code yet another idea came to my mind. Why use sha1, md5 or shortMd5 when the only thing we are looking for is "change", not "uniqueness". The generated hashes are not used as unique identifiers but to make sure that data is valid and doesn't have to be recalculated. So we could just use a simple checksum, crc32 in this case. crc32 is used in a lot of cases, where e.g. file transfers have to be checked for errors.
The benefit of using crc32 is that it's a simple 32 bit interger. Comparisons and DB searches can't get faster than that :)
Your opinions?

Edit:
Also in http://forge.typo3.org/issues/54729 with file checksums, "originalfilesha1" and similar.

#5 Updated by Ingo Schmitt over 1 year ago

If this field is used only to detect the "change", than we should name it accordingly. By looking at the field from outside right now, it seams that a file hash is stored. So for an extension developer this field could be used for detecting duplicates or similar.

@Steffen: What do you think about it?

#6 Updated by Mathias Schreiber 7 months ago

  • Status changed from New to Accepted
  • Target version changed from 6.2.0 to 7.1 (Cleanup)
  • Sprint Focus set to On Location Sprint

#7 Updated by Mathias Schreiber 7 months ago

  • Category changed from Performance to File Abstraction Layer (FAL)

#8 Updated by Gerrit Code Review 6 months ago

  • Status changed from Accepted to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36388

#9 Updated by Gerrit Code Review 6 months ago

Patch set 1 for branch TYPO3_6-2 of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at http://review.typo3.org/36433

#10 Updated by Mathias Schreiber 6 months ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF