Bug #56726

MetaDataExtraction isn't triggerd after file is uploaded

Added by Frans Saris over 1 year ago. Updated 8 months ago.

Status:New Start date:2014-03-10
Priority:Should have Due date:
Assigned To:- % Done:

0%

Category:File Abstraction Layer (FAL) Spent time: -
Target version:-
TYPO3 Version:6.2 Is Regression:No
PHP Version: Sprint Focus:
Complexity:easy

Description

Currently the metadataExtraction is only called through scheduler task. So when a editor uploads a new file he has to wait until the scheduler task is triggered again.

It would be a great improvement if it is directly called for every new uploaded file.


Related issues

duplicates Core - Bug #57408: Call of the meta extractor services for local storage pos... New 2014-03-28
duplicated by Core - Task #57546: Call ExtractionService on new files and not only Indexer... Accepted 2014-04-02

History

#1 Updated by Steffen Ritter over 1 year ago

  • Status changed from New to Needs Feedback

metadata vs. indexing

metadata extraction always should be asynchronously because it could be very heavy.

#2 Updated by Frans Saris over 1 year ago

  • Category set to File Abstraction Layer (FAL)

I know it could be heavy but I guess for 1 file at a time it should be not a problem.

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

#3 Updated by Steffen Ritter over 1 year ago

Frans Saris wrote:

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

no - that's exactly why this "processing" has been detached form indexing process (despite it was in the old indexer)

#4 Updated by Alexander Opitz about 1 year ago

Hi,

what's the state of this issue?

#5 Updated by Xavier Perseguers about 1 year ago

It was done on purpose, so this should not be changed.

If you really want to index right away, EXT:extractor lets you do that.

#6 Updated by Fabien Udriot about 1 year ago

+1 for metadata extraction upon upload. The actual situation is not satisfying, IMO -> Users do not want to wait until the next cron run.

If there is the fear to overload the system, a threshold (number of files on upload) could be added where to disable the metadata extraction. However, I believe on the majority of cases that won't be a problem.

#7 Updated by Xavier Perseguers about 1 year ago

Just to be complete here, automatic metadata extraction is not only a problem of overloading the system but it slows down the upload itself a lot in case you are relying on binaries, such as tika (Java-based). Test for yourself, you'll see.

#8 Updated by Fabien Udriot about 1 year ago

(By overloading the system, I meant slowing down the upload <-- just for the sake of clarity.)

By far not everyone has Tika deployed which is reserved to some advance set-up. Furthermore PHP based metadata extraction, is quite fast to my experience.

Could we make it as an opt-out option: by default indexing after upload which can be disabled by some configuration. This would be a compromise. Again, as a User it looks unsatisfying to have to wait for the next cron cycle to get the metadata.

#9 Updated by Ingo Renner about 1 year ago

FWIW: Tika can also be run in server mode, which then saves the start up time of the JVM and the making it a lot faster. It's just that EXT:tika does not support server mode (yet).

#10 Updated by Frans Saris about 1 year ago

Maybe we can add a checbox to the storage settings to enable auto metadata extraction for that storage?

#11 Updated by Alexander Opitz 8 months ago

  • Status changed from Needs Feedback to New

Also available in: Atom PDF