Bug #56726
MetaDataExtraction isn't triggerd after file is uploaded
Status: | New | Start date: | 2014-03-10 | |
---|---|---|---|---|
Priority: | Should have | Due date: | ||
Assigned To: | - | % Done: | 0% |
|
Category: | File Abstraction Layer (FAL) | Spent time: | - | |
Target version: | - | |||
TYPO3 Version: | 6.2 | Is Regression: | No | |
PHP Version: | Sprint Focus: | |||
Complexity: | easy |
Description
Currently the metadataExtraction is only called through scheduler task. So when a editor uploads a new file he has to wait until the scheduler task is triggered again.
It would be a great improvement if it is directly called for every new uploaded file.
Related issues
History
#1 Updated by Steffen Ritter over 1 year ago
- Status changed from New to Needs Feedback
metadata vs. indexing
metadata extraction always should be asynchronously because it could be very heavy.
#2 Updated by Frans Saris over 1 year ago
- Category set to File Abstraction Layer (FAL)
I know it could be heavy but I guess for 1 file at a time it should be not a problem.
Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.
#3 Updated by Steffen Ritter over 1 year ago
Frans Saris wrote:
Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.
no - that's exactly why this "processing" has been detached form indexing process (despite it was in the old indexer)
#4 Updated by Alexander Opitz about 1 year ago
Hi,
what's the state of this issue?
#5 Updated by Xavier Perseguers about 1 year ago
It was done on purpose, so this should not be changed.
If you really want to index right away, EXT:extractor lets you do that.
#6 Updated by Fabien Udriot about 1 year ago
+1 for metadata extraction upon upload. The actual situation is not satisfying, IMO -> Users do not want to wait until the next cron run.
If there is the fear to overload the system, a threshold (number of files on upload) could be added where to disable the metadata extraction. However, I believe on the majority of cases that won't be a problem.
#7 Updated by Xavier Perseguers about 1 year ago
Just to be complete here, automatic metadata extraction is not only a problem of overloading the system but it slows down the upload itself a lot in case you are relying on binaries, such as tika (Java-based). Test for yourself, you'll see.
#8 Updated by Fabien Udriot about 1 year ago
(By overloading the system, I meant slowing down the upload <-- just for the sake of clarity.)
By far not everyone has Tika deployed which is reserved to some advance set-up. Furthermore PHP based metadata extraction, is quite fast to my experience.
Could we make it as an opt-out option: by default indexing after upload which can be disabled by some configuration. This would be a compromise. Again, as a User it looks unsatisfying to have to wait for the next cron cycle to get the metadata.
#9 Updated by Ingo Renner about 1 year ago
FWIW: Tika can also be run in server mode, which then saves the start up time of the JVM and the making it a lot faster. It's just that EXT:tika does not support server mode (yet).
#10 Updated by Frans Saris about 1 year ago
Maybe we can add a checbox to the storage settings to enable auto metadata extraction for that storage?
#11 Updated by Alexander Opitz 8 months ago
- Status changed from Needs Feedback to New