Bug #56726: MetaDataExtraction isn't triggerd after file is uploaded - Core - TYPO3 Forge

Core Indexed Search Linkvalidator Release Cycles Team Resources Workspaces & Versioning TYPO3 CMS Usability Team Community Extensions Distributions Feature-Requests TYPO3 6.2 Projects (+)(Archived Projects)

Bug #56726

MetaDataExtraction isn't triggerd after file is uploaded

Added by Frans Saris over 1 year ago. Updated 8 months ago.

Status:

New

Start date:

2014-03-10

Priority:

Should have

Due date:

Assigned To:

% Done:

Category:

File Abstraction Layer (FAL)

Spent time:

Target version:

TYPO3 Version:

6.2

Is Regression:

PHP Version:

Sprint Focus:

Complexity:

easy

Description

Currently the metadataExtraction is only called through scheduler task. So when a editor uploads a new file he has to wait until the scheduler task is triggered again.

It would be a great improvement if it is directly called for every new uploaded file.

Related issues

History

#1 Updated by Steffen Ritter over 1 year ago

Status changed from New to Needs Feedback

metadata vs. indexing

metadata extraction always should be asynchronously because it could be very heavy.

#2 Updated by Frans Saris over 1 year ago

Category set to File Abstraction Layer (FAL)

I know it could be heavy but I guess for 1 file at a time it should be not a problem.

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

#3 Updated by Steffen Ritter over 1 year ago

Frans Saris wrote:

Heavy services could provide a own check in canProcess() that they only be executed in cli context etc.

no - that's exactly why this "processing" has been detached form indexing process (despite it was in the old indexer)

#4 Updated by Alexander Opitz about 1 year ago

Hi,

what's the state of this issue?

#5 Updated by Xavier Perseguers about 1 year ago

It was done on purpose, so this should not be changed.

If you really want to index right away, EXT:extractor lets you do that.

#6 Updated by Fabien Udriot about 1 year ago

+1 for metadata extraction upon upload. The actual situation is not satisfying, IMO -> Users do not want to wait until the next cron run.

If there is the fear to overload the system, a threshold (number of files on upload) could be added where to disable the metadata extraction. However, I believe on the majority of cases that won't be a problem.

#7 Updated by Xavier Perseguers about 1 year ago

Just to be complete here, automatic metadata extraction is not only a problem of overloading the system but it slows down the upload itself a lot in case you are relying on binaries, such as tika (Java-based). Test for yourself, you'll see.

#8 Updated by Fabien Udriot about 1 year ago

(By overloading the system, I meant slowing down the upload <-- just for the sake of clarity.)

By far not everyone has Tika deployed which is reserved to some advance set-up. Furthermore PHP based metadata extraction, is quite fast to my experience.

Could we make it as an opt-out option: by default indexing after upload which can be disabled by some configuration. This would be a compromise. Again, as a User it looks unsatisfying to have to wait for the next cron cycle to get the metadata.

#9 Updated by Ingo Renner about 1 year ago

FWIW: Tika can also be run in server mode, which then saves the start up time of the JVM and the making it a lot faster. It's just that EXT:tika does not support server mode (yet).

#10 Updated by Frans Saris about 1 year ago

Maybe we can add a checbox to the storage settings to enable auto metadata extraction for that storage?

#11 Updated by Alexander Opitz 8 months ago

Status changed from Needs Feedback to New

Also available in: Atom PDF

	duplicates Core - Bug #57408: Call of the meta extractor services for local storage pos...	New	2014-03-28
	duplicated by Core - Task #57546: Call ExtractionService on new files and not only Indexer...	Accepted	2014-04-02

Core

Issues

Custom queries

Watchers (5)