Task #51530

Improve speed of Files::readDirectoryRecursively using RecursiveDirectoryIterator?

Added by Jacob Floyd almost 2 years ago. Updated almost 2 years ago.

Status:New Start date:2013-08-29
Priority:Should have Due date:
Assigned To:- % Done:

0%

Category:-
Target version:-
Sprint: Has patch:No
PHP Version: Complexity:

Description

Basics (TL;DR)

TYPO3/Flow/Utility/Files::readDirectoryRecursively uses \DirectoryIterator instead of \RecursiveDirectoryIterator. I suspect that an implementation using \RecursiveDirectoryIterator would be faster.

Let's write a version of readDirectoryRecursively that uses \RecursiveDirectoryIterator and profile the two versions to see which one is faster.

Background

When I asked about it in IRC, ChristianM said this:

ChristianM: if I remember right there was some oddness to the RecursiveDirectoryIterator
that we wanted to avoid but I don't remember the specifics

Consider this a selection of PHP Release dates

01 May 2008 - PHP 5.2.6
10 May 2008 - Flow's Files::readDirectoryRecursively introduced
30 Jun 2009 - PHP 5.3.0
17 Sep 2009 - PHP 5.2.11
19 Nov 2009 - PHP 5.3.1

According to the Changelog for \RecursiveDirectoryIterator

5.2.11 - Introduced RecursiveDirectoryIterator::FOLLOW_SYMLINKS
5.3.0  - Extends FilesystemIterator (which extends DirectoryIterator) instead of DirectoryIterator
5.3.0  - Implements SeekableIterator
5.3.1  - Introduced RecursiveDirectoryIterator::FOLLOW_SYMLINKS

We don't even support 5.2 anymore, and haven't for a long time. I suspect that the oddness with \RecursiveDirectoryIterator has been fixed already, so I think that using \RecursiveDirectoryIterator is probably feasible now.

Stub Method

This is the same function only it doesn't filter hidden directories, just hidden files.

    static public function readDirectoryRecursively($path, $suffix = NULL, $returnRealPath = FALSE, $returnDotFiles = FALSE, &$filenames = array()) {
        if (!is_dir($path)) {
            throw new Exception('"' . $path . '" is no directory.', 1207253462);
        }

        $directoryIterator = new \RecursiveIteratorIterator(
            new \RecursiveDirectoryIterator(
                $path,
                \FilesystemIterator::UNIX_PATHS|\FilesystemIterator::SKIP_DOTS|\FilesystemIterator::FOLLOW_SYMLINKS
            )//,
            //\RecursiveIteratorIterator::SELF_FIRST //returns directories as well.
        );
        $suffixLength = strlen($suffix);

        foreach ($directoryIterator as $pathname => $fileInfo) {
            $filename = $fileInfo->getFilename();
            if ($returnDotFiles === FALSE && $filename[0] === '.') {
                continue;
            }
            //This is needed if directories are included (see SELF_FIRST above)
            //if ($fileInfo->isFile() && ($suffix === NULL || substr($filename, -$suffixLength) === $suffix)) {
            if ($suffix === NULL || substr($filename, -$suffixLength) === $suffix) {
                $filenames[] = self::getUnixStylePath(($returnRealPath === TRUE ? realpath($pathname) : $pathname));
            }
        }
        return $filenames;
    }

History

#1 Updated by Jacob Floyd almost 2 years ago

OK. This one is functionally equivalent including removing hidden directories. However, it uses \RecursiveCallbackFilterIterator which is only available in PHP 5.4.

    static public function readDirectoryRecursively($path, $suffix = NULL, $returnRealPath = FALSE, $returnDotFiles = FALSE) {
        if (!is_dir($path)) {
            throw new Exception('"' . $path . '" is no directory.', 1207253462);
        }

        $suffixLength = strlen($suffix);
        $filter = function ($fileInfo, $pathname, $iterator) use ($suffix, $suffixLength, $returnDotFiles) {
            $filename = $fileInfo->getFilename();
            if ($returnDotFiles === FALSE && $filename[0] === '.') {
                return FALSE;
            }
            if (($fileInfo->isFile() && ($suffix === NULL || substr($filename, -$suffixLength) === $suffix)) || $iterator->hasChildren()) {
                return TRUE;
            }
            return FALSE;
        };

        $directoryIterator = new \RecursiveIteratorIterator(
            new \RecursiveCallbackFilterIterator(
                new \RecursiveDirectoryIterator(
                    $path,
                    //defaults: \FilesystemIterator::KEY_AS_PATHNAME|\FilesystemIterator::CURRENT_AS_FILEINFO
                    \FilesystemIterator::UNIX_PATHS|\FilesystemIterator::SKIP_DOTS|\FilesystemIterator::FOLLOW_SYMLINKS
                ), $filter
            )
        );

        foreach ($directoryIterator as $pathname => $fileInfo) {
            $filenames[] = self::getUnixStylePath(($returnRealPath === TRUE ? realpath($pathname) : $pathname));
        }
        return $filenames;
    }

#2 Updated by Jacob Floyd almost 2 years ago

I've been working on this, because I needed a more robust filtering mechanism when using readDirectoriesRecursively.

You can see my final version, based on the one in the previous comment here

Also available in: Atom PDF