Dear all,

we experienced a severe performance impact in a php application writing small data chunks to our GlusterFS network file system. Some details can be found here and there - the baseline is that file_existsis_dir and all functions using system call lstat can not be cached in APC and cause GlusterFS to check with the quorum of servers during the operation.

Python and native linux applications ( C ) do not have this problem, so there is no obvious need to fix at system side. The problem did  not arise on xRes data importing where only single big writes are placed on GlusterFS. Testing around with php realpath_cache_size and opcache settings did not help.

Jacob implemented a simple memory buffering for collecting file writes into bigger chunks which turned out to be the single most helpful workaround. Furthermore it helps to not use file_put_contents inside big loops but open and close files only once outside a loop. See examples:

Best: buffering and fopen()/fclose() called only once, 0.8 sec:

$line = str_repeat('x', 128) . PHP_EOL;

$buffer = '';
$fh = fopen('/data/php_test.txt', 'w+');
for ($i = 0; $i < 100000; $i++) {
    $buffer .= $line;
    if ($i % 1000 === 0) {
        fwrite($fh, $buffer);
        $buffer = '';
    }
}
fwrite($fh, $buffer);
fclose($fh);

2.5 times slower: buffering and open/write/close inside loop, 2.0 sec:

$line = str_repeat('x', 128) . PHP_EOL;

$buffer = '';
for ($i = 0; $i < 100000; $i++) {
    $buffer .= $line;
    if ($i % 1000 === 0) {
         file_put_contents('/data/php_test.txt', $buffer, FILE_APPEND);
         $buffer = '';
    }
}
file_put_contents('/data/php_test.txt', $buffer, FILE_APPEND);

Worst: unbuffered plain open/write/close inside loop: 186 sec

$line = str_repeat('x', 128) . PHP_EOL;

for ($i = 0; $i < 100000; $i++) {
    file_put_contents('/data/php_test.txt', $line, FILE_APPEND);
}

Having fopen()fclose() outside of loops may help writing to hard disks, too.

Any more ideas?

-- 

Mit freundlichen Grüßen

Gunnar Mann 

- Systemadministration - 
________________________________________________________ 

TraSo GmbH

Nonnenstraße 42
D-04229 Leipzig

Tel.: +49 341 355 740 76 
Fax: +49 341 355 740 21 
E-Mail: g.mann@traso.de 


________________________________________________________
Geschäftsführer: Haiko Gerdes
Handelsregister: Amtsgericht Leipzig, HRB 21850