: A HOWTO on Optimizing PHP with tips and methodologies

home • products examples manual • faq support forum • contact news • login store

Last revised 28 Feb 2005. If you want to see what has changed, search for this date in this article.

If you like this article, visit my blog, PHP Everywhere for related articles.

A HOWTO on Optimizing PHP

PHP is a very fast programming language, but there is more to optimizing PHP than just speed of code execution.

In this chapter, we explain why optimizing PHP involves many factors which are not code related, and why tuning PHP requires an understanding of how PHP performs in relation to all the other subsystems on your server, and then identifying bottlenecks caused by these subsystems and fixing them. We also cover how to tune and optimize your PHP scripts so they run even faster.

Achieving High Performance

When we talk about good performance, we are not talking about how fast your PHP scripts will run. Performance is a set of tradeoffs between scalability and speed. Scripts tuned to use fewer resources might be slower than scripts that perform caching, but more copies of the same script can be run at one time on a web server.

In the example below, A.php is a sprinter that can run fast, and B.php is a marathon runner than can jog forever at the nearly the same speed. For light loads, A.php is substantially faster, but as the web traffic increases, the performance of B.php only drops a little bit while A.php just runs out of steam.

Let us take a more realistic example to clarify matters further. Suppose we need to write a PHP script that reads a 250K file and generates a HTML summary of the file. We write 2 scripts that do the same thing: hare.php that reads the whole file into memory at once and processes it in one pass, and tortoise.php that reads the file, one line at time, never keeping more than the longest line in memory. Tortoise.php will be slower as multiple reads are issued, requiring more system calls.

Hare.php requires 0.04 seconds of CPU and 10 Mb RAM and tortoise.php requires 0.06 seconds of CPU and 5 Mb RAM. The server has 100 Mb free actual RAM and its CPU is 99% idle. Assume no memory fragmentation occurs to simplify things.

At 10 concurrent scripts running, hare.php will run out of memory (10 x 10 = 100). At that point, tortoise.php will still have 50 Mb of free memory. The 11th concurrent script to run will bring hare.php to its knees as it starts using virtual memory, slowing it down to maybe half its original speed; each invocation of hare.php now takes 0.08 seconds of CPU time. Meanwhile, tortoise.php will be still be running at its normal 0.06 seconds CPU time.

In the table below, the faster php script for different loads is in bold:

Connections	CPU seconds required to satisfy 1 HTTP request	CPU seconds required to satisfy 10 HTTP requests	CPU seconds required to satisfy 11 HTTP requests
hare.php	0.04	0.40	0.88 (runs out of RAM)
tortoise.php	0.06	0.60	0.66

As the above example shows, obtaining good performance is not merely writing fast PHP scripts. High performance PHP requires a good understanding of the underlying hardware, the operating system and supporting software such as the web server and database.

Bottlenecks

The hare and tortoise example has shown us that bottlenecks cause slowdowns. With infinite RAM, hare.php will always be faster than tortoise.php. Unfortunately, the above model is a bit simplistic and there are many other bottlenecks to performance apart from RAM:

(a) Networking

Your network is probably the biggest bottleneck. Let us say you have a 10 Mbit link to the Internet, over which you can pump 1 megabyte of data per second. If each web page is 30k, a mere 33 web pages per second will saturate the line.

More subtle networking bottlenecks include frequent access to slow network services such as DNS, or allocating insufficient memory for networking software.

(b) CPU

If you monitor your CPU load, sending plain HTML pages over a network will not tax your CPU at all because as we mentioned earlier, the bottleneck will be the network. However for the complex dynamic web pages that PHP generates, your CPU speed will normally become the limiting factor. Having a server with multiple processors or having a server farm can alleviate this.

(c) Shared Memory

Shared memory is used for inter-process communication, and to store resources that are shared between multiple processes such as cached data and code. If insufficient shared memory is allocated any attempt to access resources that use shared memory such as database connections or executable code will perform poorly.

(d) File System

Accessing a hard disk can be 50 to 100 times slower than reading data from RAM. File caches using RAM can alleviate this. However low memory conditions will reduce the amount of memory available for the file-system cache, slowing things down. File systems can also become heavily fragmented, slowing down disk accesses. Heavy use of symbolic links on Unix systems can slow down disk accesses too.

Default Linux installs are also notorious for setting hard disk default settings which are tuned for compatibility and not for speed. Use the command hdparm to tune your Linux hard disk settings.

(e) Process Management

On some operating systems such as Windows creating new processes is a slow operation. This means CGI applications that fork a new process on every invocation will run substantially slower on these operating systems. Running PHP in multi-threaded mode should improve response times (note: older versions of PHP are not stable in multi-threaded mode).

Avoid overcrowding your web server with too many unneeded processes. For example, if your server is purely for web serving, avoid running (or even installing) X-Windows on the machine. On Windows, avoid running Microsoft Find Fast (part of Office) and 3-dimensional screen savers that result in 100% CPU utilization.

Some of the programs that you can consider removing include unused networking protocols, mail servers, antivirus scanners, hardware drivers for mice, infrared ports and the like. On Unix, I assume you are accessing your server using SSH. Then you can consider removing:

deamons such as telnetd, inetd, atd, ftpd, lpd, sambad
sendmail for incoming mail
portmap for NFS
xfs, fvwm, xinit, X

You can also disable at startup various programs by modifying the startup files which are usually stored in the /etc/init* or /etc/rc*/init* directory.

Also review your cron jobs to see if you can remove them or reschedule them for off-peak periods.

(f) Connecting to Other Servers

If your web server requires services running on other servers, it is possible that those servers become the bottleneck. The most common example of this is a slow database server that is servicing too many complicated SQL requests from multiple web servers.

When to Start Optimizing?

Some people say that it is better to defer tuning until after the coding is complete. This advice only makes sense if your programming team's coding is of a high quality to begin with, and you already have a good feel of the performance parameters of your application. Otherwise you are exposing yourselves to the risk of having to rewrite substantial portions of your code after testing.

My advice is that before you design a software application, you should do some basic benchmarks on the hardware and software to get a feel for the maximum performance you might be able to achieve. Then as you design and code the application, keep the desired performance parameters in mind, because at every step of the way there will be tradeoffs between performance, availability, security and flexibility.

Also choose good test data. If your database is expected to hold 100,000 records, avoid testing with only a 100 record database – you will regret it. This once happened to one of the programmers in my company; we did not detect the slow code until much later, causing a lot of wasted time as we had to rewrite a lot of code that worked but did not scale.

Tuning Your Web Server for PHP

We will cover how to get the best PHP performance for the two most common web servers in use today, Apache 1.3 and IIS. A lot of the advice here is relevant for serving HTML also.

The authors of PHP have stated that there is no performance nor scalability advantage in using Apache 2.0 over Apache 1.3 with PHP, especially in multi-threaded mode. When running Apache 2.0 in pre-forking mode, the following discussion is still relevant (21 Oct 2003).

(a) Apache 1.3/2.0

Apache is available on both Unix and Windows. It is the most popular web server in the world. Apache 1.3 uses a pre-forking model for web serving. When Apache starts up, it creates multiple child processes that handle HTTP requests. The initial parent process acts like a guardian angel, making sure that all the child processes are working properly and coordinating everything. As more HTTP requests come in, more child processes are spawned to process them. As the HTTP requests slow down, the parent will kill the idle child processes, freeing up resources for other processes. The beauty of this scheme is that it makes Apache extremely robust. Even if a child process crashes, the parent and the other child processes are insulated from the crashing child.

The pre-forking model is not as fast as some other possible designs, but to me that it is "much ado about nothing" on a server serving PHP scripts because other bottlenecks will kick in long before Apache performance issues become significant. The robustness and reliability of Apache is more important.

Apache 2.0 offers operation in multi-threaded mode. My benchmarks indicate there is little performance advantage in this mode. Also be warned that many PHP extensions are not compatible (e.g. GD and IMAP). Tested with Apache 2.0.47 (21 Oct 2003).

Apache is configured using the httpd.conf file. The following parameters are particularly important in configuring child processes:

Directive	Default	Description
MaxClients	256	The maximum number of child processes to create. The default means that up to 256 HTTP requests can be handled concurrently. Any further connection requests are queued.
StartServers	5	The number of child processes to create on startup.
MinSpareServers	5	The number of idle child processes that should be created. If the number of idle child processes falls to less than this number, 1 child is created initially, then 2 after another second, then 4 after another second, and so forth till 32 children are created per second.
MaxSpareServers	10	If more than this number of child processes are alive, then these extra processes will be terminated.
MaxRequestsPerChild	0	Sets the number of HTTP requests a child can handle before terminating. Setting to 0 means never terminate. Set this to a value to between 100 to 10000 if you suspect memory leaks are occurring, or to free under-utilized resources.

For large sites, values close to the following might be better:

MinSpareServers 32

MaxSpareServers 64

Apache on Windows behaves differently. Instead of using child processes, Apache uses threads. The above parameters are not used. Instead we have one parameter: ThreadsPerChild which defaults to 50. This parameter sets the number of threads that can be spawned by Apache. As there is only one child process in the Windows version, the default setting of 50 means only 50 concurrent HTTP requests can be handled. For web servers experiencing higher traffic, increase this value to between 256 to 1024.

Other useful performance parameters you can change include:

Directive	Default	Description
SendBufferSize	Set to OS default	Determines the size of the output buffer (in bytes) used in TCP/IP connections. This is primarily useful for congested or slow networks when packets need to be buffered; you then set this parameter close to the size of the largest file normally downloaded. One TCP/IP buffer will be created per client connection.
KeepAlive [on\|off]	On	In the original HTTP specification, every HTTP request had to establish a separate connection to the server. To reduce the overhead of frequent connects, the keep-alive header was developed. Keep-alives tells the server to reuse the same socket connection for multiple HTTP requests. If a separate dedicated web server serves all images, you can disable this option. This technique can substantially improve resource utilization.
KeepAliveTimeout	15	The number of seconds to keep the socket connection alive. This time includes the generation of content by the server and acknowledgements by the client. If the client does not respond in time, it must make a new connection. This value should be kept low as the socket will be idle for extended periods otherwise.
MaxKeepAliveRequests	100	Socket connections will be terminated when the number of requests set by MaxKeepAliveRequests is reached. Keep this to a high value below MaxClients or ThreadsPerChild.
TimeOut	300	Disconnect when idle time exceeds this value. You can set this value lower if your clients have low latencies.
LimitRequestBody	0	Maximum size of a PUT or POST. O means there is no limit.

If you do not require DNS lookups and you are not using the htaccess file to configure Apache settings for individual directories you can set:

# disable DNS lookups: PHP scripts only get the IP address

HostnameLookups off

# disable htaccess checks

AllowOverride none

</Directory>

If you are not worried about the directory security when accessing symbolic links, turn on FollowSymLinks and turn off SymLinksIfOwnerMatch to prevent additional lstat() system calls from being made:

Options FollowSymLinks

#Options SymLinksIfOwnerMatch

(b) IIS Tuning

IIS is a multi-threaded web server available on Windows NT and 2000. From the Internet Services Manager, it is possible to tune the following parameters:

Performance Tuning based on the number of hits per day.	Determines how much memory to preallocate for IIS. (Performance Tab).
Bandwidth throttling	Controls the bandwidth per second allocated per web site. (Performance Tab).
Process throttling	Controls the CPU% available per Web site. (Performance Tab).
Timeout	Default is 900 seconds. Set to a lower value on a Local Area Network. (Web Site Tab)
HTTP Compression	In IIS 5, you can compress dynamic pages, html and images. Can be configured to cache compressed static html and images. By default compression is off. HTTP compression has to be enabled for the entire physical server. To turn it on open the IIS console, right-click on the server (not any of the subsites, but the server in the left-hand pane), and get Properties. Click on the Service tab, and select "Compress application files" to compress dynamic content, and "Compress static files" to compress static content.

You can also configure the default isolation level of your web site. In the Home Directory tab under Application Protection, you can define your level of isolation. A highly isolated web site will run slower because it is running as a separate process from IIS, while running web site in the IIS process is the fastest but will bring down the server if there are serious bugs in the web site code. Currently I recommend running PHP web sites using CGI, or using ISAPI with Application Protection set to high.

You can also use regedit.exe to modify following IIS 5 registry settings stored at the following location:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Inetinfo\Parameters\

MemCacheSize	Sets the amount of memory that IIS will use for its file cache. By default IIS will use 50% of available memory. Increase if IIS is the only application on the server. Value is in megabytes.
MaxCachedFileSize	Determines the maximum size of a file cached in the file cache in bytes. Default is 262,144 (256K).
ObjectCacheTTL	Sets the length of time (in milliseconds) that objects in the cache are held in memory. Default is 30,000 milliseconds (30 seconds).
MaxPoolThreads	Sets the number of pool threads to create per processor. Determines how many CGI applications can run concurrently. Default is 4. Increase this value if you are using PHP in CGI mode.
ListenBackLog	Specifies the maximum number of active Keep Alive connections that IIS maintains in the connection queue. Default is 15, and should be increased to the number of concurrent connections you want to support. Maximum is 250.

If the settings are missing from this registry location, the defaults are being used.

High Performance on Windows: IIS and FastCGI

After much testing, I find that the best PHP performance on Windows is offered by using IIS with FastCGI. CGI is a protocol for calling external programs from a web server. It is not very fast because CGI programs are terminated after every page request. FastCGI modifies this protocol for high performance, by making the CGI program persist after a page request, and reusing the same CGI program when a new page request comes in.

As the installation of FastCGI with IIS is complicated, you should use the EasyWindows PHP Installer. This will install PHP, FastCGI and Turck MMCache for the best performance possible. This installer can also install PHP for Apache 1.3/2.0.

This section on FastCGI added 21 Oct 2003.

PHP4's Zend Engine

The Zend Engine is the internal compiler and runtime engine used by PHP4. Developed by Zeev Suraski and Andi Gutmans, the Zend Engine is an abbreviation of their names. In the early days of PHP4, it worked in the following fashion:

The PHP script was loaded by the Zend Engine and compiled into Zend opcode. Opcodes, short for operation codes, are low level binary instructions. Then the opcode was executed and the HTML generated sent to the client. The opcode was flushed from memory after execution.

Today, there are a multitude of products and techniques to help you speed up this process. In the following diagram, we show the how modern PHP scripts work; all the shaded boxes are optional.

PHP Scripts are loaded into memory and compiled into Zend opcodes. These opcodes can now be optimized using an optional peephole optimizer called Zend Optimizer. Depending on the script, it can increase the speed of your PHP code by 0-50%.

Formerly after execution, the opcodes were discarded. Now the opcodes can be optionally cached in memory using several alternative open source products and the Zend Accelerator (formerly Zend Cache), which is a commercial closed source product. The only opcode cache that is compatible with the Zend Optimizer is the Zend Accelerator. An opcode cache speeds execution by removing the script loading and compilation steps. Execution times can improve between 10-200% using an opcode cache.

Where to find Opcode Caches

Zend Accelerator: A commercial opcode cache developed by the Zend Engine team. Very reliable and robust. Visit http://zend.com for more information.

You will need to test the following open source opcode caches before using them on production servers as their performance and reliability very much depends on the PHP scripts you run.

Turcke MMCache: http://turck-mmcache.sourceforge.net/ is no longer maintained. See eAccelerator, which is a branch of mmcache that is actively maintained (Added 28 Feb 2005).

Alternative PHP Cache: http://apc.communityconnect.com/

PHP Accelerator: http://www.php-accelerator.co.uk/

AfterBurner Cache: http://www.bwcache.bware.it/

One of the secrets of high performance is not to write faster PHP code, but to avoid executing PHP code by caching generated HTML in a file or in shared memory. The PHP script is only run once and the HTML is captured, and future invocations of the script will load the cached HTML. If the data needs to be updated regularly, an expiry value is set for the cached HTML. HTML caching is not part of the PHP language nor Zend Engine, but implemented using PHP code. There are many class libraries that do this. One of them is the PEAR Cache, which we will cover in the next section. Another is the Smarty template library.

Finally, the HTML sent to a web client can be compressed. This is enabled by placing the following code at the beginning of your PHP script:

<?php

ob_start("ob_gzhandler");

:
:

If your HTML is highly compressible, it is possible to reduce the size of your HTML file by 50-80%, reducing network bandwidth requirements and latencies. The downside is that you need to have some CPU power to spare for compression.

HTML Caching with PEAR Cache

The PEAR Cache is a set of caching classes that allows you to cache multiple types of data, including HTML and images.

The most common use of the PEAR Cache is to cache HTML text. To do this, we use the Output buffering class which caches all text printed or echoed between the start() and end() functions:

require_once("Cache/Output.php");

$cache = new Cache_Output("file", array("cache_dir" => "cache/") );

if ($contents = $cache->start(md5("this is a unique key!"))) {

#
# aha, cached data returned
#

print $contents;
print "Cache Hit";

} else {

#
# no cached data, or cache expired
#

print "Don't leave home without it…"; # place in cache
 print "Stand and deliver"; # place in cache
 print $cache->end(10);

}

Since I wrote these lines, a superior PEAR cache system has been developed: Cache Lite; and for more sophisticated distributed caching, see memcached (Added 28 Feb 2005).

The Cache constructor takes the storage driver to use as the first parameter. File, database and shared memory storage drivers are available; see the pear/Cache/Container directory. Benchmarks by Ulf Wendel suggest that the "file" storage driver offers the best performance. The second parameter is the storage driver options. The options are "cache_dir", the location of the caching directory, and "filename_prefix", which is the prefix to use for all cached files. Strangely enough, cache expiry times are not set in the options parameter.

To cache some data, you generate a unique id for the cached data using a key. In the above example, we used md5("this is a unique key!").

The start() function uses the key to find a cached copy of the contents. If the contents are not cached, an empty string is returned by start(), and all future echo() and print() statements will be buffered in the output cache, until end() is called.

The end() function returns the contents of the buffer, and ends output buffering. The end() function takes as its first parameter the expiry time of the cache. This parameter can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to default to 24 hours.

Another way to use the PEAR cache is to store variables or other data. To do so, you can use the base Cache class:

<?php

require_once("Cache.php");

$cache = new Cache("file", array("cache_dir" => "cache/") );
$id = $cache->generateID("this is a unique key");

if ($data = $cache->get($id)) {

print "Cache hit. Data: $data";

} else {

$data = "The quality of mercy is not strained...";
 $cache->save($id, $data, $expires = 60);
 print "Cache miss. ";

}

To save the data we use save(). If your unique key is already a legal file name, you can bypass the generateID() step. Objects and arrays can be saved because save() will serialize the data for you. The last parameter controls when the data expires; this can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to use the default of 24 hours. To retrieve the cached data we use get().

You can delete a cached data item using $cache->delete($id) and remove all cached items using $cache->flush().

New: A faster Caching class is Cache-Lite. Highly recommended.

Using Benchmarks

In earlier section we have covered many performance issues. Now we come to the meat and bones, how to go about measuring and benchmarking your code so you can obtain decent information on what to tune.

If you want to perform realistic benchmarks on a web server, you will need a tool to send HTTP requests to the server. On Unix, common tools to perform benchmarks include ab (short for apachebench) which is part of the Apache release, and the newer flood (httpd.apache.org/test/flood). On Windows NT/2000 you can use Microsoft's free Web Application Stress Tool (webtool.rte.microsoft.com).

These programs can make multiple concurrent HTTP requests, simulating multiple web clients, and present you with detailed statistics on completion of the tests.

You can monitor how your server behaves as the benchmarks are conducted on Unix using "vmstat 1". This prints out a status report every second on the performance of your disk i/o, virtual memory and CPU load. Alternatively, you can use "top d 1" which gives you a full screen update on all processes running sorted by CPU load every 1 second.

On Windows 2000, you can use the Performance Monitor or the Task Manager to view your system statistics.

If you want to test a particular aspect of your code without having to worry about the HTTP overhead, you can benchmark using the microtime(), which returns the current time accurate to the microsecond as a string. The following function will convert it into a number suitable for calculations.

function getmicrotime()
{
list($usec, $sec) = explode(" ",microtime());
return ((float)$usec + (float)$sec);
}

$time = getmicrotime();

#
# benchmark code here
#

echo "Time elapsed: ",getmicrotime() - $time, " seconds";

Alternatively, you can use a profiling tool such as APD or XDebug. Also see my article squeezing code with xdebug.

Benchmarking Case Study

This case study details a real benchmark we did for a client. In this instance, the customer wanted a guaranteed response time of 5 seconds for all PHP pages that did not involve running long SQL queries. The following server configuration was used: an Apache 1.3.20 server running PHP 4.0.6 on Red Hat 7.2 Linux. The hardware was a twin Pentium III 933 MHz beast with 1 Gb of RAM. The HTTP requests will be for the PHP script "testmysql.php". This script reads and processes about 20 records from a MySQL database running on another server. For the sake of simplicity, we assume that all graphics are downloaded from another web server.

We used "ab" as the benchmarking tool. We set "ab" to perform 1000 requests (-n1000), using 10 simultaneous connections (-c10). Here are the results:

# ab   -n1000 -c10 http://192.168.0.99/php/testmysql.php
This is ApacheBench, Version 1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/

Server Software:        Apache/1.3.20
Server Hostname:        192.168.0.99
Server Port:            80

Document Path:          /php/testmysql.php
Document Length:        25970 bytes

Concurrency Level:      10
Time taken for tests:   128.672 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      26382000 bytes
HTML transferred:       25970000 bytes
Requests per second:    7.77
Transfer rate:          205.03 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        0     9   114
Processing:   698  1274  2071
Total:        698  1283  2185

While running the benchmark, on the server side we monitored the resource utilization using the command "top d 1". The parameters "d 1" mean to delay 1 second between updates. The output is shown below.

10:58pm  up  3:36,  2 users,  load average: 9.07, 3.29, 1.79
74 processes: 63 sleeping, 11 running, 0 zombie, 0 stopped
CPU0 states: 92.0% user,  7.0% system,  0.0% nice,  0.0% idle
CPU1 states: 95.0% user,  4.0% system,  0.0% nice,  0.0% idle
Mem:  1028484K av,  230324K used,  798160K free,      64K shrd,   27196K buff
Swap: 2040244K av,       0K used, 2040244K free                   30360K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1142 apache    20   0  7280 7280  3780 R    21.2  0.7   0:20 httpd
 1154 apache    17   0  8044 8044  3788 S    19.3  0.7   0:20 httpd
 1155 apache    20   0  8052 8052  3796 R    19.3  0.7   0:20 httpd
 1141 apache    15   0  6764 6764  3780 S    14.7  0.6   0:20 httpd
 1174 apache    14   0  6848 6848  3788 S    12.9  0.6   0:20 httpd
 1178 apache    13   0  6864 6864  3804 S    12.9  0.6   0:19 httpd
 1157 apache    15   0  7536 7536  3788 R    11.0  0.7   0:19 httpd
 1159 apache    15   0  7540 7540  3788 R    11.0  0.7   0:19 httpd
 1148 apache    11   0  6672 6672  3784 S    10.1  0.6   0:20 httpd
 1158 apache    14   0  7400 7400  3788 R    10.1  0.7   0:19 httpd
 1163 apache    20   0  7540 7540  3788 R    10.1  0.7   0:19 httpd
 1169 apache    12   0  6856 6856  3796 S    10.1  0.6   0:20 httpd
 1176 apache    16   0  8052 8052  3796 R    10.1  0.7   0:19 httpd
 1171 apache    15   0  7984 7984  3780 S     9.2  0.7   0:18 httpd
 1170 apache    16   0  7204 7204  3796 R     6.4  0.7   0:20 httpd
 1168 apache    10   0  6856 6856  3796 S     4.6  0.6   0:20 httpd
 1377 natsoft   11   0  1104 1104   856 R     2.7  0.1   0:02 top
 1152 apache     9   0  6752 6752  3788 S     1.8  0.6   0:20 httpd
 1167 apache     9   0  6848 6848  3788 S     0.9  0.6   0:19 httpd
    1 root       8   0   520  520   452 S     0.0  0.0   0:04 init
    2 root       9   0     0    0     0 SW    0.0  0.0   0:00 keventd

Looking at the output of "top", the twin CPU Apache server is running flat out with 0% idle time. What is worse is that the load average is 9.07 for the past minute (and 3.29 for the past 5 minutes, 1.79 for the past 15 minutes). The load average is the average number of processes that are ready to be run. For a twin processor server, any load above 2.0 means that the system is being overloaded. You might notice that there is a close relationship between load (9.07) and the number of simultaneous connections (10) that we have defined with ab.

Luckily we have plenty of physical memory, with about 798,160 Mb free and no virtual memory used.

Further down we can see the processes ordered by CPU utilization. The most active ones are the Apache httpd processes. The first httpd task is using 7280K of memory, and is taking an average of 21.2% of CPU and 0.7% of physical memory. The STAT column indicates the status: R is runnable, S is sleeping, and W means that the process is swapped out.

Given the above figures, and assuming this a typical peak load, we can perform some planning. If the load average is 9.0 for a twin-CPU server and assuming each task takes about the same amount of time to complete, then a lightly loaded server should be 9.0 / 2 CPUs = 4.5 times faster. So a HTTP request that used to take 1.283 seconds to satisfy at peak load will take about 1.283 / 4.5 = 0.285 seconds to complete.

To verify this, we benchmarked with 2 simultaneous client connections (instead of 10 in the previous benchmark) to give an average of 0.281 seconds, very close to the 0.285 seconds prediction!

# ab   -n100 -c2 http://192.168.0.99/php/testmysql.php
 
[ some lines omitted for brevity ]

Requests per second:    7.10
Transfer rate:          187.37 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        0     2    40
Processing:   255   279   292
Total:        255   281   332

Conversely, doubling the connections, we can predict that the average connection time should double from 1.283 to 2.566 seconds. In the benchmarks, the actual time was 2.570 seconds.

Overload on 40 connections

When we pushed the benchmark to use 40 connections, the server overloaded with 35% failed requests. On further investigation, it was because the MySQL server persistent connects were failing because of "Too Many Connections".

The benchmark also demonstrates the lingering behavior of Apache child processes. Each PHP script uses 2 persistent connections, so at 40 connections, we should only be using at most 80 persistent connections, well below the default MySQL max_connections of 100. However Apache idle child processes are not assigned immediately to new requests due to latencies, keep-alives and other technical reasons; these lingering child processes held the remaining 20+ persistent connections that were "the straws that broke the Camel's back".

The Fix

By switching to non-persistent database connections, we were able to fix this problem and obtained a result of 5.340 seconds. An alternative solution would have been to increase the MySQL max_connections parameter from the default of 100.

Conclusions

The above case study once again shows us that optimizing your performance is extremely complex. It requires an understanding of multiple software subsystems including network routing, the TCP/IP stack, the amount of physical and virtual memory, the number of CPUs, the behavior of Apache child processes, your PHP scripts, and the database configuration.

In this case the PHP code was quite well tuned, so the first bottleneck was the CPU, which caused a slowdown in response time. As the load increased, the system slowed down in a near linear fashion (which is a good sign) until we encountered the more serious bottleneck of MySQL client connections. This caused multiple errors in our PHP pages until we fixed it by switching to non-persistent connections.

From the above figures, we can calculate for a given desired response time, how many simultaneous HTTP connections we can handle. Assuming two-way network latencies of 0.5 seconds on the Internet (0.25s one way), we can predict:

As our client wanted a maximum response time of 5 seconds, the server can handle up to 34 simultaneous connections per second. This works out to a peak capacity of 34/5 = 6.8 page views per second.

To get the maximum number of page views a day that the server can handle, multiply the peak capacity per second by 50,000 (this technique is suggested by the webmasters at pair.com, a large web hosting company), to give 340,000 page views a day.

Code Optimizations

The patient reader who is still wondering why so much emphasis is given to discussing non-PHP issues is reminded that PHP is a fast language, and many of the likely bottlenecks causing slow speeds lie outside PHP.

Most PHP scripts are simple. They involve reading some session information, loading some data from a content management system or database, formatting the appropriate HTML and echoing the results to the HTTP client. Assuming that a typical PHP script completes in 0.1 seconds and the Internet latency is 0.2 seconds, only 33% of the 0.3 seconds response time that the HTTP client sees is actual PHP computation. So if you improve a script's speed by 20%, the HTTP client will see response times drop to 0.28 seconds, which is an insignificant improvement. Of course the server can probably handle 20% more requests for the same page, so scalability has improved.

The above example does not mean we should throw our hands up and give up. It means that we should not feel proud tweaking the last 1% of speed from our code, but we should spend our time optimizing worthwhile areas of our code to get higher returns.

High Return Code Optimizations

The places where such high returns are achievable are in the while and for loops that litter our code, where each slowdown in the code is magnified by the number of times we iterate over them. The best way of understanding what can be optimized is to use a few examples:

Example 1

Here is one simple example that prints an array:

for ($j=0; $j<sizeof($arr); $j++)
echo $arr[$j]." ";

This can be substantially speeded up by changing the code to:

for ($j=0, $max = sizeof($arr), $s = ''; $j<$max; $j++)
$s .= $arr[$j]." ";

echo $s;

First we need to understand that the expression $j<sizeof($arr) is evaluated within the loop multiple times. As sizeof($arr) is actually a constant (invariant), we move the cache the sizeof($arr) in the $max variable. In technical terms, this is called loop invariant optimization.

The second issue is that in PHP 4, echoing multiple times is slower than storing everything in a string and echoing it in one call. This is because echo is an expensive operation that could involve sending TCP/IP packets to a HTTP client. Of course accumulating the string in $s has some scalability issues as it will use up more memory, so you can see a trade-off is involved here.

An alternate way of speeding the above code would be to use output buffering. This will accumulate the output string internally, and send the output in one shot at the end of the script. This reduces networking overhead substantially at the cost of more memory and an increase in latency. In some of my code consisting entirely of echo statements, performance improvements of 15% have been observed.

ob_start();
for ($j=0, $max = sizeof($arr), $s = ''; $j<$max; $j++)
echo $arr[$j]." ";

Note that output buffering with ob_start() can be used as a global optimization for all PHP scripts. In long-running scripts, you will also want to flush the output buffer periodically so that some feedback is sent to the HTTP client. This can be done with ob_end_flush(). This function also turns off output buffering, so you might want to call ob_start() again immediately after the flush.

In summary, this example has shown us how to optimize loop invariants and how to use output buffering to speed up our code.

Example 2

In the following code, we iterate through a PEAR DB recordset, using a special formatting function to format a row, and then we echo the results. This time, I benchmarked the execution time at 10.2 ms (this excludes the database connection and SQL execution time):

function FormatRow(&$recordSet)
{
$arr = $recordSet->fetchRow();
return ''.$arr[0].''.$arr[1].'';
}

for ($j = 0; $j < $rs->numRows(); $j++) {
print FormatRow($rs);
}

From example 1, we learnt that we can optimize the code by changing the code to the following (execution time: 8.7 ms):

function FormatRow(&$recordSet)
{
$arr = $recordSet->fetchRow();
return ''.$arr[0].''.$arr[1].'';
}

ob_start();

for ($j = 0, $max = $rs->numRows(); $j < $max; $j++) {
print FormatRow($rs);
}

My benchmarks showed me that the use of $max contributed 0.5 ms and ob_start contributed 1 ms to the 1.5 ms speedup.

However by changing the looping algorithm we can simplify and speed up the code. In this case, execution time is reduced to 8.5 ms:

function FormatRow($arr)
{
return ''.$arr[0].''.$arr[1].';
}

ob_start();

while ($arr = $rs->fetchRow()) {
print FormatRow($arr);
}

One last optimization is possible here. We can remove the overhead of the function call (potentially sacrificing maintainability for speed) to shave off another 0.1 milliseconds (execution time: 8.4 ms):

ob_start();

while ($arr = $rs->fetchRow()) {
print ''.$arr[0].''.$arr[1].'';
}

By switching to PEAR Cache, execution time dropped again to 3.5 ms for cached data:

require_once("Cache/Output.php");

ob_start();

$cache = new Cache_Output("file", array("cache_dir" => "cache/") );

$t = getmicrotime();

if ($contents = $cache->start(md5("this is a unique kexy!"))) {
 print "Cache Hit";
 print $contents;
} else {
 print "Cache Miss";

##
## Code to connect and query database omitted
##

while ($arr = $rs->fetchRow()) {
 print ''.$arr[0].''.$arr[1].'';
 }

print $cache->end(100);
}

print (getmicrotime()-$t);

We summarize the optimization methods below:

ExecutionTime (ms)	Optimization Method
9.9	Initial code, no optimizations, excluding database connection and SQL execution times.
9.2	Using ob_start
8.7	Optimizing loop invariants ($max) and using ob_start
8.5	Changing from for-loop to while-loop, and passing an array to FormatRow()and using ob_start
8.4	Removing FormatRow()and using ob_start
3.5	Using PEAR Cache and using ob_start

From the above figures, you can see that biggest speed improvements are derived not from tweaking the code, but by simple global optimizations such as ob_start(), or using radically different algorithms such as HTML caching.

Optimizing Object-oriented Programming

In March 2001, I conducted some informal benchmarks with classes on PHP 4.0.4pl1, and I derived some advice from the results. The three main points are:

1. Initialise all variables before use.

2. Dereference all global/property variables that are frequently used in a method and put the values in local variables if you plan to access the value more than twice.

3. Try placing frequently used methods in the derived classes.

Warning: as PHP is going through a continuous improvement process, things might change in the future.

More Details

I have found that calling object methods (functions defined in a class) are about twice as slow as a normal function calls. To me that's quite acceptable and comparable to other OOP languages.

Inside a method (the following ratios are approximate only):

Incrementing a local variable in a method is the fastest. Nearly the same as calling a local variable in a function.
Incrementing a global variable is 2 times slow than a local var.
Incrementing a object property (eg. $this->prop++) is 3 times slower than a local variable.
Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one.
Just declaring a global variable without using it in a function also slows things down (by about the same amount as incrementing a local var). PHP probably does a check to see if the global exists.
Method invocation appears to be independent of the number of methods defined in the class because I added 10 more methods to the test class (before and after the test method) with no change in performance.
Methods in derived classes run faster than ones defined in the base class.
A function call with one parameter and an empty function body takes about the same time as doing 7-8 $localvar++ operations. A similar method call is of course about 15 $localvar++ operations.
Update: 11 July 2004: The above test was on PHP 4.0.4, about 3 years ago. I tested this again in PHP4.3.3 and calling a function now takes about 20 $localvar++ operations, and calling a method takes about 30 $localvar++ operations. This could be because $localvar++ runs faster now, or functions are slower.

Summary of Tweaks

The more you understand the software you are using (Apache, PHP, IIS, your database) and the deeper your knowledge of the operating system, networking and server hardware, the better you can perform global optimizations on your code and your system.

For PHP scripts, the most expensive bottleneck is normally the CPU. Twin CPUs are probably more useful than two Gigabytes of RAM.

Compile PHP with the "configure –-enable-inline-optimization" option to generate the fastest possible PHP executable.

Tune your database and index the fields that are commonly used in your SQL WHERE criteria. ADOdb, the very popular database abstraction library, provides a SQL tuning mode, where you can view your invalid, expensive and suspicious SQL, their execution plans and in which PHP script the SQL was executed.

Use HTML caching if you have data that rarely changes. Even if the data changes every minute, caching can help provided the data is synchronized with the cache. Depending on your code complexity, it can improve your performance by a factor of 10.

Benchmark your most complex code early (or at least a prototype), so you get a feel of the expected performance before it is too late to fix. Try to use realistic amounts of test data to ensure that it scales properly.

Updated 11 July 2004: To benchmark with an execution profile of all function calls, you can try the xdebug extension. For a brief tutorial of how i use xdebug, see squeezing code with xdebug. There are commercial products to do this also, eg. Zend Studio.

Consider using a opcode cache. This gives a speedup of between 10-200%, depending on the complexity of your code. Make sure you do some stress tests before you install a cache because some are more reliable than others.

Use ob_start() at the beginning of your code. This gives you a 5-15% boost in speed for free on Apache. You can also use gzip compression for extra fast downloads (this requires spare CPU cycles).

Consider installing Zend Optimizer. This is free and does some optimizations, but be warned that some scripts actually slow down when Zend Optimizer is installed. The consensus is that Zend Optimizer is good when your code has lots of loops. Today many opcode accelerators have similar features (added this sentence 21 Oct 2003).

Optimize your loops first. Move loop invariants (constants) outside the loop.

Use the array and string functions where possible. They are faster than writing equivalent code in PHP.

The fastest way to concatenate multiple small strings into one large string is to create an output buffer (ob_start) and to echo into the buffer. At the end get the contents using ob_get_contents. This works because memory allocation is normally the killer in string concatenation, and output buffering allocates a large 40K initial buffer that grows in 10K chunks. Added 22 June 2004.

Pass objects and arrays using references in functions. Return objects and arrays as references where possible also. If this is a short script, and code maintenance is not an issue, you can consider using global variables to hold the objects or arrays.

If you have many PHP scripts that use session variables, consider recompiling PHP using the shared memory module for sessions, or use a RAM Disk. Enable this with "configure -–with-mm" then re-compile PHP, and set session.save_handler=mm in php.ini.

For searching for substrings, the fastest code is using strpos(), followed by preg_match() and lastly ereg(). Similarly, str_replace() is faster than preg_replace(), which is faster than ereg_replace().

Added 11 July 2004: Order large switch statements with most frequently occuring cases on top. If some of the most common cases are in the default section, consider explicitly defining these cases at the top of the switch statement.

For processing XML, parsing with regular expressions is significantly faster than using DOM or SAX.

Unset() variables that are not used anymore to reduce memory usage. This is mostly useful for resources and large arrays.

For classes with deep hierarchies, functions defined in derived classes (child classes) are invoked faster than those defined in base class (parent class). Consider replicating the most frequently used code in the base class in the derived classes too.

Consider writing your code as a PHP extension or a Java class or a COM object if your need that extra bit of speed. Be careful of the overhead of marshalling data between COM and Java.

Useless Optimizations

Some optimizations are useful. Others are a waste of time - sometimes the improvement is neglible, and sometimes the PHP internals change, rendering the tweak obsolete.

Here are some common PHP legends:

a. echo is faster than print

Echo is supposed to be faster because it doesn't return a value while print does. From my benchmarks with PHP 4.3, the difference is neglible. And under some situations, print is faster than echo (when ob_start is enabled).

b. strip off comments to speed up code

If you use an opcode cache, comments are already ignored. This is a myth from PHP3 days, when each line of PHP was interpreted in run-time.

c. 'var='.$var is faster than "var=$var"

This used to be true in PHP 4.2 and earlier. This was fixed in PHP 4.3. Note (22 June 2004): apparently the 4.3 fix reduced the overhead, but not completely. However I find the performance difference to be negligible.

Do References Speed Your Code?

References do not provide any performance benefits for strings, integers and other basic data types. For example, consider the following code:

function TestRef(&$a)
{
    $b = $a;
    $c = $a;
}
$one = 1;
ProcessArrayRef($one);

And the same code without references:

function TestNoRef($a)
{
    $b = $a;
    $c = $a;
}
$one = 1;
ProcessArrayNoRef($one);

PHP does not actually create duplicate variables when "pass by value" is used, but uses high speed reference counting internally. So in TestRef(), $b and $c take longer to set because the references have to be tracked, while in TestNoRef(), $b and $c just point to the original value of $a, and the reference counter is incremented. So TestNoRef() will execute faster than TestRef().

In contrast, functions that accept array and object parameters have a performance advantage when references are used. This is because arrays and objects do not use reference counting, so multiple copies of an array or object are created if "pass by value" is used. So the following code:

function ObjRef(&$o)
{
$a =$o->name;
}

is faster than:

$function ObjRef($o)
{
$a = $o->name;
}

Note: In PHP 5, all objects are passed by reference automatically, without the need of an explicit & in the parameter list. PHP 5 object performance should be significantly faster.

Many thanks also to Andrei Zmievski for reviewing this article.

email: info#phplens.com (change # to @) telephone (malaysia): 60-3 7806 1216 fax (malaysia): 60-3 7806 1210