Last revised 28 Feb 2005. If you want to see what has changed, search for this date in this article.
If you like this article, visit my blog, PHP Everywhere for related articles.
A HOWTO on Optimizing PHP
PHP is a very fast programming language, but there is more to optimizing PHP
than just speed of code execution.
In this chapter, we explain why optimizing PHP involves many factors which
are not code related, and why tuning PHP requires an understanding of how PHP
performs in relation to all the other subsystems on your server, and then identifying
bottlenecks caused by these subsystems and fixing them. We also cover how to
tune and optimize your PHP scripts so they run even faster.
Achieving High Performance
When we talk about good performance, we are not talking about how fast your
PHP scripts will run. Performance is a set of tradeoffs between scalability
and speed. Scripts tuned to use fewer resources might be slower than scripts
that perform caching, but more copies of the same script can be run at one time
on a web server.
In the example below, A.php is a sprinter that can run fast, and B.php is a
marathon runner than can jog forever at the nearly the same speed. For light
loads, A.php is substantially faster, but as the web traffic increases, the
performance of B.php only drops a little bit while A.php just runs out of steam.
Let us take a more realistic example to clarify matters further. Suppose we need to write a PHP script that reads a 250K file and generates a HTML summary of the file. We write 2 scripts that do the same thing: hare.php that reads the whole file into memory at once and processes it in one pass, and tortoise.php that reads the file, one line at time, never keeping more than the longest line in memory. Tortoise.php will be slower as multiple reads are issued, requiring more system calls.
Hare.php requires 0.04 seconds of CPU and 10 Mb RAM and tortoise.php requires 0.06 seconds of CPU and 5 Mb RAM. The server has 100 Mb free actual RAM and its CPU is 99% idle. Assume no memory fragmentation occurs to simplify things.
At 10 concurrent scripts running, hare.php will run out of memory (10 x 10
= 100). At that point, tortoise.php will still have 50 Mb of free memory. The
11th concurrent script to run will bring hare.php to its knees as it starts
using virtual memory, slowing it down to maybe half its original speed; each
invocation of hare.php now takes 0.08 seconds of CPU time. Meanwhile, tortoise.php
will be still be running at its normal 0.06 seconds CPU time.
In the table below, the faster php script for different loads is in bold:
Connections
|
CPU seconds required to satisfy 1 HTTP request
|
CPU seconds required to satisfy 10 HTTP requests
|
CPU seconds required to satisfy 11 HTTP requests
|
hare.php
|
0.04
|
0.40
|
0.88
(runs out of RAM)
|
tortoise.php
|
0.06
|
0.60
|
0.66
|
As the above example shows, obtaining good performance is not merely writing
fast PHP scripts. High performance PHP requires a good understanding of the
underlying hardware, the operating system and supporting software such as the
web server and database.
Bottlenecks
The hare and tortoise example has shown us that bottlenecks cause slowdowns.
With infinite RAM, hare.php will always be faster than tortoise.php. Unfortunately,
the above model is a bit simplistic and there are many other bottlenecks to
performance apart from RAM:
(a) Networking
Your network is probably the biggest bottleneck. Let us say you have a 10 Mbit
link to the Internet, over which you can pump 1 megabyte of data per second.
If each web page is 30k, a mere 33 web pages per second will saturate the line.
More subtle networking bottlenecks include frequent access to slow network
services such as DNS, or allocating insufficient memory for networking software.
(b) CPU
If you monitor your CPU load, sending plain HTML pages over a network will
not tax your CPU at all because as we mentioned earlier, the bottleneck will
be the network. However for the complex dynamic web pages that PHP generates,
your CPU speed will normally become the limiting factor. Having a server with
multiple processors or having a server farm can alleviate this.
(c) Shared Memory
Shared memory is used for inter-process communication, and to store resources
that are shared between multiple processes such as cached data and code. If
insufficient shared memory is allocated any attempt to access resources that
use shared memory such as database connections or executable code will perform
poorly.
(d) File System
Accessing a hard disk can be 50 to 100 times slower than reading data from
RAM. File caches using RAM can alleviate this. However low memory conditions
will reduce the amount of memory available for the file-system cache, slowing
things down. File systems can also become heavily fragmented, slowing down disk
accesses. Heavy use of symbolic links on Unix systems can slow down disk accesses
too.
Default Linux installs are also notorious for setting hard disk default settings
which are tuned for compatibility and not for speed. Use the command hdparm
to tune your Linux hard disk settings.
(e) Process Management
On some operating systems such as Windows creating new processes is a slow
operation. This means CGI applications that fork a new process on every invocation
will run substantially slower on these operating systems. Running PHP in multi-threaded
mode should improve response times (note: older versions of PHP are not stable
in multi-threaded mode).
Avoid overcrowding your web server with too many unneeded processes. For example,
if your server is purely for web serving, avoid running (or even installing)
X-Windows on the machine. On Windows, avoid running Microsoft Find Fast (part
of Office) and 3-dimensional screen savers that result in 100% CPU utilization.
Some of the programs that you can consider removing include unused networking
protocols, mail servers, antivirus scanners, hardware drivers for mice, infrared
ports and the like. On Unix, I assume you are accessing your server using SSH.
Then you can consider removing:
deamons such as telnetd, inetd, atd,
ftpd, lpd, sambad
sendmail for incoming mail
portmap for NFS
xfs, fvwm, xinit, X
You can also disable at startup various programs by modifying the startup files
which are usually stored in the /etc/init* or /etc/rc*/init* directory.
Also review your cron jobs to see if you can remove them or reschedule them
for off-peak periods.
(f) Connecting to Other Servers
If your web server requires services running on other servers, it is possible
that those servers become the bottleneck. The most common example of this is
a slow database server that is servicing too many complicated SQL requests from
multiple web servers.
When to Start Optimizing?
Some people say that it is better to defer tuning until after the coding
is complete. This advice only makes sense if your programming team's coding
is of a high quality to begin with, and you already have a good feel of
the performance parameters of your application. Otherwise you are exposing
yourselves to the risk of having to rewrite substantial portions of your
code after testing.
My advice is that before you design a software application, you should
do some basic benchmarks on the hardware and software to get a feel for
the maximum performance you might be able to achieve. Then as you design
and code the application, keep the desired performance parameters in mind,
because at every step of the way there will be tradeoffs between performance,
availability, security and flexibility.
Also choose good test data. If your database is expected to hold 100,000
records, avoid testing with only a 100 record database – you will regret
it. This once happened to one of the programmers in my company; we did
not detect the slow code until much later, causing a lot of wasted time
as we had to rewrite a lot of code that worked but did not scale.
|
Tuning Your Web Server for PHP
We will cover how to get the best PHP performance for the two most common web
servers in use today, Apache 1.3 and IIS. A lot of the advice here
is relevant for serving HTML also.
The authors of PHP have stated that there is no performance nor
scalability advantage in using Apache 2.0 over Apache 1.3 with PHP,
especially in multi-threaded mode. When running Apache 2.0 in pre-forking
mode, the following discussion is still relevant (21 Oct 2003).
(a) Apache 1.3/2.0
Apache is available on both Unix and Windows. It is the most popular
web server in the world. Apache 1.3 uses a pre-forking model
for web serving. When Apache starts up, it creates multiple child
processes that handle HTTP requests. The initial parent process
acts like a guardian angel, making sure that all the child processes
are working properly and coordinating everything. As more HTTP requests
come in, more child processes are spawned to process them. As the
HTTP requests slow down, the parent will kill the idle child processes,
freeing up resources for other processes. The beauty of this scheme
is that it makes Apache extremely robust. Even if a child process
crashes, the parent and the other child processes are insulated
from the crashing child.
The pre-forking model is not as fast as some other possible designs,
but to me that it is "much ado about nothing" on a server serving
PHP scripts because other bottlenecks will kick in long before Apache
performance issues become significant. The robustness and reliability
of Apache is more important.
Apache 2.0 offers operation in multi-threaded mode. My benchmarks
indicate there is little performance advantage in this mode. Also
be warned that many PHP extensions are not compatible (e.g. GD and
IMAP). Tested with Apache 2.0.47 (21 Oct 2003).
Apache is configured using the httpd.conf file. The following parameters are
particularly important in configuring child processes:
Directive
|
Default
|
Description
|
MaxClients
|
256
|
The maximum number of child processes to create. The default means that
up to 256 HTTP requests can be handled concurrently. Any further connection
requests are queued.
|
StartServers
|
5
|
The number of child processes to create on startup.
|
MinSpareServers
|
5
|
The number of idle child processes that should be created. If the number
of idle child processes falls to less than this number, 1 child is created
initially, then 2 after another second, then 4 after another second, and
so forth till 32 children are created per second.
|
MaxSpareServers
|
10
|
If more than this number of child processes are alive, then these extra
processes will be terminated.
|
MaxRequestsPerChild
|
0
|
Sets the number of HTTP requests a child can handle before terminating.
Setting to 0 means never terminate. Set this to a value to between 100
to 10000 if you suspect memory leaks are occurring, or to free under-utilized
resources.
|
For large sites, values close to the following might be better:
MinSpareServers 32
MaxSpareServers 64
Apache on Windows behaves differently. Instead of using child processes, Apache uses threads. The above parameters are not used. Instead we have one parameter: ThreadsPerChild which defaults to 50. This parameter sets the number of threads that can be spawned by Apache. As there is only one child process in the Windows version, the default setting of 50 means only 50 concurrent HTTP requests can be handled. For web servers experiencing higher traffic, increase this value to between 256 to 1024.
Other useful performance parameters you can change include:
Directive
|
Default
|
Description
|
SendBufferSize
|
Set to OS default
|
Determines the size of the output buffer (in bytes) used in TCP/IP connections.
This is primarily useful for congested or slow networks when packets need
to be buffered; you then set this parameter close to the size of the largest
file normally downloaded. One TCP/IP buffer will be created per client
connection.
|
KeepAlive [on|off]
|
On
|
In the original HTTP specification, every HTTP request had to establish
a separate connection to the server. To reduce the overhead of frequent
connects, the keep-alive header was developed. Keep-alives tells the server
to reuse the same socket connection for multiple HTTP requests.
If a separate dedicated web server serves all images, you can disable
this option. This technique can substantially improve resource utilization.
|
KeepAliveTimeout
|
15
|
The number of seconds to keep the socket connection alive. This time
includes the generation of content by the server and acknowledgements
by the client. If the client does not respond in time, it must make a
new connection.
This value should be kept low as the socket will be idle for extended
periods otherwise.
|
MaxKeepAliveRequests
|
100
|
Socket connections will be terminated when the number of requests set
by MaxKeepAliveRequests is reached. Keep this to a high value below MaxClients
or ThreadsPerChild.
|
TimeOut
|
300
|
Disconnect when idle time exceeds this value. You can set this value
lower if your clients have low latencies.
|
LimitRequestBody
|
0
|
Maximum size of a PUT or POST. O means there is no limit.
|
If you do not require DNS lookups and you are not using the htaccess file to configure Apache settings for individual directories you can set:
# disable DNS lookups: PHP scripts only get the IP address
HostnameLookups off
# disable htaccess checks
<Directory />
AllowOverride none
</Directory>
If you are not worried about the directory security when accessing symbolic links, turn on FollowSymLinks and turn off SymLinksIfOwnerMatch to prevent additional lstat() system calls from being made:
Options FollowSymLinks
#Options SymLinksIfOwnerMatch
(b) IIS Tuning
IIS is a multi-threaded web server available on Windows NT and 2000. From the Internet Services Manager, it is possible to tune the following parameters:
Performance Tuning based on the number of hits per day. |
Determines how much memory to preallocate for IIS. (Performance Tab). |
Bandwidth throttling |
Controls the bandwidth per second allocated per web site. (Performance Tab). |
Process throttling |
Controls the CPU% available per Web site. (Performance Tab). |
Timeout |
Default is 900 seconds. Set to a lower value on a Local Area Network. (Web Site Tab) |
HTTP Compression |
In IIS 5, you can compress dynamic pages, html and images. Can be configured to cache compressed static html and images. By default compression is off.
HTTP compression has to be enabled for the entire physical server. To turn it on open the IIS console, right-click on the server (not any of the subsites, but the server in the left-hand pane), and get Properties. Click on the Service tab, and select "Compress application files" to compress dynamic content, and "Compress static files" to compress static content. |
You can also configure the default isolation level of your web site. In the Home Directory tab under Application Protection, you can define your level of isolation. A highly isolated web site will run slower because it is running as a separate process from IIS, while running web site in the IIS process is the fastest but will bring down the server if there are serious bugs in the web site code. Currently I recommend running PHP web sites using CGI, or using ISAPI with Application Protection set to high.
You can also use regedit.exe to modify following IIS 5 registry settings stored at the following location:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Inetinfo\Parameters\
MemCacheSize
|
Sets the amount of memory that IIS will use for its file cache. By default
IIS will use 50% of available memory. Increase if IIS is the only application
on the server. Value is in megabytes.
|
MaxCachedFileSize
|
Determines the maximum size of a file cached in the file cache in bytes.
Default is 262,144 (256K).
|
ObjectCacheTTL
|
Sets the length of time (in milliseconds) that objects in the cache are
held in memory. Default is 30,000 milliseconds (30 seconds).
|
MaxPoolThreads
|
Sets the number of pool threads to create per processor. Determines
how many CGI applications can run concurrently. Default is 4. Increase
this value if you are using PHP in CGI mode.
|
ListenBackLog
|
Specifies the maximum number of active Keep Alive connections that IIS
maintains in the connection queue. Default is 15, and should be increased
to the number of concurrent connections you want to support. Maximum is
250.
|
If the settings are missing from this registry location, the defaults are being
used.
High Performance on Windows: IIS and FastCGI
After much testing, I find that the best PHP performance on Windows
is offered by using IIS with FastCGI. CGI is a protocol for calling
external programs from a web server. It is not very fast because
CGI programs are terminated after every page request. FastCGI modifies
this protocol for high performance, by making the CGI program persist
after a page request, and reusing the same CGI program when a new
page request comes in.
As the installation of FastCGI with IIS is complicated, you should
use the EasyWindows
PHP Installer. This will install PHP, FastCGI and Turck MMCache
for the best performance possible. This installer can also install
PHP for Apache 1.3/2.0.
This section on FastCGI added 21 Oct 2003.
PHP4's Zend Engine
The Zend Engine is the internal compiler and runtime engine used by PHP4. Developed by Zeev Suraski and Andi Gutmans, the Zend Engine is an abbreviation of their names. In the early days of PHP4, it worked in the following fashion:
The PHP script was loaded by the Zend Engine and compiled into Zend opcode. Opcodes, short for operation codes, are low level binary instructions. Then the opcode was executed and the HTML generated sent to the client. The opcode was flushed from memory after execution.
Today, there are a multitude of products and techniques to help you speed up this process. In the following diagram, we show the how modern PHP scripts work; all the shaded boxes are optional.
PHP Scripts are loaded into memory and compiled into Zend opcodes. These opcodes
can now be optimized using an optional peephole optimizer called Zend Optimizer.
Depending on the script, it can increase the speed of your PHP code by 0-50%.
Formerly after execution, the opcodes were discarded. Now the opcodes can be
optionally cached in memory using several alternative open source products and
the Zend Accelerator (formerly Zend Cache), which is a commercial closed source
product. The only opcode cache that is compatible with the Zend Optimizer is
the Zend Accelerator. An opcode cache speeds execution by removing the script
loading and compilation steps. Execution times can improve between 10-200% using
an opcode cache.
One of the secrets of high performance is not to write faster PHP code, but
to avoid executing PHP code by caching generated HTML in a file or in shared
memory. The PHP script is only run once and the HTML is captured, and future
invocations of the script will load the cached HTML. If the data needs to be
updated regularly, an expiry value is set for the cached HTML. HTML caching
is not part of the PHP language nor Zend Engine, but implemented using PHP code.
There are many class libraries that do this. One of them is the PEAR Cache,
which we will cover in the next section. Another is the Smarty
template library.
Finally, the HTML sent to a web client can be compressed. This is enabled by
placing the following code at the beginning of your PHP script:
<?php
ob_start("ob_gzhandler");
:
:
?>
If your HTML is highly compressible, it is possible to reduce the size of your HTML file by 50-80%, reducing network bandwidth requirements and latencies. The downside is that you need to have some CPU power to spare for compression.
HTML Caching with PEAR Cache
The PEAR Cache is a set of caching classes that allows you to cache multiple types of data, including HTML and images.
The most common use of the PEAR Cache is to cache HTML text. To do this, we use the Output buffering class which caches all text printed or echoed between the start() and end() functions:
require_once("Cache/Output.php");
$cache = new Cache_Output("file", array("cache_dir" => "cache/")
);
if ($contents = $cache->start(md5("this is a unique key!")))
{
#
# aha, cached data returned
#
print $contents;
print "<p>Cache Hit</p>";
} else {
#
# no cached data, or cache expired
#
print "<p>Don't leave home without it…</p>";
# place in cache
print "<p>Stand and deliver</p>"; # place
in cache
print $cache->end(10);
}
Since I wrote these lines, a superior PEAR cache system has been developed: Cache Lite;
and for more sophisticated distributed caching, see memcached (Added 28 Feb 2005).
The Cache constructor takes the storage driver to use as the first parameter. File, database and shared memory storage drivers are available; see the pear/Cache/Container directory. Benchmarks by Ulf Wendel suggest that the "file" storage driver offers the best performance. The second parameter is the storage driver options. The options are "cache_dir", the location of the caching directory, and "filename_prefix", which is the prefix to use for all cached files. Strangely enough, cache expiry times are not set in the options parameter.
To cache some data, you generate a unique id for the cached data using a key. In the above example, we used md5("this is a unique key!").
The start() function uses the key to find a cached copy of the contents. If the contents are not cached, an empty string is returned by start(), and all future echo() and print() statements will be buffered in the output cache, until end() is called.
The end() function returns the contents of the buffer, and ends output buffering. The end() function takes as its first parameter the expiry time of the cache. This parameter can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to default to 24 hours.
Another way to use the PEAR cache is to store variables or other data. To do so, you can use the base Cache class:
<?php
require_once("Cache.php");
$cache = new Cache("file", array("cache_dir" => "cache/")
);
$id = $cache->generateID("this is a unique key");
if ($data = $cache->get($id)) {
print "Cache hit.<br>Data: $data";
} else {
$data = "The quality of mercy is not strained...";
$cache->save($id, $data, $expires = 60);
print "Cache miss.<br>";
}
?>
To save the data we use save(). If your unique key is already a legal file name, you can bypass the generateID() step. Objects and arrays can be saved because save() will serialize the data for you. The last parameter controls when the data expires; this can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to use the default of 24 hours. To retrieve the cached data we use get().
You can delete a cached data item using $cache->delete($id)
and remove all cached items using $cache->flush().
New: A faster Caching class is Cache-Lite.
Highly recommended.
Using Benchmarks
In earlier section we have covered many performance issues. Now we come to the meat and bones, how to go about measuring and benchmarking your code so you can obtain decent information on what to tune.
If you want to perform realistic benchmarks on a web server, you will need a tool to send HTTP requests to the server. On Unix, common tools to perform benchmarks include ab (short for apachebench) which is part of the Apache release, and the newer flood (httpd.apache.org/test/flood). On Windows NT/2000 you can use Microsoft's free Web Application Stress Tool (webtool.rte.microsoft.com).
These programs can make multiple concurrent HTTP requests, simulating multiple web clients, and present you with detailed statistics on completion of the tests.
You can monitor how your server behaves as the benchmarks are conducted on Unix using "vmstat 1". This prints out a status report every second on the performance of your disk i/o, virtual memory and CPU load. Alternatively, you can use "top d 1" which gives you a full screen update on all processes running sorted by CPU load every 1 second.
On Windows 2000, you can use the Performance Monitor or the Task Manager to view your system statistics.
If you want to test a particular aspect of your code without having to worry about the HTTP overhead, you can benchmark using the microtime(), which returns the current time accurate to the microsecond as a string. The following function will convert it into a number suitable for calculations.
function getmicrotime()
{
list($usec, $sec) = explode(" ",microtime());
return ((float)$usec + (float)$sec);
}
$time = getmicrotime();
#
# benchmark code here
#
echo "<p>Time elapsed: ",getmicrotime() - $time, " seconds";
Alternatively, you can use a profiling
tool such as APD
or XDebug. Also see my article squeezing code with xdebug.
Benchmarking Case Study
This case study details a real benchmark we did for a client. In this instance, the customer wanted a guaranteed response time of 5 seconds for all PHP pages that did not involve running long SQL queries. The following server configuration was used: an Apache 1.3.20 server running PHP 4.0.6 on Red Hat 7.2 Linux. The hardware was a twin Pentium III 933 MHz beast with 1 Gb of RAM. The HTTP requests will be for the PHP script "testmysql.php". This script reads and processes about 20 records from a MySQL database running on another server. For the sake of simplicity, we assume that all graphics are downloaded from another web server.
We used "ab" as the benchmarking tool. We set "ab" to perform 1000 requests (-n1000), using 10 simultaneous connections (-c10). Here are the results:
# ab -n1000 -c10 http://192.168.0.99/php/testmysql.php
This is ApacheBench, Version 1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/
Server Software: Apache/1.3.20
Server Hostname: 192.168.0.99
Server Port: 80
Document Path: /php/testmysql.php
Document Length: 25970 bytes
Concurrency Level: 10
Time taken for tests: 128.672 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 26382000 bytes
HTML transferred: 25970000 bytes
Requests per second: 7.77
Transfer rate: 205.03 kb/s received
Connnection Times (ms)
min avg max
Connect: 0 9 114
Processing: 698 1274 2071
Total: 698 1283 2185
While running the benchmark, on the server side we monitored the resource utilization
using the command "top d 1". The parameters "d 1" mean to delay 1 second between
updates. The output is shown below.
10:58pm up 3:36, 2 users, load average: 9.07, 3.29, 1.79
74 processes: 63 sleeping, 11 running, 0 zombie, 0 stopped
CPU0 states: 92.0% user, 7.0% system, 0.0% nice, 0.0% idle
CPU1 states: 95.0% user, 4.0% system, 0.0% nice, 0.0% idle
Mem: 1028484K av, 230324K used, 798160K free, 64K shrd, 27196K buff
Swap: 2040244K av, 0K used, 2040244K free 30360K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1142 apache 20 0 7280 7280 3780 R 21.2 0.7 0:20 httpd
1154 apache 17 0 8044 8044 3788 S 19.3 0.7 0:20 httpd
1155 apache 20 0 8052 8052 3796 R 19.3 0.7 0:20 httpd
1141 apache 15 0 6764 6764 3780 S 14.7 0.6 0:20 httpd
1174 apache 14 0 6848 6848 3788 S 12.9 0.6 0:20 httpd
1178 apache 13 0 6864 6864 3804 S 12.9 0.6 0:19 httpd
1157 apache 15 0 7536 7536 3788 R 11.0 0.7 0:19 httpd
1159 apache 15 0 7540 7540 3788 R 11.0 0.7 0:19 httpd
1148 apache 11 0 6672 6672 3784 S 10.1 0.6 0:20 httpd
1158 apache 14 0 7400 7400 3788 R 10.1 0.7 0:19 httpd
1163 apache 20 0 7540 7540 3788 R 10.1 0.7 0:19 httpd
1169 apache 12 0 6856 6856 3796 S 10.1 0.6 0:20 httpd
1176 apache 16 0 8052 8052 3796 R 10.1 0.7 0:19 httpd
1171 apache 15 0 7984 7984 3780 S 9.2 0.7 0:18 httpd
1170 apache 16 0 7204 7204 3796 R 6.4 0.7 0:20 httpd
1168 apache 10 0 6856 6856 3796 S 4.6 0.6 0:20 httpd
1377 natsoft 11 0 1104 1104 856 R 2.7 0.1 0:02 top
1152 apache 9 0 6752 6752 3788 S 1.8 0.6 0:20 httpd
1167 apache 9 0 6848 6848 3788 S 0.9 0.6 0:19 httpd
1 root 8 0 520 520 452 S 0.0 0.0 0:04 init
2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
Looking at the output of "top", the
twin CPU Apache server is running flat out with 0% idle time. What is worse
is that the load average is 9.07 for the past minute (and 3.29 for the
past 5 minutes, 1.79 for the past 15 minutes). The load average is the average
number of processes that are ready to be run. For a twin processor server, any
load above 2.0 means that the system is being overloaded. You might notice that
there is a close relationship between load (9.07) and the number of simultaneous
connections (10) that we have defined with ab.
Luckily we have plenty of physical
memory, with about 798,160 Mb free and no virtual memory used.
Further down we can see the processes
ordered by CPU utilization. The most active ones are the Apache httpd processes.
The first httpd task is using 7280K of memory, and is taking an average of 21.2%
of CPU and 0.7% of physical memory. The STAT column indicates the status: R
is runnable, S is sleeping, and W means that the process is swapped out.
Given the above figures, and assuming this a typical peak load, we can perform
some planning. If the load average is 9.0 for a twin-CPU server and assuming
each task takes about the same amount of time to complete, then a lightly loaded
server should be 9.0 / 2 CPUs = 4.5 times faster. So a HTTP request that used
to take 1.283 seconds to satisfy at peak load will take about 1.283 / 4.5 =
0.285 seconds to complete.
To verify this, we benchmarked with 2 simultaneous client connections (instead
of 10 in the previous benchmark) to give an average of 0.281 seconds, very close
to the 0.285 seconds prediction!
# ab -n100 -c2 http://192.168.0.99/php/testmysql.php
[ some lines omitted for brevity ]
Requests per second: 7.10
Transfer rate: 187.37 kb/s received
Connnection Times (ms)
min avg max
Connect: 0 2 40
Processing: 255 279 292
Total: 255 281 332
Conversely, doubling the connections, we can predict that the average connection
time should double from 1.283 to 2.566 seconds. In the benchmarks, the actual
time was 2.570 seconds.
Overload on 40 connections
When we pushed the benchmark to use 40 connections, the server overloaded with
35% failed requests. On further investigation, it was because the MySQL server
persistent connects were failing because of "Too Many Connections".
The benchmark also demonstrates the lingering behavior of Apache child
processes. Each PHP script uses 2 persistent connections, so at 40 connections,
we should only be using at most 80 persistent connections, well below the default
MySQL max_connections of 100. However Apache idle child processes are not assigned
immediately to new requests due to latencies, keep-alives and other technical
reasons; these lingering child processes held the remaining 20+ persistent connections
that were "the straws that broke the Camel's back".
The Fix
By switching to non-persistent database connections, we were able to fix this
problem and obtained a result of 5.340 seconds. An alternative solution would
have been to increase the MySQL max_connections parameter from the default of
100.
Conclusions
The above case study once again shows us that optimizing your performance is
extremely complex. It requires an understanding of multiple software subsystems
including network routing, the TCP/IP stack, the amount of physical and virtual
memory, the number of CPUs, the behavior of Apache child processes, your PHP
scripts, and the database configuration.
In this case the PHP code was quite well tuned, so the first bottleneck was
the CPU, which caused a slowdown in response time. As the load increased, the
system slowed down in a near linear fashion (which is a good sign) until we
encountered the more serious bottleneck of MySQL client connections. This caused
multiple errors in our PHP pages until we fixed it by switching to non-persistent
connections.
From the above figures, we can calculate for a given desired response time,
how many simultaneous HTTP connections we can handle. Assuming two-way network
latencies of 0.5 seconds on the Internet (0.25s one way), we can predict:
As our client wanted a maximum response time of 5 seconds, the server can handle
up to 34 simultaneous connections per second. This works out to a peak capacity
of 34/5 = 6.8 page views per second.
To get the maximum number of page views a day that the server can handle, multiply
the peak capacity per second by 50,000 (this technique is suggested by the webmasters
at pair.com, a large web hosting company), to give 340,000 page views a day.
Code Optimizations
The patient reader who is still wondering why so much emphasis is given to
discussing non-PHP issues is reminded that PHP is a fast language, and many
of the likely bottlenecks causing slow speeds lie outside PHP.
Most PHP scripts are simple. They involve reading some session information,
loading some data from a content management system or database, formatting the
appropriate HTML and echoing the results to the HTTP client. Assuming that a
typical PHP script completes in 0.1 seconds and the Internet latency is 0.2
seconds, only 33% of the 0.3 seconds response time that the HTTP client sees
is actual PHP computation. So if you improve a script's speed by 20%, the HTTP
client will see response times drop to 0.28 seconds, which is an insignificant
improvement. Of course the server can probably handle 20% more requests for
the same page, so scalability has improved.
The above example does not mean we should throw our hands up and give up. It
means that we should not feel proud tweaking the last 1% of speed from our code,
but we should spend our time optimizing worthwhile areas of our code to get
higher returns.
High Return Code Optimizations
The places where such high returns are achievable are in the while and for
loops that litter our code, where each slowdown in the code is magnified by
the number of times we iterate over them. The best way of understanding what
can be optimized is to use a few examples:
Example 1
Here is one simple example that prints an array:
for ($j=0; $j<sizeof($arr); $j++)
echo $arr[$j]."<br>";
This can be substantially speeded up by changing the code to:
for ($j=0, $max = sizeof($arr), $s = ''; $j<$max; $j++)
$s .= $arr[$j]."<br>";
echo $s;
First we need to understand that the expression $j<sizeof($arr) is
evaluated within the loop multiple times. As sizeof($arr) is actually a constant
(invariant), we move the cache the sizeof($arr) in the $max variable. In technical
terms, this is called loop invariant optimization.
The second issue is that in PHP 4, echoing multiple times is slower than storing
everything in a string and echoing it in one call. This is because
echo is an expensive operation that could involve sending
TCP/IP packets to a HTTP client. Of course accumulating the string
in $s has some scalability issues as it will use up more memory,
so you can see a trade-off is involved here.
An alternate way of speeding the above code would be to use output buffering.
This will accumulate the output string internally, and send the output in one
shot at the end of the script. This reduces networking overhead substantially
at the cost of more memory and an increase in latency. In some of my code consisting
entirely of echo statements, performance improvements of 15% have been observed.
ob_start();
for ($j=0, $max = sizeof($arr), $s = ''; $j<$max; $j++)
echo $arr[$j]."<br>";
Note that output buffering with ob_start() can be used as a global optimization
for all PHP scripts. In long-running scripts, you will also want to flush the
output buffer periodically so that some feedback is sent to the HTTP client.
This can be done with ob_end_flush(). This function also turns off output buffering,
so you might want to call ob_start() again immediately after the flush.
In summary, this example has shown us how to optimize loop invariants and how
to use output buffering to speed up our code.
Example 2
In the following code, we iterate through a PEAR DB recordset, using a special
formatting function to format a row, and then we echo the results. This time,
I benchmarked the execution time at 10.2 ms (this excludes the database connection
and SQL execution time):
function FormatRow(&$recordSet)
{
$arr = $recordSet->fetchRow();
return '<b>'.$arr[0].'</b><i>'.$arr[1].'</i>';
}
for ($j = 0; $j < $rs->numRows(); $j++) {
print FormatRow($rs);
}
From example 1, we learnt that we can optimize the code by changing the code
to the following (execution time: 8.7 ms):
function FormatRow(&$recordSet)
{
$arr = $recordSet->fetchRow();
return '<b>'.$arr[0].'</b><i>'.$arr[1].'</i>';
}
ob_start();
for ($j = 0, $max = $rs->numRows(); $j < $max;
$j++) {
print FormatRow($rs);
}
My benchmarks showed me that the use of $max contributed 0.5 ms and ob_start
contributed 1 ms to the 1.5 ms speedup.
However by changing the looping algorithm we can simplify and speed up the
code. In this case, execution time is reduced to 8.5 ms:
function FormatRow($arr)
{
return '<b>'.$arr[0].'</b><i>'.$arr[1].</i>';
}
ob_start();
while ($arr = $rs->fetchRow()) {
print FormatRow($arr);
}
One last optimization is possible here. We can remove the overhead of the function
call (potentially sacrificing maintainability for speed) to shave off another
0.1 milliseconds (execution time: 8.4 ms):
ob_start();
while ($arr = $rs->fetchRow()) {
print '<b>'.$arr[0].'</b><i>'.$arr[1].'</i>';
}
By switching to PEAR Cache, execution time dropped again to 3.5 ms for cached
data:
require_once("Cache/Output.php");
ob_start();
$cache = new Cache_Output("file", array("cache_dir" => "cache/")
);
$t = getmicrotime();
if ($contents = $cache->start(md5("this is a unique kexy!")))
{
print "<p>Cache Hit</p>";
print $contents;
} else {
print "<p>Cache Miss</p>";
##
## Code to connect and query database omitted
##
while ($arr = $rs->fetchRow()) {
print '<b>'.$arr[0].'</b><i>'.$arr[1].'</i>';
}
print $cache->end(100);
}
print (getmicrotime()-$t);
We summarize the optimization methods below:
ExecutionTime (ms) |
Optimization Method |
9.9 |
Initial code, no optimizations, excluding database connection and SQL execution times. |
9.2 |
Using ob_start |
8.7 |
Optimizing loop invariants ($max) and using ob_start |
8.5 |
Changing from for-loop to while-loop, and passing an array to FormatRow()and using ob_start |
8.4 |
Removing FormatRow()and using ob_start |
3.5 |
Using PEAR Cache and using ob_start |
From the above figures, you can see that biggest speed improvements are derived
not from tweaking the code, but by simple global optimizations such as ob_start(),
or using radically different algorithms such as HTML caching.
Optimizing Object-oriented Programming
In March 2001, I conducted some informal benchmarks with classes on PHP
4.0.4pl1, and I derived some advice from the results. The three main points
are:
1. Initialise all variables before use.
2. Dereference all global/property variables that are frequently used
in a method and put the values in local variables if you plan to access
the value more than twice.
3. Try placing frequently used methods in the derived classes.
Warning: as PHP is going through a continuous improvement process, things
might change in the future.
More Details
I have found that calling object methods (functions defined in a class)
are about twice as slow as a normal function calls. To me that's quite
acceptable and comparable to other OOP languages.
Inside a method (the following ratios are approximate only):
- Incrementing a local variable in a method is the fastest. Nearly the
same as calling a local variable in a function.
- Incrementing a global variable is 2 times slow than a local var.
- Incrementing a object property (eg. $this->prop++) is 3 times slower
than a local variable.
- Incrementing an undefined local variable is 9-10 times slower than
a pre-initialized one.
- Just declaring a global variable without using it in a function also
slows things down (by about the same amount as incrementing a local
var). PHP probably does a check to see if the global exists.
- Method invocation appears to be independent of the number of methods
defined in the class because I added 10 more methods to the test class
(before and after the test method) with no change in performance.
- Methods in derived classes run faster than ones defined in the base
class.
- A function call with one parameter and an empty function body takes
about the same time as doing 7-8 $localvar++ operations. A similar method
call is of course about 15 $localvar++ operations.
Update: 11 July 2004:
The above test was on PHP 4.0.4, about 3 years ago.
I tested this again in PHP4.3.3 and calling a function
now takes about 20 $localvar++
operations, and calling a method takes about 30 $localvar++ operations. This could be
because $localvar++ runs faster now, or functions are slower.
|
Summary of Tweaks
- The more you understand the software you are using (Apache, PHP, IIS, your
database) and the deeper your knowledge of the operating system, networking
and server hardware, the better you can perform global optimizations on your
code and your system.
- For PHP scripts, the most expensive bottleneck is normally the CPU. Twin
CPUs are probably more useful than two Gigabytes of RAM.
- Compile PHP with the "configure –-enable-inline-optimization" option to
generate the fastest possible PHP executable.
- Tune your database and index the fields that are commonly used in your SQL
WHERE criteria. ADOdb, the very
popular database abstraction library, provides a SQL
tuning mode, where you can view your invalid, expensive and suspicious
SQL, their execution plans and in which PHP script the SQL was executed.
- Use HTML caching if you have data that rarely changes. Even if the data
changes every minute, caching can help provided the data is synchronized with
the cache. Depending on your code complexity, it can improve your performance
by a factor of 10.
- Benchmark your most complex code early (or at least a prototype), so you
get a feel of the expected performance before it is too late to fix. Try to
use realistic amounts of test data to ensure that it scales properly.
Updated 11 July 2004: To benchmark with an execution profile of all function calls, you can try the xdebug extension. For a brief tutorial of
how i use xdebug, see squeezing code with xdebug. There are commercial products to do this also, eg.
Zend Studio.
- Consider using a opcode cache. This gives a speedup of between 10-200%,
depending on the complexity of your code. Make sure you do some stress tests
before you install a cache because some are more reliable than others.
- Use ob_start() at the beginning of your code. This gives you a 5-15% boost
in speed for free on Apache. You can also use gzip compression for extra fast
downloads (this requires spare CPU cycles).
- Consider installing Zend Optimizer. This is free and does some optimizations,
but be warned that some scripts actually slow down when Zend Optimizer is
installed. The consensus is that Zend Optimizer is good when your code has
lots of loops. Today many opcode accelerators have similar features (added
this sentence 21 Oct 2003).
- Optimize your loops first. Move loop invariants (constants) outside the loop.
- Use the array and string functions where possible. They are faster than
writing equivalent code in PHP.
- The fastest way to concatenate multiple small strings into one large string is to create an output buffer (ob_start) and to echo into the buffer.
At the end get the contents using ob_get_contents. This works because memory allocation is normally the killer in string concatenation, and output buffering allocates a large 40K initial
buffer that grows in 10K chunks. Added 22 June 2004.
- Pass objects and arrays using references
in functions. Return objects and arrays as references where possible
also. If this is a short script, and code maintenance is not an issue, you
can consider using global variables to hold the objects or arrays.
- If you have many PHP scripts that use session variables, consider
recompiling PHP using the shared memory module for sessions, or use a RAM Disk. Enable this with
"configure -–with-mm" then re-compile PHP, and set session.save_handler=mm
in php.ini.
- For searching for substrings, the fastest code is using strpos(), followed
by preg_match() and lastly ereg(). Similarly, str_replace() is faster than
preg_replace(), which is faster than ereg_replace().
- Added 11 July 2004: Order large switch statements with most frequently occuring cases on top. If some
of the most common cases are in the default section, consider explicitly defining these
cases at the top of the switch statement.
-
For processing XML, parsing with regular expressions is significantly
faster than using DOM or SAX.
- Unset() variables that are not used anymore to reduce memory usage. This
is mostly useful for resources and large arrays.
- For classes with deep hierarchies, functions defined in derived classes
(child classes) are invoked faster than those defined in base class (parent
class). Consider replicating the most frequently used code in the base class
in the derived classes too.
- Consider writing your code as a PHP extension or a Java class or a COM object
if your need that extra bit of speed. Be careful of the overhead of marshalling
data between COM and Java.
Useless Optimizations
Some optimizations are useful. Others are a waste of time - sometimes the improvement is neglible, and sometimes the PHP internals change, rendering the tweak obsolete.
Here are some common PHP legends:
a. echo is faster than print
Echo is supposed to be faster because it doesn't return a value while print does. From my benchmarks with PHP 4.3, the difference is neglible. And under some situations, print is faster than echo (when ob_start is enabled).
b. strip off comments to speed up code
If you use an opcode cache, comments are already ignored. This is a myth from PHP3 days, when each line of PHP was interpreted in run-time.
c. 'var='.$var is faster than "var=$var"
This used to be true in PHP 4.2 and earlier. This was fixed in PHP 4.3. Note (22 June 2004): apparently the 4.3 fix reduced the overhead, but not completely. However I find the performance difference to be negligible.
Do References Speed Your Code?
References do not provide any performance benefits for strings,
integers and other basic data types. For example, consider
the following code:
function TestRef(&$a)
{
$b = $a;
$c = $a;
}
$one = 1;
ProcessArrayRef($one);
And the same code without references:
function TestNoRef($a)
{
$b = $a;
$c = $a;
}
$one = 1;
ProcessArrayNoRef($one);
PHP does not actually create duplicate variables when "pass by value"
is used, but uses high speed reference counting internally. So in TestRef(),
$b and $c take longer to set because the references have to be tracked,
while in TestNoRef(), $b and $c just point to the original value of $a,
and the reference counter is incremented. So TestNoRef() will execute
faster than TestRef().
In contrast, functions that accept array and object
parameters have a performance advantage when references are
used. This is because arrays and objects do not use reference
counting, so multiple copies of an array or object are created
if "pass by value" is used. So the following code:
function ObjRef(&$o)
{
$a =$o->name;
}
is faster than:
$function ObjRef($o)
{
$a = $o->name;
}
Note: In PHP 5, all objects are passed by reference automatically,
without the need of an explicit & in the parameter list.
PHP 5 object performance should be significantly faster.
|
Many thanks also to Andrei Zmievski for
reviewing this article.
(c) 2001-2005 John Lim. No reproduction of this
article
is permitted without written permission from the author.
|