Performance Tuning Example

(Difference between revisions)
Jump to: navigation, search
m (Performance Tuning by Example)
m (Starting Point)
Line 32: Line 32:
 
It may be easier to see what we're doing by looking at the /etc/httpd/conf.d/opencms.conf file:
 
It may be easier to see what we're doing by looking at the /etc/httpd/conf.d/opencms.conf file:
  
<code>
+
<pre>
 
<VirtualHost *:80>
 
<VirtualHost *:80>
 
   <Directory "/usr/local/tomcat-staging/apache-tomcat-5.5.25/webapps/ROOT/">
 
   <Directory "/usr/local/tomcat-staging/apache-tomcat-5.5.25/webapps/ROOT/">
Line 66: Line 66:
 
   ProxyPassReverse  /portal/                  http://localhost:8081/portal/
 
   ProxyPassReverse  /portal/                  http://localhost:8081/portal/
 
</VirtualHost>
 
</VirtualHost>
</code>
+
</pre>
  
 
= Tuning Setup =  
 
= Tuning Setup =  

Revision as of 07:49, 5 February 2008

Contents

Performance Tuning by Example

Please note that this page is being created now (5 feb 2008) and currently includes only the first pass of tuning

Rather than a list of recipes to follow, this page takes you through a real world example.

If you have any disagreements or suggestions about performance tuning, please feel free to add them to the page, but for the convenience of anyone that is following this, mark them as suggestions - please don't just change the example as then the ab results won't be correct for the procedure followed.

Starting Point

There are already a bunch of tutorials out there about getting an initial configuration. At the start of this process, I expect that you will have verything setup and working to your satisfaction.

We have the following major components:

1. apache-tomcat-5.5.25
2. httpd-2.2.3-7
3. OpenCMS 7.03
4. mysql-server-5.0.22-2.1

We have used the Alkacon documentation to setup apache in front of OpenCMS using mod_proxy. We do have a slightly weird naming convention as (for historical reasons) we separate our 'public' and 'customer' sites such that the public site is totally exported while the customer site is mediated by opencms.

Our Customer area is named /portal.

Thus we have a site structure something like:

/ -> mod_rewrite /export/aunz
/portal -> mod_proxy -> opencms
/export/aunz (all static pages)
/export/system
/resources

It may be easier to see what we're doing by looking at the /etc/httpd/conf.d/opencms.conf file:

<VirtualHost *:80>
   <Directory "/usr/local/tomcat-staging/apache-tomcat-5.5.25/webapps/ROOT/">
       Options Indexes Includes FollowSymLinks MultiViews
       Order allow,deny
       Allow from all
   </Directory>
   ServerName dev1.local  # Change this to your  server name
   ServerAdmin webmaster@wranglers dot com dot au  # obfuscated - and change this as well.
   DocumentRoot "/usr/local/tomcat-staging/apache-tomcat-5.5.25/webapps/ROOT/"
   ErrorLog logs/error.log

   RewriteEngine on
   
   RewriteRule ^/(.*):SSL$ https://%{SERVER_NAME}/$1 [R,L]
   RewriteRule ^/(.*):NOSSL$ http://%{SERVER_NAME}/$1 [R,L]

   RewriteCond %{REQUEST_URI} ^/export/.*$
   RewriteCond "%{DOCUMENT_ROOT}%{REQUEST_FILENAME}" !-f
   RewriteCond "%{DOCUMENT_ROOT}%{REQUEST_FILENAME}/index_export.html" !-f
   RewriteRule .* /portal/handle404?exporturi=%{REQUEST_URI}&%{QUERY_STRING} [P,L]

   # Log only non-redirect requests in "normal" log file
   SetEnvIf Request_URI "\/opencms\/*" redirect
   CustomLog logs/localhost-access.log common env=!redirect

   RedirectMatch ^/$ /portal/

   ProxyPass         /resources/               !
   ProxyPass         /export/                  !

   ProxyPass         /portal/                  http://localhost:8081/portal/
   ProxyPassReverse  /portal/                  http://localhost:8081/portal/
</VirtualHost>

Tuning Setup

The first thing to do in any tuning is to get a sense of what you want your site to do. I would imagine that scalability, performance and stability are import to you. They are to me.

We are aiming to have the site tuned for 1,000 concurrent users hitting a dynamically generated page with back-end data every 2 seconds. This doesn't sound like much, but we have a 1970s legacy system that we're integrating with (anyone want a job as a UniBasic Programmer?), so this is pretty significant load against the legacy system. Fortunately there are not that many pages that require a back-end hit, so for the other pages we want 10,000 concurrent users (still hitting a page every 2 seconds).

We're going to start by tuning the non-back-ended pages.

Before we start actual tuning there are some measurement tools that are going to be important to us.

Firstly, a naive http load. This is just users clicking on links. Great for getting a sense of the simplest type of load. The tool for generating this kind of load is provided by apache in the httpd package. If you have httpd, you probably have apache bench (or ab). For the more complicated loads we'll be using jmeter (but more on that later).

apache bench on the command line:

ab -c 60 -n1000 http://dev1.local/portal/index.html

(don't forget to change your derver name)

This will make 60 concurrent connections and request the page http://dev1.local/portal/index.html up to a maximum of 10000 pages.

This gives you a baseline. Save it.

My Baseline is:

SServer Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/index.html
Document Length:        5986 bytes

Concurrency Level:      10
Time taken for tests:   95.435580 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6260000 bytes
HTML transferred:       5986000 bytes
Requests per second:    10.48 [#/sec] (mean)
Time per request:       954.356 [ms] (mean)
Time per request:       95.436 [ms] (mean, across all concurrent requests)
Transfer rate:          64.05 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       3
Processing:    96  952 2084.1    676   36504
Waiting:       95  951 2084.1    675   36503
Total:         96  952 2084.1    676   36504

Percentage of the requests served within a certain time (ms)
  50%    676
  66%    846
  75%    931
  80%    996
  90%   1360
  95%   1884
  98%   3008
  99%   8735
 100%  36504 (longest request)

Tuning MySQL

OK, so now that all of that is out of the way let's start looking at the back of the stack. Now I should point out here that I am not an expert in MySQL. However there are loads of them out there and lots of them write webpages. Yipee! Thanks lazyweb.

For background take a look at [1] and [2].

There are also useful tools such as:

[3] [4]

So for our purposes I'm just going to show the changes I made to my.cnf:

[mysqld]
datadir=/u01/mysql
socket=/u01/mysql/mysql.sock
user=mysql

# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

skip-locking
table_cache = 256

# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 4

# enable the slow query log, default 10 seconds
log-slow-queries
# log queries taking longer than 5 seconds
long_query_time = 5
# log queries that don't use indexes even if they take less than long_query_time
log-queries-not-using-indexes

# Add an in memory cache so that we can keep the content in memory
query_cache_size = 32M

# Increase packet size limit due to OpenCms requirements
max_allowed_packet=32M

# Setup some reasonable defaults

set-variable=max_connections=300
set-variable=wait_timeout=500
max_connect_errors = 100

# Start a thread cache

thread_cache = 30

# Start Replication Config Settings
log-bin=mysql-bin
server-id=1
innodb_flush_log_at_trx_commit=1
sync_binlog=1
# End Replication Config Settings
     
[mysql.server]
user=mysql
basedir=/u01/mysql

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[client]
user=mysql
socket=/u01/mysql/mysql.sock

Feel free to ignore the replication section - as we're running a backup hot standby server as well. Oh and you probably have different paths - I keep my data on a separate file system called /u01.

More interesting is the section for loggin slow queries - keep an eye on these for later tuning work.

You can see the section:

; Setup some reasonable defaults

set-variable=max_connections=300
set-variable=wait_timeout=500
max_connect_errors = 100

Max connections stops mysql from going crazy and eating all the resources on the machine. For our expected load of 1000 concurrent users, 300 connections will probably be sufficient - but we'll see later.

OpenCMS connects to MySQL using the apache commons code dbcp. This seems to take a while to connect even while running locally, so we're upped the wait_timeout to 500. This is half a second and should be fine. If you start seeing errors in your catalina logs complaining about " Failed to write locks to database " then this is an area to look at.

Max connect errors is a bit of a magical incantation. While you would expect that an error either happens or it doesn't we actually allow a larger than standard number of errors to occur before we kill a connection. This is to allow a bit more time for mysql to recover from transient issues than the default of max_connect_errors = 10

After MySQL Tuning:

Server Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/index.html
Document Length:        5986 bytes

Concurrency Level:      10
Time taken for tests:   53.987987 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6260000 bytes
HTML transferred:       5986000 bytes
Requests per second:    18.52 [#/sec] (mean)
Time per request:       539.880 [ms] (mean)
Time per request:       53.988 [ms] (mean, across all concurrent requests)
Transfer rate:          113.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   71 107.2      0     338
Processing:    52  466 2258.0    228   47924
Waiting:        0  442 2260.6    213   47910
Total:         52  538 2258.5    372   47924

Percentage of the requests served within a certain time (ms)
  50%    372
  66%    380
  75%    381
  80%    383
  90%    508
  95%    674
  98%   1239
  99%   6706
 100%  47924 (longest request)

So, pretty reasonable performance for at least a single page requested a lot of times. Don't forget we are currently asking mysql for this page on every hit as we have no http or tomcat caching yet.

Also remember that we are doing a single select a lot of times - your mysql tools might say that you are over allocating memory to caches at the moment, but (unless you have a single page on your site) this is not a very good approximation of reality yet.

App Server

So let's move though the stack to the application server. This is tuning both OpenCMS and Tomcat. The first thing to do is to look at the connection between OpenCMS and the database. This is a component from the apache commons project called DataBase Connection Pool or commons-dbcp.

It is configured in ${OPENCMS_HOME}/WEB-INF/config/opencms.properties

Invalid language.

You need to specify a language like this: <source lang="html">...</source>

Supported languages for syntax highlighting:

actionscript, ada, apache, applescript, asm, asp, autoit, bash, blitzbasic, bnf, c, c_mac, caddcl, cadlisp, cfdg, cfm, cpp, cpp-qt, csharp, css, d, delphi, diff, div, dos, eiffel, fortran, freebasic, gml, groovy, html4strict, idl, ini, inno, io, java, java5, javascript, latex, lisp, lua, matlab, mirc, mpasm, mysql, nsis, objc, ocaml, ocaml-brief, oobas, oracle8, pascal, perl, php, php-brief, plsql, python, qbasic, reg, robots, ruby, sas, scheme, sdlbasic, smalltalk, smarty, sql, tcl, text, thinbasic, tsql, vb, vbnet, vhdl, visualfoxpro, winbatch, xml, z80

I've selected 200 concurrent connections as maximum because I am expecting a pretty high concurrent load on the system. I'll be looking in the logs for connection pool issues such as 'unable to create locks' indicating that the pool is becoming exhausted.

You will also remember above that we configured a maximum number of mysql connections to 300. Try to make sure that dbcp is always less than the mysql connections for obvious reasons. If you increase db.pool.default.maxActive then go back and check you my.cnf again.

There are a bunch of other details in opencms.properties for dbcp that I'm not going to go into. They may go into a separate post called Failing Elegantly.

Flex Cache

One of the truisms of performance is to shorten the data path if possible. This means that if you can reuse data then do. Unfortunately a bunch of pages in our site are dynamic. Or are they?

Interesting question. But let's put it aside for a moment because we already know that there is a chunk of public static pages.

So we can cache all of those - but we won't get much of a performance gain as they're already going to be written out to the static area. But we may as well. In our case we have a 4GB machine, so we have memory to burn.

As a side issue - you might want to check out the FlexCache administration area in the OpenCMS Workspace to ensure that you have allocated enough memory to the flexcache. There's not a lot of use of having an 8MB flexcache if you have 16MB of pages to cache.

I'm also not going to repeat the flexcache documentation that you can see elsewhere - if you don't know the difference between caching on 'always' versus 'uri' then read the documentation.

So - we've set the page bodies to be cache 'always'.

We designed the pages for flexcaching - separating the template items such as navigation, headers, footers etc into separate files. All of these template items can also be cached - generally with the 'uri' variation caching.

After all this caching, let's run the tests again.

Server Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/opportunity/become_a_consultant/new_car/
Document Length:        7595 bytes

Concurrency Level:      10
Time taken for tests:   15.280626 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      7869000 bytes
HTML transferred:       7595000 bytes
Requests per second:    65.44 [#/sec] (mean)
Time per request:       152.806 [ms] (mean)
Time per request:       15.281 [ms] (mean, across all concurrent requests)
Transfer rate:          502.86 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    33  151 159.3    111    1488
Waiting:       32  151 159.3    111    1487
Total:         33  151 159.3    111    1488

Percentage of the requests served within a certain time (ms)
  50%    111
  66%    135
  75%    157
  80%    176
  90%    242
  95%    323
  98%    563
  99%   1369
 100%   1488 (longest request)

Ouch! that's looking better. There is an ititial hit while creating the page and then it is super fast.

Actually, let's try it again...

Server Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/opportunity/become_a_consultant/new_car/
Document Length:        7595 bytes

Concurrency Level:      10
Time taken for tests:   12.512816 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      7869000 bytes
HTML transferred:       7595000 bytes
Requests per second:    79.92 [#/sec] (mean)
Time per request:       125.128 [ms] (mean)
Time per request:       12.513 [ms] (mean, across all concurrent requests)
Transfer rate:          614.09 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    28  124 111.0     89    1044
Waiting:       28  124 111.0     89    1044
Total:         28  124 111.0     89    1044

Percentage of the requests served within a certain time (ms)
  50%     89
  66%    107
  75%    127
  80%    140
  90%    194
  95%    330
  98%    556
  99%    661
 100%   1044 (longest request)

Because I ran it a second time, there was no hit to create the page in the flexcache.

Now what about those dynamic pages?

Well we've done this before. And so we designed the pages so that we separate the truly dynamic content from any shared content. So if we have a table based report with some explanation in it, the table content is in a separate jsp page that is included by a template. The explanation, decoration etc is in static pages. So we can flexcache everything possible.

I'm not going to show you an ab test against the legacy system, as it is too embarrassing. And probably not useful to you. Except to say that the flexcache is your friend and so you should try to use it as much as possible.

So we're pretty happy with a small number of concurrent users (an ab concurrency of 10 is probably 200 concurrent users hitting refresh once every 2 seconds).

From here on in, we're going to up the number of concurrent requests. Our estimate is that 200 concurrent users in ab is representative of 5000 real concurrent users hitting a page every 2 seconds or so. At least that seems consistent with what we see in junit.

ab -c 200 -n10000 http://dev1.local/portal/index.html SPLAT

Tomcat falls in a heap. What is happening?

Tomcat Tuning

Looking in the catalina.out log shows that we have thread exhaustion. Let's look at our server.xml file:


<Server port="8005" shutdown="SHUTDOWN" debug="0">                    
    <Service name="Tomcat-Standalone">

        <!-- Define a connector for the "public visible" server name -->
        <Connector className="org.apache.coyote.tomcat5.CoyoteConnector"
            port="8081" minProcessors="5" maxProcessors="75"
            proxyName="dev1.local" proxyPort="80"
            useBodyEncodingForURI="true" enableLookups="true" redirectPort="8443"
            acceptCount="100" debug="0" connectionTimeout="20000"
            useURIValidationHack="false" disableUploadTimeout="true" />    
	
	<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 for debugging purposes only-->
    	<Connector port="8080" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
              enableLookups="false" redirectPort="8443" acceptCount="100"
              connectionTimeout="20000" disableUploadTimeout="true" />

        <Engine name="Standalone" defaultHost="localhost" debug="0">
            <Host name="localhost" debug="0" appBase="webapps" unpackWARs="true" autoDeploy="true">
            <!-- You can change "unpackWARs" and "autoDeploy" to false after the installation is finished -->        
                <Logger className="org.apache.catalina.logger.FileLogger"
                    directory="logs"  prefix="localhost_log." suffix=".txt"
                    timestamp="true"/>
            </Host>        
        </Engine>    
    </Service>    
</Server>

As you can see we have two connectors running. The first is the proxy endpoint from Apache. The second is one we use internally for debugging and the manager application etc.

We're not giving tomcat enough resources to deal with our expected load.

Let's start by giving tomcat more memory:

in catalina.sh we add a CATALINA_OPTS line:

Invalid language.

You need to specify a language like this: <source lang="html">...</source>

Supported languages for syntax highlighting:

actionscript, ada, apache, applescript, asm, asp, autoit, bash, blitzbasic, bnf, c, c_mac, caddcl, cadlisp, cfdg, cfm, cpp, cpp-qt, csharp, css, d, delphi, diff, div, dos, eiffel, fortran, freebasic, gml, groovy, html4strict, idl, ini, inno, io, java, java5, javascript, latex, lisp, lua, matlab, mirc, mpasm, mysql, nsis, objc, ocaml, ocaml-brief, oobas, oracle8, pascal, perl, php, php-brief, plsql, python, qbasic, reg, robots, ruby, sas, scheme, sdlbasic, smalltalk, smarty, sql, tcl, text, thinbasic, tsql, vb, vbnet, vhdl, visualfoxpro, winbatch, xml, z80

Also we change our server.xml file:

<Server port="8005" shutdown="SHUTDOWN" debug="0">
   <Service name="Tomcat-Standalone">

       <!-- Define a connector for the "public visible" server name -->
       <Connector className="org.apache.coyote.tomcat5.CoyoteConnector"
           port="8081" minProcessors="5" maxThreads="500" minSpareThreads="15" maxSpareThreads="200" 
           maxProcessors="200"
           proxyName="dev1.local" proxyPort="80"
           useBodyEncodingForURI="true" enableLookups="true" redirectPort="8443"
           acceptCount="200" debug="0" connectionTimeout="20000"
           useURIValidationHack="false" disableUploadTimeout="true" />
	
	<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 for debugging purposes only-->
   	<Connector port="8080" maxHttpHeaderSize="8192"
             maxThreads="15" minSpareThreads="2" maxSpareThreads="10"
             enableLookups="false" redirectPort="8443" acceptCount="100"
             connectionTimeout="20000" disableUploadTimeout="true" />

       <Engine name="Standalone" defaultHost="localhost" debug="0">
           <Host name="localhost" debug="0" appBase="webapps" unpackWARs="true" autoDeploy="true">
           <!-- You can change "unpackWARs" and "autoDeploy" to false after the installation is finished -->
               <Logger className="org.apache.catalina.logger.FileLogger"
                   directory="logs"  prefix="localhost_log." suffix=".txt"
                   timestamp="true"/>
           </Host>
       </Engine>
   </Service>
</Server>

And we'll run the ab test that failed:

Server Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/index.html
Document Length:        6005 bytes

Concurrency Level:      200
Time taken for tests:   92.178489 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      62140000 bytes
HTML transferred:       60050000 bytes
Requests per second:    108.49 [#/sec] (mean)
Time per request:       1843.570 [ms] (mean)
Time per request:       9.218 [ms] (mean, across all concurrent requests)
Transfer rate:          658.32 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6 169.5      0    8997
Processing:    16 1828 9496.1     64   92026
Waiting:       16 1828 9496.0     64   92024
Total:         16 1834 9505.2     64   92039

Percentage of the requests served within a certain time (ms)
  50%     64
  66%     83
  75%    105
  80%    130
  90%    533
  95%   2363
  98%  40927
  99%  61524
 100%  92039 (longest request)

OK - so first pass we're happy with the naive performance. There is still oddness in the 98%+ bracket where 2% of requests takes up to an order of magnitude longer to complete - but that may be an ab artefact.

Next we'll start to do some more complex testing using junit and monitoring the server performance with jmx.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox