Performance Tuning Example

From OpenCms Wiki
Revision as of 13:18, 17 July 2008 by 87.187.37.67 (Talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Rather than a list of recipes to follow, this page takes you through a real world example.

If you have any disagreements or suggestions about performance tuning, please feel free to add them to the page, but for the convenience of anyone that is following this, mark them as suggestions - please don't just change the example as then the ab results won't be correct for the procedure followed.

Contents

Starting Point

There are already a bunch of tutorials out there about getting an initial configuration. At the start of this process, I expect that you will have verything setup and working to your satisfaction.

We have the following major components:

1. jdk1.5.0_14 
2. apache-tomcat-5.5.25
3. httpd-2.2.3-7
4. OpenCMS 7.03
5. mysql-server-5.0.22-2.1

We have used the Alkacon documentation to setup apache in front of OpenCMS using mod_proxy. We do have a slightly weird naming convention as (for historical reasons) we separate our 'public' and 'customer' sites such that the public site is totally exported while the customer site is mediated by opencms.

Our Customer area is named /portal.

Thus we have a site structure something like:

/ -> mod_rewrite /export/aunz
/portal -> mod_proxy -> opencms
/export/aunz (all static pages)
/export/system
/resources

It may be easier to see what we're doing by looking at the /etc/httpd/conf.d/opencms.conf file:

<VirtualHost *:80>
   <Directory "/usr/local/tomcat-staging/apache-tomcat-5.5.25/webapps/ROOT/">
       Options Indexes Includes FollowSymLinks MultiViews
       Order allow,deny
       Allow from all
   </Directory>
   ServerName dev1.local  # Change this to your  server name
   ServerAdmin webmaster@wranglers dot com dot au  # obfuscated - and change this as well.
   DocumentRoot "/usr/local/tomcat-staging/apache-tomcat-5.5.25/webapps/ROOT/"
   ErrorLog logs/error.log

   RewriteEngine on
   
   RewriteRule ^/(.*):SSL$ https://%{SERVER_NAME}/$1 [R,L]
   RewriteRule ^/(.*):NOSSL$ http://%{SERVER_NAME}/$1 [R,L]

   RewriteCond %{REQUEST_URI} ^/export/.*$
   RewriteCond "%{DOCUMENT_ROOT}%{REQUEST_FILENAME}" !-f
   RewriteCond "%{DOCUMENT_ROOT}%{REQUEST_FILENAME}/index_export.html" !-f
   RewriteRule .* /portal/handle404?exporturi=%{REQUEST_URI}&%{QUERY_STRING} [P,L]

   # Log only non-redirect requests in "normal" log file
   SetEnvIf Request_URI "\/opencms\/*" redirect
   CustomLog logs/localhost-access.log common env=!redirect

   RedirectMatch ^/$ /portal/

   ProxyPass         /resources/               !
   ProxyPass         /export/                  !

   ProxyPass         /portal/                  http://localhost:8081/portal/
   ProxyPassReverse  /portal/                  http://localhost:8081/portal/
</VirtualHost>

Tuning Setup

The first thing to do in any tuning is to get a sense of what you want your site to do. I would imagine that scalability, performance and stability are import to you. They are to me.

We are aiming to have the site tuned for 1,000 concurrent users hitting a dynamically generated page with back-end data every 2 seconds. This doesn't sound like much, but we have a 1970s legacy system that we're integrating with (anyone want a job as a UniBasic Programmer?), so this is pretty significant load against the legacy system. Fortunately there are not that many pages that require a back-end hit, so for the other pages we want 10,000 concurrent users (still hitting a page every 2 seconds).

We're going to start by tuning the non-back-ended pages.

Before we start actual tuning there are some measurement tools that are going to be important to us.

Firstly, a naive http load. This is just users clicking on links. Great for getting a sense of the simplest type of load. The tool for generating this kind of load is provided by apache in the httpd package. If you have httpd, you probably have apache bench (or ab). For the more complicated loads we'll be using jmeter (but more on that later).

apache bench on the command line:

ab -c 60 -n1000 http://dev1.local/portal/index.html

(don't forget to change your server name)

This will make 60 concurrent connections and request the page http://dev1.local/portal/index.html up to a maximum of 10000 pages.

This gives you a baseline. Save it.

My Baseline is:

SServer Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/index.html
Document Length:        5986 bytes

Concurrency Level:      10
Time taken for tests:   95.435580 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6260000 bytes
HTML transferred:       5986000 bytes
Requests per second:    10.48 [#/sec] (mean)
Time per request:       954.356 [ms] (mean)
Time per request:       95.436 [ms] (mean, across all concurrent requests)
Transfer rate:          64.05 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       3
Processing:    96  952 2084.1    676   36504
Waiting:       95  951 2084.1    675   36503
Total:         96  952 2084.1    676   36504

Percentage of the requests served within a certain time (ms)
  50%    676
  66%    846
  75%    931
  80%    996
  90%   1360
  95%   1884
  98%   3008
  99%   8735
 100%  36504 (longest request)

Tuning MySQL

OK, so now that all of that is out of the way let's start looking at the back of the stack. Now I should point out here that I am not an expert in MySQL. However there are loads of them out there and lots of them write webpages. Yipee! Thanks lazyweb.

For background take a look at [ http://www.ibm.com/developerworks/library/l-tune-lamp-3.html ] and [ http://ebergen.net/wordpress/2006/03/06/3-minute-mysql-tuning/ ].

There are also useful tools such as:

[ http://www.day32.com/MySQL/tuning-primer.sh ] [ http://hackmysql.com/mysqlreport ]

So for our purposes I'm just going to show the changes I made to my.cnf:

[mysqld]
datadir=/u01/mysql
socket=/u01/mysql/mysql.sock
user=mysql

# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

skip-locking
table_cache = 256

# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 4

# enable the slow query log, default 10 seconds
log-slow-queries
# log queries taking longer than 5 seconds
long_query_time = 5
# log queries that don't use indexes even if they take less than long_query_time
log-queries-not-using-indexes

# Add an in memory cache so that we can keep the content in memory
query_cache_size = 32M

# Increase packet size limit due to OpenCms requirements
max_allowed_packet=32M

# Setup some reasonable defaults

set-variable=max_connections=300
set-variable=wait_timeout=500
max_connect_errors = 100

# Start a thread cache

thread_cache = 30

# Start Replication Config Settings
log-bin=mysql-bin
server-id=1
innodb_flush_log_at_trx_commit=1
sync_binlog=1
# End Replication Config Settings
     
[mysql.server]
user=mysql
basedir=/u01/mysql

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[client]
user=mysql
socket=/u01/mysql/mysql.sock

Feel free to ignore the replication section - as we're running a backup hot standby server as well. Oh and you probably have different paths - I keep my data on a separate file system called /u01.

More interesting is the section for logging slow queries - keep an eye on these for later tuning work.

You can see the section:

; Setup some reasonable defaults

set-variable=max_connections=300
set-variable=wait_timeout=500
max_connect_errors = 100

Max connections stops mysql from going crazy and eating all the resources on the machine. For our expected load of 1000 concurrent users, 300 connections will probably be sufficient - but we'll see later.

OpenCMS connects to MySQL using the apache commons code dbcp. This seems to take a while to connect even while running locally, so we're upped the wait_timeout to 500. This is half a second and should be fine. If you start seeing errors in your catalina logs complaining about " Failed to write locks to database " then this is an area to look at.

Max connect errors is a bit of a magical incantation. While you would expect that an error either happens or it doesn't, we actually allow a larger than standard number of errors to occur before we kill a connection. This is to allow a bit more time for mysql to recover from transient issues than the default of max_connect_errors = 10.

After MySQL Tuning:

Server Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/index.html
Document Length:        5986 bytes

Concurrency Level:      10
Time taken for tests:   53.987987 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      6260000 bytes
HTML transferred:       5986000 bytes
Requests per second:    18.52 [#/sec] (mean)
Time per request:       539.880 [ms] (mean)
Time per request:       53.988 [ms] (mean, across all concurrent requests)
Transfer rate:          113.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   71 107.2      0     338
Processing:    52  466 2258.0    228   47924
Waiting:        0  442 2260.6    213   47910
Total:         52  538 2258.5    372   47924

Percentage of the requests served within a certain time (ms)
  50%    372
  66%    380
  75%    381
  80%    383
  90%    508
  95%    674
  98%   1239
  99%   6706
 100%  47924 (longest request)

So, pretty reasonable performance for at least a single page requested a lot of times. Don't forget we are currently asking mysql for this page on every hit as we have no http or tomcat caching yet.

Also remember that we are doing a single select a lot of times - your mysql tools might say that you are over allocating memory to caches at the moment, but (unless you have a single page on your site) this is not a very good approximation of reality yet. Come back to this once you have got to the end of this document and are using jmeter to test more of your site.

App Server

So let's move though the stack to the application server. This is tuning both OpenCMS and Tomcat. The first thing to do is to look at the connection between OpenCMS and the database. This is a component from the apache commons project called DataBase Connection Pool or commons-dbcp.

It is configured in ${OPENCMS_HOME}/WEB-INF/config/opencms.properties


db.pool.default.maxActive=200

I've selected 200 concurrent connections as maximum because I am expecting a pretty high concurrent load on the system. I'll be looking in the logs for connection pool issues such as 'unable to create locks' indicating that the pool is becoming exhausted.

You will also remember above that we configured a maximum number of mysql connections to 300. Try to make sure that dbcp is always less than the mysql connections for obvious reasons. If you increase db.pool.default.maxActive then go back and check you my.cnf again.

There are a bunch of other details in opencms.properties for dbcp that I'm not going to go into. They may go into a separate post called Failing Elegantly.

Flex Cache

One of the truisms of performance is to shorten the data path if possible. This means that if you can reuse data then do. Unfortunately a bunch of pages in our site are dynamic. Or are they?

Interesting question. But let's put it aside for a moment because we already know that there is a chunk of public static pages.

So we can cache all of those - but we won't get much of a performance gain as they're already going to be written out to the static area. But we may as well. In our case we have a 4GB machine, so we have memory to burn.

As a side issue - you might want to check out the FlexCache administration area in the OpenCMS Workspace to ensure that you have allocated enough memory to the flexcache. There's not a lot of use of having an 8MB flexcache if you have 16MB of pages to cache.

I'm also not going to repeat the flexcache documentation that you can see elsewhere - if you don't know the difference between caching on 'always' versus 'uri' then read the documentation.

So - we've set the page bodies to be cache 'always'.

We designed the pages for flexcaching - separating the template items such as navigation, headers, footers etc into separate files. All of these template items can also be cached - generally with the 'uri' variation caching.

After all this caching, let's run the tests again.

Server Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/opportunity/become_a_consultant/new_car/
Document Length:        7595 bytes

Concurrency Level:      10
Time taken for tests:   15.280626 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      7869000 bytes
HTML transferred:       7595000 bytes
Requests per second:    65.44 [#/sec] (mean)
Time per request:       152.806 [ms] (mean)
Time per request:       15.281 [ms] (mean, across all concurrent requests)
Transfer rate:          502.86 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    33  151 159.3    111    1488
Waiting:       32  151 159.3    111    1487
Total:         33  151 159.3    111    1488

Percentage of the requests served within a certain time (ms)
  50%    111
  66%    135
  75%    157
  80%    176
  90%    242
  95%    323
  98%    563
  99%   1369
 100%   1488 (longest request)

Ouch! that's looking better. There is an ititial hit while creating the page and then it is super fast.

Actually, let's try it again...

Server Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/opportunity/become_a_consultant/new_car/
Document Length:        7595 bytes

Concurrency Level:      10
Time taken for tests:   12.512816 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      7869000 bytes
HTML transferred:       7595000 bytes
Requests per second:    79.92 [#/sec] (mean)
Time per request:       125.128 [ms] (mean)
Time per request:       12.513 [ms] (mean, across all concurrent requests)
Transfer rate:          614.09 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    28  124 111.0     89    1044
Waiting:       28  124 111.0     89    1044
Total:         28  124 111.0     89    1044

Percentage of the requests served within a certain time (ms)
  50%     89
  66%    107
  75%    127
  80%    140
  90%    194
  95%    330
  98%    556
  99%    661
 100%   1044 (longest request)

Because I ran it a second time, there was no hit to create the page in the flexcache.

Now what about those dynamic pages?

Well we've done this before. And so we designed the pages so that we separate the truly dynamic content from any shared content. So if we have a table based report with some explanation in it, the table content is in a separate jsp page that is included by a template. The explanation, decoration etc is in static pages. So we can flexcache everything possible.

I'm not going to show you an ab test against the legacy system, as it is too embarrassing. And probably not useful to you. Except to say that the flexcache is your friend and so you should try to use it as much as possible.

So we're pretty happy with a small number of concurrent users (an ab concurrency of 10 is probably 200 concurrent users hitting refresh once every 2 seconds).

From here on in, we're going to up the number of concurrent requests. Our estimate is that 200 concurrent users in ab is representative of 5000 real concurrent users hitting a page every 2 seconds or so. At least that seems consistent with what we see in junit.

ab -c 200 -n10000 http://dev1.local/portal/index.html SPLAT

Tomcat falls in a heap. What is happening?

Tomcat Tuning

Looking in the catalina.out log shows that we have thread exhaustion. Let's look at our server.xml file:


<Server port="8005" shutdown="SHUTDOWN" debug="0">                    
    <Service name="Tomcat-Standalone">

        <!-- Define a connector for the "public visible" server name -->
        <Connector className="org.apache.coyote.tomcat5.CoyoteConnector"
            port="8081" minProcessors="5" maxProcessors="75"
            proxyName="dev1.local" proxyPort="80"
            useBodyEncodingForURI="true" enableLookups="true" redirectPort="8443"
            acceptCount="100" debug="0" connectionTimeout="20000"
            useURIValidationHack="false" disableUploadTimeout="true" />    
	
	<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 for debugging purposes only-->
    	<Connector port="8080" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
              enableLookups="false" redirectPort="8443" acceptCount="100"
              connectionTimeout="20000" disableUploadTimeout="true" />

        <Engine name="Standalone" defaultHost="localhost" debug="0">
            <Host name="localhost" debug="0" appBase="webapps" unpackWARs="true" autoDeploy="true">
            <!-- You can change "unpackWARs" and "autoDeploy" to false after the installation is finished -->        
                <Logger className="org.apache.catalina.logger.FileLogger"
                    directory="logs"  prefix="localhost_log." suffix=".txt"
                    timestamp="true"/>
            </Host>        
        </Engine>    
    </Service>    
</Server>

As you can see we have two connectors running. The first is the proxy endpoint from Apache. The second is one we use internally for debugging and the manager application etc.

We're not giving tomcat enough resources to deal with our expected load.

Let's start by giving tomcat more memory:

in catalina.sh we add a CATALINA_OPTS line:

CATALINA_OPTS="$CATALINA_OPTS -server -Xms1536m -Xmx1536m"

Also we change our server.xml file:

<Server port="8005" shutdown="SHUTDOWN" debug="0">
   <Service name="Tomcat-Standalone">

       <!-- Define a connector for the "public visible" server name -->
       <Connector className="org.apache.coyote.tomcat5.CoyoteConnector"
           port="8081" minProcessors="5" maxThreads="500" minSpareThreads="15" maxSpareThreads="200" 
           maxProcessors="200"
           proxyName="dev1.local" proxyPort="80"
           useBodyEncodingForURI="true" enableLookups="true" redirectPort="8443"
           acceptCount="200" debug="0" connectionTimeout="20000"
           useURIValidationHack="false" disableUploadTimeout="true" />
	
	<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 for debugging purposes only-->
   	<Connector port="8080" maxHttpHeaderSize="8192"
             maxThreads="15" minSpareThreads="2" maxSpareThreads="10"
             enableLookups="false" redirectPort="8443" acceptCount="100"
             connectionTimeout="20000" disableUploadTimeout="true" />

       <Engine name="Standalone" defaultHost="localhost" debug="0">
           <Host name="localhost" debug="0" appBase="webapps" unpackWARs="true" autoDeploy="true">
           <!-- You can change "unpackWARs" and "autoDeploy" to false after the installation is finished -->
               <Logger className="org.apache.catalina.logger.FileLogger"
                   directory="logs"  prefix="localhost_log." suffix=".txt"
                   timestamp="true"/>
           </Host>
       </Engine>
   </Service>
</Server>

And we'll run the ab test that failed:

Server Software:        Apache-Coyote/1.1
Server Hostname:        dev1.local
Server Port:            80

Document Path:          /portal/index.html
Document Length:        6005 bytes

Concurrency Level:      200
Time taken for tests:   92.178489 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      62140000 bytes
HTML transferred:       60050000 bytes
Requests per second:    108.49 [#/sec] (mean)
Time per request:       1843.570 [ms] (mean)
Time per request:       9.218 [ms] (mean, across all concurrent requests)
Transfer rate:          658.32 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6 169.5      0    8997
Processing:    16 1828 9496.1     64   92026
Waiting:       16 1828 9496.0     64   92024
Total:         16 1834 9505.2     64   92039

Percentage of the requests served within a certain time (ms)
  50%     64
  66%     83
  75%    105
  80%    130
  90%    533
  95%   2363
  98%  40927
  99%  61524
 100%  92039 (longest request)

OK - so first pass we're happy with the naive performance. There is still oddness in the 98%+ bracket where 2% of requests takes up to an order of magnitude longer to complete - but that may be an ab artefact.

Getting More Sophisticated

Next we'll start to do some more complex testing using junit and monitoring the server performance with jmx.

Now if you checked our setup above, you will have noted that we are not running tomcat natively (ie we're not using APR). This means that Tomcat does not have access to the underlying O/S. This is OK as for a production server you will be using some other monitoring tool. (Check out OpenNMS or Nagios if you're not yet).

So using JMX we can only get at VM statistics. First let's make sure that we have some jmx listeners configured. The easiest way to do so is to ensure that you Tomcat Manager application is installed and configured. (Go and do so now).

Check out /manager/status. All this data is available via xml, so we can do stuff with it.

Let's start by grabbing jmeter from the apache site. Next look at the documentation for setting up a monitoring thread. Do so.

Now lets setup some tests. I'm going to set them up long hand. Feel free to use the Proxy setup if you want to save some time (check out the jmeter documentation for how to do so).

We have a web workflow that we expect our customers to go through:

Login -> Customer Status Page -> Order step 1 ... Order step 6 -> Completion -> logout.

And before you ask, yes a 6 step order process is pretty long. We're working on that.

In jmeter, add a new thread group and call it something useful. Have it start one thread and loop once.

Next add a HTTP Request Defaults configuration element. This saves on typing and adds to reuse. Put in the domain name and the port of your application.

If you use cookies (and you may be unaware that your application uses cookies for session management) then add a cookie manager to the thread group as well.

Now it is time to add a separate HTTP request for each page. This is time consuming and boring. And can be tricky if you have a lot of parameters to pass. My suggestion is to go and read the documentation for the jmeter proxy.

Finally, add a View Results Tree Listener and hit Run. With luck your results in the listener will show each page visited in turn. Use the view response / render HTML to see the detail for your page. Make sure you're not seeing non 200 results (these generally appear in red).

What you should notice is that the pages are requested in order. Very useful.

Once you're happy with the flow, you can add listeners and threads and do some playing around...

OK - now back to the serious testing.

On our case, we can't supply enough load from a single machine to test the server. As our servers are located in a distant COLO facility, we're going to have to setup multiple machines from multiple sites to provide enough load as each individual site is network bound. Fortunately jmeter supports remote invocation.

Firstly, setup your production test plan. You can do some useful things like providing a list of parameters for jmeter to use to simulate different users / orders etc. As an example, I have 10,000 of our customers split between 10 CSV files. I named these users.csv and popped them into the same directory as my test plan. Then I added a CSV data set configuration element to the test plan and named the variables USER,PASS. Then on the login page HTTP request in the pane where it says 'send parameters with request' I changed the username and password data to be ${USER} and ${PASS}.

This tells jmeter to iterate through the CSV file replacing the variables with the values out of the CSV file. Easy. To make it even more simple, I tell the CSV file to no reiterate at EOF and to stop threads at EOF. This means that I can ensure that each user is checked only once.

Now it is time to setup the remote testing. If you are using a single network segment, this is simple. Otherwise it is tricky. This can be particularly painful if your network administrator is not excited by the opportunity. Buy him/her chocolate now. Then tell him/her that you are going to do some fun network testing and as a favour you'll share the fun. You never know, it might work.

Jmeter communicates via RMI. By default jmeter uses port 1099, so your network admin will need to allow traffic between the nodes on this port and also make sure that any QOS implications are understood.

Now for some nomenclature. I'm using the name controller for the machine that you're using to build test plans and start and stop them on all your testing clients. I use the name client to refer to the dumb machines that are running jmeter as controlled by your controller. And the target is the poor unsuspecting machine that you are trying to load test.

Now make sure all your clients have a copy of jmeter, and your data file. Your data file should be in the same directory as the jmeter executable. On each server start jmeter as a server ([PATH TO JMETER}/bin/jmeter-server.

Next go to your controller machine and edit jmeter.properties. Add each of your clients in a single, comma separated list. You may choose to not have the controller run tests. If so, make sure you don't have 127.0.0.1 in the list of remote hosts.

Now set the thread group to run a single test and start. You should see the results turn up in your controllers listener. This is both good and bad. Good in that you can see the results. Bad in that you have to design how the results are going to be sent to you. Not a lot of point in saturating a network with results instead of load testing your target. You can experiment with Remote batching configuration. My setup is for the clients to send summary reports as I can check the back-end for correctness and am really only interested in failures.

Now I can change the testplan to run 50 threads, loop forever (as each client will stop when it runs out of users to test) and set the thing to go. Instant, sophisticated, repeatable load test.

I can then run that for a while, repeating as I go and check each component as I go. Don't forget to check for O/S level errors, catalina.out, opencms.log and httpd/error.log.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox