Performance Tuning Example
Contents |
Performance Tuning by Example
Please note that this page is being created now (5 feb 2008) and it is probably not worth reading until I've finished my first edits
Rather than a list of recipes to follow, this page takes you through a real world example.
If you have any disagreements or suggestions about performance tuning, please feel free to add them to the page, but for the convenience of anyone that is following this, mark them as suggestions - please don't just change the text.
Starting Point
There are already a bunch of tutorials out there about getting an initial configuration. At the start of this process, I expect that you will have verything setup and working to your satisfaction.
We have the following major components:
1. apache-tomcat-5.5.25 2. httpd-2.2.3-7 3. OpenCMS 7.03 4. mysql-server-5.0.22-2.1
We have used the Alkacon documentation to setup apache in front of OpenCMS using mod_proxy. We do have a slightly weird naming convention as (for historical reasons) we separate our 'public' and 'customer' sites such that the public site is totally exported while the customer site is mediated by opencms.
Thus we have something like:
/ -> mod_rewrite /export/aunz /portal -> mod_proxy -> opencms /export/aunz (all static pages) /export/system /resources
It may be easier to see what we're doing by looking at the /etc/httpd/conf.d/opencms.conf file:
Invalid language.
You need to specify a language like this: <source lang="html">...</source>
Supported languages for syntax highlighting:
actionscript, ada, apache, applescript, asm, asp, autoit, bash, blitzbasic, bnf, c, c_mac, caddcl, cadlisp, cfdg, cfm, cpp, cpp-qt, csharp, css, d, delphi, diff, div, dos, eiffel, fortran, freebasic, gml, groovy, html4strict, idl, ini, inno, io, java, java5, javascript, latex, lisp, lua, matlab, mirc, mpasm, mysql, nsis, objc, ocaml, ocaml-brief, oobas, oracle8, pascal, perl, php, php-brief, plsql, python, qbasic, reg, robots, ruby, sas, scheme, sdlbasic, smalltalk, smarty, sql, tcl, text, thinbasic, tsql, vb, vbnet, vhdl, visualfoxpro, winbatch, xml, z80
Tuning Setup
The first thing to do in any tuning is to get a sense of what you want your site to do. I would imagine that scalability, performance and stability are import to you. They are to me.
We are aiming to have the site tuned for 1,000 concurrent users hitting a dynamically generated page with back-end data every 2 seconds. This doesn't sound like much, but we have a 1970s legacy system that we're integrating with (anyone want a job as a UniBasic Programmer?), so this is pretty significant load against the legacy system. Fortunately there are not that many pages that require a back-end hit, so for the other pages we want 10,000 concurrent users (still hitting a page every 2 seconds).
We're going to start by tuning the non-back-ended pages.
Before we start actual tuning there are some measurement tools that are going to be important to us.
Firstly, a naive http load. This is just users clicking on links. Great for getting a sense of the simplest type of load. The tool for generating this kind of load is provided by apache in the httpd package. If you have httpd, you probably have apache bench (or ab). For the more complicated loads we'll be using jmeter (but more on that later).
apache bench on the command line:
ab -c 60 -n1000 http://dev1.local/portal/index.html
(don't forget to change your derver name)
This will make 60 concurrent connections and request the page http://dev1.local/portal/index.html up to a maximum of 10000 pages.
This gives you a baseline. Save it.
My Baseline is:
SServer Software: Apache-Coyote/1.1 Server Hostname: dev1.local Server Port: 80 Document Path: /portal/index.html Document Length: 5986 bytes Concurrency Level: 10 Time taken for tests: 95.435580 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 6260000 bytes HTML transferred: 5986000 bytes Requests per second: 10.48 [#/sec] (mean) Time per request: 954.356 [ms] (mean) Time per request: 95.436 [ms] (mean, across all concurrent requests) Transfer rate: 64.05 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 3 Processing: 96 952 2084.1 676 36504 Waiting: 95 951 2084.1 675 36503 Total: 96 952 2084.1 676 36504 Percentage of the requests served within a certain time (ms) 50% 676 66% 846 75% 931 80% 996 90% 1360 95% 1884 98% 3008 99% 8735 100% 36504 (longest request)
Tuning MySQL
OK, so now that all of that is out of the way let's start looking at the back of the stack. Now I should point out here that I am not an expert in MySQL. However there are loads of them out there and lots of them write webpages. Yipee! Thanks lazyweb.
For background take a look at [1] and [2].
There are also useful tools such as:
So for our purposes I'm just going to show the changes I made to my.cnf:
[mysqld] datadir=/u01/mysql socket=/u01/mysql/mysql.sock user=mysql # Default to using old password format for compatibility with mysql 3.x # clients (those using the mysqlclient10 compatibility package). old_passwords=1 skip-locking table_cache = 256 # Try number of CPU's*2 for thread_concurrency thread_concurrency = 4 # enable the slow query log, default 10 seconds log-slow-queries # log queries taking longer than 5 seconds long_query_time = 5 # log queries that don't use indexes even if they take less than long_query_time log-queries-not-using-indexes # Add an in memory cache so that we can keep the content in memory query_cache_size = 32M # Increase packet size limit due to OpenCms requirements max_allowed_packet=32M # Setup some reasonable defaults set-variable=max_connections=300 set-variable=wait_timeout=500 max_connect_errors = 100 # Start a thread cache thread_cache = 30 # Start Replication Config Settings log-bin=mysql-bin server-id=1 innodb_flush_log_at_trx_commit=1 sync_binlog=1 # End Replication Config Settings [mysql.server] user=mysql basedir=/u01/mysql [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid [client] user=mysql socket=/u01/mysql/mysql.sock
Feel free to ignore the replication section - as we're running a backup hot standby server as well. Oh and you probably have different paths - I keep my data on a separate file system called /u01.
More interesting is the section for loggin slow queries - keep an eye on these for later tuning work.
You can see the section:
; Setup some reasonable defaults set-variable=max_connections=300 set-variable=wait_timeout=500 max_connect_errors = 100
Max connections stops mysql from going crazy and eating all the resources on the machine. For our expected load of 1000 concurrent users, 300 connections will probably be sufficient - but we'll see later.
OpenCMS connects to MySQL using the apache commons code dbcp. This seems to take a while to connect even while running locally, so we're upped the wait_timeout to 500. This is half a second and should be fine. If you start seeing errors in your catalina logs complaining about " Failed to write locks to database " then this is an area to look at.
Max connect errors is a bit of a magical incantation. While you would expect that an error either happens or it doesn't we actually allow a larger than standard number of errors to occur before we kill a connection. This is to allow a bit more time for mysql to recover from transient issues than the default of max_connect_errors = 10
After MySQL Tuning:
Server Software: Apache-Coyote/1.1 Server Hostname: dev1.local Server Port: 80 Document Path: /portal/index.html Document Length: 5986 bytes Concurrency Level: 10 Time taken for tests: 53.987987 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 6260000 bytes HTML transferred: 5986000 bytes Requests per second: 18.52 [#/sec] (mean) Time per request: 539.880 [ms] (mean) Time per request: 53.988 [ms] (mean, across all concurrent requests) Transfer rate: 113.23 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 71 107.2 0 338 Processing: 52 466 2258.0 228 47924 Waiting: 0 442 2260.6 213 47910 Total: 52 538 2258.5 372 47924 Percentage of the requests served within a certain time (ms) 50% 372 66% 380 75% 381 80% 383 90% 508 95% 674 98% 1239 99% 6706 100% 47924 (longest request)
So, pretty reasonable performance for at least a single page requested a lot of times. Don't forget we are currently asking mysql for this page on every hit as we have no http or tomcat caching yet.
Also remember that we are doing a single select a lot of times - your mysql tools might say that you are over allocating memory to caches at the moment, but (unless you have a single page on your site) this is not a very good approximation of reality yet.
App Server
So let's move though the stack to the application server. This is tuning both OpenCMS and Tomcat. The first thing to do is to look at the connection between OpenCMS and the database. This is a component from the apache commons project called DataBase Connection Pool or commons-dbcp.
It is configured in ${OPENCMS_HOME}/WEB-INF/config/opencms.properties
Invalid language.
You need to specify a language like this: <source lang="html">...</source>
Supported languages for syntax highlighting:
actionscript, ada, apache, applescript, asm, asp, autoit, bash, blitzbasic, bnf, c, c_mac, caddcl, cadlisp, cfdg, cfm, cpp, cpp-qt, csharp, css, d, delphi, diff, div, dos, eiffel, fortran, freebasic, gml, groovy, html4strict, idl, ini, inno, io, java, java5, javascript, latex, lisp, lua, matlab, mirc, mpasm, mysql, nsis, objc, ocaml, ocaml-brief, oobas, oracle8, pascal, perl, php, php-brief, plsql, python, qbasic, reg, robots, ruby, sas, scheme, sdlbasic, smalltalk, smarty, sql, tcl, text, thinbasic, tsql, vb, vbnet, vhdl, visualfoxpro, winbatch, xml, z80
I've selected 200 concurrent connections as maximum because I am expecting a pretty high concurrent load on the system. I'll be looking in the logs for connection pool issues such as 'unable to create locks' indicating that the pool is becoming exhausted.
You will also remember above that we configured a maximum number of mysql connections to 300. Try to make sure that dbcp is always less than the mysql connections for obvious reasons. If you increase db.pool.default.maxActive then go back and check you my.cnf again.
There are a bunch of other details in opencms.properties for dbcp that I'm not going to go into. They may go into a separate post called Failing Elegantly.
Flex Cache
One of the truisms of performance is to shorten the data path if possible. This means that if you can reuse data then do. Unfortunately a bunch of pages in our site are dynamic. Or are they?
Interesting question. But let's put it aside for a moment because we already know that there is a chunk of public static pages.
So we can cache all of those - but we won't get much of a performance gain as they're already going to be written out to the static area. But we may as well. In our case we have a 4GB machine, so we have memory to burn.
As a side issue - you might want to check out the FlexCache administration area in the OpenCMS Workspace to ensure that you have allocated enough memory to the flexcache. There's not a lot of use of having an 8MB flexcache if you have 16MB of pages to cache.
I'm also not going to repeat the flexcache documentation that you can see elsewhere - if you don't know the difference between caching on 'always' versus 'uri' then read the documentation.
So - we've set the page bodies to be cache 'always'.
We designed the pages for flexcaching - separating the template items such as navigation, headers, footers etc into separate files. All of these template items can also be cached - generally with the 'uri' variation caching.
After all this caching, let's run the tests again.
Server Software: Apache-Coyote/1.1 Server Hostname: www.nutrimetics.com.au Server Port: 80 Document Path: /portal/opportunity/become_a_consultant/new_car/ Document Length: 7595 bytes Concurrency Level: 10 Time taken for tests: 15.280626 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 7869000 bytes HTML transferred: 7595000 bytes Requests per second: 65.44 [#/sec] (mean) Time per request: 152.806 [ms] (mean) Time per request: 15.281 [ms] (mean, across all concurrent requests) Transfer rate: 502.86 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 33 151 159.3 111 1488 Waiting: 32 151 159.3 111 1487 Total: 33 151 159.3 111 1488 Percentage of the requests served within a certain time (ms) 50% 111 66% 135 75% 157 80% 176 90% 242 95% 323 98% 563 99% 1369 100% 1488 (longest request)
Ouch! that's looking better. There is an ititial hit while creating the page and then it is super fast.
Actually, let's try it again...
Server Software: Apache-Coyote/1.1 Server Hostname: www.nutrimetics.com.au Server Port: 80 Document Path: /portal/opportunity/become_a_consultant/new_car/ Document Length: 7595 bytes Concurrency Level: 10 Time taken for tests: 12.512816 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 7869000 bytes HTML transferred: 7595000 bytes Requests per second: 79.92 [#/sec] (mean) Time per request: 125.128 [ms] (mean) Time per request: 12.513 [ms] (mean, across all concurrent requests) Transfer rate: 614.09 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 28 124 111.0 89 1044 Waiting: 28 124 111.0 89 1044 Total: 28 124 111.0 89 1044 Percentage of the requests served within a certain time (ms) 50% 89 66% 107 75% 127 80% 140 90% 194 95% 330 98% 556 99% 661 100% 1044 (longest request)
Because I ran it a second time, there was no hit to create the page in the flexcache.
Now what about those dynamic pages?
Well we've done this before. And so we designed the pages so that we separate the truly dynamic content from any shared content. So if we have a table based report with some explanation in it, the table content is in a separate jsp page that is included by a template. The explanation, decoration etc is in static pages. So we can flexcache everything possible.
I'm not going to show you an ab test against the legacy system, as it is too embarrassing. And probably not useful to you. Except to say that the flexcache is your friend and so you should try to use it as much as possible.
So we're pretty happy with a small number of concurrent users (an ab concurrency of 10 is probably 200 concurrent users hitting refresh once every 2 seconds).
From here on in, we're going to up the number of concurrent requests. Our estimate is that 200 concurrent users in ab is representative of 5000 real concurrent users hitting a page every 2 seconds or so. At least that seems consistent with what we see in junit.
ab -c 200 -n10000 http://dev1.local/portal/index.html SPLAT
Tomcat falls in a heap. What is happening?
Tomcat Tuning
Looking in the catalina.out log shows that we have thread exhaustion. Let's look at our server.xml file:
<Server port="8005" shutdown="SHUTDOWN" debug="0"> <Service name="Tomcat-Standalone"> <!-- Define a connector for the "public visible" server name --> <Connector className="org.apache.coyote.tomcat5.CoyoteConnector" port="8081" minProcessors="5" maxProcessors="75" proxyName="dev1.local" proxyPort="80" useBodyEncodingForURI="true" enableLookups="true" redirectPort="8443" acceptCount="100" debug="0" connectionTimeout="20000" useURIValidationHack="false" disableUploadTimeout="true" /> <!-- Define a non-SSL HTTP/1.1 Connector on port 8080 for debugging purposes only--> <Connector port="8080" maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" /> <Engine name="Standalone" defaultHost="localhost" debug="0"> <Host name="localhost" debug="0" appBase="webapps" unpackWARs="true" autoDeploy="true"> <!-- You can change "unpackWARs" and "autoDeploy" to false after the installation is finished --> <Logger className="org.apache.catalina.logger.FileLogger" directory="logs" prefix="localhost_log." suffix=".txt" timestamp="true"/> </Host> </Engine> </Service> </Server>
As you can see we have two connectors running. The first is the proxy endpoint from Apache. The second is one we use internally for debugging and the manager application etc.
We're not giving tomcat enough resources to deal with our expected load.
Let's start by giving tomcat more memory:
in catalina.sh we add a CATALINA_OPTS line:
Invalid language.
You need to specify a language like this: <source lang="html">...</source>
Supported languages for syntax highlighting:
actionscript, ada, apache, applescript, asm, asp, autoit, bash, blitzbasic, bnf, c, c_mac, caddcl, cadlisp, cfdg, cfm, cpp, cpp-qt, csharp, css, d, delphi, diff, div, dos, eiffel, fortran, freebasic, gml, groovy, html4strict, idl, ini, inno, io, java, java5, javascript, latex, lisp, lua, matlab, mirc, mpasm, mysql, nsis, objc, ocaml, ocaml-brief, oobas, oracle8, pascal, perl, php, php-brief, plsql, python, qbasic, reg, robots, ruby, sas, scheme, sdlbasic, smalltalk, smarty, sql, tcl, text, thinbasic, tsql, vb, vbnet, vhdl, visualfoxpro, winbatch, xml, z80
Also we change our server.xml file:
<Server port="8005" shutdown="SHUTDOWN" debug="0"> <Service name="Tomcat-Standalone"> <!-- Define a connector for the "public visible" server name --> <Connector className="org.apache.coyote.tomcat5.CoyoteConnector" port="8081" minProcessors="5" maxThreads="500" minSpareThreads="15" maxSpareThreads="200" maxProcessors="200" proxyName="dev1.local" proxyPort="80" useBodyEncodingForURI="true" enableLookups="true" redirectPort="8443" acceptCount="200" debug="0" connectionTimeout="20000" useURIValidationHack="false" disableUploadTimeout="true" /> <!-- Define a non-SSL HTTP/1.1 Connector on port 8080 for debugging purposes only--> <Connector port="8080" maxHttpHeaderSize="8192" maxThreads="15" minSpareThreads="2" maxSpareThreads="10" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" /> <Engine name="Standalone" defaultHost="localhost" debug="0"> <Host name="localhost" debug="0" appBase="webapps" unpackWARs="true" autoDeploy="true"> <!-- You can change "unpackWARs" and "autoDeploy" to false after the installation is finished --> <Logger className="org.apache.catalina.logger.FileLogger" directory="logs" prefix="localhost_log." suffix=".txt" timestamp="true"/> </Host> </Engine> </Service> </Server>
And we'll run the ab test that failed:
Server Software: Apache-Coyote/1.1 Server Hostname: www.nutrimetics.com.au Server Port: 80 Document Path: /portal/index.html Document Length: 6005 bytes Concurrency Level: 200 Time taken for tests: 92.178489 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 62140000 bytes HTML transferred: 60050000 bytes Requests per second: 108.49 [#/sec] (mean) Time per request: 1843.570 [ms] (mean) Time per request: 9.218 [ms] (mean, across all concurrent requests) Transfer rate: 658.32 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 6 169.5 0 8997 Processing: 16 1828 9496.1 64 92026 Waiting: 16 1828 9496.0 64 92024 Total: 16 1834 9505.2 64 92039 Percentage of the requests served within a certain time (ms) 50% 64 66% 83 75% 105 80% 130 90% 533 95% 2363 98% 40927 99% 61524 100% 92039 (longest request)
OK - so first pass we're happy with the naive performance. There is still oddness in the 98%+ bracket where 2% of requests takes up to an order of magnitude longer to complete - but that may be an ab artefact.
Next we'll start to do some more complex testing using junit and monitoring the server performance with jmx.