Saturday, November 8, 2008

PHP Top scalability mistakes

1. Define the scalability goals for your application. If you don’t know how many requests you’re shooting for, you don’t know whether you’ve built something that works, and how long it’s going to last you.
2. Measure everything. CPU usage, memory usage, disk I/O, network I/O, requests per second, with the last one being the most important. If you don’t know the baseline, you don’t know whether you’ve improved.
3. Design your database with scalability in mind. Assume you’ll have to implement replication.
John Coggeshall, CTO of Automotive Computer Services, and author of Zend PHP Certification Practice Book and PHP5 Unleashed, gave a talk at OSCON 2008 on top 10 scalability mistakes. I wasn’t there, but he posted the slides for everybody to follow. Here’re some lessons learned.

1. Define the scalability goals for your application. If you don’t know how many requests you’re shooting for, you don’t know whether you’ve built something that works, and how long it’s going to last you.
2. Measure everything. CPU usage, memory usage, disk I/O, network I/O, requests per second, with the last one being the most important. If you don’t know the baseline, you don’t know whether you’ve improved.
3. Design your database with scalability in mind. Assume you’ll have to implement replication.
4. Do not rely on NFS for code sharing on a server farm. It’s slow and it’s got locking issues. While the idea of keeping one copy of code, and letting the rest of the servers load them via NFS might seem very convenient, it doesn’t work in practice. Stick to some tried practices like rsync. Keep the code local to the machine serving it, even if it means a longer push process.
5. Play around with I/O buffers. If you’ve got tons of memory, play with TCP buffer size - your defaults are likely to be set conservatively. See your tax dollars at work and use this Linux TCP Tuning guide. If your site is written in PHP, use output buffering functions.
6. Use Ram Disks for any data that’s disposable. But you do need a lot of available RAM lying around.
7. Optimize bandwidth consumption by enabling compression via mod_deflate, setting zlib.put_compression value to true for PHP sites, or Tidy content reduction for PHP+Tidy sites.
8. Confugure PHP for speed. Turn off the following: register_globals, auto_globals_jit, magic_quotes_gpc, expose_php, register_argc_argv, always_populate_raw_post_data, session.use_trans_sid, session.auto_start. Set session.gc_divisor to 10,000, output_buffering to 4096, in John’s example.
9. Do not use blocking I/O, such as reading another remote page via curl. Make all the calls non-blocking, otherwise the wait is something you can’t really optimize against. Rely on background scripts to pull down the data necessary for processing the request.
10. Don’t underestimate caching. If a page is cached for 5 minutes, and you get even 10 requests per second for a given page, that’s 3,000 requests your database doesn’t have to process.
11. Consider PHP op-code cache. This will be available to you off-the-shelf with PHP6.
12. For content sites consider taking static stuff out of dynamic context. Let’s say you run a content site, where the article content remains the same, while the rest of the page is personalized for each user, as it has My Articles section, and so on. Instead of getting everything dynamically from the DB, consider generating yet another PHP file on the first request, where the article text would be stored in raw HTML, and dynamic data pulled for logged-in users. This way the generated PHP file will only pull out the data that’s actually dynamic.
13. Pay great attention to database design. Learn indexes and know how to use them properly. InnoDB outperforms MyISAM in almost all contexts, but doesn’t do full-text searching. (Use sphinx if your search needs get out of control.)
14. Design PHP applications in an abstract way, so that the app never needs to know the IP address of the MySQL server. Something like ‘mysql-writer-db’, and ‘mysql-reader-db’ will be perfectly ok for a PHP app.
15. Run external scripts monitoring the system health. Have the scripts change the HOSTS if things get out of control.
16. Do not do database connectivity decision-making in PHP. Don’t spend time doing fallbacks if your primary DB is down. Consider running MySQL Proxy for simplifying DB connectivity issues.
17. For super-fast reads consider SQLite. But don’t forget that it’s horrible with writes.
18. Use Keepalive properly. Use it when both static and dynamic files are served off the same server, and you can control the timeouts, so that a bunch of Keep-alive requests don’t overwhelm your system. John’s rule? No Keep-alive request should last more than 10 seconds.
19. Monitor via familiar Linux commands. Such as iostat and vmstat. The iostat command is used for monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates. The iostat command generates reports that can be used to change system configuration to better balance the input/output load between physical disks. vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity.
20. Make sure you’re logging relevant information right away. Otherwise debugging issues is going to get tricky.
21. Prioritize your optimizations. Optimization by 50% of the code that runs on 2% of the pages will result in 1% total improvement. Optimizing 10% of the code that runs on 80% of the pages results in 8% overall improvement.
22. Use profilers. They draw pretty graphs, they’re generally easy to use.
23. Keep track of your system performance. Keep a spreadsheet of some common stats you’re tracking, so that you can authoritatively say how much of performance gain you got by getting a faster CPU, installing extra RAM, or upgrading your Linux kernel.

1 comments:

Vice said...

More good info here.....I'll be back!

Your Ad Here
Reader's kind attention....The articles contained in this blog can be taken from other web sites, as the main intention of this blog is to let people get all sides of the web technologies under the single roof..so if any one finds duplication or copy of your articles in this blog and if you want that to be removed from this ..kindly inform me and i will remove it...alternatively if you want me to link back to your site with the article...that can also be done...

Thanks,
Webnology Blog Administrator
 

blogger templates