Web Applications Performance Optimization Brainstorming Result
Recently one of my friend asked to share few points about performance optimization that he need to add in a lecture. Here are what major points, I could think of. Feel free to add another point in comments (I will add in post with thanks)!
The web applications performance can be optimized on different tiers. I would share some points for each tier. Some are to-do items. Others are little advance and requires protocol level understanding of web.
Web Browser
- Incorporate data compression for things that moves from server to browser (CSS/HTML/JS etc). See https://developers.google.com/speed/articles/gzip for details.
- Use AJAX where possible ... to avoid page refreshes, reduce load on server, provide better user experience.
- Minification of JS and CSS, see various tools here http://www.devcurry.com/2009/11/7-free-tools-to-minify-your-scripts-and.html
- Let browser to cache resources that you believes would not be updated soon (using HTTP headers) e.g. CSS, images, JS, etc.
- Merge JS files in one, http://www.lateralcode.com/combine-javascript-jmerge/
- Recently IETF standarized WebSocket protocol (see http://tools.ietf.org/html/rfc6455). Its socket based two-way communication b/w browser and server. Ideal for apps that require fast update (e.g. chat, stock, games etc). Make sure you web server and target web browsers support it. w3C is almost done with its API standardization (see http://www.w3.org/TR/websockets/)
- Recently Google developed SPDY protocol for faster page load. (see performance increase results in http://dev.chromium.org/spdy/spdy-whitepaper). It uses compression, multiplexing, and prioritization to increase page load speed. Latest version of Chrome and Mozilla support it. Current version of HTTP is 1.1, SPDY is expected to by added in HTTP 2.0 (ref: http://bit.ly/yvr44B). IETF draft is available here http://bit.ly/zHgp5z
Web Server performance is generally associated with its configurations and application architecture and algorithms implemented. For example I am using Tomcat for one my production site, there are following things that I care about. (I would love if you share your experience with Apache, Passenger, IIS, etc.)
- How many concurrent web request are coming to my site and how much the server is capbale of handling.
- The amount of memory allocated to Tomcat (or Apache).Is the server thread based or process based (thought process based servers is story of past, but still you should be careful about)
- Disabling hot-deployment also improves performance (as server do not instantiate threads to continuously check resource updates)
- On what OS you are running the Web Server also matters (though its generic point and fit for all server side components)
- Be careful to choose right architecture for web tier e.g. use better algorithms, MVC may not be best thing always, choose good software design for given project, use libraries that are well proven from performance perspective when you have alternatives e.g. for image processing.
- There may be some data available on external server and publicly accessable. For example fetching user images from their FB profile using Graph API. But it may slow you page load. So always profile such use cases. And copy that data (images) at you own server at once, and server that from your server. Do not loose your users because other's servers are slow.
- Make application servers clusters and add load balancer which distribute requests based on each server load.
Application Server
- Use better algorithms and right architectural components also fits here.
- Use caching at application level for data that is fetched from DB. Fetching data from persistence storage (e.g. DB) may slow your web application if the number of hits are high.
- Optimized your caching service also. There are multiple strategies e.g. First in first our, most recently used, etc. Implement right strategy. Some data elements may need more storage than others. So allocate cache memory for different type of data based on profiling/analytics instead of random guess.
- Some data is highly frequently used (cities list, countries list, categories of different items, products list, head of accounts, etc) ... must cache such data at web server layer.
- Understand the domain well and use right components for solution. e.g. you may be using DB where Queuing Server is right solution.
- Do not process things synchronously that are taking longer (say more than 200ms). Schedule such jobs, process then a asynchronously, and intimate user later. It would increase the user experience and system throughput.
- Interacting will external server always takes little longer than processing the request locally. If when you need to interact with external servers (Facebook server, FTP server, Google services, etc) use optimized and well tested APIs.
- Some tasks may be done when user requests are low. For example, you may process reports, send bulk emails, etc from 1am-3am when load is very low. So use scheduler intelligently. Do not schedule jobs in peak hours.
- Database may not be only right solution for persistence storage. Think of NoSQL based DB, Directory servers, etc. also. Whose selection depends on how much big infrastructure you have and what is the nature of data you need to persist.
- Make application servers clusters and add load balancer which distribute requests based on each server load.
Database Server
- A normalized DB design is good, but sometime, we need to denormalize to improve performance.
- Make sure, tables columns that are used in where clause, index is applied. It will increase the fetch performance exponentially.
- If there are more than one columns in SQL WHERE clause, use composite indexing.
- Use relationships to ensure data entegrity (though its little to do with performance)
- Optimize DB server configuration depending on application (e.g. buffer sizes, number of concurrent connections, locking, engine type, query timeouts, query cache size, read buffer size, thread cache size etc)
- Use DB replication when needed. You may send updates on one server and fetch queries to other servers.
- For very large applications, you may need clustering at DB level too.
- Use stored procedures, the optimized performance as are processed once and also run at DB server where data reside.
Comments
Post a Comment