Web applications for dummies
Introduction
This document deals with common issues when moving from dedicated applications
to web applications
General Tips
To check for syntax errors in a PHP script: php -l myscript.php
Performance Tips
- Use AJAX when users may not want to open multiple pages in tabs with
CTRL-click (eg. opening multiple articles to avoid waiting for each to load):
Useful to reduce the amount of data to handle and send
- GET to read, POST to modify
- Write a prototype in PHP and then as a long-running process, and see
if the latter is faster
- If the application is DB-intensive, plan ahead so that it can scale
easily (load balancer, multiple web servers, multiple DB servers, multiple
image servers, data caching, etc.)
- Pay attention to any input from the user (XSS, SQL injection, etc.)
- Cache JavaScript + images in the browser
Things to watch out for when writing web applications
Here's a list of things to pay attention when moving from the world of dedicated
applications to web applications:
- Don't ever trust user input. Whenever you receive data from the user
(usually by a posted form, or by some GET variables), clean, check and double-check
the data for validity. Don't ever assume that the user has put a valid value
in a form input field.
- Every page is a *different* run of your program. HTTP is a stateless
protocol. That means: Be careful when passing data from one page to another,
and be careful about race conditions
- (from The
no-framework PHP MVC framework) As for MVC, if you use it carefully,
it can be useful in a web application. Just make sure you avoid the temptation
of creating a single monolithic controller. A web application by its very
nature is a series of small discrete requests. If you send all of your requests
through a single controller on a single machine you have just defeated this
very important architecture. Discreteness gives you scalability and modularity.
You can break large problems up into a series of very small and modular
solutions and you can deploy these across as many servers as you like. You
need to tie them together to some extent most likely through some backend
datastore, but keep them as separate as possible. This means you want your
views and controllers very close to each other and you want to keep your
controllers as small as possible
- If using page-based applications, make sure that files that contain
sensitive data (eg. database connection infos) are located in a protected
directory, out of the web server's /htdocs directory
- (from Working
with a stateless protocol) Use sessions to solve HTTP's statelessness;
Use GET to read data, and POST to update data; Use a single script to GET
a form and POST the update as relevant data live in a single $_SESSION area;
one page = one script; in sections off-limit to non-authorized users, each
page must check with the $_SESSION data whether the user has been authenticated,
and bump him back to the logon page otherwise; check that you don't expose
unnecessary information from the database such as primary keys; make sure
a transaction isn't replayed by clicking on the back button (use a stack
to keep track of past transactions); watch out for a user opening multiple
windows of the browser (same set of cookies), or multiple browsers (different
sets of cookies), and use session data to tell them apart
- (from Saving
PHP's Session data to a database) If you need to run the application
on mutiple servers with load balancing, sessions should be saved in a database
instead of sessions files on each computer's hard disk
- (from Pagination
- what it is and how to do it) Shows how to display a big number of
rows from a database
- (from Back
Button Blues) Disable the browser's cache, and maintain a program stack
(forwards add to the stack, backwards remove from it)
- (from Client
Clones and Server Sessions) Remember that a user may open multiple windows
of his browser, or even different browsers on the same computer, all pointing
to the same application, so you must provide a link such as "New session"
to let the user start afresh in a new window by generating a new session
name, and setting it to a new session ID, so that each window gets it own
session; If you don't want to use sessions cookies, PHPSESSID can be passed
in the URL or in the form data as a hidden field
- HTTP is a stateless protocol, ie. each page is independ from the other
pages that make up your site, which causes issues when passing parameters
(eg. connection to DB made in page1, "select" made in page2, etc.) Use cookies,
session IDs, or a persistance engine like ZODB (otherwise, restarting
the server would delete all the sessions).
From Quixote's documentation
: "We used Quixote and the ZODB to fix this, setting a randomly generated
session cookie on Web browsers that access our site and using the cookie's
value to look up a corresponding Session object. Session objects are also
stored persistently in our ZODB and committed at the end of every HTTP transaction,
meaning that we can restart the Quixote server at any time to load updated
code, and we can reboot the machine without losing the collection of user
sessions."
- Random access: Users can bookmark a page and hit it directly. If using
an application server, make sure only public methods are callable, and only
by authenticated users
- Access control is a pain: A plain-vanilla web server can just allow/forbid
access to a given directory. More sophisticated tools like Zope offer enhanced
access-control
- Less client-side control, eg. no event triggered when the user moves the mouse
over a web page, unless you include some JavaScript in the page
- Back button: You can't forbid users from clicking on the Back button.
How do developers display an error when the user backs up?
- Incompatibilities and small differences between browsers. Don't make
your interface more sophisticated than necessary
To read
Investigating slow performance
- Bandwidth: ping www.acme.come and
see how fast the server replies
- Server: vmstat, iostat
- Web server: Upload a basic, "Hello, world!" HTML page
- PHP: Write a simple PHP script with start/end time to print "Hello,
world!"
- MySQL: Connect to server with mysql client, and profile database (eg.
explain, show status, etc.) to profile queries
- Needed? PHP + MySQL: Run the guilty queries identified previously
- uptime to check load average: If > 5, add more CPU
- ps aux: Check CPU time + RAM/virtual memory used by programs,
and their state (running, runable, sleeping)
- pstree
- top: if lots of sleeping processes but CPU somewhat idle, probably due
to I/O blocks
General/hardware/OS
- Watch out for frequent page-outs, ie. if the server needs to move pages
from disk to RAM too often ("trashing"). Use vmstat ("virtual memory statistics")
to check it, and top and
ps to identify the processes that are using the most memory
- /sbin/hdparm -d1 -Tt /dev/hda
- time and clock (to tell how long a process runs?)
/proc/stat
(From /proc/stat
explained):
- cpu 2255 34 2290 22625563 6290 127 456 : user, nice, system,
idle, iowait, irq, softirq
- The "intr" line gives counts of interrupts serviced since
boot time, for each of the possible system interrupts. The first column
is the total of all interrupts serviced; each subsequent column is the total
for that particular interrupt.
- The "ctxt" line gives the total number of context switches
across all CPUs.
- The "processes" line gives the number of processes and threads
created, which includes (but is not limited to) those created by calls to
the fork() and clone() system calls.
- The "procs_blocked" line gives the number of processes currently
blocked, waiting for I/O to complete.
- In Linux 2.6, cpu lines have 8 numeric fields: The 8th column is called
steal_time. It counts the ticks spent executing other virtual hosts (in
virtualised environments like Xen)
(From Red
Hat Enterprise Linux 5.1 Deployment Guide)
/proc/interrupts records the number of interrupts per IRQ on the x86 architecture
/proc/ide/ contains information about IDE devices on the system. Each IDE
channel is represented as a separate directory, such as /proc/ide/ide0 and /proc/ide/ide1.
Many chipsets also provide a file in this directory with additional data concerning
the drives connected through the channels (eg. is DMA enabled, etc.) Within
each IDE channel directory is a device directory. The name of the device directory
corresponds to the drive letter in the /dev/ directory. For instance, the first
IDE drive on ide0 would be hda. Each device directory contains a collection
of information and statistics.
top/uptime
Load averages: The same information can be had by running "uptime".
The average sum of the number of processes waiting in the
run-queue plus the number currently executing over 1, 5, and 15 minute
time periods. . If the numbers of running processes is significantly and steadily
higher than the number of CPU (eg. if you have one CPU but a load average of
20 running processes), you're in trouble. OTOH, if most of the processes are
waiting to run, it might mean that they're waiting for the DBMS to reply, the
hard disk to complete a task, or the network to send/receive data. Ideally,
the load average should be consistently inferior to the number of CPU's you
have.
Too much idle means
nothing is being done; too much system time indicates a need for
faster I/O or additional devices to spread the load. Watch these numbers over
time to determine what's normal
for that system,and watch for changes.
Watch for interrupts coming from peripherals to get a rough idea of
how much load the associated device is handling.
- us -- User CPU time: The time the CPU has spent running
users' processes that are not niced
- sy -- System CPU time: The time the CPU has spent running
the kernel and its processes
- ni -- Nice CPU time: The time the CPU has spent running
users' proccess that have been niced
- wa -- iowait: Amount of time the CPU has been waiting for
I/O to complete
- hi -- Hardware IRQ: The amount of time the CPU has been
servicing hardware interrupts
- si -- Software Interrupts: The amount of time the CPU has
been servicing software interrupts.
vmstat -n 5
shows IO, swap, memory and processor resources
hdparm -Tt /dev/sda
To check hard disks
iostat -dx 5 5
mpstat -P ALL 5 5
Information on disk I/O
free
"Free reports on memory, both real and swap. You get a snapshot of the
amount of real memory split across programs sharing the same memory space (shared),
buffers used by the kernel (buffers) and what has been cached to disk. The "-/+"
line reflects the total vs. used memory as reflected by the combination of the
disk buffer cache and memory actually written to disk."
ps
netstat
netstat -nap --tcp (or netstat -napc --tcp for a continuous refresh)
time <application>
Resources
Apache
ps and top : to check number of processes, and how much they work
logs : how many active connections it's maintaining
HTTPD-Test's
Flood
PHP
Just put a few calls to Microtime() to check how much time is spent in the
different parts of a PHP page:
- $starttimer = time()+microtime();
-
- $db = mysql_connect("localhost", "root","test");
- mysql_select_db("mysql",$db);
- $result = mysql_query("SELECT * FROM user",$db);
- printf("Host: %s<br>\n", mysql_result($result,0,"Host"));
- printf("User: %s<br>\n", mysql_result($result,0,"User"));
- mysql_free_result($result);
- mysql_close($db);
-
- $stoptimer = time()+microtime();
- $timer = round($stoptimer-$starttimer,4);
- echo "Page created in $timer seconds.";
Benchmark, xdebug,
DBG, Advanced
PHP Debugger
Profiling PHP
MySQL
- Find the queries that impact the server most (general log, slow queries
log)
- Check their execution plans with EXPLAIN
- Tune if necessary (SHOW STATUS/mysqladmin, SHOW PROCESSLIST, innotop,
FLUSH STATUS/SHOW SESSION STATUS, etc.)
Communication/latency? Database design? Write access? Read
Access? Caching? Bad query design? What does EXPLAIN tell you? Do your databases have enough memory in the
different caches?
Resources
Improving performance
ToCheck
Lightttpd is fine. It is recommended that you run both Apache and Lighttpd
on port 80, so you need to get separate IP addresses for each server. You may
want to read this
article for more on this, specifically the point 5.
http://www.websiteoptimization.com/speed/tweak/parallel/
Provided the bottleneck is definitely the server, and it's due to process
overhead of spawning forked apache copies. A lightweight server that has less
in it may actually spawn better (try Zeus).
A first list of things to try
- Investigate: iostat, vmstat, top, ps, netstat, xdebug, MySQL Explain,
etc.
- Profile PHP + MySQL; Check that MySQL doesn't have run-away processes
- Check system/application logs for possible hints
- Disk controller (either SCSI or RAID) should favor read over write,
as web servers perform more outputs to clients than inputs; Enterprise-grade
disks
- Check if bigger swap partition
- Sofware RAID instead of HW RAID (why faster?)
- RAID1 (why not RAID0?)
- Optimize filesystem (
disable atime, etc. "The
relatime option just updates times if access time is newer, it is
similar to noatime and does not break some applications which need know
last read time of a particular file")
- Get dual/quad CPU board (eg. Intel Harpertowns) + lotta RAM to avoid
swapping/trashing
- Uninstall any unneeded program (CUPS, etc.)
- Good NIC, set up right (no autosensing); Organize switch/LAN to avoid
collisions; Each www server connected to shared DB server through cross-over
Ethernet?
- Check DNS/mail settings to avoid resolve and other configuration problems
(relaying, etc.)
- If sending lots of mail, consider a dedicated server
- Check total bandwidth at colo (switch, router, etc.)
- In PHP, don't include unnecessarily heavy add-on's
- Resize uploaded pictures when writing to disk, so won't need to do this
with every hit
- Consider load balancing only after checking the above (HAProxy,
UltraMonkey, Linux LVS, etc.)
General/hardware/OS
- The application can be I/O-bound due to low RAM and too many disk access,
process-bound due to too many processes for the RAM available, or CPU-bound
having to do too much computation
- Get a faster disk, or even better, a bunch of disks in RAID in one of
the striping modes
- Check solid-state disks instead of hard-drives
- Mostly you don't need huge CPU power in a *server*, but a fast disk
or better, several disks, and huge amounts of RAM are the key, as is a fast
network: A webserver should never ever have to swap, as swapping increases
the latency of each request. Only in some of the nastier SQL queries
does CPU power get an issue, and those are generally fixed by rewriting
the query or indexing
- MemcacheD ("a high-performance, distributed memory object caching system, generic
in nature, but intended for use in speeding up dynamic web applications
by alleviating database load." It makes your overall site faster by caching
the majority of your database data into a large memory pool.) Note that
MySQL and the OS both cache data - and generally do a better job at a lower
cost than using memcached. Danga Interactive developed memcached to enhance
the speed of LiveJournal.com. A local alternative to Memcached is APC, where
each instance of APC runs on a web server (so there needs to be front-end
to keep track of sessiond ID's to know which web server handled the previous
connection)
- Off-load code to the client side with JavaScript, including AJAX to
fetch data without refreshing the page
- /etc/fstab: /dev/hda5 /data1 ext2
defaults,noatime 1
2
- Hardware is cheap
Apache
- To serve static contents and use less RAM, there are better web servers
like Lihttpd, although Apache, with an event Multi-Processing Module (MPM),
is on par with lighttpd on speed and memory usage -- ie. it's a bit larger
than lighty, but does contain more features
- If the site relies mostly on static files, consider using the mod_cache;
if plenty of RAM is available, consider mod_mem_cache
- Serving static content will have an impact on apache, so take a look
at using mod_expires, or similar. If you're compressing css/js on the fly,
that can be quite cpu-intensive; consider pre-compressing it and serving
up the .css.gz directly- mod_gzip/mod_deflate might help
- Look at whether a caching proxy is a possibility for you (squid or apache
has some mods).
- Run apache's ab to get a look at page benchmarks: ab -kc 10 -t 30 http://localhost/
(or you can run this from another host to test the network as well)
- Remove any unneeded modules
- Enable and configure the memory cache for static files. (mod_mem_cache)
and possibly the file cache (mod_disk_cache)
- Client side, make sure you're caching static files like images, js,
css. Apache's mod_expires gives you good control over this in Apache config
- Find a web server than can maintain connections to the DB, so that there's
no need to handle this in the web application
MySQL
- (Caution!) Use Mysql_pconnect() instead of Mysql_connect() to use PHP's
persistent connections. These make life harder for you, but eliminate a
lot of connection/tear-down overhead with the database. You need to
be able to configure your own apache and mysql for this to really work.
(http://php.net/manual/en/features.persistent-connections.php)
Note that this INCREASES resource usage on the server. With persistent
connections, you must have the maximum number of connections *ever* required
allocated *all of the time* - even if no one is using your server. MySQL
persistent connections should not be used except in extreme cases - like
when you're running 100+ connections per second
- If the query turns out to be the problem, then verify that your table
schema has any appropriate indexes and check your query to make sure it
is
optimized. Only after you've done this should you worry about caching
- Check for messy joins or subselects. Consider running one-table SELECTs,
and perform the join in the application
- Run explain on your queries to see if they can be optimized. Check the
slow query log
- Check the write/read ratio to see if write accesses are the bottleneck
- Limits on queries help a good bit. Many queries per page equals a slow page
- If your db connections are expensive, look at sqlrelay
- Use a query cache for your db
- Cache write data in RAM, save the log on disk, and commit writes to
the DB later through a batch script
- If your application and database share the same server, it’s faster
to use a socket (instead of TCP/IP). You can accomplish this by specifying
the socket manually, or simply connecting to the named “localhost” instead
of 127.0.0.1 will achieve the same effect
- Look
into a multiple master MySQL setup (Oracle is better at this). Have a look
at http://highscalability.com/
and High Performance MySQL
(replication isn't a great option for scaling writes.)
- Other things to try:
- Profile the application: After you checked that the application
is not CPU-bound but rather I/O-bound, check average Query per second
(qps) on average and at peak load (look for high concurrency accessing
the database)
- Check ratio between read (SELECT) and write (INSERT, UPDATE, DELETE)
- Check if some specific queries are particularly heavy to run, whether
they use indexes properly
- Check which storage engine MySQL is using, and whether another engine
would be more appropriate
- Generally speaking, it's easier to scale up, ie. beef up the master
server, than to scale out, ie. spreading the write load over multiple
servers either by locating databases or tables on different hosts or
using sharding,
but this is a nasty solution if you need to do joins, etc. Before going
this way, check whether it is a problem if different users see different
data because the DB server from which their web server is reading data
isn't yet up to date
PHP
- On a test server, run the application through a profiler to see what portions of the code are causing
the slowdown (eg. echo() the microtime() after each significant
logic block, but you can also use Xdebug
for this to check, and Kcachegrind (or wincachegrind) to look at what the
code is doing and if it can be optimized. Other profilers are Benchmark
and Advanced PHP Debugger a.k.a. APD
(both from PEAR), and DBG. You should
use those tools on a test server on which you'll have dumped the database
and the PHP scripts; If you can't and really have to install a profiler
on the production server, a useful little trick to minimize the impact of
the profiler is to activate profiling only for page requests coming
from your specific IP address
- Don't generate images dynamically, and move static images off the box
- If the page does not need to be dynamic (ie: something new
every load), then consider caching the entire page as a static version of
itself for a period of time using a templating engine like Smarty, Savant,
CakePHP, etc.
Take also a look at APC
("Alternate PHP Cache"),
eAccelerator (Turckl MMCache is no
longer maintained, and was forked into eAccelerator), Ioncube,
Zend Encoder,
Xcache, etc. APC offers an
easy gain, but it also gives you the ability to cache values into shared
memory, you could consider using this to store frequently accessed data
(file or database) that doesn't change very often. You might experience
Apache crashes while using eAccelerator, APC or Zend Optimizer, so be sure
to also run some kind of watchdog that will restart httpd if (when) the
stack crashes
- Use an array and implode it instead of appending to strings, especially
in a loop
- Look closely at all loops and see if they can be cleaned up
- PHP applications often perform processing tasks on every page-load that
really only need to be processed periodically. Things like expiring user
sessions, rendering BBCode, pruning inactive users, etc. Take all of those
periodic tasks and put them into a cron job, running it just once each minute.
- Benchmark pages and functions using eg. apd to help identify the bottlenecks
and frequently called functions
- Calcution-heavy routines can be rewritten in C
- Share variables with MemcacheD or Redis
- To find out when you can no longer scale up (more RAM, faster CPU, multi-core
CPU's, faster disks, RAID disks) and need to move from a single master-multiple
slaves setup to handle more write queries:
- SHOW GLOBAL STATUS LIKE 'Com%'
- SHOW VARIABLES LIKE '%buffer%'
- slowlog : SHOW TABLE STATUS; SHOW CREATE TABLE foo \G; EXPLAIN SELECT
...
- Places to ask for help: DevShed,
PHPFreaks,
Resources
Notes from "Apache Cookbook"
Notes from "Apache Server Bible"
Notes from "Essential System Administration"
The most important system resources from a performance perspective are CPU,
memory, and disk and network I/O, although sometimes other device I/O can also
be relevant.
Put simply, when you don't have enough of something, there are only a few
options: get more, use less, eliminate inefficiency and waste to make the most
of what you have, or ration what you have.
Tuning tools in Linux: files under /proc/sys
uptime, top: The load average is a rough measure of CPU use. These three figures report the
average number of processes active during the last minute, the last five
minutes, and the last 15 minutes. High load averages usually mean that the
system is being used heavily and the response time is correspondingly slow.
ps aux: produces a report summarizing execution statistics for current processes.
pstree: displays system processes in a tree-like structure, and it is accordingly useful
for illuminating the relationships between processes and for a quick, pictorial
snapshot of what is running on the system.
vmstat interval count: information processes, CPU, memory, disk
iostat interval: show current disk usage as the number of transfers/sec (tps) and MB/sec.
Notes from "Building Scalable Web Sites" By Cal Henderson
Application =
- data
- business logic (the business logic lives, by convention, in .inc
files outside of the web root)
- page/interaction logic (The page logic resides in .php files under the
web root)
- markup layer (Smarty, Savant,
CakePHP)
- presentation (CSS)
Tips to build scalable web apps:
- Separate logic code from
markup code
- Split markup code into one
file per page
- Switch to a templating
system, and split static/dynamic parts that make up a page; Use CSS
- Separate page logic from business logic
- Keep hardware in mind, as it influences the whole architecture
- "Premature optimization is the root of all evil"... but make
sure you don't make obvious, major architecture errors
Free bugtrackers:
When dealing with the filtering of incoming data, the best approach is to group
it into three categories: good, valid, and invalid.
Good data is the kind of data you expect and
want. Valid data is the kind of data that your
application can process, store, and manipulate, but that might not make
contextual sense. Invalid data, finally, is data
that breaks some element of your application's storage, processing, or output:
- Filtering UTF-8
- Filtering Control Characters
- Filtering HTML
- Cross-Site Scripting (XSS)
- SQL Injection Attacks
CPU
CPU processing speed, contrary to popular belief, is almost never a bottleneck
in web applications. To get an overview of what's eating your CPU time, you can run top on
Unix machines. When the load average exceeds the number of processors in a box, there are
processes in the queue waiting to run. If you find that you're spending CPU time inside your own application
components, then you'll need to drill down to the next level and figure out
exactly where within your code the time is being spent.
For PHP programmers, the open source Xdebug suite of tools (
http://xdebug.org/) includes a
powerful code profiler. After installing the Xdebug extension, you can call the
xdebug_start_profiling( ) function to start profiling the running
script. Once you reach the end of your script, you can call
xdebug_dump_function_trace( ) to dump out a table of profiling data in
HTML (assuming you're running from within
mod_php). By adding a couple of lines to our
php.ini or
.htaccess files, we can enable transparent
profiling. Every time a script on the server is executed, the code is profiled
and the results are saved to disk.
PHP doesn't have any built-in method for saving the compiled version of a
script, so we need extra tools to accomplish this. These tools are called opcode
caches because they compile the source code down to opcodes (virtual code
that the PHP runtime engine can execute) and cache the opcode version of the
script.
I/O
The portion of time that we spend actually within the dynamic web serving layer
can be bound by CPU, I/O, and context switching. In general, the bottleneck in
systems using a data store
layer is I/O of some kind (network, disk, memory, etc.),
although given enough I/O bandwidth, you'll start to see CPU-bound processes or
memory limits, depending on the size of your working set. The serving of static content is very rarely CPU bound, but is often I/O bound
(and sometimes incurs context-switching overhead). Once we run out of memory and start to swap, our previously fast memory I/O and
CPU-bound operations become disk I/O bound. Although running out of memory
manifests itself as a disk I/O problem (assuming you have swap enabled), it can
be useful to treat it as a separate problem. Deciding carefully what data to
keep in memory can drastically reduce your memory usage and swapping.
External Services
Solving the capacity question for external services depends very much on the
nature of the service. Some services can be easily scaled out horizontally where
multiple nodes are completely independent. Some services are designed for
horizontal scaling, with infrastructure for node communication built in. Some
services just won't scale. Unfortunately, if the latter is the case, then
there's just not a lot you can do.
MySQL has an option to create a log of all the queries it runs. If we turn on
this log for a while, then we can get a
snapshot of the queries produced by real production traffic.
MySQL
- The fewer indexes the
better
- Keep your most narrowing fields
on the left of the key
- Avoid file sorts, temporary
tables, and table scans
Scaling in a Nutshell
- Design components that can scale linearly by adding more hardware.
- If you can't scale linearly, figure out the return for each piece of
hardware added.
- Load balance requests between clusters of components.
- Take into redundancy account as a percentage of your platform, rather
than a fixed number.
- Design your components to be fault-tolerant and easy to recover.
- Federate large datasets into fixed-size chunks.
Using templates
- "There are very simple tests to see if template system separates
programming code from the html well. If you can not change behaviour
of your app without editing the files that contain html, than they are not
sparated well. If you can not change the some parts of html template without
editing the file that contains programming constructs like if's, loops etc,
they are not separated well."
- "Write function or class libraries to encapsulate your business
logic. These should be placed outside the document root for security (if
php gets turned off no-one can download the source)."
Why use an application server vs. PHP?
- Even when using a mod_ where the interpreter is compiled into the web
server and is launched at boot time, a page-based application like PHP is
usually slower than an application server, since the latter doesn't need
to rerun/reparse the code with each new connection
- Retrieved data can cached more efficiently
- An application server like CherryPy doesn't tie you to a single templating
style: CP
starts with no template and lets you add one, PHP starts with a
template and lets you replace it
- An application server scales better: PHP is much easier for smaller apps,
and so many web apps are developed incrementally
- Why should I use eg. CherryPy instead of mod_python? Better API
More information: http://www.google.com/search?q=servlets+versus+cgi,
including:
- Servlets vs CGI
- Servlets and JSP: An Overview
- PHP versus J2EE
- The PHP Scalability Myth
- Why
PHP Scales - A Cranky, Snarky Answer
- CherryPy for CGI programmers
- http://en.wikipedia.org/wiki/FastCGI
- Zope
for the Perl/CGI programmer
- A
Comparison of Portable Dynamic Web Content Technologies for the Apache Server
- Introducing mod_python
- How to
use Django with FastCGI
- Introducing WSGI: Python's Secret
Web Weapon
- Python
Web Programming: Web Application Frameworks
- Five
Minutes to a Python CGI
- (From A
Comparison of Portable Dynamic Web Content Technologies for the Apache Server):
"One major shortcoming of PHP and alikes is that the developer is forced
to think in web pages: The application starts when a web page is requested,
and it terminates as soon as the web page has been delivered. But you don't
really want to design applications around the user interface; the focus
of the design should be on the task the application is supposed to do.
Having
a permanently running process makes it easier for the developer to design
the application in a more structured way, because he has a clear separation
between the program doing the actual work and the mark-up of the results
of that work.
But the most important advantage, gained by FastCGI
programs outliving the web page, is a different one: You don't lose your
context. The application does not need to read its configuration file again
and again, it doesn't have to open a connection to the database every time
a request comes in, and it certainly doesn't need to pass a myriad of different
variables from page to page just because all variables are gone by the time
the user requests the next page. One can implement efficient caching, share
results between requests, etc."
- "as great as mod_python is, there are lots of restrictions and
limitations to what youc an do with it because of limitations of apache
itself, and I am refereing to apache 2.x as well as 1.x, like others are
saying if you don't need apache specific things it will just be one more
thing to work around the design constraints it causes, twisted will be you
best bet"
- "there are lots of things you can't do or can't do easily or can't
do at efficiently in Apache using python as cgi or as anyone would more
likely assume mod_python. anything that requires any shared state or shared
resources in Apache is next to impossible. Doing similar things in an app
server or network application framework like twisted is trivial."
- "Where the state is served purely by the use of the database, there
is nothing wrong with Apache/mod_python in itself being used. Where the
state also embodies the need to have active agents which operate on that
data, possibly independent of web requests, then a back end application
server which embeds the core functionality of the application would be better.
In doing that though, it doesn't mean that Apache has to be discarded as
the web server, and a web server embeded in the application server be adopted.
Depending on the requirements, you will get a more flexible, more easily
maintained system if a combination of the two technologies is used. "
Resources