- How does my web server handle such 100 simultaneous requests? Will web server generate one process/thread for each request? (if yes, process or thread?)
Depends on the webserver (and sometimes configuration of such).Apache has both threads and processes for handling requests. A description of various models:
- Apache with mpm_prefork (default on unix): Process per request. To minimize startup time, Apache keeps a pool of idle processes waiting to handle new requests (which you configure the size of). When a new request comes in, the master process delegates it to an available worker, otherwise spawns up a new one. If 100 requests came in, unless you had 100 idle workers, some forking would need to be done to handle the load. If the number of idle processes exceeds the MaxSpare value, some will be reaped after finishing requests until there are only so many idle processes.
- Apache with mpm_event, mpm_worker, mpm_winnt: Thread per request. Similarly, apache keeps a pool of idle threads in most situations, also configurable. (A small detail, but functionally the same: mpm_worker runs several processes, each of which is multi-threaded).
- Nginx/Lighttpd: These are lightweight event-based servers which use select()/epoll()/poll() to multiplex a number of sockets without needing multiple threads or processes. Through very careful coding and use of non-blocking APIs, they can scale to thousands of simultaneous requests on commodity hardware, provided available bandwidth and correctly configured file-descriptor limits. The caveat is that implementing traditional embedded scripting languages is almost impossible within the server context, this would negate most of the benefits. Both support FastCGI however for external scripting languages.
- How does the interpreter of the backend language do? How will it handle the request and generate the proper html? Will the interpreter generate a process/thread for each request?(if yes, process or thread?)
Depends on the language, or in some languages, on which deployment model you use. Some server configurations only allow certain deployment models.
- Apache mod_php, mod_perl, mod_python: These modules run a separate interpreter for each apache worker. Most of these cannot work with mpm_worker very well (due to various issues with threadsafety in client code), thus they are mostly limited to forking models. That means that for each apache process, you have a php/perl/python interpreter running inside. This severely increases memory footprint: if a given apache worker would normally take about 4MB of memory on your system, one with PHP may take 15mb and one with Python may take 20-40MB for an average application. Some of this will be shared memory between processes, but in general, these models are very difficult to scale very large.
- If the interpreter will generate a process/thread for each request, how about these processes(threads)? Will they share some code space? Will they communicate with each other? How to handle the global variables in the backend codes? Or they are independent processes(threads)? How long is the duration of the process/thread? Will they be destroyed when the request is handled and the response is returned?
We will split the question based on hosting models for where this is applicable.
- mod_php, mod_python, mod_perl: Only the C libraries of your application will generally be shared at all between apache workers. This is because apache forks first, then loads up your dynamic code (which due to subtleties, is mostly not able to use shared pages). Interpreters do not communicate with each other within this model. No global variables are generally shared. In the case of mod_python, you can have globals stay between requests within a process, but not across processes. This can lead to some very weird behaviours (browsers rarely keep the same connection forever, and most open several to a given website) so be very careful with how you use globals. Use something like memcached or a database or files for things like session storage and other cache bits that need to be shared.
- FastCGI/SCGI/AJP/Proxied HTTP: Because your application is essentially a server in and of itself, this depends on the language the server is written in (usually the same language as your code, but not always) and various factors. For example, most Java deployment use a thread-per-request. Python and its “flup” FastCGI library can run in either prefork or threaded mode, but since Python and its GIL are limiting, you will likely get the best performance from prefork.
- mod_wsgi/passenger: mod_wsgi in server mode can be configured how it handles things, but I would recommend you give it a fixed number of processes. You want to keep your python code in memory, spun up and ready to go. This is the best approach to keeping latency predictable and low.
In almost all models mentioned above, the lifetime of a process/thread is longer than a single request. Most setups follow some variation on the apache model: Keep some spare workers around, spawn up more when needed, reap when there are too many, based on a few configurable limits. Most of these setups -do not- destroy a process after a request, though some may clear out the application code (such as in the case of PHP fastcgi).
- Suppose the web server can only support 100 simultaneous requests, but now it got 1000 simultaneous requests. How does it handle such situation? Will it handle them like a queue and handle the request when the server is available? Or other approaches?
If you say “the web server can only handle 100 requests” it depends on whether you mean the actual webserver itself or the dynamic portion of the webserver. There is also a difference between actual and functional limits.
In the case of Apache for example, you will configure a maximum number of workers (connections). If this number of connections was 100 and was reached, no more connections will be accepted by apache until someone disconnects. With keep-alive enabled, those 100 connections may stay open for a long time, much longer than a single request, and those other 900 people waiting on requests will probably time out.
If you do have limits high enough, you can accept all those users. Even with the most lightweight apache however, the cost is about 2-3mb per worker, so with apache alone you might be talking 3gb+ of memory just to handle the connections, not to mention other possibly limited OS resources like process ids, file descriptors, and buffers, and this is before considering your application code.
For lighttpd/Nginx, they can handle a large number of connections (thousands) in a tiny memory footprint, often just a few megs per thousand connections (depends on factors like buffers and how async IO apis are set up). If we go on the assumption most your connections are keep-alive and 80% (or more) idle, this is very good, as you are not wasting dynamic process time or a whole lot of memory.
In any external hosted model (mod_wsgi/fastcgi/ajp/proxied http), say you only have 10 workers and 1000 users make a request, your webserver will queue up the requests to your dynamic workers. This is ideal: if your requests return quickly you can keep handling a much larger user load without needing more workers. Usually the premium is memory or DB connections, and by queueing you can serve a lot more users with the same resources, rather than denying some users.
Be careful: say you have one page which builds a report or does a search and takes several seconds, and a whole lot of users tie up workers with this: someone wanting to load your front page may be queued for a few seconds while all those long-running requests complete. Alternatives are using a separate pool of workers to handle URLs to your reporting app section, or doing reporting separately (like in a background job) and then polling its completion later. Lots of options there, but require you to put some thought into your application