Do not miss other articles on this subject:
Benchmarks: Truth and Lies
So we have a language+engine combination that, at its root, is faster. However we want to build website & webapp, so for this particular task, which one is faster?
Here it will be VERY difficult to come with a meaningful benchmark, because of the PHP paradigm's very nature. PHP's performance are tied with the actual web-server (Apache? NginX? Lighttpd?), how it interacts with it (mod_php? cgi? fast-cgi? php-fpm?), and the opcode cache (none? APC? OPcache?).
You can find benchmark all over the internet comparing PHP & Node.js, but be sure that there are all biased, if not deliberatly lying in favor of the author's language of choice.
You will find that most of time, there are in favor of Node.js... but truth should be told: an "Hello World" benchmark comparing Node.js vs Apache Prefork MPM + mod_php is just bottomless stupidity... And sadly that's what you will get if you google "php nodejs benchmark". At least give PHP a chance to shine!
Hey! If you are one of those awful benchmarks' author, here is a hint about how works a real world PHP stack: first a pool of NginX front server, then a pool of PHP-FPM server in the back-end. So at least for your glorious benchmark, create 2 VM, one with NginX and one with PHP-FPM... Seriously: Apache Prefork MPM *alone* (serving a static "Hello world" HTML) is already slower than Node.js! Do you even know what your benchmarks are measuring?
Beyond Benchmarks: Some Rationale
-- The benchmark game
But that does not mean that we do not have any clue!
We said the very nature of PHP makes things hard to test, however, this is exactly this way of doing things that leads to a strong belief: the PHP paradigm itself is way slower than the Node.js paradigm.
A typical PHP app works this way:
- the webserver (Apache or NGinX or Lighttpd) receives a client HTTP request for some kind of dynamic content
- the webserver dispatches the request to an idle PHP process
- the PHP process loads the cached opcode of the script
- the PHP process runs the script
- the script runs its startup/init code:
- if a framework is involved, the framework core is loaded and its startup code is run: this is usually the part that hurts performance the most
- the PHP script may load session data attached to the requesting user/client, involving a memcached or a database round-trip
- some application-specific startup may eventually run
- the script processes the request, and eventually send content back to the webserver
- the webserver forwards the response to the client
- the PHP process is recycled, waiting for the next request
A request performed by PHP is more or less like launching a program just for a request, then terminate the program when the client is served, launch again the program for the next request, terminate it, and so on.
The process itself can be recycled thanks to PHP-FPM, and most database drivers offer the persistent connection feature so connection get recycled too: that saves a lot of resources. But ultimately, your script get started and terminated for each single request.
It's a fact that the PHP Zend engine spend way more time executing startup code (mostly because of frameworks) than running the actual business logic of the application.
A typical Node.js app works this way:
- optionnal: NginX (used as a load balancer, HTTPS to HTTP bridge, or websocket bridge) receives a request or establish a connection with the client
- then it dispatch the request to one of the Node.js service worker
- the Node.js service processes the request or accepts the client connection (by the way all startup code have already been executed when the service had been launched)
A Node.js service can run days!
In fact there is no particular limit, it all depends on the quality of your code.
If your code is full of bug, then probably the service will crash and will be resurrected again by forever every two or three minutes. If your code is clean enough, you can expect it to serve thousands or millions of clients before exiting cleanly.
So the time V8 processes your startup code is really nothing compared to the time it processes the actual business logic of your application.
On the contrary, for each line of your business logic the Zend engine processes, it processes 10, 20, 50 or 100 lines of your framework startup code.
With that in mind, do the math, and you will understand easily why Node.js is so powerful.
The PHP paradigm is defeated.
PHP is rarely the bottleneck?
Some will argue that PHP is rarely the bottleneck.
That's true, PHP, in the PHP's paradigm, is rarely the bottleneck. I mean, Zend engine is fast enough, not as fast as V8, but it is already pretty fast (you may know that PHP is actually faster than Python). However, that's not PHP/Zend itself that is slow, that's the PHP's way of doing things that is not relevant anymore.
The CGI way is doomed... You can evolve it into Fast-CGI/PHP-FPM, that will improve things, but the main issue remains unsolved.
Most probably, if you turn your PHP code into C++, and port all your framework to C++ as well, you will get slighly better performance, but you will not get the best your hardware can deliver anyway.
The truth is that most of the time, your PHP script is waiting for I/O, and I/O are the most time-consuming things. That IS the reasoning behind “PHP is rarely the bottleneck”.
However, it only applies to web application that are I/O bound (which is the case for at least 95% of the web), but if your application is CPU bound PHP will surely become the bottleneck. Web applications that need some complex data processing, Artificial Intelligence (e.g. board games), file-sharing with a complex layer of user access management, file-synchronization... are applications that are CPU-bound. I should mention here that Node.js has hard-time with CPU-bound tasks too, and some strategies should be employed or the event-loop will be drop dead.
However, being the bottleneck or not is one thing, but that's just one aspect of the performance that matters to us.
That suggests that the time to process a single request is not so much tied to the speed of PHP. But what happened when your application is literally besieged by your clients/users? How much time each request will take? How many request will be dropped?
There is indeed at least one bottleneck: the number of PHP process available.
The Power of Event-driven & non-Blocking I/O
Long-lived services capable of handling thousands of concurrent connections taking advantage of a non-blocking evented I/O system are *THE* way to do it now.
When your service is waiting for a response from the database, PHP is just lazily idling, while Node.js will simply accept another concurrent request, start to process it until a new database round-trip is needed... When the database response is available, the request processing is resumed, and so on. That's why a single PHP process can accept one request at a time while a single Node.js process can accept thousands of them.
Creating a process is pretty damn slow, so the PHP strategy is to create a pool of reusable processes.
However each PHP process has its own share of memory, so spawning thousands of them has a cost. Not to mention that CPU context-switching has a cost too.
With Node.js, you can save a lot of CPU. Just spawn as many Node.js services as CPU core available on your system. Evented non-blocking I/O will do the tricks for you: each client's request will trigger your request callback, it's easy and reliable.
Hanging requests waiting for I/O do not take much memory, it costs far less than spawning processes, there is almost no context switching since everything happens into the same Node.js process and it has a dedicated CPU core (remember: we spawn as many Node.js services as available core).
More interesting: your service can happily eat as much resources as available on your system. Your database back-end is engorged? No problem, Node.js can still accept client's request, while PHP would be limited to the number of process of the pool. With Node.js, there is always room for another request. Whenever a request get its precious I/O, your application can serve it to your client.
However like mentioned earlier, Node.js can have some trouble to be aware of, when your application has many CPU-bound task.
The event-loop MUST keep spinning fast, that's the key for blazing fast web application. If some CPU-bound task kick in without giving control back to the event-loop, the event-loop is blocked. That's it: other lightweight tasks waiting for their I/O will freeze, until the CPU-bound task ends (or at least give control back to the event-loop). Node.js is single-threaded, the tasks don't run in a parallel fashion, tasks are concurrents.
If you have a CPU-bound task, you should run it on another Node.js process. A pool of worker is what you need.
This article explains how to overcome those issues.
There is no real metrics for that, but I think I made enough good points here, so let's cut it out now: Node.js is way faster than PHP.
But we don't want to code CLI programs or 3D games: that's out of the focus of PHP anyway.
So... for the back-end of a Software as a Service, powering a modern web app? Now that's involving the full PHP stack (Nginx, php-fpm, Memcached/Redis, databases, and eventually a PHP framework), not PHP alone.
In my experience, you can expect Node.js to be 5 times, 10 times or 20 times faster. Even more if the PHP team have chosen one of the full-featured frameworks (e.g. Zend or Symfony2) over a micro-framework. Even more if they have chosen one of the slowest framework (e.g. CakePHP, etc).
One may argue that frameworks should not be taken into consideration. However there are some facts: almost any PHP projects uses a framework. Frameworks are part of the PHP ecosystem, they are generally slow, and because of the PHP paradigm, a PHP framework has more impact on performance than a Node.js framework (mainly because of startup code).
Finally PHP can overcome its greatest weakness, I will discuss another day what the PHP of the futur should be, if it want to survive in the long run. But I keep that for one of the next articles on this subject!