Handling 1 Billion requests a week with Symfony2

symfony-server

Some says that Symfony2, as every complex framework, is a slow one. Our answer’s that everything depends on you 😉 In that post, we’ll reveal some software architecture details of the Symfony2 based application running more than 1 000 000 000 requests every week.

Following great community feedback after tweeting…

…we’ve decided to reveal you some more information about how we achieved such performance results.

The article will tell you some insights of the application based on Symfony2 and Redis. There won’t be many low-level internals described, instead we will show you its big picture and Symfony2 features we especially liked while development.

For low-level Symfony2 performance optimization practices, we wrote some dedicated posts – check out our articles from the Mastering Symfony2 Performance series — Internals and Doctrine

At the beginning, some numbers from the described application

Performance statistics from a single application’s node:

  • Symfony2 instance handles 700 req/s with an average response time at 30 ms
  • Varnish – more than 12.000 req/s (achieved during stress test)

Note that, as we describe below, the whole platform consists of many of such nodes 😉

Redis metrics:

  • More than 160.000.000 keys (98% of them are persistent storage!)
  • 89% Hit ratio – that means, that only 11% of transactions goes to the MySQL servers

Stack architecture

Application

The whole traffic goes to the HAProxy which distributes it to the application servers.

In front of the application instances stays Varnish Reverse Proxy.

We keep Varnish in every application’s server to keep high availability – without having a single point of failure (SPOF). Distributing traffic through single Varnish would make it more risky. Having separate Varnish instances makes cache hits lower but we’re OK with that. We needed availability over a performance but as you could see from the numbers, even the performance isn’t a problem 😉

Application’s server configuration:

  • Xeon E5-1620@3.60GHz, 64GB RAM, SATA
  • Apache2 (we even don’t use nginx)
  • PHP 5.4.X running as PHP-FPM, with APC

Data storage

We use Redis and MySQL for storing data. The numbers from them’re also quite big:

  • Redis:
    • 15 000 hits/sec
    • 160 000 000 keys
  • MySQL:
    • over 400 GB of data
    • 300 000 000 records

We use Redis both for persistent storage (for the most used resources) and as a cache layer in front of the MySQL. The ratio of the storage data in comparison to the typical cache is high – we store more than 155.000.000 persistent-type keys and only 5.000.000 cache keys. So in fact you can use Redis as a primary data store 🙂

Redis is configured with a master-slave setup. That way we achieve HA — during an outage we’re able to quickly switch master node with one of a slave ones. It’s also needed for making some administrative tasks like making upgrades. While upgrading nodes we can elect new master and than upgrade the previous one, at the end switch them again.

We’re still waiting for production-ready Redis Cluster which will give features like automatic-failover (and even manual failover which is great for e.g. upgrading nodes). Unfortunately there isn’t any official release date given.

MySQL is mostly used as a third-tier cache layer (Varnish > Redis > MySQL) for non-expiring resources. All tables are InnoDB and most queries are simple SELECT ... WHERE 'id'={ID} which return single result. We haven’t noticed any performance problems with such setup yet.

In contrast to the Redis setup, MySQL is running in a master-master configuration which besides of High Availability gives us better write performance (that’s not a problem in Redis as you likely won’t be able to exhaust its performance capabilities 😉 )

Application's Architecture

Application’s Architecture

Symfony2 features

Symfony2 comes out-of-box with some great features that facilitate development process. We’ll show you what our developers like the most…

Annotations

We use Symfony2 Standard Distribution with annotations:

  • Routing – @Route for defining URLs of the application – we had also tested dumping routing rules to Apache but it didn’t result with any major optimizations
  • Service Container – we define our DI Container using @Service’s annotations from the great JMSDiExtraBundle – that speeds up development and allows to handle such services’ definitions within a PHP code, we find it more readable

As the application serves as a REST API, we mainly don’t use Templating (like Twig). We keep it only for some internal dashboard panels.

We haven’t seen any performance impact in comparison to the different types of configuration (YAML/XML). It’s nothing strange as every annotation is nicely cached — at the end everything goes to the pure PHP code.

Take a look at sample service configuration which we achieve using JMSDiExtraBundle:

/**
 * Constructor uses JMSDiExtraBundle for dependencies injection.
 * 
 * @InjectParams({
 *      "em"         = @Inject("doctrine.orm.entity_manager"),
 *      "security"   = @Inject("security.context")
 * })
 */
function __construct(EntityManager $em, SecurityContext $security) {
    $this->em = $em;
    $this->security = $security;
}

That way, changing class dependencies requires only changes in-code.

Symfony2 monitoring – Monolog, Stopwatch

The application strongly uses Monolog to log every unexpected behaviors and to let catch if anything goes wrong. We’re using multiple channels to have separated logs from different application’s modules.

We stopped to use the FingersCrossed handler as it comes with bigger memory usage (could lead to memory leaks). Instead we simply use StreamHandler with appropriate verbosity level. That way we have to add verbose, additional context to the single log line.

We also use the Stopwatch Component in many places to have a control over some characteristic application’s methods. That allows to nicely spot weak points in some of the bigger parts of custom logic.

For example, we’re tracking times of requests to some external Webservices:

if (null !== $this->stopwatch) {
    $this->stopwatch->start('my_webservice', 'request');
}

// Makes a CURL request to some my_webservice
$response = $this->request($args);

if (null !== $this->stopwatch) {
    $this->stopwatch->stop('my_webservice');
}

Console Component

Under the development and maintenance, we especially liked Symfony Console Component which provides nice Object Oriented interface for creating CLI tools. About 50% of the new features added to the application, base on developing CLI commands, mostly administrative ones or for analyzing internals of the application.

Console Component takes care of properly handling of command’s arguments or options – you can set default values or which one are optional or required. The good practice is to always properly document them in a code – you can set main description of a command and for options. That way, commands are mostly self-documenting, as adding --help option outputs nicely formatted description of a command.

$ php app/console octivi:test-command --help
Usage:
 octivi:test-command [-l|--limit[="..."]] [-o|--offset[="..."]] table

Arguments:
 table                 Database table to process

Options:
 --limit (-l)          Limit per SQL query. (default: 10)
 --offset (-o)         Offset for the first statement(default: 0)

One must remember to always run commands with explicitly set environment. The default one is dev which can cause some problems with e.g. memory leaks (because of more verbose logs collecting and storing some debugging information).

$ php app/console octivi:test-command --env=prod

To still have better verbosity just add -v option

$ php app/console octivi:test-command --env=prod -vvv

Nice eye-candy can be a Progress Bar helper that adds… a progress bar 😉 It even takes into consideration verbosity level, so when it’s set to low – only some basic information will be outputed, but with higher level, you can find out elapsed time and even memory consumption.

Btw. we had some long migration processes which had been running for about ~2 days — 0 memory leaks — without a progress bar, monitoring them would be a nightmare.

Data layer

For the Redis, we’re using PredisBundle.

We totally rejected the Doctrine ORM as it would add an overhead and we just don’t need any advanced Object-style manipulations. Instead we use pure Doctrine DBAL with its features:

  • Query Builder
  • Prepared statements

Using PredisBundle and Doctrine’s Bundle also allows us to monitor weak queries as we’re extensively using Profiler Toolbar.

Summary

Such setup allows to keep High Performance and Availability while, thanks to Symfony2, still having nice development environment – maintainable and stable. In fact that’re key business needs for such application which serves as a mission-critical subsystem for one of an eCommerce website.

So at the end of the article we can demystify some of the biggest myths:

  • You can’t use Redis as a primary store – as we shown above, of course you can! It’s already very stable and mature technology which with some persistence mechanisms won’t loose any of your critical data.
  • Symfony2 is so features-rich that it must be slow – when you won’t use some of the most time/memory consuming tools like ORM you can achieve similar performance to the microframeworks like Silex (yep, we tested it 🙂 ).

Photo by Jared Tarbell

Looking to scale-out your
web application?

Hire Octivi!

Antoni is a Software Architect and Scrum Master at Octivi. He is responsible for software architecture of our key projects, he also holds Professional Scrum Master certificate.

  • Konrad Podgórski

    Nice app!

    Removing not used bundles from registerBundles is fastest way to increase app performance.

    Do you connect to MySQL through some other service?

    my minimal required stack to serve json rest api is

    $bundles = array(
    new SymfonyBundleFrameworkBundleFrameworkBundle(),
    new SymfonyBundleMonologBundleMonologBundle(),
    new Acme/TopSecretBundle(),
    );

    Also migrating it to 5.5 and opcache would give you another nice boost, however it might be not worth it, decreasing time from 30 to 25ms is a nice percentage boost but it’s still only 5ms faster 🙂

    • Hi Konrad!

      What do you mean by cli env? just another env like prod or dev?

      • Konrad Podgórski

        Yes, “cli” is just an example name.

        I didn’t do any testing how big performance gain is achieved by removing 1-2 bundles but it make sense that there should be some.

        • Got it, thanks!

        • gggeek

          Indeed I often find myself wanting different configuration for cli scripts and web pages.
          To avoid using a “cli” environment, I tried tricks with bundles loading their config, but then the values would get cached and strange things would happen (cli settings used in web or viceversa)., so I reluctantly adopted the practice.
          My main qualm is when you have 3 environments already (dev, test, prod), you end up with 6 just to keep a couple of different configs

          • Yep we’ve also tested configs with additional environments like “api” and “cli” but as you say – you end with quite complicated setup with 4 additional ymls. But if you do it smart, you still have one major config where you keep common settings, so additional ymls stays almost empty.

        • Interesting idea! Could you provide us with some benchmarks so we can have a clear idea on how this (removing unused bundles and using a cli env) can impact performances?

  • lsmith77

    Have you tried JMSAopBundle to move things like logging and stopwatch out of the controller code?

    • We’re thinking about it (and in fact using AOP in other projects, like Security) but in that one, logging and stopwatch resides in more generic (reusable) classes, not exactly associated with Symfony2 bundles – we aren’t using such components in controller classes.

  • Usama Ahmed

    I support your decision of rejecting Doctrine ORM. I don’t think any php ORM can stand infront of huge data.

    • I think it’s mostly not about volume of stored data but performance requirements for retrieving >single< records – ORM adds big overhead while hydrating objects (deserialization). If you accept such overhead you can use Doctrine ORM even when you have +200 million records – ORM performance will be the same as if you have 1000 🙂

      • Ziad Jammal

        Do you have any numbers by how much was the orm usage slower?

  • How many application servers maintain whole infrastructure? If it’s not a secret of course.

    • Hmmh let’s say more than 2 ;-))

      • Okay, good answer 🙂 But have you achieved horizontal scalability with such architecture? (ok, almost horizontal to be precise I suppose)

        • Yeah, Like linear one 😉
          It isn’t a problem to add next application’s server to the HAPRoxy, so scalability is ensured. And of course if one haproxy wont be enough you can add next one and make LB on a dns server.

  • Bruno Seixas

    Great article =) Thanks for sharing and the tips.

  • Nice article, lol @ apache comment 😉

  • Also the question: how do you guys do validation? With standard component or manually?

  • sandip

    thanks for sharing. I have question that why did you choose Predis over phpredis?
    And for redis persistence which method you used between rdb file and AOF?

    • JoshWorden

      For Predis vs phpredis the likely answer is performance. phpredis is a PHP code implementation of a redis driver, whereas Predis is a PHP extension (therefore in C, and faster than any PHP implementation could be).

      • sandip

        i think, you said exactly opposite statement. anyways I have seen library its having support for both predis and phpredis.
        Also can you help with redis persistence option.

        • JoshWorden

          Sorry, you are correct. phpredis is the extension and predis is the library. Same reasoning though 🙂

          We aren’t using redis as a persistent store; or rather we don’t care whether any data stored in redis is retained. We’re using it like memcache, really. We went with RDB.

        • We’re using both AOF and RDB on master node and only RDB on slaves.
          We’ve used Predis as in our internal performance tests we didn’t notice much gains. We also try to avoid any not neccesery 3rd party extensions to simplify development environements and keep better stability and maintanability.
          Anyway, If there’ll be any performance problems we can easily change to phpredis 🙂

      • MistiDFox

        I just got paid
        ——————————————————-

        OPEN THIS LINK–>>­OPEN NEXT TAB FOR MORE INFO AND HELP

  • 0_amit_saxena_0

    You can use Redis Sentinel for redis HA and auto failover: http://redis.io/topics/sentinel

    It was designed for that purpose. Redis cluster is more of a distributed redis. Sentinel was prioritized over cluster due to the reasons described here:
    https://groups.google.com/d/msg/redis-db/KK7LW0dBD5Q/mJQD5g58TsMJ

  • ShuichiAy

    Your app is mainly fast because there is no twig ^^
    Twig would add some 100 of ms , I guarantee it

    • Bart

      I fail to see how twig would make any noticeable impact on a website that makes proper use of a reverse proxy cache. If you have to fetch from the back-end in only 1% of the cases the difference won’t be noticeable.

      Also, twig templates compile to pure PHP. The overhead is minimal as is.

  • jverdeyen

    Which settings are you using to speed up development page loads? I’m also using JMSDiExtraBundle, it feels slower when using annotations to inject services. Are you developing on a Vagrant VM? I’m curious about your development setup. Thanks for sharing this article/knowledge!

  • Reynier Pérez Mira

    Could you explain more about “Requests are handled by a HAProxy load balancer which distributes them to Varnish reverse proxies” as for example which tools was used for HAProxy load balancer (mod_proxy from Apache, any other?) and some tips on configuration if it’s possible?

  • At this point after your experience with Redis, why are you still keeping MySQL in the loop? Does it “feel safer” somehow than trusting Redis as your system of record? I’m in a similar situation with an earlier stage system and I’ll be honest, I’m leaning towards just leaving Redis in charge but there’s a nagging voice telling me to keep a data set in MySQL as well – for tradition, maybe?

  • Great Article. I was wondering what you use for storage?

  • Chris

    hi, great post! you mention only using Doctrine DBAL and removing ORM, how did you go about this? thanks

    • $container->get(‘database_connection’); ?

      • Chris

        ok, I read that but I thought they actually removed the ORM bit so less code was being loaded?

        • Actually doctrine-dbal is separate project, so It’s easy to use it as stand alone package.

  • Kamil Endruszkiewicz

    Handling 700 request/s is not amazing on such machine – http://www.techempower.com/benchmarks/#section=data-r9&hw=peak&test=db using not Redis, but mysql as a backend. With such workcase you should have 50000-70000 req/s (look for go/java). You are using symfony as a builder for key. It’s slow, resource hungry beast.

  • Harry

    So basically, what you have done has nothing to do with Symfony per se. You’re not using doctrine ORM but DBAL ( for which you lost some of the features but you said you don’t need them anyway), you don’t use rendering ( not templating ). So basically it’s a very stripped version of SF framework. I’m quite sure you would be able to achieve the same performance using any other modern framework – ZF2, Yii2, Laravel…
    In any case – kudos for architecture.

    • Fakeer

      Stripping it down to nothing makes it fast. Enough said.

    • Symfony doesn’t have to do anything with Twig or Doctrine ORM. Just because they come with Symfony Standard Edition, it doesn’t mean they’re part of the Symfony core.

  • trgadmin2015

    Very interested in how you went around Doctrine ORM, any details on this?

  • Alex

    Your network diagram implies more than one server as a front facing web server, yet you only give the specs implying ONE server. Which is it? … one or many servers (web servers) ?

    • We’re using always at least two servers to achieve redundancy and remove Single Point of Failure (SPoF).

      • Alex

        Thanks for the reply, so it’s 350req/s per server or 700 req/s per server?

        I don’t know how your monitoring works – whether it’s an overview of the entire app or just a single server/node?