The connected visitor

We are working on some interesting connections between user visits to a website and the development of that user’s digital profile.  What I mean is that a visitor to your website generally takes the form of a hit or visit.  But most users are now starting to connect with your website via some form of authentication or single sign on through their social network accounts. This enables us to build intelligence on that user so we can instantly respond to that user’s likes and needs.

Behind the scenes we can use intelligence to join the dots around that user and look to build a digital footprint that enables us to blend content and navigation around the user.  We can also use this intelligence to  understand the user(taking into account their consent) and segment according to their likes and dislikes.

The next generation website will use a combination of big data and connected intelligence to provide a more tailored browsing experience with content that is both more useful and intuitive.

This type of technology plays dividends when built into e-commerce solutions, both in terms of the templates that display product information but also the dashboard technology that enables e-commerce managers to keep track of the performance of their retail operations. It allows them to have a digital nerve centre monitoring all aspects of the commerce site.

The technology to do this exists in the identity layer of the site. We are seeing the development of SasS based identity management providers that can start to provide connected sign on, intelligence and reporting for your digital estate.  We will report more on this in the next few weeks.

So are you interested in how Facebook scales?

If, like me you, are interested in scaling big platforms we’ve been doing some research into exactly how the Facebook techies scale their infrastructure. Some useful techniques if you are using the LAMP technology stack. Interesting reading…

Facebook’s scaling challenge

Before we get into the details, here are a few factoids to give you an idea of the scaling challenge that Facebook has to deal with:

  • Facebook serves 570 billion page views per month (according to Google Ad Planner).
  • There are more photos on Facebook than all other photo sites combined (including sites like Flickr).
  • More than 3 billion photos are uploaded every month.
  • Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images served by Facebook’s CDN.
  • More than 25 billion pieces of content (status updates, comments, etc) are shared every month.
  • Facebook has more than 30,000 servers (and this number is from last year!)

Software that helps Facebook scale

In some ways Facebook is still a LAMP site (kind of), but it has had to change and extend its operation to incorporate a lot of other elements and services, and modify the approach to existing ones.

For example:

  • Facebook still uses PHP, but it has built a compiler for it so it can be turned into native code on its web servers, thus boosting performance.
  • Facebook uses Linux, but has optimized it for its own purposes (especially in terms of network throughput).
  • Facebook uses MySQL, but primarily as a key-value persistent storage, moving joins and logic onto the web servers since optimizations are easier to perform there (on the “other side” of the Memcached layer).

Then there are the custom-written systems, like Haystack, a highly scalable object store used to serve Facebook’s immense amount of photos, or Scribe, a logging system that can operate at the scale of Facebook (which is far from trivial).

But enough of that. Let’s present (some of) the software that Facebook uses to provide us all with the world’s largest social network site.

Memcached

MemcachedMemcached is by now one of the most famous pieces of software on the internet. It’s a distributed memory caching system which Facebook (and a ton of other sites) use as a caching layer between the web servers and MySQL servers (since database access is relatively slow). Through the years, Facebook has made a ton of optimizations to Memcached and the surrounding software (like optimizing the network stack).

Facebook runs thousands of Memcached servers with tens of terabytes of cached data at any one point in time. It is likely the world’s largest Memcached installation.

HipHop for PHP

HipHop for PHPPHP, being a scripting language, is relatively slow when compared to code that runs natively on a server. HipHop converts PHP into C++ code which can then be compiled for better performance. This has allowed Facebook to get much more out of its web servers since Facebook relies heavily on PHP to serve content.

A small team of engineers (initially just three of them) at Facebook spent 18 months developing HipHop, and it is now live in production.

Haystack

Haystack is Facebook’s high-performance photo storage/retrieval system (strictly speaking, Haystack is an object store, so it doesn’t necessarily have to store photos). It has a ton of work to do; there are more than 20 billion uploaded photos on Facebook, and each one is saved in four different resolutions, resulting in more than 80 billion photos.

And it’s not just about being able to handle billions of photos, performance is critical. As we mentioned previously, Facebook serves around 1.2 million photos per second, a number which doesn’t include images served by Facebook’s CDN. That’s a staggering number.

BigPipe

BigPipe is a dynamic web page serving system that Facebook has developed. Facebook uses it to serve each web page in sections (called “pagelets”) for optimal performance.

For example, the chat window is retrieved separately, the news feed is retrieved separately, and so on. These pagelets can be retrieved in parallel, which is where the performance gain comes in, and it also gives users a site that works even if some part of it would be deactivated or broken.

Cassandra

CassandraCassandra is a distributed storage system with no single point of failure. It’s one of the poster children for the NoSQL movement and has been made open source (it’s even become an Apache project). Facebook uses it for its Inbox search.

Other than Facebook, a number of other services use it, for example Digg. We’re even considering some uses for it here at Pingdom.

Scribe

Scribe is a flexible logging system that Facebook uses for a multitude of purposes internally. It’s been built to be able to handle logging at the scale of Facebook, and automatically handles new logging categories as they show up (Facebook has hundreds).

Hadoop and Hive

HadoopHadoop is an open source map-reduce implementation that makes it possible to perform calculations on massive amounts of data. Facebook uses this for data analysis (and as we all know, Facebook has massive amounts of data). Hive originated from within Facebook, and makes it possible to use SQL queries against Hadoop, making it easier for non-programmers to use.

Both Hadoop and Hive are open source (Apache projects) and are used by a number of big services, for example Yahoo and Twitter.

Thrift

Facebook uses several different languages for its different services. PHP is used for the front-end, Erlang is used for Chat, Java and C++ are also used in several places (and perhaps other languages as well). Thrift is an internally developed cross-language framework that ties all of these different languages together, making it possible for them to talk to each other. This has made it much easier for Facebook to keep up its cross-language development.

Facebook has made Thrift open source and support for even more languages has been added.

Varnish

VarnishVarnish is an HTTP accelerator which can act as a load balancer and also cache content which can then be served lightning-fast.

Facebook uses Varnish to serve photos and profile pictures, handling billions of requests every day. Like almost everything Facebook uses, Varnish is open source.

Other things that help Facebook run smoothly

We have mentioned some of the software that makes up Facebook’s system(s) and helps the service scale properly. But handling such a large system is a complex task, so we thought we would list a few more things that Facebook does to keep its service running smoothly.

Gradual releases and dark launches

Facebook has a system they called Gatekeeper that lets them run different code for different sets of users (it basically introduces different conditions in the code base). This lets Facebook do gradual releases of new features, A/B testing, activate certain features only for Facebook employees, etc.

Gatekeeper also lets Facebook do something called “dark launches”, which is to activate elements of a certain feature behind the scenes before it goes live (without users noticing since there will be no corresponding UI elements). This acts as a real-world stress test and helps expose bottlenecks and other problem areas before a feature is officially launched. Dark launches are usually done two weeks before the actual launch.

Profiling of the live system

Facebook carefully monitors its systems (something we here at Pingdom of course approve of), and interestingly enough it also monitors the performance of every single PHP function in the live production environment. This profiling of the live PHP environment is done using an open source tool called XHProf.

Gradual feature disabling for added performance

If Facebook runs into performance issues, there are a large number of levers that let them gradually disable less important features to boost performance of Facebook’s core features.

The things we didn’t mention

We didn’t go much into the hardware side in this article, but of course that is also an important aspect when it comes to scalability. For example, like many other big sites, Facebook uses a CDN to help serve static content. And then of course there is the huge data center Facebook is building in Oregon to help it scale out with even more servers.

And aside from what we have already mentioned, there is of course a ton of other software involved. However, we hope we were able to highlight some of the more interesting choices Facebook has made.

Facebook’s love affair with open source

We can’t complete this article without mentioning how much Facebook likes open source. Or perhaps we should say, “loves”.

Not only is Facebook using (and contributing to) open source software such as Linux, Memcached, MySQL, Hadoop, and many others, it has also made much of its internally developed software available as open source.

Examples of open source projects that originated from inside Facebook include HipHop, Cassandra, Thrift and Scribe. Facebook has also open-sourced Tornado, a high-performance web server framework developed by the team behind FriendFeed (which Facebook bought in August 2009).

(A list of open source software that Facebook is involved with can be found onFacebook’s Open Source page.)

More scaling challenges to come

Facebook has been growing at an incredible pace. Its user base is increasing almost exponentially and is now close to half a billion active users, and who knows what it will be by the end of the year. The site seems to be growing with about 100 million users every six months or so.

Facebook even has a dedicated “growth team” that constantly tries to figure out how to make people use and interact with the site even more.

This rapid growth means that Facebook will keep running into various performance bottlenecks as it’s challenged by more and more page views, searches, uploaded images, status messages, and all the other ways that Facebook users interact with the site and each other.

But this is just a fact of life for a service like Facebook. Facebook’s engineers will keep iterating and coming up with new ways to scale (it’s not just about adding more servers). For example, Facebook’s photo storage system has already been completely rewritten several times as the site has grown.

So, we’ll see what the engineers at Facebook come up with next. We bet it’s something interesting. After all, they are scaling a mountain that most of us can only dream of; a site with more users than most countries. When you do that, you better get creative.

Building a viable e-commerce business model

For the last week I’ve spent my time on helping produce a viable e-commerce business model using latest enterprise class commerce platforms, a well informed seo and media strategy and a decent runway to break even. This is not the first time may I add, but it does take into account some new challenges.

The saas world, e-commerce 2, global markets and the improved use of pay per click have made it even more a numbers game to make the business model work. Obviously your product needs to be good, but if it is and you have the right partners in place and the right technology it is simply a case of playing the numbers to get a return on investment? Sounds easy? Not quiet that easy. Its a fine art, where you need to apply your skills and know how to finally tune your system to get the most out of it. Its only through experience and through having the right partners can you guarantee the numbers game will work.

To get it right depends on your conversion path and this starts by having a coordinated approach between brand, traditional advertising, pay per click, social channels and traditional product channels. All need to be identified and a strategy developed to create a buying conveyor belt to your e-commerce cart. It doesn’t start with just your site its starts a lot earlier.

Choosing the right platform and partner to route this conveyor belt through the buying process will lead to successful conversion.

The platform needs to be able to take the feeds of potential customers from all these conveyor belts. It then needs to show case the product well, display options and link to other products to engage the user and stimulate their buying emotions. This is where saas platforms that have one model to fit all fail to deliver. They fail to capture the channels and trigger the buying responses. A wholly owned platform tailored to your product, brand and customer will win hands down when it comes to maximising conversion. It requires multi channelled approach to commerce and there are only a few platforms out there that do this well.

So if you are looking to play the numbers conversion game you need to be in full control of the whole engine from advertising spend, ppc, seo to platform and design. You need to control it all in order to fine tune the animal. This I believe is critical to achieving your business plan. However think SasS when you implement.  Build the As A Service element for your global markets but own the technology yourself.

Can social really predict the future?

With the London mayoral contest being accurately predicted by social network sentiment analysis, should business really start to rely on the intelligence collected from social networks to help plan their business models?

I for sometime have taken a balanced view of key trends emerging from social networks to predict what’s going to be hot in the technology space. Most of the time its been accurate even though most of the time its been predicting the rise of social networks.

The key thing though is blending the intelligence gathered with authentic sources to get the balance right.

Analysis of intelligence from the social graph can help inform your thinking, but if you are doing something new you need to lead the pack. You could follow Steve Job’s mantra and tell your customers what they need. But to do this you need to be brave.

Its important to get the right tools setup to give you the necessary intelligence gathering capability from the social graph and the wider internet so you can bring all the information together in one clear dashboard for you to study and make your own decisions.

Another advantage for using the right tools is to amplify your message to back you up as an authority in your world. Amplify the positive statistics in realtime backs up your messaging. What the social graph can give us is an instant poll of a large targeted constituency without actually polling them. Doing this enables you to get realtime assurance around your decision making process. Social graphs will never make the decision for you but they will give confidence and backing to your ideas.

Splinternet Takes Hold

For the past 15 years, websites have been the prevalent digital modus operandi for businesses, however, all of that changed in 2010 with social, mobile and applications becoming mainstream. As these digital technologies continue to fragment, it has become increasingly difficult for brands to manage and monitor their online presence across multiple markets, let alone understand and act upon the opportunities that are available to them.

‘The Splinternet’ is the term we’ve been using to describe this effect where the internet is being segmented into walled gardens; walled gardens that are being created around users, either through subscriber or device driven communities. The two biggest examples are Facebook and new apps that have been created for the mobile community, such as Flipboard and the many gaming apps that come integrated with something like Apple’s Game Centre. The Splinternet is affecting the internet as we know it, creating a number of challenges for agencies and technology players alike who wish to continue to reach users moving forward.

The following top five considerations are underpinning the strategies and decision making process of technologists and agencies like ourselves that are navigating this new world.

1. Awareness

How can you be made aware of what’s going on in this world? How can you watch, protect and react to your own brand and product challenges? We are seeing a lot of paralysis out there, with large organisations understanding the significance of the impact and growing momentum of social media, but with no clear insight on where to start and what to do. The tools now available and the strength and importance of the communities that are in conversation about your brand should not be ignored or underestimated.

The first part of any organisation’s strategy should be to gain an awareness of what’s going on within these communities. This is not necessarily just about listening, it should be focused on trying to make sense of what’s going on and where, and putting the right framework in place to monitor this new world without having to watch it constantly. Creating a model that monitors and makes sense of the Splinternet is the first step towards easing the paralysis, from which a defined strategy to deal with this new world can be created. Once you know how users are responding, you can create a strategy around them to amplify your reach and begin the process of engagement.

2. Governance

Once you have reached this level of awareness, there is temptation to jump right in. However, you need to be careful as social networks are great for building relationships but they can be even better at ending them – it’s like the Wild West out there. If you’re good – or lucky – you can prosper, but it’s essential to create a framework to facilitate your engagement in this world.

To do this, look at how you control your brand’s real estate and keep track of what’s being said. How many Facebook passwords do you have and who has them? What happens if they get lost? Who’s keeping track of what’s being published? Is it effective and how do you monitor it? The list of questions is endless. Now multiply them by every territory or company in your group with all those eager social network engagers and you have a growing nightmare.

Governance is beginning to become a consideration, but it is something that needs to be acted on early before it’s too late. For many businesses, policy is not enough and tools are required to ease the transition into social success. Until recently, however, these tools didn’t exist and so to address the awareness and governance challenges, we developed an innovative infrastructure that overcomes both.

3. Penetration and engagement

How can you reach users within this new world and close the gap between domain based investments, such as .com websites that cost a fortune, and a Facebook page set up in minutes? How do you penetrate these walled gardens and reach the users and communities within?

How do we target our advertising in this space and how do we operate our tools in this space? Take Facebook, its pages are one of the fastest growing areas to have come out of the Splinternet era, but how do we take their potential to a new level by building Facebook apps that integrate with existing content or business systems. Once your awareness and governance tools are in place, this has to be the next stage of your thinking and strategy.

It’s a complex process – pages, apps and content all need to be managed – so to discover that there is a tool being developed that facilitates this will be music to many ears.

4. Connecting intelligence (making sense of it all)

The one thing the Splinternet is enabling us to do is connect intelligence. Even through walled gardens, communities are providing a level of information and human behaviour understanding that so far has not been seen on the web. This is resulting in product and brand strategies informed by the likes and dislikes and content generated by the web 3.0 and 4.0 generations.

Whilst crowd surfing has been long talked about, we are now able to listen and connect the intelligence with the cloud to truly target brand messaging. But to really understand the community, you need to understand who the members are and how they behave, how active they are, what the size of their network is and whether they are mobile. Once we understand this, we can interact with them in the most appropriate ways, such as triggering engagement with an upset or happy customer, either automated or through a real customer services representative.

5. Convergence

The Splinternet is also starting to enable what we call device and application convergence. Whilst desktop and mobile will remain the top targeted devices in business, at home it’s a very different story. Smart televisions (TVs) and tablet devices are big game changers that are forcing the convergence of media. For example, the very latest smart TVs are bringing together smart remotes and mobile apps with the TV experience.

Media channels can also be converged with social, with on demand or live TV programming closely integrated on one screen. This convergence provides the most exciting after-effect of the Splinternet, as it’s the biggest opportunity to date for brands to reach into the world of above-the-line targeting. You can find out who your potential customer is in real-time and connect with them instantly whilst they consume a particular media.

So, whilst the Splinternet effect has changed face of digital once again, freeing and restricting us simultaneously, there are steps that every brand can take to make sense of and interact with those in this new world.

Facebook Predictions

With talk of Facebook IPO in the near future and potential rumours about Facebook community showing signs of plateauing what is the likely predictions around where Facebook’s going with its functionality.  This is our guess…

1) Improved chat with audio in video.  With Microsoft’s purchase of Skype, you can see that there is an increased appetite to push video into social applications.  Facebook’s approach to create a central messaging hub with facebook email and chat integrated into one messaging platform, combined with online presence, naturally asks the question where’s video.  We think it will come and it will be a big hit with users.  The biggest current challenge to video is its ease of use.  But now with most laptops embedded with video capability, smart tv’s and tablet devices will mean that video calling will start to grow.  It is one of the natural steps for facebook to make.  There are already a number of Facebook video conferencing apps in the appstore that show the capabilities of combining video capability with social networking.
2) Music and media.  Closer alignment with Spotify shows that the music and media industries naturally fit with social networking, both can benefit from each other.  Spotify integration with Facebook brings together perfect examples of what a users likes and dislikes with their online presence indicating what they are listening too.  Moving forwards we can start to look at streaming of content such as movies and tv programmes.  Facebook’s unique ability to really tune into users makes it a fantastic media platform for the future. Expand this into the living room through Smart Tv’s combined with social usage will make Facebook a two way social media channel platform.
3) Check-in. Facebook officially launched its check-in service last August.  At the moment it updates your status and brings together people that are in the same location as you. But if you look at the potential behind this service to bring together and to connect people with other people in the same location or to enable businesses to target people in a particular place, in realtime, it brings a whole new dimension to this service.  Both facebook and its development partners are starting to get to grips with this capability so we can expect to see more exciting developments in the near future.
4) Social commerce.  One of the obvious beneficiaries from the 600,000 + community of Facebook users, however Facebook is in a difficult space.  The only advantage they can gain from any social commerce activity is via advertising or providing a platform for payment, analytics, user information and storefront functionality.  However Facebook could provide more intelligence to group or individual buying.  We don’t know what Facebook will do in this area but we know what they can do, its just how it will be perceived by the user community if they go too far.  We can expect to see more tools for businesses to provide apps advertising.
5) Facebook intelligence.  Facebook already employs a tonne of intelligent analytics and algorithms to provide search and relationship suggestions.  The recently launched auto photo tagging service starts to utilise facial recognition but where could facebook go if it really starts to make sense of what’s going on in the community?  As we enter web3.0 and web4.0 generations making sense of the web, the community and the conversations will enable platforms like Facebook to add a new levels of usefulness.  Products and brands can find you rather than you searching for them. Information can find you, communities can find you. Facebook with intelligence could be even more powerful than the network already is.  Then apply the concept to knowledge and learning then the network could aid knowledge expansition of individuals within the community.
Who knows where Facebook will go next.  What we do know is that the platform will continue to grow in significance and usefulness to the user community.  There is no significant challenge to Facebook so it really is their battleground to innovate in, lets just hope that through their partners and their own innovations that any future initial public offerings will not get in the way of this social revolution.