Skip to content

rakoo/rproxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RProxy

rproxy was first invented by Andrew Tridgell, following his seminal work on rsync. rsync has become a central piece of software for all unix-like users, allowing everyone to synchronise 2 computers while sending the minimum amount of data on the wire, thanks to simple and smart techniques. If you haven't already, I suggest you read the paper describing how rsync works; everyone can understand it, and everyone can learn from it.

rproxy is an adaptation of the rsync algorithm to the web, in order to reduce overall bandwidth consumption.

How it works

(For a more in-depth description, don't hesitate to read the original project's page)

The idea is to have a pair of proxies between the client and the server.

One of the proxy is located close to the client (by close, I mean that the network is virtually free and fast; think LAN) while the other one is close to the server. Now, instead of having the client talk to the server directly, communication goes from the client to the client rproxy to the server rproxy to the server, and vice versa.

browser -------- client rproxy -------- server rproxy -------- server

(I will now use the term browser instead of simply client, so that one can rapidly grasp the situation)

When the client first requests something from the server, the GET will go through the proxy pair to the server and the response will be forwarded to the browser, and cached in the client rproxy.

rproxy diagram

I use the term client rproxy instead of decoding rproxy and server rproxy instead of encoding rproxy, because encoding/decoding is actually done in the 2 directions so these can be confusing.

When the client re-requests the server for the same resource (1), the client proxy intercepts the request, reads if there is something similar in the cache (2). If there is nothing, everything will happen as before. If there is something, the client rproxy will calculate the rsync signature of the data and add this (url-base64-encoded-) signature as a X-RProxy-Sig header in the request to the server (3). The server rproxy will GET the resource as usual (4), and do some calculation when it receives the response (5).

Upon receiving the response, the server rproxy has a new file and an old file signature, which allow it to calculate a delta from old file to new file. It will then send this delta to the client rproxy (6).

The client rproxy has a old file and a delta, so it will patch the old file to produce new file, which it can now serve to the browser (7) (and also save in the cache for further requests)

In the previous diagram, you can see that the big data exchange was on fast links, while the slow link (across the internet) just exchanged little data: goal is achieved.

Usage

This project is developed with Go, so you will need the usual installation steps.

Dependencies:

To run the demo:

  • In 1 terminal, run the dummy server:
  • In a second terminal, run the server rproxy (in the server folder)
  • In a third terminal, run the client rproxy (in the client folder)

Now use curl to send a GET to localhost:2424 (the client rproxy). Nothing will be visible the first time (except you retrieve the data), but on subsequent calls, you will see that the server part sends less data:

2013/12/11 20:59:34 C -> S: 32B
2013/12/11 20:59:34 S -> C: 118716B
2013/12/11 20:59:35 C -> S: 944B
2013/12/11 20:59:35 S -> C: 2067B

C -> S is the length of the signature sent from the client proxy; S -> C is the size of the data sent to the client proxy. You can see that in the first iteration, an "empty" signature is sent, so the full content is retrieved, but on later call, a non-empty signature allows the server to send less data.

When to use

As you can see, this setup needs no modification in the browser nor in the server, only to set the browser's proxy to the client rproxy and to set the server rproxy as a frontal to the logic server. However, a better solution would be to directly integrate these 2 parts in the respective components for better performance (although there could still be value in having a domain-wide client rproxy, so that requests to the same urls from different browsers can be effectively deduplicated).

This functionality allows a server to send (far) less content when said content is dynamic: I believe this is completely adapted to the current state of the Web, were most content is dynamic, developers mix endless AJAX calls to incrementally update a page and everything moves so fast.

On the other hand, rproxy will be a waste of bandwidht and CPU time if the content is static, for instance for a CDN that distributes static assets such as images or js scripts. If you have one of these, don't use something like rproxy.

Some numbers

The repo currently contains a dummy server that serves two pages: index.html and index2.html, which are 2 very similar versions of the webpage at www.cnn.com. Whenever a GET is satisfied, one of the 2 pages is served, and the other one is queued for the next request (so that 2 consecutive calls basically test the diffing from one to the other).

I have chosen this page because it is very huge (116 kB!), but it is extremely dynamic: the 2 pages were fetched ~1 min apart, and are a mere 3 bytes (!) different. This means that if you browse to this website, you will have to download an enormous blob for something you can merely notice.

With the rsync algorithm (and thus with rproxy), a signature of either of the page amounts to 708B, which means 944B after base64-encoding, and the delta produced by the algorithm is 2067B long.

So, instead of downloading 116 kB of data just to get those new 3 bytes, you will send 944B upstream and receive 2067B of data. That is still a big amount of data, but far less than the original amount, and you won't be able to do much better unless you start uploading more (which is a bad idea when you look at the current abysmal asymetry of the typical ADSL line)

LICENCE

CC0

To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.

You should have received a copy of the CC0 Public Domain Dedication along with this software. If not, see http://creativecommons.org/publicdomain/zero/1.0/.

About

An implementation of the dormant rproxy idea

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages