Hendrik Weimer's Quantenblog

Having fun with science and technology.

Read Scientific Papers from Anywhere

2007-01-03

Print version

Slashdot me! Digg me! Stumble me! del.icio.us Reddit me!

Accessing scientific papers online is great because it spares you the way to the library. Unfortunately, most journals make their articles available only to paying subscribers (contrary to Open Access). So if you want to read an article at home or while on a conference you have a problem. This posting shows you how to access it anyway just by clicking on the download link in a journal.

For the method to work you need a few prerequisites:

  • Apache or any other webserver serving CGI scripts
  • Squid
  • Perl with Net::SSH installed
  • An SSH account on a machine that is eligible to access the journal's articles.

Still on board? Great. We will create a setup that uses Squid as a proxy for your favorite browser. Squid then feeds all requests to a redirector. Based on a list of regular expression the redirector script decides whether the request is for a paper. If this is the case an URL redirecting to the Apache server is returned. The CGI script running there runs an SSH command on the remote machine that downloads the paper and delivers it to the browser. In a nutshell, the whole process looks like this:
Browser -> Squid -> Redirector -> Apache -> SSH tunnel -> PDF article

First of all, install and configure Squid so that you can use it as an HTTP proxy. Don't forget to update your browser's settings as well. To use Squid's redirector feature, add the following line to your squid.conf:

redirect_program /usr/local/bin/redirector

The redirector is a perl script, which can be found here. At the top of the script you will see the URLs of the journals you want to access via the SSH tunnel, so you might need to adjust them to fit your needs. If you want to redirect to another machine than localhost you need to change the $httphost variable. Copy the get-paper perl script into your CGI directory. Edit the top of the script to enter your SSH server host and username.

Now we are almost there. We only need to setup the SSH connection so that the CGI script can perform downloads via the SSH tunnel. For this, you first need to open a shell as the user running the web server (i.e. the Apache user). The usual way to achieve this is to become root and then switch to the user with su - www-data or whatever your Apache user is. You might need to specify a different shell, though. As the Apache user create an SSH key (using ssh-keygen) and add the public key to ~/.ssh/authorized_keys on the SSH server. Add a "command" directive to the new line to call Wget for downloading the document, so it looks like this:

no-port-forwarding,no-X11-forwarding,no-agent-forwarding,
no-pty,command="wget –quiet –save-header
-O - $SSH_ORIGINAL_COMMAND" ssh-rsa AAAA[...]

Make sure that everything is placed on a single line.

Okay, now everything is ready and you should be able to read papers just by clicking on the download link in your browser. If you encounter problems, feel free to post a comment explaining your difficulties.

View comments

Copyright 2006--2011 Hendrik Weimer. This document is available under the terms of the GNU Free Documentation License. See the licensing terms for further details.