Monday, 1 August 2011

Unix Made Easy: Tutorial for Squid

Unix Made Easy: Tutorial for Squid: "Introduction Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Unlike tr..."

Tutorial for Squid


Introduction

    Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Unlike traditional caching software, Squid handles all requests in a single, non-blocking, I/O-driven process.
    Squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests.It supports SSL, extensive access controls, and full request logging. By using the lightweight Internet Cache Protocol, Squid caches can be arranged in a hierarchy or mesh for additional bandwidth savings.
    Squid consists of a main server program squid, a Domain Name System lookup program dnsserver, some optional programs for rewriting requests and performing authentication, and some management and client tools. When squidstarts up, it spawns a configurable number of dnsserver processes, each of which can perform a single, blocking Domain Name System (DNS) lookup. This reduces the amount of time the cache waits for DNS lookups.
    This web caching software works on a variety of platforms including Linux, FreeBSD, and Windows. Squid is created by Duane Wessels.



Operating Systems Supported by Squid 

  • Linux
  • FreeBSD
  • NetBSD
  • OpenBSD
  • BSDI
  • Mac OS/X
  • OSF/Digital Unix/Tru64
  • IRIX
  • SunOS/Solaris
  • NeXTStep
  • SCO Unix
  • AIX
  • HP-UX
  • OS/2
  • Cygwin
Installation Squid
Downloading Squid 

Squid can be  download as  a squid source archive file in a gzipped tar ball form (eg.squid-*-src.tar.gz) available athttp://www.squid-cache.org/ or from ftp://www.squid-cache.org/pub 

squid can also be downloaded as an binary from http://www.squid-cache.org/binaries.html

Installing Squid from Source

1.Extract the source
    tar xzf squid-*-src.tar.gz


2.Change the current directory to squid-*
    cd squid-*


3.Compile and Installing squid
    ./configure
    make
    make install


Note:
This will by default, get installed  in "/usr/local/squid".
To get more help for the compile time options available in squid.
./configure .help


Creating Squid Swap Directories

The Squid swap directories could be created by the following command

#/usr/local/squid/sbin/squid -z 

Start, Stop & Restarting Squid

Start Squid #/usr/local/squid/sbin/squid

Stop Squid
Stopping squid .  #/usr/local/squid/sbin/squid -k shutdown

Restart Squid
Stopping squid .  #/usr/local/squid/sbin/squid -k shutdown
Starting squid - #/usr/local/squid/sbin/squid

Options Available
-k reconfigure|rotate|shutdown|interrupt|kill|debug|check|parse
                 Parse configuration file, then send signal to
                 running copy (except -k parse) and exit.


Running Squid as Daemon

For running squid as a daemon or a background process, it could  be started as 

#/usr/local/squid/sbin/squid -N

Starting Squid in Debugging Mode

Squid can be started in debugging mode by running squid as given below.

#/usr/local/squid/sbin/squid -Ncd1

which gives a debugging output.
If the test is perfect then it would print .Ready to serve requests..

Check Squid Status

To check whether squid is running the following command could be used.

#/usr/local/squid/sbin/squid -k checkConfiguration
 
Basic Configuration
 
Squid Listening to a Particular Port
 
The option http_port specifies the port number where squid will listen for HTTP client requests. If this option is set to port 80, the client will have the illusion of being connected to the actual web server. Squid by default listen to the port 3128
 
Different modes of Squid Configuration
 
Squid could be configured in three different modes as Direct proxy, Reverse proxy and Transparent proxy. 
 
Direct Proxy Cache

Direct proxy cache is used to cache static web pages (html and images) to a squid machine. When the page is requested second time, the browser returns the data from the proxy instead of the origin web server. The browser is explicitly configured to direct all HTTP requests to the proxy cache, rather than the target web server. The cache then either satisfies the request itself or passes on the request to the target server.

Configuring as Direct Proxy
By default, squid is configured in proxy mode. In order to cache web traffic and to use the squid system as a proxy, you have to configure your browser, which needs at least two pieces of information: 
Set the proxy server's host name 
Set the port that the proxy server is accepting requests on 

Transparent Cache

Transparent cache achieves the same goal as a standard proxy cache, but operates transparently to the browser. The browser does not need to be explicitly configured to access the cache. Instead, the transparent cache intercepts network traffic, filters HTTP traffic (on port 80) and handles the request if the object is in the cache. If the object is not in the cache, the packets are forwarded to the origin web server.


Configuring as Transparent Proxy

Using squid transparently is a two part process, requiring first that squid be configured properly to accept non-proxy requests (performed in the squid module) and second that web traffic gets redirected to the squid port (achieved in three ways namely policy based routing, Using smart switching or by setting squid Box as a gateway).
 
Getting transparent caching to work requires the following steps
 
For some operating systems, have to configure and build a version of Squid which can recognize the hijacked connections and discern the destination addresses. For Linux this seems to work automatically. For BSD-based systems, you probably have to configure squid with the --enable-ipf-transparent option, and you have to configure squid as
 
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

 
You have to configure your cache host to accept the redirected packets - any IP address, on port 80 - and deliver them to your cache application. This is typically done with IP filtering/forwarding features built into the kernel. On linux they call this ipfilter (kernel 2.4.x), ipchains (2.2.x) or ipfwadm (2.0.x). On FreeBSD and other BSD systems they call it ip filter or ipnat; on many systems, it may require rebuilding the kernel or adding a new loadable kernel module.
 

Reverse Proxy Cache

A reverse proxy cache differs from direct and transparent caches, in that it reduces load on the origin web server, rather than reducing upstream network bandwidth on the client side. Reverse Proxy Caches offload client requests for static content from the web server, preventing unforeseen traffic surges from overloading the origin server. The proxy server sits between the Internet and the Web site and handles all traffic before it can reach the Web server. A reverse proxy server intercepts requests to the Web server and instead responds to the request out of a store of cached pages. This method improves the performance by reducing the amount of pages actually created "fresh" by the Web server.

 
Configuring as Reverse Proxy 
 
To set Squid up to run as an accelerator then you probably want to listen on port 80. And finally you have to define the machine you are accelerating for. This is done in squid module,
http_port 80
httpd_accel_host visolve.com
httpd_accel_port 81
httpd_accel_single_host on
httpd_accel_with_proxy on

If you are using Squid as an accelerator for a virtual host system, then instead of a 'hostname' here you have to use the word virtual as:
 
http_port 80
httpd_accel_host virtual
httpd_accel_port 81
httpd_accel_with_proxy on
 
Different method of Intercepting HTTP Traffic 
 
The methods could found in detail in the following link.
 
http://www.visolve.com/squid/whitepapers/trans_caching.php   



WCCP configuration

Does Squid supports wccp?
 
Yes, Squid supports WCCP. Routers that support WCCP can be configured to direct traffic to one or more web caches using an efficient load balancing mechanism. WCCP also provides for automatic bypassing of an unavailable cache in the event of a failure 
 
Configuring Squid for WCCP Support
 
Patches to be applied for linux kernel.

The linux kernel in the squid machine should be patched with ip_wccp as ip_gre is some what broken. Recompile the kernel enabling ip_gre and ip_wccp.

Now install the squid from source and configure it in the squid.conf to point to the WCCP router.

Squid Machine configuration.
The following iptables rule to be made so as to redirect all the http traffic to squid port 3128.
iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 3128

Cache Inside the Routers network

If the cache is inside the routers network the packets coming from caches should be prevented from being redirected back to the caches again. So the the following firewall rule has to be prepended in the router machine.

iptables -t mangle -A PREROUTING 1 -p tcp --dport 80 -s <ip-squid> -j ACCEPT

 
SNMP Configuration
 
Enabling SNMP support to Squid
 
To use SNMP with squid, it must be enabled with the configure script, and rebuilt. To enable SNMP in squid go to squid src directory and follow the steps given below :

./configure --enable-snmp [ ... other configure options ]

make all
make install

And edit following tags in squid.conf file :

acl aclname snmp_community public
snmp_access aclname

Once you configure squid and SNMP server, Start SNMP and squid. 

 
Why should i go for SNMP?
 
SNMP in squid is useful in longer term overview of how proxy is doing. It can also be used as a problem solver. For example: how is it going with your file descriptor usage? Or how much does your LRU vary along a day? These informations can not be monitored normally.
 
Monitoring Squid 
 
There are a number of tools used to monitor Squid via SNMP, among which MRTG is mostly used. The Multi Router Traffic Grapher (MRTG) is a tool to monitor squid information which generates a real-time status (graphical representation), in dynamic view by sampling data every five minutes (may vary according to your need). MRTG shows activity - in the last 24 hours and also in a weekly, monthly and yearly graph. 
 
Parameters Monitored
 
Squid runtime information like CPU usage, Memory usage, Cache Hit, Miss etc., can be monitored using SNMP. 
 
Delay Pools Configuration


Limiting Bandwidth
 
Delay Classes are generally used in places where bandwidth is expensive. They let you slow down access to specific sites (so that other downloads can happen at a reasonable rate), and they allow you to stop a small number of users from using all your bandwidth (at the expense of those just trying to use the Internet for work).
To ensure that some bandwidth is available for work-related downloads, you can use delay-pools. By classifying downloads into segments, and then allocating these segments a certain amount of bandwidth (in kilobytes per second), your link can remain uncongested for useful traffic.
To use delay-pools you need to have compiled Squid with the appropriate source code: you will have to have used the --enable-delay-pools option when running the configure program

An acl-operator (delay_access) is used to split requests into pools. Since we are using acls, you can split up requests by source address, destination url or more.

 
Configuring Squid with Delay Pools
 
To enable delay pools option,
Compile squid with --enable-delay-pools
Example 
acl tech src 192.168.0.1-192.168.0.20/32
acl no_hotmail url_regex -i hotmail
acl all src 0.0.0.0/0.0.0.0
delay_pools 1 #Number of delay_pool 1
delay_class 1 1 #pool 1 is a delay_class 1
delay_parameters 1 100/100
delay_access 1 allow no_hotmail !tech

In the above example, hotmail users are limited to the speed specified in the delay_class. IP's in the ACL tech are allowed in the normal bandwidth. You can see the usage of bandwidth through cachemgr.cgi.   



Caching

Can squid cache FTP contents?
 
Squid is a http proxy with ftp support, not a real ftp proxy. It can download from ftp, it can also upload to some ftp, but it can't delete/change name of files on remote ftp servers.When we block ports 20 and 21, we won't be able to delete/change name of files on remote ftp servers.It speaks FTP on the server-side, but not on the client-side

Can squid Cache dynamic pages?
 
Squid will not be able to cache pages that dynamically generate the scripts. It will cache only the static pages.
 
Deleing Objects from Cache
Deletion of object from is possible by using .purging. method.

Squid does not allow you to purge objects unless it is configured with access controls in squid.conf. First you must edit the following tag in squid.conf as

acl PURGE method PURGE
acl localhost src 127.0.0.1
http_access allow PURGE localhost
http_access deny PURGE

The above allows purge requests which come from the local host and denies all other purge requests.

/usr/local/squid/bin/client -m PURGE <URL>

 
Specifing Cache Size 
Cache size could be specified by
 
Using cache_dir directive in squid.conf,

cache_dir ufs /usr/local/squid/cache 100 16 256 

 
Here ufs is the squid filesystem, /usr/local/squid/cache is the default cache directory, 100 is the cache size in MB . The cache size could be specified here
and 16 and 256 are the number of sublevel directories in cache directory. 
 
Squid Swap Formats
 
The squid swap formats systems available are

ufs,aufs,diskd and coss

Authentication 
 
Configuring Squid for authenticating users
 
Squid allows you to configure user authentication by using auth_param directive.This is used to define parameters for the various authentication schemes supported by Squid.
 
Proxy authentication in transparent mode
 
       Authentication can't be used in a transparently intercepting proxy as the client then thinks it is talking to an origin server and not the proxy. This is a limitation of bending the TCP/IP protocol to transparently intercepting port 80, not a limitation in Squid.
 
Authentication schemes available for squid
 
The Squid source code comes with a few authentication processes for Basic authentication. These include
LDAP: Uses the Lightweight Directory Access Protocol
NCSA: Uses an NCSA-style username and password file.
MSNT: Uses a Windows NT authentication domain.
PAM: Uses the Linux Pluggable Authentication Modules scheme.
SMB: Uses a SMB server like Windows NT or Samba.
getpwam: Uses the old-fashioned Unix password file.
sasl: Uses SALS libraries.
winbind: Uses Samba authenticate in a Windows NT domain

In addition Squid also supports the NTLM and Digest authentication schemes which both provide more secure authentication methods where the password is not exchanged in plain text.

 
Configuring squid for LDAP authentication

Compiling squid with ldap support.
./configure --enable-basic-auth-helpers="LDAP"

In squid.conf file edit the following

For Example
auth_param basic program /usr/local/squid/libexec/squid_ldap_auth -b dc=visolve,dc=com -f uid=%s -h visolve.com
acl password proxy_auth REQUIRED
http_access allow password
http_access deny all
  
 
Check Squid working with LDAP auth
 
To check whether the Squid machine communicates with the LDAP server Use the below command in command line

Example:
# /usr/local/squid/libexec/squid_ldap_auth -b dc=visolve,dc=com -f uid=%s visolve.com

This waits for the input.You have to give uid space passwd. If it was able to connect to LDAP server it will return "ok".

 
 LDAP group authentication
 
Compiling squid with ldap support.
./configure --enable-basic-auth-helpers="LDAP" --enable-external-acl-helpers=ldap_group

In the confiuration file (squid.conf)

external_acl_type group_auth %LOGIN /usr/local/squid/libexec/squid_ldap_group -b "dc=visolve,dc=com" -f " (&(objectclass=groupOfUniqueNames)(cn=%a)(uniqueMember=uid=%v,cn=accounts,dc=visolve,dc=com))" -h visolve.com

acl gsrc external group_auth accounts
http_access allow gsrc

 
  configuring Squid for NCSA
 
NCSA Authentication

This is the easiest to implement and probably the preferred choice for many environments. This type of authentication uses an Apache style  htpasswd  file, which  is checked  whenever anyone logs in. This is the best supported option, and a web based password changing  program  is provided to make it easy for our  users to  maintain  their own  passwords

To turn on NCSA authentication, edit some directives in squid.conf

authenticate_program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd

This tells Squid where to find the authenticator. Next we have to create an ACL.

Acl configuration for ncsa_auth :

acl auth_users proxy_auth REQUIRED
http_access allow auth_users
http_access deny all

Configuring Squid for SMB

SMB Auth Module :

smb_auth is  a  proxy authentication module. With  smb_auth we can authenticate proxy users against an SMB server like Windows NT or Samba.

Adding smb_auth in Squid.conf :

Squid Configuration :

To turn on SMB authentication, edit some directives in squid.conf.

authenticate_program /usr/local/squid/bin/smb_auth -W domain -S /share/path/to/proxyauth

This tells Squid where to find the authenticator. Next we have to create an ACL .

Acl configuration for smb_auth :

acl domainusers proxy_auth REQUIRED
http_access allow domainusers
http_access deny all

 
Configuring squid for MSNT

MSNT Auth Module :

MSNT is a Squid web proxy authentication module. It allows a Unix web proxy to authenticate users with their Windows NT domain credentials.

Adding msnt_auth in Squid.conf :

Squid Configuration :

To turn on MSNT authentication, edit some directives in squid.conf

auth_param basic program /usr/local/squid/libexec/msnt_auth
auth_param basic children 5
auth_param basic realm Squid proxy-caching web server
auth_param basic credentialsttl 2 hours

This tells Squid where to find the authenticator. Next we have to create an ACL

Acl configuration for msnt_auth :

acl auth_users proxy_auth REQUIRED
http_access allow auth_users
http_access deny all

Configure squid for PAM
PAM Auth Module :

This program authenticates users against a PAM configured authentication service "squid". This allows us to authenticate Squid users to any authentication source for which we have a PAM module.

Adding pam_auth in Squid.conf

Squid Configuration

To turn on PAM authentication, edit some directives in squid.conf.

authenticate_program /usr/local/squid/bin/pam_auth

 This tells Squid where to find the authenticator. Next we have to create an ACL .

Acl configuration for pam_auth :

 acl auth_users proxy_auth REQUIRED
 http_access allow auth_users
 http_access deny all

 
Configure squid for NTLM 

NTLM authentication is a challenge-response authentication type. NTLM is a bit different and does not obey the standard rules of HTTP connection management. The authentication is a three step (5 ways) handshake per TCP connection, not per request.

1a. Client sends unauthenticated request to the proxy / server.

1b. Proxy / server responds with "Authentication required" of type NTLM.

2a. The client responds with a request for NTLM negotiation

2b. The server responds with a NTLM challenge

3a. The client responds with a NTLM response

3b. if successful the connection is authenticated for this request and onwards. No further authentication exchanges takes place on THIS TCP connection. 

Adding ntlm_auth and passwd file in Squid.conf

Squid Configuration:

To turn on NTLM authentication, edit some directives in squid.conf.

auth_param ntlm program /usr/local/squid/libexec/ntlm_auth (domainname)/(pdc name)
auth_param ntlm children 5
auth_param ntlm max_challenge_reuses 0
auth_param ntlm max_challenge_lifetime 2 minutes

This tells Squid where to find the authenticator. Next we have to create an ACL.

Acl configuration for ntlm_auth :

acl auth_users proxy_auth REQUIRED
http_access allow auth_users
http_access deny all

Filtering

Filtering a website
 
Filtering of websites could be made with ACL (Access Control List). Here is an example of denying a group of ip addresses to a specific domain.

acl block_ips src <ipaddr1-ipaddr2>
acl block_domain dstdomain <domainname>

http_access deny block_ips block_domain
http_access allow all
  
 
Denying a user from accessing particular site
 
Denying a user from accessing particular site coule be done by ACLs.
It is possible by using 'dstdomain' acl type.

For example..

acl sites dstdomain .gap.com .realplayer.com .yahoo.com

http_access deny sites 

 
Filter a particular port
 
Filtering a particular port could be done in ACL as follows

acl block_port port 3456
http_access deny block_port
http_access allow all

 
Denying or allowing users 
 
Denying access to websites for a particular timing could be done as follows.

To restrict the client from a source IP to access a particular domain during 9am-5pm on Monday,

acl names src <ipaddr>
acl site dstdomain <domainname>
acl acltime time M 9:00-17:00

http_access deny names site acltime
http_access allow all
  
 
What all squid cant filter?
 
Squid cannot filters virus and web pages based on content.
 
Filtering a Particular MAC address

To use ARP (MAC) access controls, you first need to compile in the optional code. Do this with the --enable-arp-acl configure option.

Example:

acl M1 arp 01:02:03:04:05:06
acl M2 arp 11:12:13:14:15:16
http_access allow M1
http_access allow M2
http_access deny all
Performance
 
Monitoring Squid Performance
 
Squid performance is monitored by using cache manager and SNMP.
Cache Manager:
This provides access to certain information needed by the cache administrator. A companion program, cachemgr.cgican be used to make this information available via a Web browser. Cache manager requests to Squid are made with a special URL of the form 
 
        cache_object://hostname/operation
 
The cache manager provides essentially ``read-only'' access to information. It does not provide a method for configuring Squid while it is running. 
 
SNMP: 
 
SNMP could be used for monitoring squid runtime information like CPU usage, Memory usage, Cache Hit, Miss etc. The Multi Router Traffic Grapher (MRTG) is a tool to monitor squid information which generates a real-time status (graphical representation), in dynamic view by sampling data every five minutes.
 
Improving Squid Performance
 
Squid performance could be improved by gathering the performance data for the particular environment and tuning the Hardware and Kernel parameters for the peak performance.
 
Does the cache directory filesystem impact the performance?
 
The Cache directory has the default option ufs. When it is made with the following 
 
cache_dir aufs 
 
The aufs storage scheme improves the Squid.s disk I/O response time by using a number of thread processes for disk I/O operations .The aufs code requires a pthreads library. This is the standard threads interface defined by POSIX. To use aufs squid must be compiled with storeio option.
 
Note:
 
If disk caching is not used, it should be disabled by setting to 'null /tmp'
This eliminates the need for meta-data cache index memory space used by squid.
 
 Log files
 
Log files produced by squid
 
The list of log files produced by squid are
 
squid.out, cache.log, useragent.log, store.log, hierarchy.log, access.log.
 
Monitoring User Access
 
The access information gets stored in the access.log file.
 
Rotating Log
Larger log files could be handled by rotating the same.This could be done with the following command

squid -k rotate

To specify the number of logfile rotations to make when you type 'squid -k rotate' configure it in the squid.conf file in logfile_rotate directive.

Scheduling of this procedure could be done by Cron entry which rotates logs at midnight.

0 0 * * * /usr/local/squid/bin/squid -k rotate

 
 Can squid supports logs of size greater than 2GB?
 
Squid by default doesnt supports logs of size greater than 2 GB.To make the squid supports files of size greater than 2GB compile the squid with the option(--with-large-files)
 
Disbaling Squid Log File
 
Disabling log files could be done
To disable access.log
        cache_access_log none
To disable store.log
        cache_store_log none
To disable cache.log
        cache_log /dev/null 
Tools
 
Cache Manger (cachemgr.cgi)
  
The cache manager (cachemgr.cgi) is a CGI utility for displaying statistics about the squid process as it runs. The cache manager is a convenient way to manage the cache and view statistics without logging into the server. 


Tools For Configuring Squid
 
There are many tools available to configure squid like webmin and so on.

You can get these tools from

http://www.squid-cache.org/related-software.html
 
  
Log Analysers

Calamaris 

It is a commonly used  tool to analyze Squid's access.log. It Supports many features like generating Status reports of incoming UDP-Requests and incoming TCP-Requests for total as well as on per host basis; Reports about requested second-level-domains and Top_Level_domains are generated; And also Reports about requested Content Types, file_extension and on Protocols are generated using calamaris. It generates ASCII or html reports. For a full list of features, please visit the Calamaris home page. 
  
Weblog 
  
  WebLog is a group of Python modules containing several class definitions that are useful for parsing and manipulating common Web and Web proxy logfile formats. 
  
The Webalizer 
  
The Webalizer is a fast, free web server log file analysis program It is  written in C to be extremely fast and highly portable. The results are presented in both columnar and graphical format. Yearly, monthly, daily and hourly usage statistics are presented, along with the ability to display usage by site, URL, referrer, user agent , search string, entry/exit page, username and country. Processed data may also be exported into most database and spreadsheet programs that support tab delimited data formats. In addition, wu-ftpd xferlog formatted logs and squid proxy logs are supported. 
  
SARG 

   Sarg is a Squid Analysis Report Generator that allow you to view "where" your users are going to on the Internet. Sarg generates reports in html, with many fields, like: users, IP Addresses, bytes, sites and times. 

  
Tools to generate user web access report
Webmin is a web-based tool for generating web access reports. Using any browser that supports tables and forms (and Java for the File Manager module), you can setup user accounts, Apache, DNS, file sharing and so on. 
  
Webmin consists of a simple web server, and a number of CGI programs which directly update system files like/etc/inetd.conf and /etc/passwd. The web server and all CGI programs are written in Perl version 5, and use no non-standard Perl modules.   
 

Miscellaneous


Controlling Uploads
  
The uploads can be controlled by using acls(req_header).
  
acl upload_control req_header Content-Length [1-9][0-9][0-9][0-9][0-9]{3,}
http_access deny upload_control
http_access allow all

Controlling Downloads
  
The Downloads can be controlled by using the following directive.

reply_body_max_size     bytes allow|deny acl