| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] | 
There are various advanced ways to configure LinkController. These are mostly not needed for simple checking of a small collection of web pages. For larger sites and special situations however, they may well make life much easier.
3.1 Advanced Infostructure Configuration 3.2 Authorisation Configuration 
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] | 
Using more advanced configuration it is possible to skip over certain resources when we are doing link extraction and to ignore some of the links. You may want to skip over this section initially and come back to it only when you find that there are links or pages being checked that you would rather avoid.
For this section, we assume that you already know how to make basic Perl code. If not, then please read through the Perl manual pages `perl', `perlsyn' and `perldata'. You may find that the examples given below are sufficient to get you started.
In order to get extract-links to extract links using an
advanced infostructure, you must use the advanced keyword. In the
infostructure file.  Infostructures not listed there will be ignored,
but won't cause any harm.
Advanced configuration is in the `.link-controller.pl'
configuration file by making definitions into the %::infostrucs
hash.  These look like the following
| $::infostrucs{http://www.mypages.org/} = {
   mode => "directory";
   file_base => "/home/myself/www",
   prune_re => "^(/home/myself/www/statistics)" #ignore referrals
              . "|(cgi-bin)", #do CGIs separately
   exclude_re => "\.secret$", #secrets shouldn't get into link database
};
$::infostrucs{http://www.mypages.org/cgi-bin/} = {
   mode => "www";
   exclude_re => "query", #query space is infinite!!
};
 | 
There are a number of keywords that can be used.
N.B. the exclude and include regular expression can be used together. For a match, the include regular expression must match and the exclude must not match. In other words excludes override includes.
In order for the infostructure to be used by extract-links an
entry must still be made in the `infostrucs' file.  For this use the
advanced keyword.  The second argument is a URL used to look up
the definition in the $::infostrucs hash.
| advanced http://www.mypages.org/ advanced http://www.mypages.org/cgi-bin/ | 
The URL used here must match exactly the one used in the hash. It is important to note that `directory' and `www' definitions in the `infostrucs' file will override any advanced configuration given.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] | 
One problem when checking links, especially within an intranet situation is that some pages can be protected with basic authentication. In order to extract links from those pages or to simply know that they are there, we have to get through that authentication. By using the advanced Authorisation Configuration we can give LinkController authority to access these pages and allow link checking to work as normal.
| Using this method to allow LinkController to work in an environment with authentication is inherently a security issue since authentication tokens must be stored, effectively in plaintext, in files. This risk may, however, not be much higher than the one that you currently accept, so this can be useful | 
We can store the authentication tokens simply in the %::credentials hash which we can create in the `.link-controller.pl' configuration file. The keys in the hash are the exact realm string which will be sent by the web server. Each value of this hash is a hash with a pair of keys. The `credentials' key should be associated to the authentication token. The `uri_re' key should be a regular expression which matches the web pages you want to visit. For security reasons it shouldn't match any others.
| $::credentials = {
  my_realm => { uri_re => "https://myhost.example.com",
                credential => "my_secret" }
} );
 | 
As a minor sanity check, every `uri_re' will be tested against the strings `http://3133t3hax0rs.rhere.com' and `http://3133t3hax0rs.rhere.com/secretstuff/www.goodplace.com/'. If they match then those credentials will be disallowed. The owners of `3133t3hax0rs.rhere.com' will just have to hack the code..
For more discussion about the security risks and how to mitigate them see the file `authorisation.pod' included with the LinkController distribution. If you didn't understand the security risk from the above description then probably you should consider avoiding using this mechanism.
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |