Fravia's comment

I will just publish, from now on, the essays I like exactly as I get them.
Authors are invited to And I'll update it.
Note that if the essay should infringe on anyone copyrights, and if I receive a legitimate request to nuke it, it will immediately disappear from my site, so you always better write software reversing essays that are not "specific" target related... so, pointing out deficiences is OK, individuating "software black holes" is a must, but explaining lamers how to register (or, even more silly, how to make a coward keygen for the idiots) is definitely NOT "Fraviatiquette".
Indeed from now on I want to HELP, not to damage programmers.
This said I publish this perfectly formatted and extremely useful essay by Kuririn with pleasure. Archie's searching is somewhat becoming a "lost" art, and I'm glad that Kuririn has accepted to help all Internet-newbies to understand it... hope more will follow!


Archie



The File Transfer Protocol (FTP)

FTP is a way of moving computer files from one site to another ; anonymous FTP is our concern here since it is open to anyone. The administrators at computer sites around the world have made directories full of information available to anyone who logs is as 'anonymous' (with the convention of using your email address as a password). Since then there is no special permission required to get into these sites, they constitute a kind of "swap box," where everything is freely available. The range of programs available at these sites (now enormous) grows exponentially by the day.

A Short History

Archie was developed at McGill University's School of Computer Science by Alan Emtage, Bill Heelan, and Peter Deutsch. (Apparently) this computer tool (like many) was devised out of the university's need to save money. Looking for public domain software, the McGill group began searching anonymous FTP archive sites, and eventually began to automate the process of scanning their findings. From this evolved an information tool that is among the most widely used on the Internet. Deutcsh and Emtage, went on to found Bunyip Information Systems of Montreal, which licenses the archie server system and provides upgrades and support. (Newest Version to date is 3.5).

A Short Definition

Consider archie as a set of related functions. The software maintains a list of Internet FTP sites known as the Internet Archives database (the term archie is a play on the term "archive"). The database is searchable by a variety of servers across the internet. Note well that that archie does not make data sweeps across the entire Internet ; rather ; it targets specific sites, with the permission of their administrators, and searches them. Of course going through the directories manually would take constant vigilance as new files and directories were added to one or another site. By automating the process, archie can make its own sweep of FTP directories, compiling them and storing the results in its database. Each site's holdings therefore stay almost up-to-date at any given moment, (given a slight overlap) that is to say an archie search could point to a file that had been removed shortly after the last archie database update. (and vice versa: a new file that a given archie doesn't yet know about could be added to the directories at a specific site.) If you want to see the the listings that archie creates for its database ; they are available at the individual server sites.

Using Archie (via telnet) kinda remedial

The first thing of course is to use a telnet client (for the sake of simplicity let's just say we're using the plain ol' unix telnet client). So you need a server to telnet into, let's use archie.th-darmstadt.de. So you issue the command telnet archie.th-darmstadt.de, and you see the following: (this after you login with archie)


Welcome to Archie!

Vers 3.3


Currently the help system provides support for the following languages:

deutsch english francais

Use 'set language' to change the language

# Bunyip Information Systems, Inc., 1993, 1994, 1995

# Terminal type set to `vt100 24 80'.
# `erase' character is `^?'.
# `search' (type string) has the value `exact'.
archie>
Aside from being able to identify the current version of this archie server, it is important to note the default search type --which is `exact' here ; meaning that anything you type in will be looked for as an exact match. So this is what +orc meant by nomen omen eh? So if you type in pumpkin you will not get hits such as pumpkin.txt or pumpkin.tar.Z (simple enough). To set this parameter you use the set command where search is the variable and sub tells specifically how the variable is to be applied.In other words --The 'search' variable determines the kind of search performed on the database by the 'prog' command, providing flexibilty on search times and ranges. set search exact! Now set search sub (of course all of this occurs on the archie> command line before you conduct your search, set variables first search later in other words). So set search sub which is default for some servers, will retrieve any file or directory name containing your search term within it, ignoring case. So if you search for -tin- you will get hits ranging from -bulletin to tinman-. An example of this would be to search for orc.htm, so first you write set search sub and then you write find orc.htm. Here is an example of this simple process (truncated).


archie> set search sub
archie> find orc.htm
# Search type: sub.
working...

Host ftp.imag.fr (129.88.30.10)
Last updated 00:30 21 Jan 1999

Location: /pub1/mail/mh/book/mh
FILE -rw-rw-r-- 14997 23:00 27 Sep 1997
nmenorc.htm

Host ftp.sics.se (193.10.66.43)
Last updated 15:54 23 Feb 1998

Location: /www/people/orc
FILE -rw-r----- 17024 00:00 26 Sep 1994 orc.html

Other types of searchs include set search subcase which works like a regular substring search, except it differentiates between upper- and lowercase letters. set search regex makes use of UNIX regular expressions to conduct the search. Used without furthur specification, a regex search becomes a substring search, because regular expressions assume a wild-card character at the beginning and end of the search term. Using the caret (^) and the dollar sign ($) you can specify that the search term should only appear at the beginning of the retrieved file or directory of the end of the retrieved file or directory. e.g., ^eros would return hits that contain the search term at the beginning of the file eros$ would return hits that contain the search term at the end of the file. (more on this later)

Some other search parameters explained



If you want to examine your results (since they can sometimes scroll by too rapidly to view) you might write at the prompt set mailto your@email.address. Now you can use tha mail command to send yourself and others the results of your query. Or else you might rather use the set pager command which displays material one page at a time (which is the program called less under UNIX), advancing through the pages by pressing your spacebar. When you are finished you enter a q followed by a RETURN. The command list can be used to indicate all the sites in the database at the server site, or in conjunction with a UNIX regular expression to limit the search to particular domains. For example, you can use list to search for all sites in Switzerland, using a regex term: list .*ch$ This returns any site with the ch domain name, and excludes others. The $ sign (again) specifies that no text should follow the search term ; the .* allows any text to exist in front of the term. A full example of this:

archie> list .*ch$
# Your queue position: 1
# Estimated time for completion: 5 seconds.
working... O

aragorn.unibe.ch 130.92.9.51 14:30 22 Jan 1999
bandon.unisg.ch 130.82.101.96 00:37 29 Oct 1997
claude.ifi.unizh.ch 130.60.48.8 04:49 27 Jan 1999
ftp.inf.ethz.ch 129.132.167.2 12:08 22 Jan 1999
domreg.nic.ch 130.59.1.80 12:08 22 Jan 1999
iacrs1.unibe.ch 130.92.11.3 14:28 22 Jan 1999
liasun3.epfl.ch 128.178.155.12 05:13 29 Jan 1997
liaftp.epfl.ch 128.178.155.15 23:42 24 Feb 1998
lucy.ifi.unibas.ch 131.152.81.1 04:59 29 Jan 1997
ftp.switch.ch 130.59.10.32 22:20 26 Feb 1997
iamftp.unibe.ch 130.92.64.5 14:29 22 Jan 1999
ftp.cscs.ch 148.187.10.13 14:30 22 Jan 1999
ftp.unizh.ch 130.60.68.41 04:46 27 Jan 1999
rd24.cern.ch 137.138.61.190 23:42 24 Feb 1998
sunsite.cnlab-switch.ch 195.176.255.9 12:09 22 Jan 1999
ftp.unibe.ch 130.92.6.40 14:30 22 Jan 1999
ftp.unige.ch 129.194.17.1 12:07 22 Jan 1999
ftp.ethz.ch 129.132.1.45 23:40 24 Feb 1998
archie>

Now the last few command I will cover are help which you can type a ? at the help prompt to get a list of available subtopics. Use quit to exit (of course! ;-) and servers to generate a list of publicly available archie servers known to the site you are currently using. You can also type in manpage to get a look at the manual page for archie!.

regex and whatis

The whatis command: archie maintains a second set of data called the Software Description Database, in which are found short descriptions and the names of numerous files stored around the internet. As with archie's Internet Archives database, it should be understood that not all these files are executable programs (i.e., docs and other stored data). whatever it is, using the Software Description database through the whatis command can help. To search the database, use whatis followed by the term you are looking for. I might, want to know example below

archie> whatis moon
astro Computes astronomical data about the sun, moon, and planets jupmoons Jupiter's major moons simple plotter [in perl]
moon A phase-of-the-moon-program

moontool The moon on a Sun
phoon Phase of the moon, date routines
rise_set Sun and Moon rise/set program
xmoon Dynamically display astronomical data concerning the moon and the sun
xphoon Draw the current phase of the moon on the root window (under X11)

next you search!
archie> find jupmoons
# Search type: exact.
working...

Host ftp.uni-koeln.de (134.95.100.202)
Last updated 00:32 23 Jan 1999

Location: /usenet/comp.sources.misc/volume13
DIRECTORY drwxrwxr-x 2048 23:00 22 Apr 1993 jupmoons

Host scitsc.wlv.ac.uk (134.220.4.1)
Last updated 09:07 13 Feb 1997

Location: /pub/infomagic/usenet/misc/volume13
FILE -rw-r--r-- 8934 01:00 25 Aug 1991 jupmoons

regex expressions or UNIX regular expressions (basic stuff). As I said before using a search term without further regex expressions causes the search to be treated as a hunt for subsearches ; the effect is as if you entered set search sub. So if you do a find orc (you will probably get "hey fella there's no FTPs on the moon ;-)" but seriously it is the same as typing find .*orc.* which is covered in the list command above. So aside from these signs ($), (^), and (.) and (*) --let me say one thing about the asterisk and period...the asterisk stands for zero or more occurences of the preceding regular expression, in other words the example .*orc.* the period lets the search term be preceded by any one letter, while the asterisk means that any number of letters can occur before the orc string occurs. So the asterisk looks to the preceding expression, which is a period, and determines that it can occur any number of times (dig it?). The same goes for the end of the term, so that any number of letters can occur at the end of the term as well. Use [brackets] to show a set of characters you want to match. example: [smt]end you will get results ranging through [send, mend, tend], matching any of the four bracketed characters to the string that follows. It will end up finding much more coz the regular expressions have .* at the beginning and at the end unless a carat (^) or dollar sign ($) appears. There are many more expressions such as these. I end now! yet the end is good no? heh I steal Nexors form. bye friends hope this helps (I know its all out there yet I also know its good to have a quick printed reference ;)


Search for:

There are several types of search:

The results can be sorted By Host or By Date

Several Archie Servers can be used:

You can restrict the results to a domain (e.g. "uk"):

You can restrict the number of results to a number < 100:

The impact on other users can be: