Connected: An Internet Encyclopedia
Frequently Asked Questions

Up: Connected: An Internet Encyclopedia

Frequently Asked Questions

Frequently Asked Questions Q: How did the Internet get started?

A: A 1969 U.S. Department of Defense study lead to the deployment of an experimental packet-switched network (the ARPANET) that eventually evolved into the Internet.

Q: Why was it created?

A: The military theorized that a distributed data network would be more fault-tolerant than a telephone network, which could be disabled simply by attacking its central office. The ARPANET was created to test this theory.

Q: Who was involved?

A: The initial contractor to construct the ARPANET hardware was BBN. Important UNIX software development was done by the University of California, Berkeley.

Q: I want to know how e-mail messages reach their destination. Does a mail server establish a direct connection to the destination address and transfer the data? If a direct connection is not been made then how a message is routed and who determines the next domain it has to hop?

A: Current Internet e-mail technology supports a variety of delivery mechanisms to reach user@host.domain:

First, a DNS lookup is done on host.domain. This is not an ordinary A-record lookup, it is an MX-record lookup, and the listed host is called the mail exchanger.

Next, the SMTP protocol is used to transfer the message to the mail exchanger. If the mail exchanger is unreachable, the sender can either try an alternate mail exchanger (if one is listed), or queue the message and try again later.

The mail exchanger is not usually user's computer. Rather, it is a domain-wide server, typically a UNIX host running the sendmail program, that is programmed with delivery rules for local e-mail addresses. In a LAN environment, the mail exchanger might lookup user in a table, then use another SMTP transfer to complete delivery to the desktop.

In a dial-up environment, SMTP can't be used for the final transfer, since the destination machine is typically off-line, and only connected for short periods. In the case, SMTP is still used to transfer the e-mail to a POP or IMAP server, where it is queued. The user dials in with an Internet e-mail client, which uses either the POP or IMAP protocol to retrieve the e-mail.

Q: How can I start an Internet Service Provider (ISP)?

A: Starting an Internet service provider - that can be a daunting undertaking.

My personal advice is - don't do it. I say this simply because I think the market is flooded right now. Internet service has become a commodity item. It is very difficult to distinguish yourself from the competition on anything but price. To compete at the $20/mo level now prevailing in the consumer industry, you'll need a user base of several hundred to run in the black, and who knows how long before you can pay off the initial investment in hardware, which could easily run over $10,000 - not to mention payroll, rent, headaches, etc, etc.

All that being said, if you really want to do it, well, hell - Do It! Here's a sketch of a minimal ISP configuration:


   MODEM  ...  MODEM                        Fractional /
     \           /                             T-1    /
      \         /                                    /
       +--------+                                   /
       |Terminal|                            +------+
       | Server |                            |Router|
       +--------+                            +------+
           |                                    |
  [======================================================]
                 |
              +-----+
              |Linux|
              +-----+
           Min 3G Hard Disk
             Tape Backup

Starting at the upper right, you've got a fractional T-1 coming into a router. This is your connection to the global Internet. Where do you get this? Well, from a larger Internet provider, of course. Like I said - Internet service is a commodity market. You buy a big chunk of Internet access, resell a whole bunch of small chunks, and try to make a profit by being a middleman. The T-1 line itself is a dedicated line leased from the phone company that gives you a guaranteed, 24-hours a day connection to a single other location. There are other options for this, such as frame relay, or metropolitan-area fiber networks, and you may be able to get a good deal on one of these alternative technologies.

The fractional T-1 goes through a device called a CSU/DSU, which is basically a fast modem, and is then connected to a router. This doesn't have to be a fancy router, but it's nice to have something with extra slots in it, so that if you want to upgrade later, you can. On the other hand, you could just get a low-end model with one high-speed serial connection (for the CSU/DSU), and one Ethernet connection.

The router is plugged into a LAN, usually an Ethernet. I recommend 10-Base-T for new installations, simply because it is easier to diagnose, but 10-Base-2 coaxial "thinnet" is cheaper. This LAN is your backbone. Keep it small, because if it goes down, you're off the air.

The next thing you need is a terminal service, with a bunch of modems plugged into it. Again, there is flexibility here. I think highly of ISDN, and wish every Internet installation was done solely with ISDN, but the current pricing arrangements for ISDN make that impractical. More likely, you'll be stuck with a couple dozen flaky modems, so make life easier for yourself - get rack-mounted, SNMP-manageable modems if at all possible. I say this simply because I detest modems. They are the most unreliable devices in common use on the net today.

Finally, you need at least one host. It's nice to have several, just for redundancy in case something breaks, as well as security considerations, but you can get away with just one if you set it up right and it's fast enough. You'll need a lot of disk space, because this machine will be a news server. Yes, these disks will store alt.sex.pictures.erotica for all your high brow techie clients to download and gawk at. As far as operating systems go, some people now think you can build an ISP using only Windows NT, but I'm a free software aficionado, and just don't understand why I should pay Bill Gates another billion dollars when Linux works just fine.

Well, that's my two cents. Oh yeah, there's one thing I forget - you need several different skill sets, including an Internet engineer that really knows his stuff. I suggest you get someone who:

  1. Has run an ISP before;
  2. Has done real programming before; and
  3. Doesn't mind being leased to a beeper for a good solid year.

Starting an Internet provider is a lot like starting any other business. You're not going to make money right way; in fact, you're going to need some money to burn before you start turning a profit. You'll need sales and marketing and all that jazz, too. And you'll need to be ruthless at times. You'll have to be able to say, "I'm sorry sir, but your credit card charge was declined, your account has been deactivated, and no, you can't read you email until I've got another 50 bucks in my hand". In short, it's something I tried for a while, discovered that I had absolutely no stomach for, decided to quit and do something I really wanted - writing the Internet Encyclopedia.

Q: If it's not worth starting an Internet provider, would it be more beneficial to activate a local BBS and offer email access? How would I go about doing that?

A: With a BBS all you need is a computer and some phone lines. You don't have "real" Internet access, but you can get e-mail and news. The "gotcha" is that mail gets queued until the BBS dials in to the Internet.

The simplest way to establish a e-mail linked BBS is to treat it as an end node in the Internet, connected via a modem link.

The "traditional" way to do this used a protocol suite called UUCP (Unix-to-Unix CoPy), which has been ported to MS-DOS, incidentally. UUCP design is for a network of store-and-forward systems, all interconnected via periodic (and hopefully local) phone calls. A decade ago, UUCP was the core protocol in the USENET, which supported e-mail and news. The USENET core nodes migrated to Internet technology, TCP/IP-based protocols were developed to transport both email and news, and USENET "just faded away". UUCP is still in use in lieu of POP/IMAP in some locales.

The contemporary approach to building an e-mail-linked BBS would be to register it in the DNS either as a domain or as a single host, and address all of its users as user@bbs.domain.com. In theory, POP/IMAP could be used to perform mail transfer between the BBS and Internet, but SMTP is typically used instead, because the SMTP software (Sendmail) is much older and more mature, though very difficult to configure.

An interesting and largely untried approach would be to offer BBS users Internet-style PPP access - but only to a small, BBS LAN! Users could then use standard tools like Eudora to access their email on the BBS. In addition, the BBS's dial-up link to the Internet could be brought up "on demand" to provide all connected users with Internet access! I would recommend an ISDN Internet connection for the BBS. In fact, this might be a good way for a dozen or so people to split the costs and still receive much of the benefit from an ISDN line. (maybe)

Q: I'm trying to program a PPP/IP/TCP/SNMP application, and finding the low down nitty gritty on this stuff is next to impossible.

A: Ah, network management (the SNMP component). You should have a lucrative career future. Take a good look at the CMU SNMP package, and its port as a Perl 5 module. Unless you're trying to build an embedded system up from scratch, most of your work might already be done for you.

Hopefully the encyclopedia can help answer your technical questions. Try these pages, and don't forget DHCP!

Q: I understand there is a proposal to upgrade the Internet's classic 4-byte IP addresses (such as 200.9.41.20) to a new 16-byte format. I believe it is called either "IPng" or "IPv6". What I need to know is the status of this proposal: will it be implemented anytime soon? Will it require a major engineering effort to deploy? (not unlike the "year 2000 crisis" that seems to threaten the computer world.)

A: Many major Internet vendors are working on IPv6 products, as are free software developers (Linux, for example). It's still not clear how well IPv6 will be accepted as a replacement for "classic" Internet, referred to as IPv4.

The new address format provides a mechanism to encode an IPv4 address as a larger IPv6 address. This means that the Internet core could migrate to IPv6, while still allowing IPv4 transport. It would even be possible to assign a host both IPv6 and IPv4 addresses, allowing multi-version interoperability.

The problem arises for devices that have only IPv6 addresses, with no corresponding IPv4 address. They can not communicate with IPv4 devices without the help of some kind of translational gateway.

At this time (early 1997), I envision IPv6 deployment going roughly like this. Within the next two years, most major Internet vendors (including Linux) will release interoperable IPv4/IPv6 implementations. Then there will be a big push to get all the routers, and then the servers, into multi-version interoperability. This should take about a decade. By roughly 2005 (give or take a few years), we'll be running out of IPv4 addresses "for real", and will start assigning IPv6-only addresses to clients, which will then only be able to access IPv6-speaking servers (hopefully most servers) via IPv6-speaking routers (hopefully all routers). IPv4 will continue to be real popular for servers who want to handle any client. Eventually, most clients will be IPv6, and then IPv4 will fade out onto the periphery, much as UUCP has done over the last decade.

Oh yes, there's one other possibility - IPv6 could be a total flop. Could this happen? Sure. People will be reluctant to migrate to pure IPv6 solutions, because this would cut them off from the IPv4 world. Also, the predictions of global IP address doom might be based on false assumptions. In my mind, the biggest of these potential false assumptions is the notion that we can't manage our address space any better then we are now. In a few more years, we might realize that 32 bits is more than we'll every really need. Keep in mind also that the IPv6 header is more than twice the size of the IPv4 header, though header compression might compensate for this.

Q: How do I uncompress a .TGZ or tar file? What program do I require? Will Winzip do the job?

A: The encyclopedia is distributed in several formats (TGZ, TAR, ZIP), all of which are some type of archive file. The encyclopedia's .TGZ file is for use on a UNIX computer, where you should use GNU tar to unpack the archive. If you don't have GNU tar, a regular tar can be used on a .TAR file, which is simply an uncompressed version of a .TGZ file. Under MS-DOS, use either pkzip or Winzip to disassemble the ZIP file.

Q: I have downloaded you "Connected" kit (the master kit), but searches don't work for me.

A: The search engine is currently broken.

Connected is DANGEROUS. Why: I have not checked all other scripts yet, but there is one script in which there is something like this: ...

A: The various scripts that run the encyclopedia haven't been checked for security, awaiting a major redesign that should address the security problems completely.

In the mean time, do what I do - run the web server software under a UNIX id that has basically no permissions to do anything at all.

Q: I would like to register the domain: xxx.yyy.zzz

A: First, make sure that the last component (zzz) in your domain name is either a two-letter ISO country code or a three-letter Internet top-level domain. For three-letter domains, contact http://rs.internic.net/. For two-letter country codes, contact your top-level administrator.

Q: What is a checksum, how does it differ from a CRC, and where can I get source code for CRC-16/CRC-32?

A: Checksums and Cyclic Redundancy Checks (CRCs) are methods used to verify the integrity of data by computing some extra bits and transmitting them along with the data. The receiver applies the same algorithm to regenerate the extra information and verify that it is correct. Note that this is not a cryptographic signature, and can not be used to verify the identity of the sender, since the algorithms used are well known, though much debated. The reliability of the check depends heavily on the exact algorithm, of which the most important are the one's complement checksum, HDLC-CRC-16, CRC-16, and CRC-32. Since these checks are performed quite often (every TCP/IP packet is checksummed), the relative performance of otherwise identical algorithms is also of interest.

The one's complement checksum, used extensively by TCP/IP, is computed by splitting the packet into 16-bit words, adding them all together using one's complement math, and complementing the result. One's complement math is almost identical to normal (two's complement) math, except that a carry out the high bit is folded back into the low bit (i.e. FFFF+0001=0001). In addition to the theoretical advantage of preserving all the bits, this algorithm also has the convenient feature that any valid packet header will checksum to zero, since it contains a complemented one's complement checksum as one of its fields!

However, the TCP/IP checksum algorithm has some disadvantages. Some relatively simple mistakes (like swapping information exactly four bytes apart) go unnoticed. Over the last two decades, a better understanding of these problems has lead to the development of CRC algorithms. A CRC algorithm isn't more reliable per se (you still run the same statistical chance of error), it's just designed to require a far more bizarre combination of events to produce a false positive. These algorithms are based on multiples of binary polynomials, of which an almost infinite variety can be constructed. Common CRC algorithms are CRC-16 and CRC-32, which generate 16- and 32-bit check sequences, respectively. Ethernet, Token Ring, FDDI, and PPP all use CRC check sequences, so the Internet's link layer protocols tend its most reliable.

RFC 1071 is devoted entirely to discussing Internet checksum generation. The TCP/IP code in the Linux kernel is a good source for freely available checksumming code, including assembly language implementations for several common architectures. C language code implementing the CRC-16 and CRC-32 polynomials can be downloaded from ftp://nic.funet.fi/pub/crypt/hash/crc.

Q: What function does IP header compression serve, for both the client and the ISP?

A: Header compression, developed by Van Jacobson and documented in RFC 1144, is a popular means for improving throughput over low-speed links by reducing the size of TCP/IP packet headers, though no attempt is made to compress the data itself. Think of it as writing a short note on a postcard, instead of mailing a regular letter - the content is the same, there's just less overhead. Jacobson's algorithm operates at the link layer, so it isn't noticeable to any systems except the two that are participating, and any link layer protocol can be used, so you will also hear about CSLIP (compressed SLIP) and CPPP (compressed PPP).

TCP/IP header compression (UDP packets don't get compressed) reduce the packet header from a nominal 40 bytes (20 bytes IP and 20 bytes TCP) to perhaps half a dozen, or maybe just two or three - exact performance varies. For interactive terminal traffic, where a user is typing on a keyboard and every character gets sent as the keys are struck, this can be a big win. Normally, a packet containing a single byte would introduce 4000 percent (no typo - four thousand percent!) overhead, but header compression can reduce this to perhaps 300 percent. Obviously, for highly interactive traffic over low-speed links, this is a big win. Bulk data transfers such as file and web downloads won't see so much of an improvement, and LAN technology such as Ethernet is fast enough that the high overhead of standard TCP/IP usually isn't noticeable.

If you want to learn more, I suggest you read Van Jacobson's excellent RFC 1144.

Q: I work with Network Management on a SNMP based NMS. Some times I get unknown traps from a private MIB that I don't know of. Do you know if there is an index to private MIBs?

A: Private MIBs are, well, private. Apart from the distinguishing Enterprise Numbers (which are indexed here, part of the Assigned Numbers RFC), you'll need to email the contact person to find out the exact structure, though I would try the company's web site and FTP server first.

Q: I've had a lot of trouble downloaded Connected.

A: Well, you're not the only one. I'm not sure what the problem is, but some people can download fine and others not at all. There is now a list of mirror sites on the home page at http://www.FreeSoft.org/Connected/index.htm, so you might want to try one of them.

If all else fails, you'll just have to order a CD :-)

Q: Something is wrong with your TeX scripts. I see a bunch of strange escape sequences like \begin{soapbox}. What do they mean?

A: I don't use TeX for this project. The soapboxes are how I distinguish my opinions from objective facts, like this:

Q: You've mentioned about PPP/IP/TCP/HTTP. But where is the V.34 in the dial-up link?

A: Names like like V.34 refer to international CCITT modem standards, some of the few international telecommunications standards to have caught on in the U.S. These standards are physical layer protocols that describe how bit sequences are translated into analog sound waves for transmission over telephone lines.

Here are the important V-series standards:

V.22        1200 bps
V.22 bis    2400 bps
V.32        9600 bps
V.32 bis   14.4 kbps
V.34       28.8 kbps
V.42       error correction (MNP4)
V.42 bis   compression (MNP5)

Since the V-series standards are implemented in the modem, and are translated into RS-232 serial signals by the time they reach the computer, we rarely discuss them in the content of Internet networking, though most people use them every time they dial into the 'net.

Q: I'm trying to ping my server and don't understand what's happening:

a.      ping www.xyz.com.uk

        bad address

b.      ping www.xyz.com.uk.xyz.com.uk

        Pinging www.xyz.com.uk.xyz.com.uk [203.127.154.129]


        Reply from 203.127.154.129: bytes=32 time=1ms TTL=128
        Reply from 203.127.154.129: bytes=32 time=1ms TTL=128

A: You're missing a single period somewhere in a DNS record.

DNS servers (such as bind) can be configured with an $ORIGIN field that will append a domain at the end of every hostname - unless that name ends in a period. This makes it easy to write DNS records like:

           $ORIGIN xyz.com.uk.

           www              A      203.127.154.129
Obviously, www is intended to mean www.xyz.com.uk, or something similar. The problem arises when you write:
           $ORIGIN xyz.com.uk.

           www.xyz.com.uk   A      203.127.154.129
                         ^
                 Period missing here
See the problem? DNS will interpret this as a record for www.xyz.com.uk.xyz.com.uk! And even more subtle problem occurs when you write:
           $ORIGIN xyz.com.uk.

           www             CNAME    www.otherdomain.com
                                                       ^
                                        Period missing here
This record will probably never work, since www.xyz.com.uk is now an alias for www.otherdomain.com.xyz.com.uk

Q: What is IP spoofing?

A: IP spoofing is a technique used primarily to allow a small number of IP addresses (a class C, for example) to be used by a large number of hosts (a thousand, for example), even though a thousand hosts could never fit in a single class C.

The trick is to use a router that actually changes the IP addresses as it passes the packets on to the global Internet. Thus, a host might be assigned an IP address of 10.10.55.2. This is completely bogus, and in fact the entire 10 network (all addresses like 10.X.X.X) has been reserved for the creation of these bogus addresses. The router connecting such a host to the Internet must then perform IP spoofing. A valid IP address is drawn from a pool and temporarily mapped to 10.10.55.2, and the packets' source IP addresses are changed to reflect this. As the reply packets come back, the router changes their destination addresses to 10.10.55.2, and forwards them into the internal network.

Thus, a thousand hosts can share a single class C, so long as no more than about 250 are using the global Internet at one time.

The disadvantages of this scheme lie in its complex router configuration, made even more complex (or perhaps impossible) if several routers handle outbound Internet traffic, as well the client's lack of a fixed IP address. For example, IP spoofing can not be used for Internet servers of any kind (DNS, web, FTP), since they require fixed, reachable addresses.

Q: Is there a situation with IP addressing where a subnet mask like 250.255.112.0 is legal?

A: No. Think of the subnet mask as a string of 32 bits. Valid subnet masks should start with one bits, then switch at some point to zero bits and end that way. The first number in this example (250 decimal), translates in binary as 11111010. Notice that it switches to zero, then back to one, then back to zero again. 250 is not a valid component in a subnet mask.

To put it another way, only nine numbers may legally appear in a subnet mask - 0, 128, 192, 224, 240, 248, 252, 254, and 255. Furthermore, three of the four numbers used in any subnet mask must be either 0 or 255. The mask must start with zero or more 255s, then possibly one of the other numbers (but only one), and then end with zero or more 0s.

Here is a complete list of all valid subnet masks, though of course, only one will be right for any given network:

        0.0.0.0                  255.255.0.0
        128.0.0.0                255.255.128.0
        192.0.0.0                255.255.192.0
        224.0.0.0                255.255.224.0
        240.0.0.0                255.255.240.0
        248.0.0.0                255.255.248.0
        252.0.0.0                255.255.252.0
        254.0.0.0                255.255.254.0
        255.0.0.0                255.255.255.0
        255.128.0.0              255.255.255.128
        255.192.0.0              255.255.255.192
        255.224.0.0              255.255.255.224
        255.240.0.0              255.255.255.240
        255.248.0.0              255.255.255.248
        255.252.0.0              255.255.255.252
        255.254.0.0              255.255.255.254
                                 255.255.255.255

Q: If I use SMTP to send e-mail between hosts, what is to stop me forging the sender name?

A: Almost nothing. SMTP can not be relied upon for security purposes. If you wish to verify a sender's identity, you should look at something like PEM (Privacy Enhanced Mail), or some other system which attaches a cryptographic signature to a email message.

Q: What is your connection with the CIA? The building you are in is well-known to have ties to the CIA. I suspect that you are just one of the many propriataries of the agency and what I'll probably do for you is spread the word on the internet that you are a front organization for the agency. (Ed: I actually received this email)

A: So that's why I can't get off the elevator on the third floor!

I disavow all knowledge of the CIA. A friend of mine who ran a business in that building offered us free web service for a while, but our server is now elsewhere.

Either that, or FreeSoft is a sinister attempt to infiltrate and expose those red traitors who have organized the free software movement. :-)

Q: I want to process MIME (multimedia mail) extensions. The problem is that I can't figure out how to decode the attachments (Content-Transfer-Encoding Base=64, etc.) Is there a program that does this?

A: Try mimedecode.c.


Connected: An Internet Encyclopedia
Frequently Asked Questions