The Hypertext Transfer Protocol - HTTP/1.0 draft, proposed by the Internet Engineering Task Force or IETF HTTP Working Group, gave initial suggestions as to the possible security threats involved in HTTP.
Among the threats they have mentioned are:
1. Client/session authentication. The basic authentication scheme used by HTTP/1.0 does not provide a secure method of user authentication.
2. Idempotent Methods. Writers of client software should make sure make that any actions taken by their software are safe and otherwise idempotent. The actual user of the software should be completely aware of any actions that may be taken by the software they are running.
3. Abuse of Server Log Information. Servers are in the position to collect data about the information requested by clients. This information is considered confidential in nature and may be prohibited by law. Server providers should make sure that logging information is not distributed.
More threats and problems are:
- Insecure transport of documents, the body of a HTTP message is transmitted as html text across the physical layer. This insecure transport can include credit card numbers, private or secret details, etc. There are some proposed standards to solve that, and allow secure transport, such as SSL, SHTTP, and Shen.
- How to set up access to WWW from behind security firewalls.
- How to restrict a robot (An automated software that surfs the net, for indexing, etc.) from accessing a server and what to do if a robot is affecting your server in a negative manner (Reads file over and over again, for example).
As we said SSL, SHTTP and Shen are proposed encryption and user authentication standards for the Web, though SSL is supported by Netscape thus has more chances to become a standard, we mention the two others because you can't predict what would happen on the internet in a few months time.
None of them is yet the universal solution to the secure data transmission problem, and to use each of them you need the right combination of a WWW-Browser and a server.
Secure Servers/HTTP
SSL - Secure Socket's Layer
SSL is the scheme proposed by Netscape Communication Corporation, and was contributed for free use.
Netscape has designed and specified a protocol for providing data security layered between application protocols (such as HTTP, Telnet, NNTP, or FTP ) and TCP/IP.
This security protocol, called Secure Sockets Layer (SSL), provides data encryption, server authentication, message integrity, and optional client authentication for a TCP/IP connection.
SSL is currently implemented commercially on several different browsers, including the two most popular, Netscape Navigator, and Internet Explorer, and Secure Mosaic, and many different servers, including ones from Netscape, Microsoft, IBM, Quarterdeck, OpenMarket and O'Reilly and associates.
The main goal of the SSL Protocol is to provide privacy and reliability between two communicating applications.
The SSL Record Protocol is used for encapsulation of various higher level protocols. One such encapsulated protocol, the SSL Handshake Protocol, allows the server and client to authenticate each other and to negotiate an encryption algorithm and cryptographic keys before the application protocol transmits or receives its first byte of data. One advantage of SSL is that it is application protocol independent. A higher level protocol can layer on top of the SSL Protocol transparently.
The three basic things the SSL protocol provides are:
- The connection is private. Encryption is used after an initial handshake to define a secret key. Symmetric cryptography is used for data encryption.
- The peer's identity can be authenticated using asymmetric, or public key, cryptography.
- The connection is reliable. Message transport includes a message integrity check using a keyed MAC.
The current SSL version is 3.0.
How does it work ?
SSL uses the RSA public key cryptography, which is widely used for authentication and encryption in the computer industry.
The public key encryption is a technique that uses two asymmetric keys for encryption and decryption. Each pair of keys consists of a public key and a private key. The public key is made public by distributing it widely. The private key is never distributed; it is always kept secret.
Data that is encrypted with the public key can be decrypted only with the private key.
Conversely, data encrypted with the private key can be decrypted only with the public key.
A Public Key Cryptography Can Be Used For Authentication
Authentication means verifying the identity, and checking if someone is who he claims to be.
Here's an example of using public key cryptography for authentication:
The basic idea is this:
Say Yogev wants to authenticate Ron. Ron has 2 keys, a public key and a private key.
Ron gives Yogev his public key (More on that later) then Yogev generates a random message and send it to Ron. Ron encrypts the message he got with his private key and send the encrypted message back. Then all Yogev has to do is decrypt what Ron sent, with his public key. If the decrypted message is identical to the message Yogev generated in the beginning, he knows it's really Ron, since if it's someone else he isn't supposed to know Ron's private key, and wouldn't be able to encrypt the message he sent for checking.
That's the basic idea, but there are some twists in how it is actually done.
It's not a very good idea to encrypt something with your private key and send it to someone else without knowing what you're encrypting. This is because someone can use the encrypted value against you (Since only you could have done the encryption with your private key).
So Ron, instead of encrypting the original message Yogev sent him takes the message that was sent to him, constructs a message digest out of it, and encrypts that.
What's a message digest, you must ask now (Unless you know...), well, a message digest is a value derived from the original message that has to be:
- Difficult to reverse - So if someone is trying to impersonate to Ron, he couldn't get the original message back from the digest.
- Hard to find a different message that has the same digest value.
In this way Ron can protect himself. He computes a digest from the random message sent by Yogev and then encrypts the result and sends the encrypted digest back to him.
Then Yogev can compute the same digest and authenticate Ron, by decrypting his message with his (Ron's) public key and comparing them.
What we have described now is called digital signature. Ron has signed a message Yogev generated, and that's as dangerous as encrypting a random value. So the protocol takes another twist: some (or all) of the data needs to be originated by Ron.
Yogev-> Ron : Hey, is that you Ron?
Ron-> Yogev : Yogev, it's me, Ron [ digest[Yogev, it's me, Ron] ] Ron's-private key
Now Yogev can easily know Ron is Ron, and he hasn't signed something he doesn't know.
The Protocol itself
The SSL messages are records that are 32,767 bytes long. Each message has a header of 2 or 3 bytes which include a security escape function, a flag that indicates padding and the length of the message (with the padding). The two byte header has no padding, the three bytes header includes some padding.
The header looks like this:
# is the number of bytes in the header :
0 indicates a 3 byte header, max length 32,767 bytes.
1 indicates a 2 byte header, max length 16,383 bytes.
S indicates the presence of a security escape, although none are currently implemented.
The record itself has 3 components: MAC-DATA, Actual-Data, and Padding-Data.
The first is the Message Authentication Code, the second is the actual data that's sent, and the last is padding.
The MAC-DATA is a hash of a key, the data, the padding and a sequence number.
The sequence number is an unsigned 32 bit integer incremented with each message (or 0 after 0xFFFFFFFF).
Failure to authenticate, decrypt, or otherwise get correct answers in a crytpographic operation
result in I/O errors, and a close of connection.
The SSL Handshake
There are 3 kinds of SSL handshakes:
One when no sessions existed recently (recently is suggested to be under 100 seconds), one when the set of session identifiers still exist, and one when client authentication is desired.
1. If there's no session-identifier:
The handshake looks like this:
CLIENT-HELLO C -> S: challenge, cipher_specs
SERVER-HELLO S -> C: connection-id,server_certificate,cipher_specs
CLIENT-MASTER-KEY C -> S: {master_key} server_public_key
CLIENT-FINISH C -> S: {connection-id} client_write_key
SERVER-VERIFY S -> C: {challenge} server_write_key
SERVER-FINISH S -> C: {new_session_id} server_write_key
Explanation:
When a client wants to establish a secure connection is sends a CLIENT-HELLO message, including a challenge and information about the encryption systems it is willing or able to support. The server then sends a SERVER-HELLO back to the client which includes a connection-id, it's key certificate, and information about the encryption it supports.
The client, after verifying the server's public key responds with a CLIENT-MASTER-KEY message, which is a randomly generated master key, encrypted with the server's public key followed be a CLIENT-FINISH message, which is the connection-id, encrypted with the client-write key.
The server then replies with a SERVER-VERIFY, in which he verifies his identity by responding with the challenge encrypted with his write key. The server got it's server server-write-key from the client, encrypted with the server's public key. The server must have the appropriate private key to decrypt the client-master-key message to obtain the master-key, from which it produces the server-write-key.
2. If a session-identifier was found by both client and server
The handshake looks like this:
CLIENT-HELLO C -> S: challenge, session_id, cipher_specs
SERVER-HELLO S -> C: connection-id, session_id_hit
CLIENT-FINISH C -> S: {connection-id}client_write_key
SERVER-VERIFY S -> C: {challenge}server_write_key
SERVER-FINISH S -> C: {session_id}server_write_key
Explanation:
In this case the client already has a session id.
The client sends a different CLIENT-HELLO which includes a challenge, like in the first case, a session identifier, and information about the encryption methods it knows, or want to use.
The server responds with a SERVER-HELLO that includes a session_id_hit which is a bit that indicates if the session_id was found.
The client answers with a CLIENT-FINISH message with an encrypted connection-id for the session.
The server then sends a SERVER-VERIFY and a SERVER-FINISH message like in the first case, besides the fact the contain a session-id instead of a new-session-id.
3. Assuming a session-identifier was used and client authentication is used
The handshake looks like this:
CLIENT-HELLO C -> S: challenge, session_id, cipher_specs
SERVER-HELLO S -> C: connection-id, session_id_hit
CLIENT-FINISH C -> S: {connection-id}client_write_key
SERVER-VERIFY S -> C: {challenge}server_write_key
REQUEST-CERTIFICATE S -> C: {auth_type,challenge-tag}server_write_key
CLIENT-CERTIFICATE C -> S: {cert_type,client_cert, response_data}client_write_key
SERVER-FINISH S -> C: {session_id}server_write_key
Explanation:
In this case the client authentication is in use.
The server sends a REQUEST-CERTIFICATE message, which contains a challenge (challenge-tag).
The client responds with a CLIENT-CERTIFICATE message which contains the client certificate's type, the certificate and some response data.
The server then sends a SERVER-FINISH message.
Explanation about the keys
Client-write-key and client-read-key are a function of the master key, and ordinal char, the challenge, and connection-id via a secure hash. The master key is reused across sessions, read-key and write-key are generated again for each session.
Is is really safe?
The version of SSL that is exportable from the United States is restricted to 40 bit keys (But it can also use 128 bit), which means they can be broken by anyone with access to a reasonable amount of computing power (For example, a student who studies Computer Science in Tel-Aviv University). The breaking can be done by using brute force (Which means, in simple words, trying all of the combinations...).
Come to think of it , a PENTIUM PC can crack a 40 bit key in a matter of a month, more or less.
A new Intel Pentium 166MMX costs in Israel 4,890 NIS (Have a look at Excellnet... You even get some free games...) if a criminal buys one he can break 12 keys a year for 407 NIS per key...
SHTTP
SHTTP (Secure HTTP) is the scheme designed by Enterprise Integration Technologies (EIT). It is a higher level protocol that only works with the HTTP protocol, but is potentially more extensible than SSL.
S-HTTP is backwards compatible with HTTP. It is designed to incorporate different cryptographic message formats into WWW browsers and servers. This will include PEM, PGP, and PKCS-7. Non S-HTTP browsers/servers should be able to communicate with S-HTTP without a discernible difference, unless they request protected documents
SHTTP provides a wide variety of mechanisms to provide for confidentiality, authentication, and integrity. SHTTP is not tied to any particular cryptographic system, key infrastructure, or cryptographic format.
Shen
Shen is a security scheme proposed by CERN. Shen provides for three separate security related mechanisms:
- Weak Authentication with low maintenance overhead and without patent or export restrictions.
- Strong Authentication via public key exchange.
- Strong Encryption of message content.
For some strange reason there isn't a possibility to find information about Shen on the WWW.
Firewalls and WWW proxies
A firewall is any one of several ways of protecting one network from another untrusted network. The actual mechanism whereby this is accomplished varies widely, but in principle, the firewall can be thought of as a pair of mechanisms: one which exists to block traffic, and the other which exists to permit traffic. Some firewalls place a greater emphasis on blocking traffic, while others emphasize permitting traffic.
You can use a firewall to enhance your site's security in a number of ways. The most straightforward use of a firewall is to create an "internal site", that is accessible only to computers within your own LAN. For that, you just need to place the server inside the firewall:
However, most chances are you'd like to make your server available to the rest of the world, that means you'll have to put it outside the firewall. The safest way to do so is to put it completely outside of the LAN:
This is called a "sacrificial lamb" configuration. The server is at risk of being broken into, but at least when it's broken into it doesn't breach the security of the inner network.
In order to connect from the LAN to the outside world, a proxy is often installed on the Firewall machine.
A proxy is a small program that can see both sides of the firewall. Requests for information from the Web server are intercepted by the proxy, forwarded to the server machine, and the response forwarded back to the requester.
A proxy server mediates traffic between a protected network and the Internet.
Many proxies contain extra logging or support for user authentication.
Since proxies must "understand" the application protocol being used, they can also implement protocol specific security (e.g., an FTP proxy might be configurable to permit incoming FTP and block outgoing FTP).
Another way of contacting the outside world from behind a firewall is allowing the firewall to pass requests for port 80 that are bound to or returning from the WWW server machine. This has the effect of poking a small hole in the dike through which the rest of the world can send and receive requests to the WWW server machine.
You can have a look at some more information on firewalls on the Firewalls FAQ on http://www.tis.com/
The connection between Security, spiders and robots
Robots, webcrawlers, web wanderers, ants, worms and spiders are programs that automatically surf the web and collect information. The information can be used for indexing, HTML validation, link validation, "What's New" monitoring, or mirroring.
Known robots are the ones of Webcrawler and Alta Vista.
These programs can be quite useful, it would be almost impossible to search for information on the web without them. But poorly written "depth first" searching robots have the potential to overload servers by recursively downloading information from CGI scripts which have an infinite number of links, for example. Robots can also overload servers by producing too many requests for information in a very short time (called "rapid-fire"). The suggested retrieval access rate for a robot is to retrieve 1 document per minute or longer. For additional information, see Ethical Web Agents .
It's a good idea to protect some areas of your server from access by robots. This is done by writing a file called /robots.txt that looks like this:
# robots.txt for http://www.rad.co.il/
#Allows robotiti to look everywhere it wants
User-agent: robotiti
Disallow:
# Disallow spiderico from accessing the tmp and the log directory
User-agent: spiderico
Disallow: /tmp
Disallow: /log
#Disallows any other user-agent from accessing the /tmp directory tree
User-agent: *
Disallow: /tmp/
Another possibility to disallow robots from accessing an html page is to add this tag:
<META NAME="ROBOTS" CONTENT="NOFOLLOW">
