
Introduction to Entropy
A new open source distributed and encrypted P2P network for anonymous
and uncensored communication.
Credits:
- Entropy by Juergen Buchmueller (pullmoll)
- Stop1984 content by Bettina Jodda (Twister)
- Document by Joey Stanford (Rescue)
Table of Contents:
- What
is Entropy?
- Why
a "net inside the net"?
- Entropy
History
- Stop
1984 History
- What is Stop1984?
- What does Stop1984 want to accomplish?
- Political Goals
- Net
inside a Net
- Why
should I use Entropy?
- Stop
1984 materials
- How
Does Entropy Work?
- What
does Entropy look like?
- How
Does
Entropy differ from Freenet?
- What
3rd
Party clients are available?
- System
Requirements
- How can
I
get more information?
- Inner
workings
of Entropy
- i18: Supported Languages
- P2P
1) What is Entropy?
ENTROPY stands for Emerging Network To Reduce Orwellian Potency Yield
and as such describes the main goal of the project.
ENTROPY is developed as a response to increasing censorship and
surveillance in the Internet. The program connects your computer to a
network of machines which all run this software. The ENTROPY network is
running parallel to the WWW and also other Internet services like FTP,
email, ICQ. Etc.
For the user the ENTROPY network looks like a collection of WWW pages.
The difference to the WWW however is that there are no accesses to
central servers. And this is why there is no site operator who could
log who downloaded what and when. Every computer taking part in the
ENTROPY network (every node) is at the same time server, router for
other nodes, caching proxy and client for the user: that is You.
After you gained some experience with the ENTROPY network, there are
command line tools for you to insert whole directory trees into the
network as a ENTROPY site. So ENTROPY does for you what a webspace
provider does for you in the WWW - but without the storage and
bandwidth costs and without any regulation or policy as to what kind of
content you are allowed to publish. Everyone can contribute his own
ENTROPY site for everybody else to browse through. The contents is
stored in a distributed manner across all available and reachable nodes
and no one can find out about who put up what contents into the
network. Even if your node is not actively running, your contents can
be retrieved by others - without knowing that it was actually you who
published the files. Of course this is only true if you do not publish
your name (or leave your name or other personal data in the files you
publish)
2) Why a
"net inside the net"?
a) Entropy History
A little history by PullMoll:
"In spring 2001, I had a time when I was drinking way too much, hanging
around, sick of the world and the people surrounding me, but I could
not tell the real reason.
In the time shortly before easter 2001, I decided to take a break and
even practice the abstinence (or lent) that Christians hold in this
time. After a few days my body was weeping all that waste that I put
into it before. I had deep thoughts, great experiences and some
insights on the things going on around me.
I felt, for the first time, what it really means to be a part of this
world. It's not that you're "just there". Every single wink has
enormous consequences for everything else around you. You are not just
an external part of the world, but a very important wheel in this
clockwork.
Now, what has this all to do with Entropy? I 'saw' the things that
inflow you every day. I could really 'feel' the pain while looking at
TV ads, seeing the crying colors of leaflets with advertisements, the
hypnotizing impact of the holographic effect on the new Euro
banknotes... and I realized how much and how often all the companies
and state authorities are invading your privacy, just to sell more,
know more, keep an eye on you.
I hated it. I wanted to be alone and no one should know what I did, how
I felt. I wanted to speak freely, read freely and discuss with others
without having to fear any form of surveillance, any harassment because
of my non-conforming ideas etc.
This was the time when I decided to entirely refrain from using Windows
- and even Linux. I began using FreeBSD. I also cared, for the first
time, about tools to increase my privacy: I began using GPG. I used
Freenet before, but didn't think too much about what it really was good
for - whenever it was in a working state.
I realized the many things that always scared me when using windows. I
was a little bit paranoid before, but now I saw many things that really
scared me. I didn't want to be the slave of my operating system, not
even for money (money is what makes you a slave, not other people. They
only use money as a whip to let you jump over their barricades and
hurdles).
Then, one day, after reading some things about encryption, I had that
idea of transmitting a replacement for random data for a one-time-pad
within the same stream of data that contains the encrypted messages. I
had the idea to duplicate or 'blow up' the safely transmitted random
values, so that they could be used to refill the one-time-pad data that
was used to encrypt both, the plain text and the new random data...
some kind of perpetuum mobile.
Well, this idea was not really good, as some people told me. I'm still
not fully convinced that you can't make it work. Anyway, I wanted to
try things out and therefore I needed a tool to play with my ideas for
encryption. Freenet was there, but I never (really) understood how the
algorithms used there did work. I still don't understand them in all
detail - they're too much mathematics for my taste. I'm the algebra
type and big numbers, primes or even elliptic curves make me nervous :-)
I liked the simple, clean approach of the one-time-pad and there is
something very similar that is cryptographically not bad either: stream
ciphers. They have the advantage of using new (pseudo) random data all
the time, instead of one fixed, unchanged key for big amounts of
transmitted data. You "only" have to be sure to a) use a good quality
random source and b) have no stupid bugs sitting in your code, and you
are very safe against unwanted listeners, wiretapping, surveillance etc.
I started working on a project that would be in many ways just like
Freenet, while it should use only simple to understand (in my opinion)
and still strong cryptographic algorithms. I decided to re-implement
the front end interface: FCP. So I could use existing client software
with my code.
From my Freenet experiences I knew about some of the pinholes of P2P
networks. The main problem is availability of keys. You cannot run a
network assuming that data will be available; you have to think of the
network as a very oblivious mind. I first thought I could cure this by
using a Hamming code to put some redundancy in the data. It took some
time until I saw that Hamming codes were not suitable for the purpose.
FEC (forward error correction) or - to be more specific - an erasure
codec was what the network needed.
Freenet supports FEC on the client side of things which is, in my
humble opinion, a bad idea. I already had implemented the low-level
redundancy code, so all I had to do was switch over from Hamming codes
to FEC using a fast erasure codec.
This gave (one of) the boost(s) to the network. Now any node that
retrieves a piece of data would need only two thirds of the chunks (8
out of 12) and could, at the same time, regenerate up to one third (4
of 12) and put the chunks of data back into the network. Every piece of
data that is successfully retrieved will lead to spread more of its
redundancy chunks around, keeping it intact, healing the holes in the
network.
All this happens invisibly for the user, on the lowest level of the
implementation. No front end or client author would have to care about
this.
There was one problem remaining, though. Every single file was encoded
into a) the XML containing a list of hashes of at least 12 chunks, the
1 fragment and the whole document and b) the 12 chunks of data bits and
FEC bits. So even a 1 byte file did take 13 chunks and you needed 9 out
of 13 (the XML plus 8 data or FEC bits) to be reconstructed.
After I implemented the encryption of the XML texts (the main keys i.e.
CHK@, SSK@ and KSK@), there was no good reason not to keep short files
inside the XML.
Now every short file that can be squeezed into one chunk will require
only one key. This was a great improvement, as it reduced the network
load for the many, many small files quite a bit.
This is where we are today and I think that Entropy is a nice example
or 'proof of concept' for my ideas. There are some things left to do
like handling the retrieval of large files. But this is all about
timing, retries, delays and won't lead to changes in the fundamental
code.
I forgot to mention one of the other goals I had in mind when I
designed the data layout: no one should fear to have 'illegal data
bits' on his hard drive. In a big network, where data is spread like it
is supposed to be, you may have bit 0, bit 3-7, check bit 8 and b of
some Britney-Spears.mp3 in your data store but you do not have a copy
of a copyright protected work. And since you cannot even tell if you
have, perhaps, half a bit of something, you cannot be made responsible.
Only the one who actively asks Entropy to reconstruct some data that he
knows the key of (the CHK@), will then - perhaps - be committing a
crime. IMO this is the safest way to handle or condone unwanted,
censored, 'illegal' data.
I don't even accept the term 'illegal' for any collection of data bits,
but that's just my personal point of view and it is not widely
accepted."
b) Stop 1984 History
1) What is STOP1984? Who
are
the people behind STOP1984?
STOP1984 - these are people, who work for informational
self-determination, data security and free speech. People, who for
these reasons reject surveillance, censorship and data abuse.
STOP1984 is an open project.
Everyone, who would like, could and should help!
2) What does STOP1984 want
to
achieve? What are the goals?
We would like to contribute by helping people to be conscious of:
1. the value of their own privacy
2. the value of their own data
3. the dangers of the abuse of data
4. the consequences of the loss of privacy
5. the political, social and personal consequences of an increasing
surveillance · the dangers of the political lack of interest
3) Our political goals are:
1. A transparent examination and if necessary canceling of the TKUEV as
well as the European data retention directive, which approves
provisional data storage on the European Union level
2. Transparency regarding successes and failures of surveillance
· Transparency regarding kinds and extent of past and current
surveillance · A right for data protection and informational
self-determination fixed in the constitution as well as on European
Union level.
4) Net inside a Net
The Entropy-net can hardly be put under surveillance, this is the
difference between the "direct" Internet and the Entropy-net. Nobody is
able to know who has up- or downloaded which content as there are no
central log files and no central server.
5) Why should I use Entropy?
If you do not want to accept the growing surveillance of any
communication and the growing censorship -often in the name of
copyright, branding or similar laws which are used to restrict
communication - you should protect your communication.
Software to encrypt your e-mail (PGP, GnuPG) does help. But: anyone who
is interested can still see that person A has communicated with person
B. File transfer is usually unencrypted. HTTP, FTP etc. are protocols
which are not encrypted. They do leave traces and so someone can find
out which files have been up- or downloaded, by whom and to or from
which server. Entropy tries to plug these holes (holes meant in the way
of data security) by hiding connection details.
6) Stop 1984 materials
a) Humans - a private being
Is it nowadays, when people reveal their most intimate secrets about
themselves in talk shows, actually still important to have any privacy?
The answer is simple:
Even if many humans give their privacy up too voluntarily and
carelessly, the total abolishment of privacy for all cannot be the
result.
Privacy just means there are areas in life, in which, without our
permission, nobody should have insight:
1. The private telephone call.
2. The short flirt in an Internet chat room.
3. The own preferences (not only, but as well, in sexual regard)
· The daily mail
4. The critical book, about which one converses with the neighbor
· The walk over the marketplace
Privacy refers to data protection, but it covers as well things like
the communication secret and the secrecy of letters.
Having privacy means, being able to say:
Stop! Until this point and no further!
b) Surveillance
Do you know whether you are monitored?
In most cases, you don't! Non-out-described video surveillance,
non-communicated telephone monitoring...
The list of the secret surveillance is long. And the uncertainty about
being observed or not naturally also affects us. Personal contact and
social relations become more difficult by the developing distrust.
Examples and forms:
1. Video surveillance
Video surveillance (also called CCTV) for the surveillance of objects
or places mainly used with the argument of preventing and monitoring
criminal activities. Video surveillance is not only expensive, but also
often only leads to a displacement of the criminality into not
supervised districts. By missing signs pointing out the cameras as well
as by lacking clearing-up on details (are the films stored, for how
long and who gets access to them) the citizens are left without
transparency concerning by whom and whereby they are being watched.
2. Pattern search
This is the search for criminals and possible terrorists in national
data bases.
According to fixed methods (the pattern) a group of persons is examined
individually.
The patterns are often arbitrarily or vaguely fixed, so that
law-abiding citizens may get stuck in the pattern easily.
3. Internet surveillance
A surveillance of the connecting data - content is desired, too. The
TKUEV (German telecommunication monitoring act) is a notorious example
of it and at the same time for provisional data storage. A goal is the
collection of the data on provision, in order to search this data for
potential perpetrators if necessary. At the same time, electronic
profiles of the Internet users are developed. Such profiles may also be
politically motivated and evaluated accordingly.
This is incompatible with the assumption of innocence, the individual's
privacy and informational self-determination.
4. Data mining systems
Widely known are particularly "Echelon" and the planned "Total
Information Awareness" system. Such mechanisms are maintained mainly by
secret services, as for example the US-American "NSA". The goal is the
comprehensive collecting of data from most different sources, like
travel reservations, financial transactions or the E-Mail and telephone
services.
These systems are also incompatible with democratic principles such as
assumption of innocence etc..
c) Data security
Everyone has private data, but only few care about it.
Our own world of data starts with the own housing lease, recently
includes the motor traffic as well (for example in London) and ends
with completely everyday data. This means for example the connection
data of telephone & Internet communication, but also the details of
financial transactions, even for small payments - the credit card makes
life easier but also contributes to the databases of financial
corporations.
Another example would be personal data concerning health insurance -
maybe your employer would like to take a look at these?
Who cares about one's data and its protection...
Examples from the world of data:
1. Data protection
Data protection concerns everybody, because every human defines himself
over his data in our world. Or don't you have anything to hide? Isn't
it annoying for example to have unwanted guests trying to penetrate
into your domestic computer, in order to spy on your data there?
Thus data avoidance and data protection are important. In the Internet
for example by suitable technical measures, like firewalls and proxies,
as well as skillful configuration of the software. Thereby unpleasant
contemporaries (viruses and worms or the alleged hacker) are kept away.
2. Security of telecommunications and TKUEV The TKUEV (see above)
renders the security of telecommunication useless.
The legislator obligates the telephone companies and the Internet
providers to store connection data of telephone calls or Internet
connections. Allegedly, this is done only to search for terrorists or
potential criminals communicating.
Like a grand-prize in a lottery this occurrence is rare, but the
players try it again and again...
3. Data bases and electronic profiles
In our digital world, in which everything can be noted and stored, any
protection is almost completely omitted. The data flood produced by us
is noted automatically and stored arbitrarily for a long time. Whether
credit card usage, presenting of the loyalty card in the supermarket or
a simple phone call, data is collected everywhere, stored and
connected, thus perfecting the citizen's profile more and more. This
way the respective organization is able to find the - from its point of
view - ideal way of communicating with us.
d) Censorship
Censorship is the control of information, that citizens are able to
get, by considering which information is released and how it is
accessible. Not only powerful groups are using that technique, the
human being itself tries to maintain censorship by itself.
Examples for external censorship:
1. Censorship in the Internet
Censorship at the Internet is realized by using technical measures like
IP-blocking and content filters. Widely known is the censorship in
China and Saudi-Arabia, where religious, dissident and/or pornographic
sites are blocked. In Germany, actually, Mr.
Büssow (President of the government of the state of NRW) along
with others, is trying to force ISPs to block US-based websites because
of xenophobic, racist and neo-Nazi content, as well as a site of which
they claim that it is made up of inhuman content.
Not to mention Spain, where the LSSI Act is in effect and is used to
stop unlicensed websites by applying political and economical pressure,
which caused many websites to go offline.
2. Censorship in the mass-media
The "classic" media is censoring information, too. On the one hand they
ignore - maybe - important news for merely economic reasons, on the
other hand, they are trying to control the information due to actual
political or economical reasons.
The result is obvious. The freedom of information and opinion is
restricted. Unreflected and unique thinking is maintained, where the
opportunity to inform oneself from freely accessible sources and
uncensored sources is necessary, whenever and however one wants to do
so.
Internal Censorship - The bars in your own mind As we know, information
is filtered and censored, we should keep in mind, that the most
important censorship is happening in our own minds. We have to learn to
tolerate and to consider attitudes and opinions of other people. There
is no other way to prevent us from our own intrinsic censorship.
It's a main target of STOP1984 to educate people to think that sort of
freedom!
e) Information for anybody
1) Information for anybody
Imagine a world of tomorrow, where there is no free flow of information
at all. No free-tv, no free-radio stations and in the shelves of the
libraries there is just dust.
Unbelievable? Yet possible!
That's not reality by now:
The wired world delivers more and more information to us on a daily
basis. But the right to receive that information is at risk. Especially
on the Internet. No other media is faster and more off-borders, no one
is cheaper and more flexible. None is more open and none seems to be a
bigger threat to governments and the powerful. The well informed
citizen seems to become a threat! Without a doubt, countries are trying
to restrict the free flow of information and are going to manipulate
their citizens. In addition, political, economical and religious
groups, associations and companies are trying to restrict the freedom
of information and opinion.
2) Freedom of opinion - any opinion is of value! Why such efforts to
protect freedom of information, security of telecommunications, privacy
and the fight against data-retention?
Only with free and open sources of information, powerfully protected
from censorship and the right to use those sources with full respect to
privacy, it's possible, for anybody, to set up his own mind on
politics, daily news, religious beliefs, etc.
3) The freedom of thought - and act - is the matter! Basic civil
rights! It is our freedom! Only by maintaining a free and open society
we can reach a society rich of different attitudes, ideals and
believes, far beyond an all-uniform society.
Is this a society worth to be engaged in?
We think: YES!
3)
How does Entropy Work? (General)
Entropy is a software for several tasks - tasks for the purpose of an
anonymous and uncensored communication:
1. it connects (via the Internet) your computer with other computers
where Entropy is installed and running (P2P - Peer to Peer)
2. it distributes or downloads pieces of data (chunks) to/from
computers taking part in the Entropy-Network
3. data-loss by "nodes" being (permanently or temporarily) offline is
avoided by redundancy (FEC = forward error correction)
4. data exchange between nodes is encrypted
5. one file (the cache) is being used as store for the data - both your
own as those of other nodes (you can, of course, define the size of
this cache)
6. it does not store complete files or file names in a legible form(!)
on a single computer
Entropy supports the Freenet Client Protocol (FCP) so that existing
clients can easily and quickly be used for Entropy.
Freenet and Entropy can be used at
the same time.
One example for those clients is
Frost,
a software originally written for Freenet. Frost can be used for
exchanging news and files (it serves as message board and file-sharing
client at the same time) and can be used for both Freenet and Entropy.
4)
What does Entropy look like?
Entropy can be used in different ways. The average user will probably
use the web-interface (proxy).In this case, Entropy will not be looking
much different to the known pages on the WWW. The difference is that
there are no central servers and no "active content" which could be
used to track users' surfing behavior.
Clients like Frost are also meant to address to the "average user" -
for those users who want to exchange opinions, texts, pictures,
documents, music etc. or who want to have access to "blocked"
information.
For the sophisticated user, who is able to design HTML-pages himself,
there are helping programs (tools) to put one or more websites into the
Entropy network. The difference between this opportunity and a provider
is that there is no regulation for the content because there is no
chance for regulation (as mentioned above: any down- or upload is made
anonymously)
Other tools are to be thought of. Some of them are already "work in
progress" (a shared SQL database, HTTP-proxy using Entropy as cache).
The Freenet Client Protocol offers relatively simple ways for the
development of new applications if you are interested to create new
applications.
5)
How does Entropy differ from Freenet?
a) Why further develop entropy since Freenet exists?
Choice and freedom. It's always good to have more than one option
available to you.
b) Why use entropy since Freenet exists?
Entropy is faster and simpler than Freenet.
c) What is the relationship between entropy and Freenet?
Entropy considers Freenet to be another means to the same end. We have
a separate development paths but our end goals are generally the same:
anonymous communication. Ian Clarke of Freenet has, unfortunately, not
been very kind with his remarks on Entropy.
d) What instances should you use Freenet instead of Entropy?
If you need bullet-proof cryptography, use Freenet. Otherwise, we think
that Entropy is very good, fast, and has a reasonably good streaming
cypher set.
6)
What 3rd Party clients are available?
Entropy also supports this protocol of Freenet, so programs designed to
use it should work with Entropy as well. The only major difference is
the default FCP port number, which is 8482 for Entropy (while it is
8481 for Freenet). You can either configure the clients for Entropy to
use port 8482, or - if you don't use Freenet at all - configure Entropy
to use port 8481 (changing the line fcpport=8482 in entropy.conf).
a) Samizdat
Samizdat is a NNTP gateway for Entropy (and Freenet). It is designed so
that you can use a standard newsreader (such as Mozilla Newsgroups,
Knode, or Tin) to read and post news articles securely and anonymously.
You will just have to create a new news server entry for localhost
using port (or service) number 1119 (usually news is on port 119, but
that is a privileged port). Then you choose a fantasy email address
and/or name and can subscribe the configured news groups. While
Samizdat and Samizdat-nntp daemons are running, they will collect new
messages or insert your postings in the background.
b) Frost
Frost is a Java program, that is something between a message board
(comparable to discussion forums) and a file sharing client. There's a
search function, too. However, quite different from other P2P networks
such as Gnutella, E-Donkey oder Morpheus/KaZaa, these searches take
place on your own machine after downloading a list of keys from the
network.
That is why your own node must run for some time before it can find
some of the lists (so called index files). Only then a search can be
successful. In other words: after starting Frost you should look for
some time at the log outputs and wait. At some point Frost will output
a series of stars and then dots (.), showing that a list of keys is
stored at your local drive. Also texts on the message boards take some
time to arrive. Frost looks backward from today up to three days back.
I suggest you manually add a board by the name of 'test' (without the
quotes) and write a "Hello World" there, just see if you can upload and
thus - most probably - also download messages.
c) Freenet Tools
The most important tools for those who want to insert their own content
as a website into Freenet or Entropy, are the Freenet Tools (or similar
tools from other authors :). For Freenet, there are some such programs
linked from their http://freenetproject.org) pages. Not many of them
will work with Entropy out-of-the-box, as they sometimes specialize on
minor deviations in the FCP interface. Specifically the newer tools,
supporting the FEC FCP v1.1 will fail with Entropy, as Entropy does not
yet fully support the changes to the Freenet Client Protocol. So I
suggest you use ft for Entropy for now, since I can help you there with
problems or questions.
d) Freemail
Freemail allows you to send encrypted, private, and anonymous email
over Entropy and Freenet. It's written in python.
7)
System Requirements
Basic requirements:
You need the following environment and libraries on your *nix system:
1. GNU C Compiler (gcc)
2. GNU Make (e.g. Gmake on systems where GNU make is not the default)
3. Zlib compression library (http://www.gzip.org/zlib/) 1.1.3 or newer
4. Expat XML library (http://expat.sourceforge.net/) 1.95.2 or newer
System requirements:
1. *nix-box, Windows-PC, or Mac OS X/Darwin with Internet-connection
(at least 56K-Modem)
2. at last 100 MB free disk-space (for the cache described above)
3. the Entropy software (download of ca. 556.4K source or 1.5M Windows
setup)
4. some online-time with an IP-address not changing too quickly
5. a free TCP-port (you probably have to adapt your firewall or NAT)
6. a web-browser addressing localhost/127.0.0.1 without proxy (for
security)
8)
How can I get more information on Entropy? How can I help?
The Entropy homepage can be found online at the following URL:
http://entropy.stop1984.com/
On the Entropy homepage you can find more information about how Entropy
works as well as download various tools which work with Entropy.
If you have questions you can:
1. post a message to
http://f27.parsimony.net/forum66166/
2. post a message to the "entropy" news board (from inside entropy)
9)
Inner
workings of Entropy (detailed)
Entropy is designed to meet several goals at once:
1. Distribute documents in the network in a wide-spread manner, so that
it is not practically possible to locate it (i.e. Make censorship
impossible).
2. Keep documents retrievable even if a part of the nodes keeping
copies of them is offline -Hide from users, admins or authorities, what
node on the network actually keeps parts (fragments, bit-chunks) of
what documents (key encryption).
3. Optimize data retrieval for the most common user base with ADSL,
where the downstream bandwidth is several times the upstream bandwidth
(e.g. 768Kbit/s vs. 128kbit/s with many German ISPs).
4. Hide from network operators, what kind of communication happens
between nodes of the network (transport encryption).
These and some more goals have been achieved. Some still have to be
tested and optimized on a large scale network.
Entropy is divided into several modules, which are run as separate
processes (forked) that use shared memory to communicate with each
other. These modules are:
1.Peer management with bandwidth limiter
2.Peer outgoing connections launcher
3.Peer incoming connections listener
4.Data store management
5.Freenet Client Protocol (FCP) server
6.HTTP gateway, aka proxy
a) Peer management with bandwidth limiter
This module and its process is rather simple. You can configure a
maximum bandwidth to use for incoming and outgoing connections. You
configure a total number of bytes per second in entropy.conf. The
limiter now runs with 10 ticks per second (in file include/config.h
there's a line #define TICKS 10) and adds to two shared memory
variables 1/10th of the bandwidth per second in either direction. The
socket I/O functions use these variables to grab their required
bandwidth from. If e.g. A send buffer is 10K bytes and there's
currently only 1000 byte/s available, the sock_writeall() function will
sleep and later try to get more bandwidth until it is finished writing
all the data out. The various processes of the connections are
concurrently trying to lock the bandwidth variables down and get a
fraction (80%, hard-coded in src/sock.c) of the required bandwidth
until they're done.
This method is perhaps far from perfect and some operating systems may
have much better ways to limit a socket or network bandwidth. However,
Entropy runs on systems where such things are not available and there
is no general standard anyway. Entropy at least allows you to keep some
spare bandwidth for your other jobs and this is all of what the
bandwidth limiter is intended to do.
b) Peer outgoing connections launcher
Entropy has its very own way of finding out about possible peer nodes
it can contact. Besides the initial list, which is defined by listing
some hostname:port or ipaddr:port lines in your seed.txt file (or
whatever you want to call it in your entropy.conf), a node tells its
outgoing connections about other connections it has every now and then.
These so-called node announcements are made inside the network. For
this purpose, a node creates a file with zero contents but meta-data
only. In this meta-data there is an XML text specifying some
information about a node:
1.node's IP address
2.node's contact port (world accessible port)
3.node's preferred encryption module
This information is packed into an XML text of the form:
<?xml version="1.0" standalone='yes'?>
<p2p>
<peer hostname='aaa.bbb.ccc.ddd' port='nnnn' /> <crypto
module='something' />
</p2p>
The hostname= attribute here actually contains an IP address. It could
contain a hostname as well, but all hostnames are resolved prior to
spreading info around, since this saves many hostname lookups.
The crypto module name is a hint for a node wanting to contact this
peer how to try to talk to it. The peer might just ignore the incoming
connection, if it does not (actually) want to support the incoming
node's crypto module.
These announcements are now kept in documents (with zero length) in the
meta-data part. The documents have known names. The current names are:
entropy:KSK@utc-timeslice-ipaddr:port
entropy:KSK@utc-timeslice-2-digit-hex-value
The first form is just used to derive a hash value from it. The first
byte of this hash value is then used to create the second form. This
KSK (key signed key) is then redirected to the content hash key that
contains the meta-data with p2p information. The timeslice part of both
keys is a hex number derived from the current time UTC (or GMT) rounded
to the next 10 minutes timeslice. So an active node can be found in
another node's data store under some key
KSK@utc-current-timeslice-some-2-digit-hex-value.
A node's outgoing connections launcher does look up all 256 possible
keys of the current timeslice until it finds an entry that is:
1.not yet connected
2.not blocked (due to errors or permanently)
3.resolvable (if it is a hostname)
and then tries to make a new outgoing connection to that peer. The rate
at which outgoing connections are sought decreases with the number of
already existing outgoing connections. With other words: the more
connections your node already has, the slower it will make new
connections. If a node has all of its outgoing connections used up
(currently 32), it does not search for more, until one or more
connections are dropped.
If a connection fails, the peer's IP address and port are entered into
a failure list (in shared memory). If this happens for the second time
within some time (currently 1 hour) the peer's IP address and port is
blocked for 10 minutes. It is therefore put into the blocked list with
a timestamp of now + 10 minutes, which means the peer_node_search()
function in src/peer.c will skip this ip/port for some time.
You might wonder if two nodes could produce the same 2-digit-hex value
for a specific time slice? Yes, they could and they sometimes will do.
It just isn't much of a problem, since one of the keys will be
somewhere on the network. The next time slice changes the resulting
numbers and the possibility, that two nodes constantly collide in their
announcement keys, is very low (except they had the same ipaddr/port -
which cannot happen if you think about it :). Should the net ever
become so big that 256 slots are not sufficient for every node to find
enough outgoing connections after some time, I might as well add
another digit of the hash and then search for 4096 slots. I don't
believe that this will ever be needed, though.
c) Peer incoming connections listener
This module and process is rather simple. It sets up a socket, binds it
to the special IP address 0.0.0.0 (IPADDRANY) with the specified port
number for Entropy (nodeport= from entropy.conf), so that anyone can
contact it and then listens for incoming connections. Whenever the
listen() call returns, a new connection is accept()ed and a new child
process is started to handle the incoming connection.
This child process first checks that there is a free slot for incoming
connections. If a node has 32 incoming connections, it won't accept any
more. This did not yet happen, since the network is just too small, but
it is supposed to happen.
Then the child process reads a first, initial message from the peer
node that contains a XML text block of the form:
<?xml version="1.0" standalone='yes'?>
<p2p>
<peer hostname='node.somewhere.net' port='nnnn' /> <node
type='entropy' major='0' minor='0.30' build='xyz' /> <store
fingerprint='[32 hex digits]' />
<crypto module='something' size='xxx' initial='[2 * size hex
digits]' /> </p2p>
Some of the tags and attributes are looking familiar? Yes, the message
is similar to the node announcement's XML text and in fact parsed by
the same functions. It has some more information, as in this case the
node hostname= itself is contacting us. At least, that is what we
assume. One of the first things the peer_in_child() function in
src/peer.c does is verifying the hostname against the incoming IP
address. The incoming address is set by the listener process during the
accept() and kept in the connection info. If a hostname lookup matches
the IP address, then Entropy updates its internal lists of contacted
peers to show hostname:port for every connection to that IP address.
But this is just cosmetics. If a node claims to be whitehouse.gov and
is (most probably) not at this address, it will still work with
Entropy. And since its address, not its hostname, will be spread around
inside the net, it will also be contacted by other nodes - sooner or
later.
The node tag and the attributes there should be obvious. This is purely
informational for now but might be used to e.g. Block certain
implementations or (broken) builds from polluting the network. Of
course, someone with malicious intent could fake these fields, as they
are not verified in any way.
The fingerprint attribute of the store tag does serve an important
purpose. A node's fingerprint determines its place in the routing
inside the network. Such a fingerprint consists of 16 byte-sized
values, i.e. Unsigned numbers between 0 and 255. The node derives its
own fingerprint by weighing the number of keys of certain kinds in its
data store. For routing purposes and the fingerprint, Entropy simply
uses the last digit of the SHA1 hashes of the keys. So if a node has a
maximum of 1000 keys ending in '9', 500 ending in '3' and '100' ending
in 'f', and some more keys in unimportant amounts (say below 10) its
fingerprint will look like this:
01 03 00 7f 00 02 00 00 01 ff 00 00 02 00 00 19
You see the ff in 'slot' number 9 (counted from zero), the 7f in slot
number 3 and the 19 in slot number f. These numbers tell when a node is
asked for a key (request) and when it is told about a new key
(advertise).
Fingerprints are updated from time to time (currently every five
minutes) when a node sends its current fingerprint as message to all
contacted peers. The difference from other messages is that this
special message contains a hops to live value of zero (messages with
hops to live zero wouldn't normally leave a node). So if a node's
fingerprint changes - slightly or dramatically, e.g. Because it
collected a lot of data from local or from other peers - its peer nodes
will be informed about what to expect from it or what to send to it.
The whole story with fingerprints is nothing more (and nothing less)
than a way to load-balance the network and to diverge keys, so that
they are kept in different places in the network.
In the final tag of the initial message (crypto) a node tells its peer
about what crypto module it intends to use and what initial data
(comparable to a session key) it will use. The crypto module= names are
simply placeholders for the implemented methods. I am experimenting
with some stream-cipher number generators and Entropy currently prefers
a module 'crypt3', which is an implementation of an S-box algorithm.
Other modules do nothing (crypt0 is a null-layer), are obsolete (crypt1
is the old, now unused method of Entropy up to 0.2.x) or are
experimental and not working.
Since this initial message is currently not encrypted, the whole
communication between two nodes could be eavesdropped by just logging
the initial= attribute of a connection and re-running the same
algorithm. And though the communications encryption becomes to make
sense only when the node-key handling is there, so that the initial
messages can be sent encrypted, too, it makes it unpractical for a
listener to follow the communications on the line.
For the time being, this communications layer encryption is nothing
more than hiding away what goes on on a connection. Still, this is much
better than what most P2P networks do about their user's privacy:
nothing.
d) Data store management
Entropy's data store is kept in a tree of directories below a
configurable base path; storepath=store is the default in entropy.conf.
Depending on the setting of storedepth=x in the configuration, there is
a zero, one, two or three level tree of directories. Every directory's
name is just one lower case hex digit, that is a number between 0-9 or
a-f.
The best storedepth to choose depends on the abilities of your
filesystem. For most Unix filesystems and also NTFS, a storedepth of 1
seems to be a good choice. Whenever Entropy is looking for a key or
looking for a place to store a key, your system has to traverse the
directory contents to see if a given filename already exists. If you
intend to have a huge store, it will probably be wise to not use a flat
directory, that is: all files in one directory (storedepth=0), because
then there will be several hundred thousand or even million filenames
to be scanned for every action.
On the other hand it is not wise to choose a deep nesting for the
directories, if you're going with the default store size or not much
more. The reason is that your system will be able to keep some
directories in memory. The system will try to do this for some recently
accessed directories, but perhaps not for 256 (storedepth=2) or even
4096 (storedepth=3). On FreeBSD on a UFS file system I got the best
results with storedepth=1.
Now how does Entropy decide where to look for a file? Each file's name
is a 40 hex digit string. It represents the SHA1 hash value for the
file (= key). For most files, this is the SHA1 hash of the contents of
the file: for the bit chunks. For redirecting keys (CHK@, SSK@ and
KSK@) however, the file's name is the SHA1 hash of the contents of the
file you would get if you reconstructed the contents from the list of
chunks inside the key.
Right now the contents of the CHK@, SSK@ and KSK@ keys is not
encrypted. It is stored as plain XML text file - it might be gzipped
for larger files. In the next stage of Entropy's development, those
keys will be encrypted. Only then, after this is done, no one (not me,
not you, no authority) will be able to tell exactly what is contained
in your data store.
Until now it would be possible to scan for the XML texts and
reconstruct a key hierarchy and then check if and what chunks of the
keys you have in your data store. This is bad and thus will be
impossible soon.
For the details of how a directory for a key is chosen by entropy, take
a look into src/store.c. The last four digits of the SHA1 hash are used
to pre-select a directory and determine the routing. The fourth digit
from the right is used to determine the first directory level, the
third digit for the second level (if any), the second digit for the
third level (if any). The last digit is not used in the store, but to
build fingerprints and to decide on routes.
e) Freenet Client Protocol
The Freenet Client Protocol, as the name suggests, was designed by the
developers of Freenet. It is a specification for a set of names and
conventions for a client application to talk to a Freenet node. Entropy
implements a very similar interface, except for some minor differences
that should not hurt any well designed client (some differences do hurt
FCP clients, but that is because they are relying on undocumented
details or definitions of Freenet).
A Freenet node usually listens on 127.0.0.1:8481 for FCP clients,
that's why I've chosen the default port number (or service) 8482 for
Entropy. You can run both, Freenet and Entropy, on the same machine
without collisions.
After a client made a socket connection to the FCP port, it will send a
request that is plain text for the most part. But prior to any text, it
must send a header of four bytes. This header might be used for
protocol identification in the future. Right now, the only possible and
accepted header is 00 00 00 02 - this is true for Freenet as well as
for Entropy and all FCP clients seem to have this hardcoded.
The text part of every Request consists of two or more lines,
terminated with CR (\n in C notation). The first line is the type of
request that is send. Then there can be from zero to several lines with
parameters for a request, and finally there is a message terminating
line.
The list of commands or requests that entropy understands is as follows:
ClientHello ... EndMessage
ClientGet ... EndMessage
ClientPut ... Data
GenerateCHK ... EndMessage
ClientDelete ... EndMessage
GenerateSVKPair ... EndMessage
FECSegmentFile ... EndMessage
FECEncodeSegment ... Data
FECDecodeSegment ... EndMessage
FECMakeMetadata ... Data
GenerateSHA1 ... Data
ClientHello
The client says "Hello!" to the node and expects the node to reply.
Nodes are friendly beings and usually tell who they are and what they
are willing to do. The request has no parameters and so the second line
is a EndMessage text. After sending this line, you should read from the
socket until end-of-file and/or until you received an EndMessage line.
So this is what your client application should send on the socket
connection to a FCP Server (values in square brackets are binary, i.e.
Bytes):
[00][00][00][02]
ClientHello
EndMessage
Then you can expect the server to reply with some lines like this:
NodeHello
NodeHello
Node=ENTROPY,0,3.0,215
Protocol=1.2
MaxFilesize=1fffff
EndMessage
This is the reply you'll receive from Entropy. Node= line contains the
Node's name and version and build numbers (they will increase with
every release). The Protocol= line describes the current version of
implemented FCP features. Entropy is not fully compatible to Freenet
(yet) and does though reply with 1.2 to not confuse some clients. The
MaxFilesize= line is the smaller of 2GB - 1 and storesize - 1.
As of build number 284 and newer, Entropy defines an additional
protocol header that is used to support clients which understand some
additional replies sent by the node. The header for this extension is:
[00][00][01][02]
ClientHello
EndMessage
If your client sends this header, there will be two more fields in the
NodeHello reply:
NodeHello
Node=ENTROPY,0,3.0,215
Protocol=1.2
MaxFilesize=1fffff
MaxHopsToLive=a
SVKExtension=BCMA
EndMessage
These two fields MaxHopsToLive and SVKExtension are unique for Entropy.
Freenet does not return them for a NodeHello. The default for the
SVKExtension of a client should be PagM (if it is intended to run on
both, Freenet and Entropy), because this is how Freenet extends a
public sub space key (SSK@). For Entropy this Extension is BCMA. So if
your client sees this Line SVKExtension=BCMA, you should change your
(otherwise hard-coded?) defaults. The MaxHopsToLive value can be used
to scale your client's range of retries to insert or fetch data.
Freenet usually is configured to support a maximum HopsToLive of 25
(decimal; 19 hex). However, there is no way for a client application to
know the current setting.
ClientGet
ClientGet is the work-horse of the FCP. It is used to request data that
was inserted under a specific key. Each ClientGet requests must at
least specify the URI of the request (uniform resource identifier) and
should specify a HopsToLive= value. In case of Entropy, the URI is of
the form entropy:CHK@xxxxxxxx,yyyyy for a content hash key,
entropy:SSK@xxxxxxxx,yyyyyy/path/file for a file below a sub space key
or entropy:KSK@somename for a key signed key (which in Entropy is
nothing but a redirecting key to the CHK@ of the contents of the file).
The HopsToLive= value is specified as hexadecimal digits, e.g.
HopsToLive=a for a request that should go 10 hops at most. So a request
looks like this:
ClientGet
URI=entropy:KSK@gpl.txt
HopsToLive=19
EndMessage
This would request the infamous test key gpl.txt with a hops to live
value of 25 (19 hex is 25 decimal). The reply from the node depends on
some things, like if your request was formally okay, if the key can be
requested (outgoing connections with sufficient routes for the key), if
the data is available, in the local store or - perhaps after some
incoming connection sent it. Here's the list of possible replies:
NodeFailed
NodeFailed
Reason=Some description of the reason why the request failed EndMessage
The node failed to successfully complete the get request, usually
because of some internal problem. You shouldn't see this reply too
often, except for broken builds or bad configurations.
URIError
URIError
URI=xxx
EndMessage
Your URI was wrong. This could be for one of several reason:
1. You specified a CHK@ in wrong format (is it perhaps a Freenet CHK@?)
-You specified a SSK@ without the correct SVKExtension (BCMA for
Entropy)
2. The URI is too long or contains a typo. Watch out for URL encoded
strings. You will have to decode them.
3. Some of the characters of an Entropy URI that might be URL encoded
are the at (@), the tilde (~) and the colon (:)
RouteNotFound (aka RNF)
RouteNotFound
Reason=No route found
EndMessage
A suitable route for requesting a key (either the main key or one or
more of its internally following bit chunk keys) could not be found,
even after some retries. This message is to be expected more often,
especially in these cases:
Your node has no or very few outgoing connections. Be sure to check
your seed.txt has some valid entries and your DNS is able to resolve
the nodes listed there.
Your node is overloaded with requests and hardly finds enough free
queue entries in outgoing connections to handle your local requests.
You could try to lower your inbound bandwidth or increase your outbound
bandwith. As a last resort you might try to use HopsToLive values
closer to the maximum, because this leads to a broader spreading of
keys (to not so well matching routes, too).
DataNotFound (aka DNF)
DataNotFound
Reason=No data found
EndMessage
This is the most common message. A key you requested could not be
found. This could be for one or more of several reasons:
1. You have no inbound connections (look at /node/peers.html).
2. The key never existed.
3. The key existed but it fell out of the network, because all
participating node's data stores were full.
4. The key is somewhere, but it did not make it to the nodes that your
node has contacts to (or to be more specific: nodes that did contact
your node).
Only in the latter case it makes sense to try again to fetch the key,
perhaps with an increased HopsToLive value. It also makes sense to
retry with the same or even lower HopsToLive value, because it could
simply be to high network load that a key did not yet arrive close to
your node.
The general format of a ClientGet request is:
ClientGet
URI=xxx
[HopsToLive=xx]
[MaxFileSize=xxxx]
[Verbose={boolean}]
EndMessage
The lines in square brackets are optional fields for the request. Note
that the MaxFileSize= and Verbose= fields are Entropy extensions;
Freenet does not support them. MaxFileSize= expects a hexadecimal value
for maximum size your application wants to receive. This can be used to
limit the impact of retrieving unknown keys in some polling client
application, if there are malicious spammers polluting your name space.
Entropy uses this option itself, internally, to limit the size of news
messages to 64K. No message longer than this limit will be retrieved or
displayed.
The Verbose= option is also Entropy specific. Freenet sends a reply
only in case of errors or retries (Restarted message). Entropy can be
more verbose. With Verbose=true (or Verbose=yes, Verbose=1), your
client will receive a NodeGet message for every fragment that Entropy
is trying to collect and reconstruct. The format is this:
NodeGet
URI={internal-fragment-key}
Offset=xxx
DataLength=xxx
Percent=dd%
EndMessage
You can use this info to display some kind of progress information for
long lasting downloads. Note that the Offset and Percent values may
"jump back" after an internal retry.
The general reply for successful request is:
DataFound
DataLength=xxxx
MetadataLength=xxxx
EndMessage
This is the general format of the node's reply if the data you
requested could be retrieved. The DataLength and MetadataLength lines
tell your application, what amount of meta data and data will follow
now. The DataLength is the total amount of bytes following, including
meta data!. So if you have a reply DataLength=123 and MetadataLength=23
this means that you should expect 23 hex (35 decimal) bytes of meta
data first, followed by 100 hex (256 decimal) bytes of document data. I
don't know who invented this specification - I would have designed it
differently; anyway, the meta data and data now follows in packets:
The DataChunk reply
DataChunk
Length=xxxx
Data
{raw binary data}
This is how the node sends the meta data and data to your client
application. There is no guarantee about alignment of meta data (if
any) and data chunks. For Entropy the meta data part will come in its
own chunk (or multiple chunks), but you cannot rely on this. You will
have to count the number of bytes that were given in the DataFound
header to know when meta data ends and raw document data starts.
ClientPut
GenerateCHK
ClientDelete
GenerateSVKPair
FECSegmentFile
FECEncodeSegment
FECDecodeSegment
FECMakeMetadata
GenerateSHA1
This command is a helper for clients that do not have a native function
to generate a SHA1 hash for a sequence of data bytes. Many languages
have this type of function or a library you can include and it will be
faster most of the times to use one of these. Sending huge files over a
socket can be adventurous, so use this only as a last resort.
10)
i18: Supported Languages
Currently English and German are supported. Entropy has an external
translation file that allows for easy translation into additional
languages. Entropy supports UTF-8.
11) P2P
There have always been ways to protect ones e-mail communication. You
have probably heard about Pretty Good Privacy (PGP) from Phil
Zimmermann; or you have heard about the GNU Privacy Guard (GnuPG or
short GPG), which comes as free software under the GPL (GNU General
Public License). Those programs are very well suited to make it hard
for eavesdroppers to read all your electronic communication with others.
But there are a lot of privacy concerns in unveiling with whom you had
e-mail contact and when, even if the contents of the communication is
encrypted. That's why I was looking 'for more'. Please don't take this
too serious ;-) It is not that I want to replace or overcome PGP or GPG
but rather look for another way to anonymize data and information
exchange. And I want to achieve impossibility of censorship, if this
goal is reachable at all.
a) What is a Peer-To-Peer network (P2P)?
Peer-To-Peer means 'hand-in-hand' or 'face-to-face' and describes how
computers are connected to each other. The most commonly used form of a
connection on the Internet is where one computer does the job of a
server and where there are one or more (many) clients. This approach
makes a distinction between who serves and who requests. True
peer-to-peer solutions on the other hand let computers play all roles:
the server, the client and sometimes also the router or forwarder. That
means that while many protocols such as HTTP or FTP have dedicated
machines for the server job and where data is stored in one location
and always retrieved from there, many peer-to-peer networks have no
dedicated servers and data is also spread over many computers in a
network. This is the main difference of the re-invented p2p technology
compared to many established services.
The difference between a server and a client on the Internet isn't
actually too big. For small server, which do not have to server
hundreds or thousands of requests, any 'normal' Internet connection
would be sufficient. Now a p2p network uses the fact that small
networks with few connections can be handled by almost any
(non-dedicated) computer. The network as a whole may hold a lot of data
and also give a high speed at accessing the data, but these accesses do
not have to be handled by a single, big machine. The 'nodes' of a p2p
network are serving the content together, in a distributed manner. And
the nodes also play the role of forwarding or routing the data. The
main job of a peer-to-peer software is to define and implement a
routing protocol by which data can be put into and retrieved from a
network of nodes. This routing does connect 'neighbor' nodes to each
other, where neighbor does not necessarily mean a geographically short
distance, but a short distance in the network topology.
If some connections between computers are established they can be used
to exchange texts like e-mails or messages like ICQ, or data and files.
The files could be pictures, music, films... anything which you can
store electronically on your computer. The p2p network software now
assigns a key (or a short cut) to any such file or message and it is
these keys that make their way through the network first. It would not
make too much sense to simply use ones local filenames for a key. Many
people will use the same names for their files, like 'text1.txt' and
still have different contents in their files. So what is needed is a
system to categorize and label files in a usable way. And it must be
possible to look out for a certain file or message on the network and
identify it right.
b) About keys and hashes - specifically SHA1
The key to bring some order into the chaos is the key, or hash value,
of a file. A hash value is something like a short cut or a handle
describing the contents of a file in a short but unique way. There are
several methods to find such unique hashes for a file contents. One of
the newer algorithms used to do it is SHA1 (secure hash algorithm 1)
and it is described here for example.
To describe the function of a hash value in a non-technical way you
could assume the aim was to fine a unique name for every file on your
hard disc. Unique not only for your hard disc but unique in the whole
world (or even unique in the universe and all other universes). If
you're going to use filenames like 'file1.txt', 'file2.txt' and so on,
you would not come too far. The 'trick' of SHA1 is that it does
something like adding up the characters (or bytes) a file. Of course it
does not simply add the values of the bytes, because otherwise a file
containing '12' would yield the same SHA1 hash as a file containing
'21' (and it does not). The effect however should be clear: every file
gets its own unique 'number' assigned based on its contents. This
'number' is the hash value, which in case of SHA1 is a 160 bits number.
It might seem astonishing that it shall be possible to uniquely
identify any file, but the length of the hash value and the quality of
the hash algorithm are the reason why unique hashes are possible. As I
said before, the SHA1 hash length is 160 bits and so there are 2160
possible keys.
Do you remember the story of the emperor of China, who was told about a
new game 'chess' and who wanted to give something to the inventor of
this game. The inventor seemed to wish only a small fee for his
invention: he asked the emperor to put one rice corn on the first
field, two corns on the second field, four corns on the third field and
so forth... The emperor was first impressed by the modesty of the
inventor. However, he did not expect what this wish would really mean
when he agreed to pay the price. The number of corns on the fields is
264-1 which is somewhat above 1'844'674'074'000'000'000 - and I don't
know if there were that many rice corns on earth since rice is
cultivated. Now, if you got that figure, imagine as many universes with
earths, as the number of rice corns would have been and in every
universe a Chinese emperor with a problem understanding the exponential
growth. And if you finally have that idea embraced somehow, imagine as
many collections of universes each with gazillions of emperors with
unavailable amounts of rice corns: this is SHA1.
So we have solved one problem: we can assign a unique number to any
file, any name and even any version of any text, where only a single
letter is modified. This is what we need to tell anyone else on the
world, exactly which file (text, picture, music) he could get from us.
Nobody can really remember 40 digit numbers (which is the length of 160
bit numbers if you write them down in hexadecimal notation). Even that
ENTROPY, just like Freenet, uses a different notation called 'base64'
which reduces the length of the numbers to 27 digits doesn't help too
much. Computers, however, have no problem juggling around with 160 bit
numbers. It is their everyday job to do it and so the numbers aren't a
problem at all. And for the human beings handling files, there is
another type of key, which I will describe now: Key signed keys.
c) Content Hash Keys and Key Signed Keys (CHKs and KSKs)
The keys described in the previous paragraph are called content hash
keys in technical terms. This term isn't restricted to the Freenet or
ENTROPY, but a rather widely used term for the functionality of this
type of keys, which assign a unique key to the contents of a file. This
is why ENTROPY, too, uses the key type CHK, even if the technical
details differ from the ones found in Freenet.
A name in Freenet and thus in the ENTROPY project results in a unique
SHA1 hash value, too. Therefore the name itself is seen as a file
contents, that is the characters making a name are the contents. Such a
name does not reference a file contents, though. It references another
SHA1 hash where the contents can be found. This kind of keys is called
Key Signed Keys or short KSK. Entropy uses this terminology just like
Freenet does, though there are some differences in the implementation.
You can find the contents of the GNU General Public License below a KSK
with the name gpl.txt. In a browser window you can retrieve /gpl.txt,
/KSK@gpl.txt or written in the long notation /Freenet:KSK@gpl.txt. If
you retrieve this key, the Entropy code will quietly forward your
request to the contents hash key of the file gpl.txt. You can try it
here if your setup uses the default values for the fcpproxy address and
port values. With a system like this I could, in theory, insert the
contents of my entire hard disc into Entropy below KSKs like
Freenet:KSK@pullmoll/harddisc/file1.txt etc. Anyone who knows the first
part of the name (pullmoll/harddisc) and the filenames would then be
able to fetch the files.
A very welcome and positive side effect of splitting the names from the
contents is that if two people would insert exactly the same file under
a different name, the network would only hold one copy of the data and
in addition the two (or more) references to the contents. In fact
re-using identical data goes even further, because files are split into
fragments and chunks when they're inserted, so even identical fragments
of two files will share the same data blocks (chunks).
It is important to keep in mind that this forwarding of names to
contents, which key signed keys do permit is also a weak point. Keys of
the type KSK are insecure, because there is no guarantee that you will
find what a name suggests below the key. Simply put there could be two
people running nodes, unconnected and unknown to each other, and insert
two different files under the same name. Now what a requester would
retrieve if he asks for a KSK would depend on which node replied
quicker to his request. If you want to be sure to retrieve a specific
file, you would have to request a CHK key. Those are impossible (as far
as I know) to fake. But it would be quite possible to insert e.g. A
picture below the KSK@gpl.txt from a newly connected (and not yet well
connected) node. This applies for Freenet, too, and recently someone
managed to fake the Freenet copy of KSK@gpl.txt and you got a picture
of some naked girl when you asked for that key.
So you always should be aware that a KSK is even less secure and more
questionable than a CHK. It is practically impossible to fake a CHK and
furthermore ENTROPY checks that only valid blocks (with matching SHA1
hashes) are inserted into a node's data store.
Fortunately there is an exit out of this dilemma, which seems to be
induced by the KSK key types and this exit is called Sub Space Keys or
short SSK. A sub space is a region of the network, where only one user
- the one who knows a private key - can insert files.
d) Sub Space Keys (SSKs)
If now every user of Freenet or ENTROPY had his own space, where only
he had the right to publish some data, then there would be no more name
collisions. And this is what is achieved by means of the Sub Space Keys.
A SSK is really a key perpended to a file or path name. Only one person
has the right to use this special sub space. Of course, there could be
several persons be sharing a private key to publish under the same SSK.
However, to keep things simple, we for now assume that a private key is
in one person's hand - for now.
Freenet utilizes a method, that is known under the name of DSA (Digital
Signature Algorithm). An electronic signature of the sender, i.e. A
signed file or message, can be verified against a known signature
template. I means, that in Freenet your node can verify that a file is
really coming from a claim-to-be sender - with a very high probability.
However, dealing with electronic signatures and checking or verifying
them at the receiver's end is a very time-consuming job. It is all
about calculations with very big numbers, which are difficult to
handle, even on nowadays computers. I'm not yet sure, if we need this
scheme for Entropy, too. The reason I doubt it is, that for ENTROPY the
main goal is to give you an anonymous way to communicate, not to give
you guaranteed authenticity of content. There are other tools (e.g.
GnuPG) to take care of that part of communications. And finally, you
can't trust content from an anonymous for a full 100% anyways. If you
want to ensure authenticity of messages, you should create a private
and public key pair with GnuPG specifically for Freenet or Entropy,
just as you would do for the (untrustable) e-mail transfers.
What Entropy does, is to give the average user no chance to
accidentally overwrite sub space keys of other users. It does so by
simply hashing (SHA1) a random number (the private key) so that it
results in a public key. This is a non-reversible action and so no one
would be able to guess the private key from a publicly visible sub
space key. However, since mapping a SSK@something/file is done by
simply creating the content hash key for this string, a malicious
attacker could guess e.g. The date based redirect or next edition key
of a sub space and fake it by widely distributing data under a wrong
CHK@. I want to see this happen, before I continue to think about a
solution to avoid this type of attack. For now, just be sure to not
assume anything about files popping up under a certain sub space key.
An Entropy sub space key does tell you nothing about the authenticity
of the contents. And I must admit, that this could even be seen as an
advantage, since no one can proof that a file under a certain SSK@ did
come from you either.
To sum it up: SSKs in ENTROPY are used as a means to avoid name
collisions, not as certificates of authenticity. To derive some form of
security from SSKs is a wrong approach. No one can say that contents,
which appears under my SSKs, did really come from me - and this applies
to anyone's content, too. We will see, if this is a problem or an
advantage. As long as cooperation and exchange of data is the main
interest of the majority, it should not be a problem.
I can't stress enough the consequences: Do not execute programs
downloaded from this network. You should never run programs from an
unknown, untrusted source. Who does this could just as well lend his
door keys to anyone on the street. You can do this, but think about
what you're going to do.
Finally a comment on Freenet, DSA and signed contents. If you trust
this system, it means that you trust the algorithm, the implementation,
the current source code or binary on your machine, which (supposedly)
does DSA. Did you understand DSA? Did you understand the
implementation? Did you read the source code and verify, that it is
really detecting wrong signatures? What I want to say is this: there is
a long chain of things to verify and double check. Security is not in a
program, security is a whole system. Or how do you verify, that the
pgp.exe or /usr/local/bin/gpg is still the binary that came out of
verified source code, and not a Trojan? I admit: I don't...
e) McEliece Crtypo & MECH
Error correcting codes in a public key algorithm. McEliece Cryptography
is used in Entropy and in MECH (McEliece Crypto Harness) . Entropy uses
this method to encrypted the initial node communications. MECH is a
simple PGP or GnuPG substitute, featuring the crypto routines from
Entropy. This is the McEliece PKCS for the public and private keys and
the Lili2 PRNG bit stream used to encrypt messages and the secret keys.
By looking at the easy to follow fact that intentionally induced errors
in a crypto text make it much harder for a crypto-analyst to decipher
the message, Mc Eliece in 1978 had the idea to use error correcting
codes as a basis for a crypto system. In this system the generator
matrix of a Goppa code is converted into a linear code of your choice
by matrix multiplication. Since decoding a linear code is NP-hard [ 1],
where a Goppa code takes linear time, one can see the matrix
multiplication as a one way function and the secret dissection into
single matrices as the trap door information.
[1] Arto Salomaa, Public-Key-Cryptography. Springer, EATCS Monographs
on Theoretical Computer Science Vol. 23, (1990)
Key creation, Encryption, Decryption, and Security details as well as a
mathematical example are available on the Entropy website.