2004年10月24日
  1. Raining Sockets – Raining Sockets is a non-blocking sockets framework which eases the job of creating a highly scalable application that can receive and send over 10000 socket connections. The SocketBlaster (NIO based Load Tester) and Server (very basic NIO Server) program (both extended from NioSocket.java) have been successfully tested with 13000 connections on 2 Xeon machines. The Raining HTTP Server is an NIO based HTTP Server that aims at implementing most of HTTP 1.1
  2. Reattore – Reattore is a simple single threaded HTTP server written in Java. Unlike most Java server applications, Reattore uses the socket channel features added in Java 1.4 to serve all requests from one thread, instead of spawning each request off to a new thread. In theory this provides better performance and allows the system to degrade well under high load.
  3. UberMQ – UberMQ is a clean room implementation of the Java Message Service specification. JMS is a part of the Java 2 Enterprise Edition. We wrote UberMQ because many of the established JMS vendors have turned their back on the core tenets of distributed computing: fast and simple. Our technology is implemented with Java NIO, allowing you to scale your message traffic without thousands of threads and the associated performance penalties.
  4. EIO – The EIO package (which stands for “Event Input/Output”) offers a networking API in Java that is efficient and very simple. This provides a third alternative to the two two major Java networking APIs: the traditional one-thread-per-connection model (java.net), and the new non-blocking model (java.nio). EIO provides a simplified wrapper around java.nio.
  5. Mule – Mule is a light-weight, event-driven component technology. It is highly scalable, using ideas from SEDA
  6. EmberIO – EmberIO is an I/O event system centered around Java 1.4 NIO. Supports both request-process-response models and many asynchronous models. Configurable thread pooling mechanisms and event processing models
  7. Bamboo DHT – A distributed hash table, or DHT, is a building block for peer-to-peer applications. At the most basic level, it allows a group of distributed hosts to collectively manage a mapping from keys to data values, without any fixed hierarchy, and with very little human assistance. This building block can then be used to ease the implementation of a diverse variety of peer-to-peer applications such as file sharing services, DNS replacements, web caches, etc.
  8. JicarillaHTTP – A componentized, scriptable, event-based webserver, based on the JDK 1.4 New I/O (nio) package.
  9. NBServer – A framework that takes most of the pain out of writing non-blocking network servers in Java 1.4. It’ll shield you from low-level details of output buffering, selector management etc. leaving you to concentrate solely on writing the network protocol.
  10. JADE Stuct/Union – JavaTM classes for direct interoperability with C/C++ applications. Base classes analogous to C/C++ struct & union (same storage layout, alignment rules, bit-field support, etc.) Memory sharing between Java applications and native libraries. Direct encoding/decoding of streams for which the structure is defined by legacy C/C++ code. Serialization/deserialization of Java objects (complete control, e.g. no class header). Mapping of Java objects to physical addresses (with JNI).
  11. Kowari – The storage engine of Kowari is a transactional triplestore known as the XA Triplestore. ll relevant fields of in-memory and on-disk data structures are 64 bits wide. System crashes caused by power failures and some types of hardware fault will not cause data corruption. The on-disk data structures of the triplestore are designed to be kept in a consistent state at all times while minimizing the overhead required to achieve this. NIO file channels allow multiple threads to concurrently read and write different parts of the same file without having to use thread synchronization to protect the current file position. On 32 bit platforms the amount of virtual memory that is available for mapping files is usually limited to less than 2 GB.
  12. Netty2 – Netty 2 provides an easy event-based API (like Swing) to develop high-performance, maintainable TCP/IP server/client application. Netty handles many essential features such as readiness selection, thread pooling, and buffer reuse which are required to build high- performance and capacity network applications.
2004年09月21日
Thread per connection : NIO, Linux NPTL and epoll
Author: b_rahul
Jun 18, 2004 3:27 PM    
I have been benchmarking Java NIO with various JDKs on Linux. Server is
running on a 2 CPU 1.7 GHz, 1GB RAM, Ultra160 SCSI 36GB disk

With Linux kernel 2.6.5 (Gentoo) I had NPTL turned on and support for
epoll compiled in. The server application was designed to support
multiple disptach models :

1. Reactor with Iterative Disptach with multiple selector threads. Essentially
the accepted connections were load-balanced between varying number of
selector threads. The benchmark then applied a step function to experimentally
determine the optimal # of threads and connection per selector ratio.

2. Also a simple concurrent blocking disptach model was supported. This is
essentially a reader thread per connection model.

Client application opens concurrent persistent connections to the server
and starts blasting messages. Server just reads the messages and does
basic un-marshalling to ensure message is ok.

Results were interesting:

1. With NPTL on, Sun and Blackwidow JVM 1.4.2 scaled easily to 5000+ threads. Blocking
model was consistently 25-35% faster than using NIO selectors. Lot of techniques suggested
by EmberIO folks were employed – using multiple selectors, doing multiple (2) reads if the first
read returned EAGAIN equivalent in Java. Yet we couldn’t beat the plain thread per connection model
with Linux NPTL.

2. To work around not so performant/scalable poll() implementation on Linux’s we tried using
epoll with Blackwidow JVM on a 2.6.5 kernel. While epoll improved the over scalability, the
performance still remained 25% below the vanilla thread per connection model. With epoll
we needed lot fewer threads to get to the best performance mark that we could get out of NIO.

Here are some numbers:

(cc = Concurrent Persistent Connections, bs = Is blocking server mode on Flag,
st = Number of server threads, ct = Connections handled per thread,
thruput = thruput of the server )

cc, bs,st,ct, thruput
1700,N,2,850,1379
1700,N,4,425,1214
1700,N,8,212,1240
1700,N,16,106,1140
1700,N,32,53,1260
1700,N,64,26,1115
1700,N,128,13,886
1700,N,256,6,618
1700,N,512,3,184
1700,Y,1700,1,1737

As you can see the last line indicates vanilla blocking server (thread per connection)
produced the best thruput even with 1700 threads active in the JVM.

With epoll, the best run was with 2 threads each handling around 850 connections in
their selector set. But the thruput is below the blocking server thruput by 25%!

Results shows that the cost of NIO selectors coupled with OS polling mechanism (in
this case efficient epoll VS selector/poll) has a significant overhead compared to
the cost of context switching 1700 threads on an NPTL Linux kernel.

Without NPTL of course it’s a different story. The blocking server just melts at 400 concurrent
connections! We have run the test upto 10K connections and the blocking server outperformed
NIO driven selector based server by same margin. Moral of the story – NIO arrives at the scene
a little too late – with adequate RAM and better threading models (NPTL), performance gains
of NIO don’t show up.

Sun’s JVM doesn’t support epoll() so we couldn’t use epoll with it. Normal poll() based
selector from Sun didn’t perform as well. We needed to reduce the number of connections
per thread to a small number (~ 6-10) to get comprabale numbers to epoll based selector.
That meant running lot more selector threads kind of defeats the purpose of multiplexed IO.
The benchmarks also dispell the myth created by Matt Welsh et al (SEDA) that a single
threaded reactor can keep up with the network. On a 100Mbps ethernet that was true: network
got saturated prior to server CPUs but with > 1Gbps network, we needed multiple selectors
to saturate the network. One single selector’s performance was abysmal (5-6x slower than
concurrent connections)

For application that want to have fewer number of threads for debuggability etc, NIO may be
the way to go. The 25-35% performance hit may be acceptable to many apps. Fewer threads
also means easier debugging, it’s a pain to attach a profiler or a debugger to a server hosting
1000+ threads :-) . Bottom line with better MT support in kernels (Linux already with NPTL), one
needs to re-consider the thread per connection model

Rahul Bhargava
CTO, Rascal Systems

 

From: http://forum.java.sun.com/thread.jsp?forum=17&thread=531781

From: http://www.onjava.com/pub/a/onjava/2004/09/01/nio.html

by Nuno Santos
09/01/2004

About one year ago, a client of the company where I work asked us to develop a router for telephony protocols (i.e., protocols used for communication between a SMS center and external applications). Among the requirements was that a single router should be able to support at least 3,000 simultaneous connections.

It was clear that we could not use the traditional thread-pooling approach. Most thread libraries do not scale well, because the time required for context switching increases significantly with the number of active threads. With a few hundred active threads, most CPU time is wasted in context switching, with very little time remaining for doing real work. As an alternative to thread pooling, we decided to use I/O multiplexing. In this approach, a single thread is used to handle an arbitrary number of sockets. This allows servers to keep their thread count low, even when operating on thousands of sockets, thereby improving scalability and performance significantly. Unfortunately, there is a price to pay: an architecture based on I/O multiplexing is significantly harder to understand and to implement correctly than one based on thread pooling.

The support for I/O multiplexing is a new feature of Java 1.4. It builds on two features of the Java NIO (New I/O) API: selectors and non-blocking I/O. The article “Introducing Nonblocking Sockets” provides a good introduction to these two features.

In this article, we describe the lessons we learned while designing and implementing our router, focusing on architectural issues such as I/O event dispatching, threading, management of client data, and protocol state. This is not an introductory article; the intended audience is developers that already have a basic knowledge of I/O multiplexing and Java NIO, but haven’t yet used those technologies to develop a full-scale server.

The article includes the source code for a echo server and client based on the architecture described. Both the server and the client are functional and can be complied and executed without any modification. The source code can also be used as a starting point to develop a full server.

I/O Event Handling

The I/O architecture of our router was strongly inspired by the Swing event-dispatch model. In Swing, events generated by the user interface are received by the JVM and stored in an event queue. Inside of the JVM, an event dispatch thread (implemented in the class java.awt.EventQueue) monitors this queue and dispatches incoming events to interested listeners. This is a typical example of the Observer pattern.

In our router, there is also an event dispatch thread, implemented in the class SelectorThread. As the name suggests, this class encapsulates a selector and a thread. The thread monitors the selector, waiting for incoming I/O events and dispatching them to the appropriate handlers.

The SelectorThread class generates four types of events, corresponding to the operations defined on java.nio.channels.SelectionKey: connect, accept, read, and write. Handlers register with the SelectorThread class to receive events. Depending on the type of events they are interested in, they must implement one of the following interfaces:

  • ConnectSelectorHandler: For establishing outgoing connections.
  • AcceptSelectorHandler: For receiving incoming connections.
  • ReadWriteSelectorHandler: For reading or writing data to a connection.

Figure 1 describes the class hierarchy of these interfaces. We chose not to define a single interface for all of the possible events because a single handler will likely be only interested in some of the events. For instance, a handler that accepts connections will most likely not need to establish connections. This separation allows handlers to implement only the operations they really need.

Figure 1
Figure 1. Class hierarchy for I/O event handlers

The Life of a Handler

One important difference from the thread-per-client model is that all read and write operations are non-blocking, forcing the programmer to deal with partial reads and writes. When the handler receives a read event, it means only that there are some bytes available in the socket’s read buffer. This data may contain either part of a packet, a full packet, or more than one packet. All cases have to be considered while reading. A similar situation occurs when writing. It is only possible to write as much as the space available on the socket’s write buffer. A call to write will return as soon as the buffer space is exhausted, regardless of whether the data has been fully written or not. This has a direct impact on the lifecycle of a handler, which needs to deal with all of these situations.

A handler is basically a state machine reacting to I/O events. Its typical lifecycle is the following:

  1. Waiting for data
    The handler is interested in reading but not in writing, since there is nothing to send to the client. Therefore, it activates read interest and waits for the read event.

  2. Reading
    After receiving the read event, the handler retrieves the available data from the socket and starts reassembling the request. During this state, it is not interested in receiving any type of event. If a packet is fully reassembled, it starts processing it (state 3). Otherwise, it saves the partial packet, reactivates read interest, and continues waiting for data (state 1).

  3. Processing request
    The handler enters this state whenever a request is fully reassembled. While here, the handler is not interested in either reading or writing (assuming that it only processes a request at a time). All interest in I/O events is disabled.

  4. Writing
    When the reply is ready, the handler tries to send it immediately using a non-blocking write. If there is not enough space on the socket’s write buffer to hold the entire packet, it will be necessary to send the rest later (step 5). Otherwise, the packet is sent and the handler can reactivate read interest, waiting for the next packet.

  5. Waiting to write
    When a non-blocking write returns without having written all of the data, the handler activates interest in the write event. Later, when there is space available in the write buffer, the write event will be raised and the handler will continue writing the packet.

Figure 2 shows the state transition diagram of a handler.

Figure 2
Figure 2. State transition diagram of an handler

Dispatching I/O Events

The SelectorThread class is responsible for supporting the lifecycle of the handlers. For that, it manages the following information for each handler:

  • A SelectableChannel: The channel to be monitored for events.
  • The handler itself: The object to be notified of events.
  • An interest set: The set of operations to be monitored.

Handlers must provide these elements when they register themselves. The SelectorThread will then register the channel with the internal selector, using the interest set to activate monitoring of the corresponding I/O operations. The handler is stored as an attachment, which is a convenient way of associating application data with a registered channel. Internally, the following method call is performed to register a handler:

channel.register(selector, interestSet, handler);


After being registered, handlers can activate or deactivate interest in specific I/O 

events by updating their interest sets. Internally, the SelectorThread class updates 

the interest set of the corresponding SelectionKey. There is no support for 

de-registering a channel, since this can be easily accomplished by closing the socket.

Here is what the SelectorThread class looks like:
public class SelectorThread implements Runnable { /** * Graceful shutdown. */ public void requestClose() { ... } /** * Adds a new interest to the list of events * where a channel is registered. */ public void addChannelInterestNow( SelectableChannel channel, int interest) throws IOException { ... } /** * Like addChannelInterestNow(), but executed * asynchronously on the selector thread. */ public void addChannelInterestLater( SelectableChannel channel, int interest, CallbackErrorHandler errorHandler) { ... } /** * Removes an interest from the list of events * where a channel is registered. */ public void removeChannelInterestNow( SelectableChannel channel, int interest) throws IOException { ... } /** * Like removeChannelInterestNow(), but executed * asynchronously on the selector thread. */ public void removeChannelInterestLater( SelectableChannel channel, int interest, CallbackErrorHandler errorHandler) { ... } /** * Like registerChannelLater(), but executed * asynchronously on the selector thread. */ public void registerChannelLater( SelectableChannel channel, int selectionKeys, SelectorHandler handlerInfo, CallbackErrorHandler errorHandler) { ... } /** * Registers a SelectableChannel with this * selector. */ public void registerChannelNow( SelectableChannel channel, int selectionKeys, SelectorHandler handlerInfo) throws IOException { ... } /** * Executes the given task in the selector * thread. Does not wait for its execution. */ public void invokeLater(Runnable run) { ... } /** * Executes the given task synchronously in the * selector thread, waiting for its execution. */ public void invokeAndWait(final Runnable task) throws InterruptedException { ... } /** * Main cycle. This is where event processing * and dispatching happens. */ public void run() { ... } }

The purpose of the invoke*() methods and of the two variants (*Now() and *Later())

for most of the public methods will be explained shortly.

Threading in a Multiplexed World

In theory, with I/O multiplexing it is possible to have a single thread do all of the

work in a server application. In practice, that is a very bad idea. When using a

single thread, it is not possible to hide the latency of disk I/O (Java NIO does

not support non-blocking file operations) or to take advantage of systems with

multiple CPUs. As a rough guideline, a server application should have at least 2*n

threads, with n being the number of execution units available. Therefore, we had to

implement a way of dividing the work among threads.

Getting the threading model right was the hardest part of the development. We

considered the following architectures:

  • M dispatchers/no workers
    Several event-dispatch threads doing all the work (dispatching events,
    reading, processing requests, and writing).

  • 1 dispatcher/N workers
    A single event-dispatch thread and multiple worker threads.

  • M dispatchers/N workers
    Multiple event dispatch threads and multiple worker threads.

In all cases, incoming connections are assigned to an event-dispatch thread
(a SelectorThread) for the duration of their lives. In the first architecture,
I/O events are fully processed by the event-dispatch thread. In the other two,

the processing is delegated to worker threads.

The Complex Solution

Our initial approach was based on the M-N architecture. This proved to be a bad

option. The main problem was keeping the whole system thread-safe. There were

many interaction points between dispatcher and worker threads, all of them

requiring careful synchronization. An even worse problem is that the group

formed by a selector and its associated selection keys is not safe for

multithreaded access. If a selection key is changed in any way by a thread

while another thread is calling its selector select() method, the typical

result is the select() call aborting with an ugly exception. This happened

often when worker threads closed channels and indirectly cancelled the

corresponding selection keys. If select() is being called at that time, it

will find a selection key that was unexpectedly cancelled and abort with a

CancelledKeyException. This is perhaps the most important lesson we learned

about I/O multiplexing and Java NIO: A selector, its selection keys, and

registered channels should never be accessed by more than one thread.

IBM Asynchronous IO for Java

http://www.alphaworks.ibm.com/tech/aio4j

SEDA: An Architecture for Highly Concurrent Server Applications

http://www.eecs.harvard.edu/~mdw/proj/seda/

Coconut investigates technologies for implementing highly concurrent Internet services.


Coconut is not one single project but a number of projects that together enables the construction of highly concurrent systems.

All projects are based on Java 1.5 and there are currently no plans on porting any of the projects to previous versions. The primary reasons for choosing Java 1.5 are the new concurrency constructs (that fits nicely into Coconut), JMX support (because its all about control) and new language features.

To get started with the Coconut platform check out the Getting Started guide or as an alternative choose one of the specific Coconut projects listed below.

Coconut Projects

Currently Coconut consists of the following projects

Project Description
Coconut Core Common Coconut interfaces that are used across of almost all other Coconut projects
Coconut AIO A package that allows to perform I/O on sockets and files asynchronously

These projects will be made available for general use within the next months.

Project Description
Coconut Cache An adaptive and high performant cache library which automatically selects cache management schemes to dynamically adapt to access patterns.
Coconut Staged An event-driven architecture used for constructing highly concurrent Internet services

Project Keywords

Performance, Robustness to load, Generic service platforms, Self-optimization, Scalability, Event-driven architectures, Asynchronous I/O.


Project despot: Kasper Nielsen

书目录:

Table of Contents


Preface 

1. Introduction       I/O Versus CPU Time       No Longer CPU Bound       Getting to the Good Stuff       I/O Concepts       Summary

2. Buffers       Buffer Basics       Creating Buffers       Duplicating Buffers       Byte Buffers       Summary

3. Channels       Channel Basics       Scatter/Gather       File Channels       Memory-Mapped Files       Socket Channels       Pipes       The Channels Utility Class       Summary

4. Selectors       Selector Basics       Using Selection Keys       Using Selectors       Asynchronous Closability       Selection Scaling       Summary

5. Regular Expressions       Regular Expression Basics       The Java Regular Expression API       Regular Expression Methods of the String Class       Java Regular Expression Syntax       An Object-Oriented File Grep       Summary

6. Character Sets       Character Set Basics       Charsets       The Charset Service Provider Interface       Summary

A. NIO and the JNI

B. Selectable Channels SPI

C. NIO Quick Reference 

该书的网站:

http://www.javanio.info/filearea/bookexamples/

例子代码:

http://www.javanio.info/filearea/bookexamples/

Sun例子代码:

http://java.sun.com/j2se/1.4.2/docs/guide/nio/example/

The C10K problem

http://www.kegel.com/c10k.html

The Java Developers Almanac 1.4