Some questions about the ProtocolParser interface

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Some questions about the ProtocolParser interface

Erik Svensson-2
Howdy all!

I'm currently implementing a protocol parser and looking at the java doc for
the interface some questions comes up.

I'm a bit unclear about the contract implicit in the ProtocolParsers
interface. If we start from the start, with startBuffer() (:-)) it's almost
clear. It basically states that the ByteBuffer parameter is a byte source to
be parsed. It's also stated that startBuffer() 'should store the buffer and
its state so that subsquent calls to getNextMessage() will return distinct
messages'. I take this to mean that the contents of the source should be
appended or added to some ProtocolParser local storage (in my case appended
to a local ByteBuffer). However releaseBuffer() states that 'No more parsing
will be done... . Set up the buffer  so that its position is the first byte
that was not part of a full message'. This indicates that the full content
of the source meantioned above should NOT be copied into local storage when
the startBuffer() method is invoked. releaseBuffer() also states
'[Returns] true if the parser has saved some state [in the source buffer]'
which contradicts the documentation from startBuffer(). There is also a
problem with this. Unless framework is ready to grow the source buffer
(which it is not according to some mails on the list) this might be a
problem. Some clients send large messages (I know of one protocol where the
messages can be arbitrarly large. In one case the message was 21 MB.). Since
it's bad to make assumptions about the frameworks behaviour (which might
change at the drop of a hat) the protocol parser has to keep state itself
(and store all incoming bytes). Ie, I think releaseBuffer() should work as
'be done with this buffer'-message from the ParserProtcolFilter.

getNextMessage() is another puzzle. The first sentence 'Get the next
complete message from the buffer, which can then be processed by the next
filter in the protocol chain.' I understand as ' Get the next complete
message from the buffer, parse it to non-byte format and pass it on to be
processed by the next filter in the protocol chain'. However, the next
sentence makes no sense to me: ' Because not all filters will understand
protocol messages, this method should also set the position and limit of the
buffer at the start and end boundaries of the message'. Does this mean that
the ProtocolParser should NOT consume the bytes in the buffer? Isn't the
point of all this to turn a stream of bytes into a buffer? Further, if the
ProtocolParser has stored bytes locally then the buffer is full of data that
makes no sense without the locally stored data and might not even make sense
with that data as the message might not be complete.
In that vein, when you add filters to the protocol chain, can you assume
that they will invoked in the order added?

This is a bit rambling as I try to work while writing it but I would really
like to see a bit of idea beyond the tutorial on how this is supposed to
work.

Also, the javadocs is bit old.In the comments to hasMoreBytesToParse() the
method 'parseBytes()' is mentioned. parseBytes() cannot be found however, so
I'm guessing getNextMessage() is what's meant.
There's a bunch of others.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

sdo
Reply | Threaded
Open this post in threaded view
|

Re: Some questions about the ProtocolParser interface

sdo


> I'm a bit unclear about the contract implicit in the ProtocolParsers
> interface. If we start from the start, with startBuffer() (:-)) it's almost
> clear. It basically states that the ByteBuffer parameter is a byte source to
> be parsed. It's also stated that startBuffer() 'should store the buffer and
> its state so that subsquent calls to getNextMessage() will return distinct
> messages'. I take this to mean that the contents of the source should be
> appended or added to some ProtocolParser local storage (in my case appended
> to a local ByteBuffer).

There are different ways to do this based on the needs of your parser.
In the simple case, you could just store a reference to the buffer, its
initial position, limit, and so on so that you can keep track of the
buffer as you go through the parsing.

You can copy out the data, but that is probably inefficient. Also, it
leaves the question of partial messages (below). So maybe we should make
it clearer that by store the buffer, we really mean store a reference to
the buffer.

So a startBuffer() could do:
    bb.flip()
    savedBuffer = bb;
    origLimit = bb.limit();
    ... and perhaps copy out the data, or if the buffer has a backing
store, just get that array
    for future use

>  However releaseBuffer() states that 'No more parsing
> will be done... . Set up the buffer  so that its position is the first byte
> that was not part of a full message'. This indicates that the full content
> of the source meantioned above should NOT be copied into local storage when
> the startBuffer() method is invoked. releaseBuffer() also states
> '[Returns] true if the parser has saved some state [in the source buffer]'
> which contradicts the documentation from startBuffer().

There are two cases here -- if there is no partial message left over in
the buffer, then you can just clear out the buffer. If there is a
partial message, then the position should be set to the beginning of the
message (so that next time the parser is called, it will process that
message), and the limit should be set to the original limit (so that
additional data is read after that limit).

>  There is also a
> problem with this. Unless framework is ready to grow the source buffer
> (which it is not according to some mails on the list) this might be a
> problem. Some clients send large messages (I know of one protocol where the
> messages can be arbitrarly large. In one case the message was 21 MB.). Since
> it's bad to make assumptions about the frameworks behaviour (which might
> change at the drop of a hat) the protocol parser has to keep state itself
> (and store all incoming bytes). Ie, I think releaseBuffer() should work as
> 'be done with this buffer'-message from the ParserProtcolFilter.

Yes, it's true that there is no real buffer management that can grow the
buffer yet, and it's true that's a problem. If you have a protocol that
has 21MB messages, then perhaps the ParserProtocolFilter isn't the best
solution for you -- or at least, it will probably require some work
extending the ParserProtocolFilter class. The ORB team using grizzly has
done that because some of their requirements don't necessarily fit this
parsing framework.

But for releaseBuffer to be completely done with the buffer, then the
parser would have to copy data out of the bytebuffer into temporary
storage somewhere, which is something we'd like to avoid -- data copying
between buffers can get to be quite expensive. So leaving the partial
data in the buffer and setting up that buffer to process additional
messages is a way to avoid that copying.

Another thing you might be able to do is compact the buffer in
releaseBuffer -- if you find that there isn't enough room between the
current position and the capacity for a message, compacting the buffer
will make more room (at the expense of copying data).

> getNextMessage() is another puzzle. The first sentence 'Get the next
> complete message from the buffer, which can then be processed by the next
> filter in the protocol chain.' I understand as ' Get the next complete
> message from the buffer, parse it to non-byte format and pass it on to be
> processed by the next filter in the protocol chain'. However, the next
> sentence makes no sense to me: ' Because not all filters will understand
> protocol messages, this method should also set the position and limit of the
> buffer at the start and end boundaries of the message'. Does this mean that
> the ProtocolParser should NOT consume the bytes in the buffer? Isn't the
> point of all this to turn a stream of bytes into a buffer? Further, if the
> ProtocolParser has stored bytes locally then the buffer is full of data that
> makes no sense without the locally stored data and might not even make sense
> with that data as the message might not be complete.
> In that vein, when you add filters to the protocol chain, can you assume
> that they will invoked in the order added?

You can assume that filters are invoked in order.

The getNextMessage() returns a message object; what that object is is
completely up to you (which is what I meant by non-byte format, but
that's quite a bad choice of words on my part!). That message is passed
down the filter chain.

The subsequent filters in the chain have two options: they can call
context.getAttribute(ProtocolParser.MESSAGE) if they know a protocol
parser is in use; then they can simply process the message. But some
filters are dumb -- like the LogFilter. It doesn't know about messages
or anything, so it will just get the buffer from the context and process
the buffer. That's why you should also set the position/limit of the
buffer to reflect the message boundaries -- so the LogFilter (or
whatever) can deal with just the subset of data that you're dealing
with.

On the other hand, while that was working for me with the LogFilter when
I proposed these changes, it isn't anymore. I'll have to go debug that,
but that at least is the intent of why you might want to set up the
buffer bounds to reflect the message bounds.

> This is a bit rambling as I try to work while writing it but I would really
> like to see a bit of idea beyond the tutorial on how this is supposed to
> work.
>
> Also, the javadocs is bit old.In the comments to hasMoreBytesToParse() the
> method 'parseBytes()' is mentioned. parseBytes() cannot be found however, so
> I'm guessing getNextMessage() is what's meant.
> There's a bunch of others.

I'll work on updated both the javadoc and the tutorial code.

-Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Some questions about the ProtocolParser interface

Erik Svensson-2
On 2/7/08 10:47 PM, "Scott Oaks" <[hidden email]> wrote:

>
>
>> I'm a bit unclear about the contract implicit in the ProtocolParsers
>> interface. If we start from the start, with startBuffer() (:-)) it's almost
>> clear. It basically states that the ByteBuffer parameter is a byte source to
>> be parsed. It's also stated that startBuffer() 'should store the buffer and
>> its state so that subsquent calls to getNextMessage() will return distinct
>> messages'. I take this to mean that the contents of the source should be
>> appended or added to some ProtocolParser local storage (in my case appended
>> to a local ByteBuffer).
>
> There are different ways to do this based on the needs of your parser.
> In the simple case, you could just store a reference to the buffer, its
> initial position, limit, and so on so that you can keep track of the
> buffer as you go through the parsing.
>
> You can copy out the data, but that is probably inefficient. Also, it
> leaves the question of partial messages (below). So maybe we should make
> it clearer that by store the buffer, we really mean store a reference to
> the buffer.
>
> So a startBuffer() could do:
>     bb.flip()
>     savedBuffer = bb;
>     origLimit = bb.limit();
>     ... and perhaps copy out the data, or if the buffer has a backing
> store, just get that array
>     for future use

Here's another thing I don't quite like and that's the ProtocolParser
flipping the bytebuffer for reading. IMHO the bytebuffer should be given to
to PP set for reading. The Filter is responsible for gathering bytes and
presenting them to the PP in a way that the PP can start reading.
In the same vein I feel that the PP should do a compact() before returning
the buffer (if it consumed anything, otherwise do a rewind()).
Might be nitpicking here but (once again) IMHO it's important to keep
responsibilities well definied. It only leads to confusion and chaos
especially for the maintainers coming in afterwards.
The ProtocolFilter flips the ByteBuffer before calling the PP and flips it
back afterwards.

>>  However releaseBuffer() states that 'No more parsing
>> will be done... . Set up the buffer  so that its position is the first byte
>> that was not part of a full message'. This indicates that the full content
>> of the source meantioned above should NOT be copied into local storage when
>> the startBuffer() method is invoked. releaseBuffer() also states
>> '[Returns] true if the parser has saved some state [in the source buffer]'
>> which contradicts the documentation from startBuffer().
>
> There are two cases here -- if there is no partial message left over in
> the buffer, then you can just clear out the buffer. If there is a
> partial message, then the position should be set to the beginning of the
> message (so that next time the parser is called, it will process that
> message), and the limit should be set to the original limit (so that
> additional data is read after that limit).

But why not just do a compact()?
Right now we're now halfway into a producer/consumer pattern. I'd prefer
that the PP does it's parsing, consuming the bytes and returning the
bytebuffer unflipped but compacted() (in the most common case).
If somebody else needs the raw bytes down the line then some other mechanism
should be implemented to furnish them with that.
 

>>  There is also a
>> problem with this. Unless framework is ready to grow the source buffer
>> (which it is not according to some mails on the list) this might be a
>> problem. Some clients send large messages (I know of one protocol where the
>> messages can be arbitrarly large. In one case the message was 21 MB.). Since
>> it's bad to make assumptions about the frameworks behaviour (which might
>> change at the drop of a hat) the protocol parser has to keep state itself
>> (and store all incoming bytes). Ie, I think releaseBuffer() should work as
>> 'be done with this buffer'-message from the ParserProtcolFilter.
>
> Yes, it's true that there is no real buffer management that can grow the
> buffer yet, and it's true that's a problem. If you have a protocol that
> has 21MB messages, then perhaps the ParserProtocolFilter isn't the best
> solution for you -- or at least, it will probably require some work
> extending the ParserProtocolFilter class. The ORB team using grizzly has
> done that because some of their requirements don't necessarily fit this
> parsing framework.
>
> But for releaseBuffer to be completely done with the buffer, then the
> parser would have to copy data out of the bytebuffer into temporary
> storage somewhere, which is something we'd like to avoid -- data copying
> between buffers can get to be quite expensive. So leaving the partial
> data in the buffer and setting up that buffer to process additional
> messages is a way to avoid that copying.

While creating some sort of local storage might have a cost associated with
it, the act of actually copying more data into it can't really cost all that
much. System.arraycopy() is one of the most optimized routines in the vm.
But I do agree that gathering enough bytes for a complete message should not
really be the providence of the procotol parser.

> Another thing you might be able to do is compact the buffer in
> releaseBuffer -- if you find that there isn't enough room between the
> current position and the capacity for a message, compacting the buffer
> will make more room (at the expense of copying data).

I would say that compacting the buffer after reading a message should be
mandatory.
 

>> getNextMessage() is another puzzle. The first sentence 'Get the next
>> complete message from the buffer, which can then be processed by the next
>> filter in the protocol chain.' I understand as ' Get the next complete
>> message from the buffer, parse it to non-byte format and pass it on to be
>> processed by the next filter in the protocol chain'. However, the next
>> sentence makes no sense to me: ' Because not all filters will understand
>> protocol messages, this method should also set the position and limit of the
>> buffer at the start and end boundaries of the message'. Does this mean that
>> the ProtocolParser should NOT consume the bytes in the buffer? Isn't the
>> point of all this to turn a stream of bytes into a buffer? Further, if the
>> ProtocolParser has stored bytes locally then the buffer is full of data that
>> makes no sense without the locally stored data and might not even make sense
>> with that data as the message might not be complete.
>> In that vein, when you add filters to the protocol chain, can you assume
>> that they will invoked in the order added?
>
> The subsequent filters in the chain have two options: they can call
> context.getAttribute(ProtocolParser.MESSAGE) if they know a protocol
> parser is in use; then they can simply process the message. But some
> filters are dumb -- like the LogFilter. It doesn't know about messages
> or anything, so it will just get the buffer from the context and process
> the buffer. That's why you should also set the position/limit of the
> buffer to reflect the message boundaries -- so the LogFilter (or
> whatever) can deal with just the subset of data that you're dealing
> with.

To be frank here: I don't like adjusting position/limit to align with
message boundaries at all. In my mind that's changing the semantics of the
ByteBuffer class. I would rather wrap the byte buffer class in something
like a RawMessage class that contains the complete (raw) message as a byte
buffer gotten through the ByteBuffer.slice() operation and using fields in
that class as indicators for message start/end.

Once again rambling trying to work at the same time. :-)

cheers
Erik Svensson, SIX AB


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]